Improved over-sampling techniques based on sparse representation for imbalance problem

Abstract

The classification problem of imbalanced datasets has received much attention in recent years. This imbalance problem usually occurs in when the ratio between classes is high. Many techniques have been developed to tackle the imbalance problem in supervised learning. The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most effective over-sampling methods processing this problem, which changes the distribution of training sets to balance the different number of examples of each class. However, SMOTE randomly synthesizes the minority instances along a line joining a minority instance and its selected nearest neighbors, ignoring nearby majority instances and isolated points, which would affect the final classification result. In this paper, we propose two improved techniques based on SMOTE through sparse representation theory. This extension results in Sparse-SMOTE and SROT (Sparse Representation Based Over-Sampling Technique). The Sparse-SMOTE replaces the k-nearest neighbors of the SMOTE with sparse representation, and the SROT uses a sparse dictionary to create a synthetic sample directly. The experiments are performed on 10 UCI datasets using C4.5 as the learning algorithm. The experimental results show that both proposed methods can achieve better performance on TP-Rate, F-Measure, G-Mean and AUC values. Moreover, the results show that our new proposals’ perform is more effective compared with SMOTE and some other approaches.

Keywords

Imbalanced dataset over-sampling SMOTE sparse-SMOTE SROT

1. Introduction

In real life, samples of datasets are usually imbalanced, meaning that some classes have fewer samples than others. Most of existing data is imbalanced, while specific data satisfying balance requirements is difficult to acquire, and is, therefore, very scarce. The Imbalance Ration (IR), defined by the number of the majority class divided by the number of the minority class, expresses to which extent a dataset is imbalanced: a dataset with IR equal to 1 is perfectly balanced, the higher the IR, the more imbalanced the dataset [1]. Imbalanced datasets is very common in network intrusion, medical diagnosis, text classification, and other practical applications [2]. Many kinds of situations can be converted to binary classification problems, even though most of datasets have multi-class attributes [4], so this work focuses on binary classification problems (the minority versus the majority class) with examples randomly and uniformly distributed in the two-dimensional real-value space.

At present, effective solutions of this class imbalanced problem can take three types of strategies: data pre-processing, algorithm and prediction post-processing [5]. Among the three strategies, the algorithm based on data side is more popular due to its independence and adaptability. The SMOTE algorithm, as one of the most effective over-sampling methods processing imbalanced problem, has a powerful influence. SMOTE algorithm generates synthetic minority examples to over-sample the minority class. For every minority example, its k nearest neighbors of the same class are calculated, then some examples are randomly selected from them according to the over-sampling rate [6]. After that, new synthetic examples are generated along the line between the minority example and its selected nearest neighbors. But inappropriate k values lead to the incorporation of the minority of samples into the majority of samples, resulting in noise that hinders the classification [7, 8]. Additionally, SMOTE algorithm is only interpolated between minority class samples and does not fundamentally change the sparsity of the sample distribution. The result of the interpolation is that the minority class sample of the dense place is still relatively dense, the minority class sample of the sparse place is still relatively sparse [9, 10]. In addition, SMOTE does not take the isolated points and noise into account. In this way, in the sparse region of the sample, the classification algorithm is not easy to identify, prone to misclassification. Thus, the classification effect of SMOTE on some sparse samples is not obvious [11].

Accordingly, we propose two improved algorithms to attempt to solve imbalance problem: the Sparse-SMOTE and SROT (Sparse Representation-Based Over-Sampling Technique) in this paper. Our research is inspired by the widely used SMOTE algorithm, as well as the sparse representation theory. A significant research task is to use sparse representation instead of k nearest neighbors based on the framework of Compressive Sensing (CS). The methods based on sparse representation extract samples from all datasets and obtains the correlation coefficients. Then, the correlation coefficient is used to classify the datasets. Theoretically, the sparse representation method is not limited by dataset and it has been proved to solve a similar but constrained problem well [12, 13]. First, we use minority class samples to construct a sparse dictionary for obtaining the sparse solution of current points by solving $L_{1}$ -norm minimization.

The Sparse-SMOTE algorithm uses the nonzero sparse solution instead of the k-nearest neighbor algorithm used in the SMOTE algorithm, but SROT modifies the sparse solution using some disturbance, and obtains new samples using the sparse dictionary. Experimental results show that our algorithms can effectively improve the classifier to distinguish imbalanced datasets.

The structure of this paper is organized as follows. Section 2 briefly describes related works for handling the class imbalanced problem. Section 3 describes our improved over-sampling methods on resolving the imbalanced problem. Section 4 presents the experiments and result. Section 5 draws the conclusion and future work.

2. Related works

The strategy based on data pre-processing is mainly to solve the imbalance problem by changing the size of such imbalance training datasets under the four criteria proposed by Breiman [14], namely, to resample the datasets to enforce the majority and minority classes back to balance. Resampling techniques include under-sampling and over-sampling techniques. The Random under-sampling technique selects a subset of the majority class in a certain proportion, and removes it from the original dataset to reduce the majority class in the original data. The Random over-sampling technique copies minority class samples into the minority class in order to increase the number of the minority class. With the further research, there has been several important approaches to the problem of imbalanced domains and different results reported when using over-sampling strategies with decision trees to address the problem of imbalanced domains [15]. For instance, in [16] over-sampling is reported to be ineffective, while in other studies (e.g. [17]) over-sampling is reported to have advantages when considering under-sampling strategies. Thus, random under-sampling, redux and introduction of Gaussian Noise are among the strategies that have been discussed. The SMOTE (Synthetic Minority Over-Sampling Technique) is the most representative over- sampling technique, proposed by Chawla et al. in 2002 [6]. Moreover, although SMOTE is a strategy developed for generating synthetic examples of the minority class, this strategy was combined with random under-sampling in the paper where it was proposed. Besides, the SMOTE contains many other modified algorithms [11], such as the Borderline-SMOTE [18], SMOTE-TL [19], SMOTE-D [20], SMOTE-IPF [21], etc. The Random over-sampling technique can improve the ability of classifiers because it can decrease the imbalance degree of the sample space. However, the disadvantages of Random sampling technique cannot be ignored for simply copying the original data as a large number of repeated samples, which makes the classifier decision area too small. This operation eventually leads to the overfitting phenomenon, thus seriously reduces the generalization performance of the algorithm.

The strategy based on the algorithm side improves existing machine learning algorithms, and attaches more importance on the minority class. It mainly focuses on improving the classification algorithms for minority of concern, including cost-sensitive learning [22], ensemble learning [23], single-class learning [24], etc. In the imbalance classification problem, the cost of different categories of misjudging is not equal. If the minority class were handled as the majority class by mistake, it would incur more serious costs. The strategy based on the prediction post-processing considers two main types of solutions: threshold method [25] and cost-sensitive post-processing [20]. Prediction post-processing approaches use the original dataset and a standard learning algorithm only manipulating the predictions of the models according to the user preferences and the imbalance of the data. As advantages, it is not necessary to be aware of the user preferences biases at learning time. But the models cannot reflect the user preferences [5].

3. Methodology

As mentioned before, applying a preprocess step in order to balance the class distribution is a positive solution to the imbalance problem. Specifically, we improve over-sampling techniques with sparse representation. In this section, first, the main details of Compressive Sensing theory and Sparse Representation theory are given in Section 2.1. Next, the motivational analysis of using sparse representation to improve the over-sampling technique is described in Section 2.2. Then, two implicated methods are described in depth: the Sparse-SMOTE algorithm in Section 2.3, and SROT algorithm in Section 2.4.

3.1 Compressive sensing and sparse representation

Compressed Sensing (also known as Compressive Sensing, Compressive Sampling, or Sparse Sampling) is a signal processing technique for efficient acquisition and reconstruction of a signal, by finding solutions of underdetermined linear systems, detailed in reference [26]. The Shannon/Nyquist sampling theorem [27] shows that if a function $x(t)$ contains no frequencies higher than B hertz, it is completely determined by providing its values at a series of points spaced $1/(2B)$ seconds apart. This theorem has been used in digital signal processing in past decades. In addition, Compressive Sensing theorem states that as long as the signal is compressible or sparse in a transform domain, then the transformed matrix can be transformed into a low-dimensional space by using an observation matrix that is not related to the transform base. Then by solving an optimization problem, it is possible to reconstruct the original signal from these small numbers of projections with high probability, which proves that such a projection contains enough information to reconstruct the signal. In this framework, the sampling rate is no longer dependent on the bandwidth of the signal, but largely depends on two basic criteria: sparsity of the signal and non-correlation of the sampling system [26]. This theory allows achieving signal compression and sampling simultaneously, and is used in academia and industry frequently. It is highly concerned in the fields of information theory, image processing, earth science, optics, microwave imaging, pattern recognition, etc.

The theory of Compressive Sensing mainly includes three part [26, 28]:

1.
Sparse representation of the signal.
2.
Design of the measurement matrix, in order to reduce the dimension while ensuring the minimum loss of information of the original signal $X$ .
3.
Design of the signal recovery algorithm, using $M$ observations without distortion to restore the original signal (The length of signal is $N$ ).

And the procedure of Compressive Sensing can be summarized as the following steps:

Step 1:
Assuming that the signal $X$ of length $N$ be K-sparse (i.e., contain k nonzero values) on an orthogonal basis $\psi$ .
Step 2:
Find an observation base $A$ that is irrelevant to $\psi$ .
Step 3:
Observe the original signal with the observation base $A$ to obtain the one-dimensional measured value (the length is M) and observation value Y (the number of Y is M).
Step 4:
Using the optimization method to recover X from the high probability of the observed value.

Next, the theory of Compressive Sensing and sparse representation is explained from the mathematical point of view.

Assuming that $x$ is a one-dimensional signal (length is N), and its degree of sparsity is $k$ . Stacking the measurements $y_{i}$ into the $M\times 1$ vector R as $y\in R^{M\times 1}$ , and let $A\in R^{M\times N}$ ( $M\ll N)$ a set of basis vector. Our goal is to use linear simultaneous equations $y=A\cdot x$ to recover x from y. However, the underdetermined system of equations is an ill-conditioned equation as the number of unknowns is larger than the number of equations, so that there is no solution of the underdetermined system of equations.

Figure 1.
Schematic of sparse representation.

However, if we make ${\bm{x}}$ sparse, possibly meaning that $||x||_{0}$ (the $L_{0}$ -norm of x) is as small as possible, then the number of unknowns will be decreased significantly, which makes the signal reconstruction possible. Sparse representation is shown in Fig. 1.

Then, we can obtain the following optimization target as follows:

$\displaystyle(l_{s}^{0}):\hat{x}_{0}=\arg\min||x||_{0},s.t.Ax=y$ (1)

In 2004, Donoho and Elad have proved mathematically that when A satisfies certain conditions, the Eq. (1) has a unique solution, referred in [29]. Still, the $L_{0}$ -minimization problem is a non-convex optimization problem, which is NP-hard, because no feasible solution can be obtained in polynomial time. In 2006, Tao and Donoho proved that the $L_{1}$ -normal form can substitute $L_{0}$ -normal form based on the RIP condition in 2006 [26]. Both forms have the same sparse solution, but the framework of Compressive Sensing (CS) becomes a convex optimization problem with solutions obtained in polynomial time on the $L_{1}$ -normal form. Its optimization target is as follows.

$\displaystyle(l_{s}^{1}):\hat{x}_{1}=\arg\min||x||_{1},s.t.Ax=y$ (2)

Using this, the framework of Compressive Sensing (CS) has been formed originally. In reality, we usually use Eq. (3) instead of Eq. (2) to take noise into account.

$\displaystyle(l_{s}^{1}):\hat{x}_{1}=\arg\min||x||_{1},s.t.||Ax-y||_{2}\leqslant\varepsilon$ (3)

Where A represents the sparse dictionary, x is called the sparse solution, and $\varepsilon$ is a parameter controlling the sparsity penalty and representation fidelity.

The goal of the reconstruction algorithm is to find the solution of x, where the core of the whole problem is the sparse representation of y.
3.2 Motivational analysis of using sparse representation

The real signal that exists in nature is generally not sparse, but rather sparsely in a transformation domain, which is a compressible signal. Or that theoretically any signal is compressive, as long as the corresponding sparse space can be found. The sparsity or compressibility of the signal is an important prerequisite and theoretical basis for compression perception [31]. The optimization model of sparse representation is designed from the perspective of signal reconstruction, but now sparse representation has made a great performance in the pattern recognition and artificial intelligence field [32]. For example, the Sparse-based Representation Classification algorithm (SRC) [33, 34] has achieved a great success in face recognition, image denoising and other aspects.

Thus, inspired by Compressive Sensing and SRC, this paper considers processing the imbalanced problem by sparse representation. When we need to synthesize the new samples of minority class, we need to solve the sparse solution in order to find the corresponding samples of minority class, and then produce a more evenly distributed samples of minority class by interpolation or increasing Gaussian noise.

3.3 Sparse-SMOTE

In this subsection, we improved SMOTE algorithm by using sparse representation instead of k nearest neighbors, and the technique is called Sparse-SMOTE.

In our method, the construction of a sparse dictionary is an important work. At present, there has been two types of construction method: human construction and training learning. The former contains the isotropic Gabor dictionary [35], and anisotropic Refinement-Gaussian dictionary [36] etc. The latter contains the dictionary learning algorithm, K-SVD [37]. We use training samples directly to construct the sparse dictionary in this paper.

First, a training set is provided, and then all of minority class samples are detached from that training set. The minority class is $S_{\min}$ , $S_{\min}\in R^{m\times n}$ . The number of samples is $m$ , and the dimension of samples is $n$ . For each current sample point $x_{i}$ , $x_{i}\in S_{\min}$ , the method uses the rest of minority class samples except for $x_{i}$ to construct the sparse dictionary $D,D\in R^{n\times(m-1)}$ .

Next, we normalizes every sample, and calculate their $L_{2}$ -norm as follows.

$\displaystyle{y}^{\prime}_{i,j}=\frac{y_{i,j}}{\sqrt{\sum\nolimits_{j}{y_{i,j}% ^{2}}}},i=1,2,\ldots,m-1,j=1,2,\ldots,n$ (4)

Where ${y}^{\prime}_{i}$ is a sample point of the sparse dictionary $D$ .

After obtaining $D$ , we can obtain sparse result $w$ by finding the solution of $L_{1}$ -minimization problem. The optimization target is as follows.

$\displaystyle(l_{s}^{1}):\hat{w}_{1}=\arg\min||w||_{1},s.t.||Dw-x||_{2}\leqslant\varepsilon$ (5)

We calculate the solution of Eq. (5) by using the Homotopy algorithm, which is the fastest and best algorithm [38]. The Homotopy algorithm considers the following basic problem for lower noise.

$\displaystyle\mathop{\min}\limits_{x,\lambda}\frac{1}{2}||x-Dw||_{2}^{2}+% \lambda||w||_{1}$ (6)

Where $\lambda$ is the Lagrange multiplier. The Homotopy implementation is referred in [38].

After obtaining the sparse solution, we randomly select the sample points represented by the nonzero terms in $w$ , and interpolate with $x_{i}$ to synthesize the new samples.

For example, we obtain the subscript of nonzero sparse solution of $x_{i}$ , and save the indices in the nnarry. Next, we randomly select an element of nnarry, and it represents the subscript of a data point $y_{k}$ of the samples of minority class except for $x_{i}$ . Then we calculate the residual between $y_{k}$ and $x_{i}$ as Eq. (7)

$\displaystyle\textit{dif}_{l}=y_{k,l}-x_{i,l},l=1,2,\ldots,\textit{num}_{% \textit{attr}}.$ (7)

Where $\textit{dif}_{l}$ is a residual vector, referring to the difference between each attribute of $y_{k}$ and $x_{i}$ . $\textit{num}_{\textit{attr}}$ . is the number of attributes.

Then we can synthesize a new sample as Eq. (8).

$\displaystyle\textit{Syn\_Samp}_{k,l}=x_{i,l}+g\ast\textit{dif}_{l},l=1,2,% \ldots,\textit{num}_{\textit{attr}}.$ (8)

Where $g$ is a random number, $g\in[0,1]$ , $\textit{Syn\_Samp}_{k,l}$ represents a new sample synthesized by $y_{k}$ and $x_{i}$ .

Figure 2.

(a) Original data distribution. (b) Current point $x_{i}$ and its 5-nearest-neighbors data points. (c) Synthetic result obtained by interpolating point 2 to current point. (d) Final synthetic result of SMOTE.

Figure 3.

(a) Original data distribution. (b) Current point $x_{i}$ and the two nonzero point of sparse solution. (c) Synthetic result obtained by interpolating point 1 to current point. (d) Final synthetic result of Sparse-SMOTE.

The Sparse-SMOTE considers interpolation of the sparse solution, because the sparse solution may be not located around the current point. So, the synthetic samples obtained by interpolation are distributed more uniformly. At the same time, the sparse solution keeps minority class information and improves the performance of the algorithm. The overall procedure of the Sparse-SMOTE is described in Algorithm Sparse-SMOTE. In Line 11 to Line 15, Sparse-SMOTE obtains the subscript of nonzero sparse solution of Point_i instead of computing the k nearest neighbors for Point_i. In order to do that, we design a function named Get_Sparse_Index to calculate the subscript in Line 29 to Line 31. Finally, in Line 18 to Line 27, Sparse-SMOTE generates the synthetic samples by interpolation. Figure 3 shows the composition of the Sparse-SMOTE. (a) shows the original data distribution with the ratio of minority class to the majority class equals to 20:400. (b) marks the current point $x_{i}$ , and two nonzero points of the sparse solution. (c) shows synthetic samples after interpolation of point 1 to $x_{i}$ . The synthetic point cannot be at the line formed by the two dots, because our algorithm is used in each dimension. The final composition result is shown in Fig. 3d. Compared with Fig. 2 result of SMOTE, the sample points (red star in figure) synthesized by Sparse-SMOTE is more uniform on distribution. From the red circle of the Fig. 2, we can find that the red star points are generally distributed around the sample points of minority, which does not does not fundamentally change the sparsity of the sample distribution. But from the red circle of the Fig. 3, the red star points are randomly distributed in the samples, which can change fundamentally change the sparsity of the sample distribution.

3.4 SROT

As described before, the Sparse-SMOTE generates the synthetic samples by interpolating the sparse solution, but in this section, we attempt to use sparse dictionary directly to generate new samples. The method is called SROT.

First, we build the sparse dictionary $D$ by the method similarly to the Sparse-SMOTE, and obtain the sparse solution $w$ of $x_{i}$ . Then, the following formula is workable:

$\displaystyle x_{i}=Dw+\varepsilon$ (9)

where $\varepsilon$ represents noise, but $\varepsilon$ can be ignored due to little difference between the samples of $D$ and $x_{i}$ . Next, we modify $w$ by adding Gaussian noise to the nonzero term of $w$ . The process is described as follows.

$\displaystyle w_{i}=\textit{sgn}(w_{i})\ast|w_{i}|+N(0,\sigma)),i=1,2,\ldots,k$ (10)

where $k$ is the number of nonzero terms in $w$ , $N(0,\sigma)$ is used to produce Gaussian random numbers with mean 0 and standard deviation $\sigma$ . The value of $\sigma$ is related to the value of $w_{i}$ . The perturbation is stronger, the bigger $w_{i}$ will be. Therefore, the following formula is assumed: $\sigma=\textit{abs}(\beta\ast w_{i})$ , where $\beta$ is a tunable parameter. If the $\sigma$ is too small, we obtain meaningless perturbation. So, the value of $\sigma$ should be 1 at least.

In addition, we can get better results if we modify a part of the nonzero term in $w$ rather than the whole nonzero term. $\alpha$ is used to represent this ratio of part to the whole. Finally, we use some samples of the minority class to create the sparse dictionary and obtain the sparse representation of $x_{i}$ , considering that the time complexity of the SROT increases when expanding the sparse dictionary. The samples ratio is controlled by $\gamma$ .

Finally, SROT generates the synthetic samples by final calculated sparse dictionary. The overall procedure of the SROT is described in Algorithm SROT. By experiments, when $\beta\in\left[{0.1,0.8}\right]$ , $\alpha\in\left[{0.6,1.0}\right]$ and $\gamma\in\left[{0.1,1.0}\right]$ , SROT can get better experimental results. In Line 9 to Line 15, SROT generates the synthetic samples by Compressed Sensing. Especially, in Line 13, we use our designed function named Get_Sparse to realize the synthesis strategy. The function is detailed in Line 16 to Line 34.

Figure 4.

(a) Original data distribution. (b) Current point $x_{i}$ , and nonzero point of sparse solution. (c) Final synthetic result of SROT.

The data distribution is the same as before shown in Fig. 4a. SROT calculates the sparse solution of every sample point of minority class samples, and adds random Gaussian noise to sparse solutions in order to disturb sparse solutions. Then, new sample data points are synthesized by the generated sparse dictionary. New synthetic sample points are marked using pentagrams in (b). The final synthetic result is shown in (c).

Compared with the result of the SMOTE and the result of Sparse-SMOTE, we can see that the sample points generated by the SROT are distributed more randomly, even though a few sample points are out of the distribution area of the original samples, being outlier point, injecting some noise.

On the one hand, of some datasets, SROT generates some noise points, resulting that some minority samples can be overlapped with the boundary of the majority due to noise and the distribution randomness, which has bad effects on the performance of total classification, But on the other hand, of some datasets, some parts of the datasets are relatively sparse due to the lack of minority class samples. SMOTE can reduce the problem to some degree but not solve this sparse problem, while SROT can make distribution of samples more uniform. Certainly, a more uniform data distribution does not always improve the final performance of classification, which depends on the importance of the regional sparsity relative to overall distribution.

4. Result and analysis

4.1 Experimental design

In this section, we briefly describe the experimental datasets and statistical tests used alongside the experimental study. We use C4.5 [39] as the learning algorithm for the experimental study, which has been identified as one of the 10 top algorithms in Data Mining [40] and has been widely used in imbalance problems [41]. The experiment uses TP-Rate (True Positive Rate) [42], F-Measure [43], G-Mean (Geometric Mean) [44] and AUC (Area Under the ROC curve) [45] as the evaluation parameters.

Table 1 shows the description of these data-sets. For each one, the number of examples (#Ex.), attributes (#Attr.), name of each class (minority and majority), class distribution and the IR.

Table 1
Description of the data-sets used in experiment

Data set	#Ex	#Atts	Class(min.,maj.)	%calss(min.,maj.)	IR
Abalone	4177	8	(class 7, remainder)	(9.36, 90.64)	9.7
Balance	625	5	(balanced, remainder)	(7.84, 92.16)	11.8
Ionosphere	351	34	(bad, good)	(35.90, 64.10)	1.8
Letter	20000	16	(A, remainder)	(3.95, 96.05)	24.3
Mf-morph	2000	6	(class 10, remainder)	(10.00, 90.00)	9.0
Mf-zernike	2000	47	(class 10, remainder)	(10.00, 90.00)	9.0
Pima	768	8	(positive, negative)	(34.84, 66.16)	1.9
Satimage	6435	36	(class 4, remainder)	(9.37 90.27)	9.3
Vehicle	846	18	(Opel, remainder)	(25.06, 74.84)	3.0
Wpbc	198	34	(recur, nonrecur)	(23.74, 76.26)	3.2

The experimental data sets in Table 1 contain both real-world data sets and training-testing data sets. The choice of the real-world datasets was based on the work on imbalanced classification with noisy and borderline examples presented in [46, 47]. The attribute type of data sets includes Nominal and Continuous marked by N and C, respectively. We cannot obtain nominal properties by calculations, so that a vectorization method is used. Vectorization refers to using a k-dimensional vector instead of the nominal attribute containing k kinds of values. For example, if any nominal attribute has three values: A, B and C, then we use a vector such as $(1,0,0)$ instead of the three nominal. After vectorization, the original sample dimension $m$ will be increased $k$ -1 to $m+k$ -1.

The problem of multi-class classification can be converted to a binary classification problem to determine the target value. The samples conforming to the target value are classified as minority class while the others are majority class. We can obtain the target value of each dataset and its imbalanced degree shown in Table 1. However, some original UCI datasets are distributed regularly in some parts, so we need to break the concentrated samples, and vectorize nominal attributes before experiments. We also hope to restore the original distribution of the data, because our research is related to the data strategy. Therefore, we round the data to be integer after synthesis.

Finally, for each method, all these measures are estimated by stratified 10-fold cross validation. In order to decrease the randomness in different methods, all values of evaluation measures are the average of five independent 10-fold cross validation experiments.

We must point out that all experiments are conducted in MATLAB based on the Weka machine learning and data mining software. The classifier algorithm adopts the J48 algorithm (C4.5 decision tree) of the Weka [19] following the recommended parameter values in this platform. But the parameters of the SROT are variable and controllable. The parameters setup for the implementation of SROT used in this work has been determined experimentally in order to better fit it to the characteristics of imbalanced datasets with noisy examples. In our experiment, we let $\alpha$ be 0.9, $\beta$ be 0.5 and $\gamma$ be 0.2. The following results are obtained in the same experimental environment.

4.2 Comparison of classification result

Tables 2–5, Figs 5–8 show the TP-rate values, F-Measure values, G-Mean values, and AUC values of the five algorithms on ten datasets. The ratio of synthesis is set to the value which makes the dataset balanced. For instance, for Pima dataset whose imbalance degree is 1.9, its ratio of synthesis is set to 100. It can be relatively balanced after samples composition, when the ratio of the minority class to the majority class is 536/500.

Table 2
Comparison of the TP-rate for five algorithms for test

	None	RO	SMOTE	S-SMOTE	SROT
Abalone	0.000 $\pm$ 0.000	0.490 $\pm$ 0.000	0.451 $\pm$ 0.020	0.600 $\pm$ 0.052	0.529 $\pm$ 0.065
Balance	0.000 $\pm$ 0.000	0.465 $\pm$ 0.000	0.521 $\pm$ 0.132	0.587 $\pm$ 0.024	0.653 $\pm$ 0.105
Ionosphere	0.841 $\pm$ 0.000	0.841 $\pm$ 0.000	0.853 $\pm$ 0.021	0.841 $\pm$ 0.018	0.839 $\pm$ 0.029
Letter	0.938 $\pm$ 0.000	0.952 $\pm$ 0.000	0.978 $\pm$ 0.003	0.989 $\pm$ 0.004	0.984 $\pm$ 0.003
Mf-morph	0.000 $\pm$ 0.000	0.932 $\pm$ 0.000	0.964 $\pm$ 0.005	0.966 $\pm$ 0.003	0.969 $\pm$ 0.000
Mf-zernike	0.068 $\pm$ 0.000	0.565 $\pm$ 0.000	0.665 $\pm$ 0.010	0.655 $\pm$ 0.060	0.724 $\pm$ 0.029
Pima	0.614 $\pm$ 0.000	0.687 $\pm$ 0.000	0.767 $\pm$ 0.023	0.805 $\pm$ 0.026	0.716 $\pm$ 0.020
Satimage	0.547 $\pm$ 0.000	0.597 $\pm$ 0.000	0.705 $\pm$ 0.017	0.753 $\pm$ 0.015	0.665 $\pm$ 0.008
Vehicle	0.480 $\pm$ 0.000	0.622 $\pm$ 0.000	0.686 $\pm$ 0.027	0.695 $\pm$ 0.058	0.584 $\pm$ 0.041
Wpbc	0.447 $\pm$ 0.000	0.361 $\pm$ 0.000	0.595 $\pm$ 0.052	0.589 $\pm$ 0.053	0.511 $\pm$ 0.028

Table 3

Comparison of the F-Measure for five algorithms for test

	None	RO	SMOTE	S-SMOTE	SROT
Abalone	0.000 $\pm$ 0.000	0.342 $\pm$ 0.000	0.319 $\pm$ 0.025	0.343 $\pm$ 0.025	0.363 $\pm$ 0.034
Balance	0.000 $\pm$ 0.000	0.121 $\pm$ 0.000	0.159 $\pm$ 0.024	0.184 $\pm$ 0.014	0.332 $\pm$ 0.029
Ionosphere	0.858 $\pm$ 0.000	0.857 $\pm$ 0.000	0.836 $\pm$ 0.029	0.843 $\pm$ 0.022	0.854 $\pm$ 0.021
Letter	0.949 $\pm$ 0.000	0.940 $\pm$ 0.000	0.925 $\pm$ 0.002	0.807 $\pm$ 0.007	0.886 $\pm$ 0.008
Mf-morph	0.000 $\pm$ 0.000	0.605 $\pm$ 0.000	0.615 $\pm$ 0.004	0.618 $\pm$ 0.002	0.623 $\pm$ 0.001
Mf-zernike	0.051 $\pm$ 0.000	0.415 $\pm$ 0.000	0.450 $\pm$ 0.002	0.423 $\pm$ 0.030	0.478 $\pm$ 0.010
Pima	0.614 $\pm$ 0.000	0.627 $\pm$ 0.000	0.644 $\pm$ 0.010	0.655 $\pm$ 0.026	0.601 $\pm$ 0.010
Satimage	0.551 $\pm$ 0.000	0.558 $\pm$ 0.000	0.568 $\pm$ 0.013	0.566 $\pm$ 0.007	0.461 $\pm$ 0.005
Vehicle	0.483 $\pm$ 0.000	0.552 $\pm$ 0.000	0.575 $\pm$ 0.017	0.564 $\pm$ 0.030	0.545 $\pm$ 0.024
Wpbc	0.434 $\pm$ 0.000	0.319 $\pm$ 0.000	0.469 $\pm$ 0.014	0.458 $\pm$ 0.041	0.429 $\pm$ 0.016

Figure 5.

TP-rate value comparison for five algorithms on ten datasets.

Table 4

Comparison of the G-Mean for five algorithms for test

	None	RO	SMOTE	S-SMOTE	SROT
Abalone	0.000 $\pm$ 0.000	0.647 $\pm$ 0.000	0.620 $\pm$ 0.019	0.693 $\pm$ 0.031	0.662 $\pm$ 0.050
Balance	0.000 $\pm$ 0.000	0.316 $\pm$ 0.000	0.500 $\pm$ 0.107	0.586 $\pm$ 0.017	0.676 $\pm$ 0.048
Ionosphere	0.884 $\pm$ 0.000	0.885 $\pm$ 0.000	0.877 $\pm$ 0.021	0.878 $\pm$ 0.014	0.885 $\pm$ 0.011
Letter	0.967 $\pm$ 0.000	0.974 $\pm$ 0.000	0.986 $\pm$ 0.002	0.986 $\pm$ 0.002	0.988 $\pm$ 0.002
Mf-morph	0.000 $\pm$ 0.000	0.904 $\pm$ 0.000	0.919 $\pm$ 0.003	0.920 $\pm$ 0.002	0.923 $\pm$ 0.000
Mf-zernike	0.107 $\pm$ 0.000	0.701 $\pm$ 0.000	0.755 $\pm$ 0.004	0.742 $\pm$ 0.032	0.788 $\pm$ 0.016
Pima	0.695 $\pm$ 0.000	0.707 $\pm$ 0.000	0.714 $\pm$ 0.011	0.723 $\pm$ 0.026	0.675 $\pm$ 0.011
Satimage	0.721 $\pm$ 0.000	0.749 $\pm$ 0.000	0.805 $\pm$ 0.010	0.825 $\pm$ 0.008	0.761 $\pm$ 0.004
Vehicle	0.630 $\pm$ 0.000	0.702 $\pm$ 0.000	0.727 $\pm$ 0.016	0.720 $\pm$ 0.028	0.687 $\pm$ 0.021
Wpbc	0.602 $\pm$ 0.000	0.438 $\pm$ 0.000	0.637 $\pm$ 0.029	0.639 $\pm$ 0.040	0.593 $\pm$ 0.017

Table 5

Comparison of the AUC for five algorithms for test

	None	RO	SMOTE	S-SMOTE	SROT
Abalone	0.500 $\pm$ 0.000	0.647 $\pm$ 0.000	0.726 $\pm$ 0.020	0.756 $\pm$ 0.007	0.817 $\pm$ 0.009
Balance	0.500 $\pm$ 0.000	0.508 $\pm$ 0.000	0.521 $\pm$ 0.051	0.574 $\pm$ 0.019	0.715 $\pm$ 0.032
Ionosphere	0.883 $\pm$ 0.000	0.891 $\pm$ 0.000	0.881 $\pm$ 0.037	0.889 $\pm$ 0.017	0.901 $\pm$ 0.011
Letter	0.987 $\pm$ 0.000	0.974 $\pm$ 0.000	0.985 $\pm$ 0.002	0.986 $\pm$ 0.003	0.990 $\pm$ 0.002
Mf-morph	0.500 $\pm$ 0.000	0.903 $\pm$ 0.000	0.923 $\pm$ 0.003	0.925 $\pm$ 0.006	0.926 $\pm$ 0.001
Mf-zernike	0.107 $\pm$ 0.000	0.717 $\pm$ 0.000	0.776 $\pm$ 0.006	0.754 $\pm$ 0.014	0.796 $\pm$ 0.020
Pima	0.758 $\pm$ 0.000	0.714 $\pm$ 0.000	0.735 $\pm$ 0.017	0.760 $\pm$ 0.034	0704 $\pm$ 0.014
Satimage	0.761 $\pm$ 0.000	0.771 $\pm$ 0.000	0.806 $\pm$ 0.009	0.819 $\pm$ 0.016	0.767 $\pm$ 0.006
Vehicle	0.729 $\pm$ 0.000	0.710 $\pm$ 0.000	0.753 $\pm$ 0.029	0.725 $\pm$ 0.035	0.755 $\pm$ 0.026
Wpbc	0.623 $\pm$ 0.000	0.590 $\pm$ 0.000	0.667 $\pm$ 0.030	0.680 $\pm$ 0.071	0.637 $\pm$ 0.048

Figure 6.

F-Measure value comparison for five algorithms on ten datasets.

Figure 7.

G-Mean value comparison for five algorithms on ten datasets.

It can be seen that four kinds of sampling techniques can enhance classification performance on imbalanced datasets with the classifier compared to the original data processing. But the Random Over-Sampling leads to classifier over-fitting, resulting in limited performance improvement, which is proved in the experiments. On the other hand, the SMOTE can avoid over-fitting [10], and improve the performance of the imbalance problem using the over-sampling technique and interpolation theory. But performance on datasets using the SMOTE is not always better compared to the Random Over-Sampling. For example, Abalone and Ionosphere datasets show poor performance when using the SMOTE. The experiments are proved that there is no absolutely best algorithm to deal with all imbalanced datasets.

Figure 8.

AUC value comparison for five algorithms on ten datasets.

Of the two new algorithms proposed in this paper, the performance of the Sparse-SMOTE is more stable. The Sparse-SMOTE significantly outperforms the SMOTE on seven datasets of the 10 test data samples, especially in Abalone, Balance, Pima and Satimage. Even for the other three datasets, its performance is also very close to the SMOTE. Taking randomness into consideration, we conclude that the Sparse-SMOTE is superior to the SMOTE in total.

However, the performance of the SROT is not so stable. The SROT outperforms the SMOTE on almost half of the datasets, including in Abalone, Balance and Mf-zernike. Especially on Balance dataset, the SROT significantly outperforms both SMOTE and Sparse-SMOTE. However, its performance is less effective for Pima, Satimage, Vehicle and Wphc. In spite of this, the SROT algorithm can improve the classification results of the imbalance problem.

According to the experimental results, we can obtain the following conclusion: the randomness used in the SROT can distribute samples more uniformly, enlarge the decision-making level, and improve the classification performance, which benefits sparse datasets which are short of typical data. However, excessive randomness leads to the minority class being overlapped with the majority class, generating noise. The SROT algorithm is more suitable for those classifiers that fail completely due to imbalanced datasets, such as Abalone, Balance etc. There is one interesting phenomenon found in our experiments. Not all of the imbalance causes serious damage to the performance of the classifier, for example, Letter dataset does not. Its degree of imbalance is 24.3, the most imbalanced of the 10 datasets, but classifiers show good recognition performance on it. This kind of phenomenon also exists for Ionosphere dataset, where the original data without any procession shows even more outstanding results compared to other sampling algorithms. So, the classifier may not get any benefits, even interfered by artificial data, if we use the over-sampling technique to process such datasets. These experimental results and relevant analysis teach us to pay more attention to the complexity of the imbalance problem, and show limitations of the over-sampling technique.

Table 6

Ranking obtained through Friedmanâ€™s test

Methods	Ranking
SMOTE-FRST	3.166
SROT	3.278
SMOTE-RSB	3.564
Sparse-SMOTE	3.913
SMOTE-TL	3.997
SMOTE	4.426
SMOTE-BL1	5.830
SMOTE-BL2	6.090

Figure 9.

Average AUC values over all datasets in Table 1 for each method.

In order to verify the effectiveness of our algorithms, we compare our methods with other methods for over sampling. In Fig. 9, the average AUC values over all 10 datasets of Table 1 are given for each method. It can be seen that Sparse-SMOTE and SROT improve the SMOTE, SMOTE-BL1 and SMOTE-BL2 [18] quite well. On the one hand, Sparse-SMOTE is approximately same as SMOTE-TL, but is not better than SMOTE-FRST [1] and SMOTE-RSB [48]. On the other hand, SROT improves SMOTE-TL and SMOTE-RSB, but do not improve SMOTE-FRST. Nevertheless, Fig. 9 show that if we use sparse representation for over-sampling, we still obtain good results.

In order to compare the results, we perform a statistical analysis conducted by non-parametric multiple comparison procedures [49] to find better preprocessing algorithms. We use Friedman’s procedure to compute the set of ranks that represent the effectiveness associated with each algorithm. In Table 6 we can observe that our proposals have great ranking.

4.3 Settings of synthetic ratio

We consider the synthetic ratio to make the values of the datasets balanced roughly in our experiments. But this value may not be the best ratio [5, 43]. This problem is discussed in this section. For instance, Fig. 10 to Fig. 13 show the changes in results of four synthesis techniques when the synthetic ratio changes of Balance dataset. In the figures, we use dotted lines to mark the former experimental ratio of 1100, which means that each sample point can compose 11 new samples.

Figure 10.

Change of TP-Rate value with the synthetic ratio.

Figure 11.

Change of F-Measure value with the synthetic ratio.

From Fig. 10 to Fig. 13, it is clear that the four algorithms perform better outside of the dotted line, which shows that the synthetic ratio to achieve balance is not the best value. If the minority class is larger than the majority class, the classifiers will identify the minority class better. But at the same time, if the minority class is identified by the classifiers terribly, the identification accuracy for the majority class will be damaged. Therefore, we need to find a balance between the two so as to achieve the best recognition efficiency.

Figure 12.

Change of G-Mean value with the synthetic ratio.

Figure 13.

Change of AUC value with the synthetic ratio.

It can be seen that the four indexes of 4 kinds of sampling algorithms increase with the synthetic ratio at first, but then decrease. The “turning point” of each algorithm is not the same. In the experiment of Balance dataset, SROT can maintain a good recognition performance, while the performance of the Sparse-SMOTE and SMOTE becomes unsatisfactory when the synthetic ratio is larger than about 1500 to 1800, in spite of TP-Rate increase. It can be concluded that the two algorithms achieve precision for the minority class in exchange for the recognition performance for the majority class. The “turning point” of the Random Over-sampling algorithm is almost 1500 as well, but when the ratio is larger than this value, the performance of this algorithm is stable.

To conclude, from the figures, the algorithms mentioned above can achieve close to best values of the TP-rate, F-Measure, G-Mean and AUC when the synthetic ratio of Balance dataset is set at 1500. The ratio of the minority class to the majority class becomes 784/576 (imbalance degree equals to 0.735). Certainly, this ratio changes as the dataset changes, but in our experiments, it is proved that the better classification performance is reached by increasing the proportion of the minority class appropriately, when the imbalance degree of the dataset is high.

5. Conclusion and future work

The imbalance problem of datasets is one of the hottest areas of research in machine learning, pattern recognition and other fields. Traditional classification algorithms that do not consider the distribution of the sample tend to hard identify the minority class, leading to performance fail sharply. Accordingly, we put forward two kinds of sampling techniques based on sparse representation: the Sparse-SMOTE and SROT. According to the performance of ten UCI datasets in Table 1, the Sparse-SMOTE is the most stable method, which can obtain superior classification results regardless of how the dataset changes. This fact is because the k-nearest-neighbors SMOTE algorithm cannot reflect the real distribution of data, meaning that new synthetic sample points are also located around the original data samples, which leads to sparse area being still sparse and so being a dense area. However, the Sparse-SMOTE can overcome this defect, and it can make synthetic sample points uniform in order to enlarge the decision area. At the same time, the algorithm is based on the sparse representation of the overall sample, which is easy to reserve the information of the original dataset avoiding the blindness problem. The SROT can improve the uniformity of the datasets in spite of the consequences that are likely to cause data overlapping and noise enhancing. The SROT can significantly improve the performance of the classifiers and effectively solve the non-equilibrium problem compared to the original dataset and random sampling techniques. Thus, both of our proposed approaches can be widely used in the fields of information theory, earth science, life sciences, etc.

In the future we want to take this work a step further by applying more and further data cleaning techniques on datasets preprocessed by our methods. Moreover, we hope to use other strategies to change the sparse solution for generating synthetic samples and attempt to reduce time complexity.

Footnotes

Acknowledgments

This study was supported by Scientific and Technological Development Scheme of Jilin Province under Grant No. 20180101048JC.

References

Ramentol

Verbiest

and Bello

, SMOTE-FRST: a new resampling method using fuzzy rough set theory//10th International FLINS conference on uncertainty modelling in knowledge engineering and decision making (to appear), 2012.

Huang

Yang

King

et al., Biased Minimax Probability Machine for Medical Diagnosis//AMAI, 2004.

Chen

Lin

and Xiong

, Exploiting probabilistic topic models to improve text categorization under class imbalance, Information Processing & Management 47 (2011), 202–214.

and Garcia

E.A.

, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21 (2009), 1263–1284.

Branco

Torgo

and Ribeiro

R.P.

, A survey of predictive modeling on imbalanced domains, ACM Computing Surveys (CSUR) 49(2) (2016), 31.

Chawla

N.V.

Bowyer

K.W.

Hall

L.O.

et al., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research (2002), 321–357.

Martınez-Trinidad

J.F.

, SMOTE-D a Deterministic Version of SMOTE//Pattern Recognition: 8th Mexican Conference, MCPR 2016, Guanajuato, Mexico, June 22–25, 2016. Proceedings. Springer, 9703 (2016), 177.

Yun

and Lee

J.S.

, Automatic Determination of Neighborhood Size in SMOTE//Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication. ACM, 2016, 100.

Bunkhumpornpat

Sinapiromsaran

and Lursinsap

, Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, Advances in Knowledge Discovery and Data Mining (2009), 475–482.

10.

Dong

and Wang

, A new over-sampling approach: random-SMOTE for learning from imbalanced data sets//International Conference on Knowledge Science, Engineering and Management. Springer Berlin Heidelberg, 2011: 343–352.

11.

Wang

B.X.

and Japkowicz

, Imbalanced data set learning with synthetic samples//Proc. IRIS Machine Learning Workshop. 2004, p. 19.

12.

Aharon

Elad

and Bruckstein

, K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation, IEEE Transactions on Signal Processing 54(11) (2006), 4311.

13.

Wright

Yang

A.Y.

Ganesh

et al., Robust Face Recognition via Sparse Representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2) (2009), 210–227.

14.

Breiman

Friedman

and Stone

C.J.

, Classification and regression trees, CRC press, 1984.

15.

Di Martino

Hernández

and Fiori

, A new framework for optimal classifier design, Pattern Recognition 46(8) (2013), 2249–2255.

16.

Drummond

and Holte

R.C.

, C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. InWorkshop on Learning from Imbalanced Datasets II, volume 11. Citeseer, 2003.

17.

Batista

G.E.A.P.A.

Prati

R.C.

and Monard

M.C.

, Astudy of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter 6(6) (2004), 20–29.

18.

Han

Wang

W.Y.

and Mao

B.H.

, Borderline-SMOTE: a new over-sampling method in imbalanced datasets learning//Advances in intelligent computing. Springer Berlin Heidelberg, 2005, pp. 878–887.

19.

Batista

G.E.

Prati

R.C.

and Monard

M.C.

, A study of the behavior of several methods for balancing machine learning training data, ACM Sigkdd Explorations Newsletter 6(1) (2004), 20–29.

20.

Martınez-Trinidad

J.F.

, SMOTE-D a Deterministic Version of SMOTE//Pattern Recognition: 8th Mexican Conference, MCPR 2016, Guanajuato, Mexico, June 22–25, 2016. Proceedings. Springer, 9703 (2016), 177.

21.

Sáez

J.A.

Luengo

and Stefanowski

, SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences 291 (2015), 184–203.

22.

Elkan

, The foundations of cost-sensitive learning//International joint conference on artificial intelligence, Lawrence Erlbaum Associates Ltd 17 (2001), 973–978.

23.

Dietterich

T.G.

, Ensemble methods in machine learning//Multiple classifier systems. Springer Berlin Heidelberg, 2000, pp. 1–15.

24.

Padmaja

T.M.

Krishna

P.R.

and Bapi

R.S.

, Majority filter-based minority prediction (MFMP): An approach for imbalaced datasets//TENCON 2008–2008 IEEE Region 10 Conference. IEEE, 2008, pp. 1–6.

25.

Hall

Frank

and Holmes

, The WEKA data mining software: an update, ACM SIGKDD Explorations Newsletter 11(1) (2009), 10–18.

26.

Donoho

D.L.

, Compressed sensing, IEEE Transactions on Information Theory 52 (2006), 1289–1306.

27.

Jerri

A.J.

, The Shannon sampling theorem – Its various extensions and applications: A tutorial review, Proceedings of the IEEE 65 (1977), 1565–1596.

28.

Compressed sensing: theory and applications. Cambridge University Press, 2012.

29.

Donoho

D.L.

and Elad

, Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ 1 minimization, Proceedings of the National Academy of Sciences 100 (2003), 2197–2202.

30.

Donoho

D.L.

, Compressed sensing, IEEE Transactions on Information Theory 52 (2006), 1289–1306.

31.

Huang

and Aviyente

, Sparse representation for signal classification//NIPS, 19 (2006), 609–616.

32.

Wright

and Mairal

, Sparse representation for computer vision and pattern recognition, Proceedings of the IEEE 98(6) (2010), 1031–1044.

33.

Wright

Yang

A.Y.

and Ganesh

, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2) (2009), 210–227.

34.

Cao

Zhao

and Lai

, Landmark recognition with sparse representation classification and extreme learning machine, Journal of the Franklin Institute 352(10) (2015), 4528–4545.

35.

Bergeaud

and Mallat

, Matching pursuit of images//Time-Frequency and Time-Scale Analysis, Proceedings of the IEEE-SP International Symposium on. IEEE, 1994, pp. 330–333.

36.

Figueras i Ventura

R.M.

Vandergheynst

and Frossard

, Low-rate and flexible image coding with redundant representations, IEEE Transactions on Image Processing 3 (2015), 726–739.

37.

Michal

Elad

and Alfred

, K-SVD: An algorithm for designing over-complete dictionaries for sparse representation, IEEE Transactions on Signal Processing 54 (2006), 4311–4322.

38.

Yang

A.Y.

Sastry

S.S.

Ganesh

and Ma

, Fast

\ell

1-minimization algorithms and an application in robust face recognition: A review//Image Processing (ICIP), 2010 17th IEEE International Conference on. IEEE, 2010: 1849–1852.

39.

Quinlan

J.R.

, C4. 5: programs for machine learning. Morgan Kaufmann, San Mateo, CA, 1992.

40.

Kumar

Quinlan

J.R.

Ghosh

Yang

Motoda

McLachlan

G.J.

Liu

P.S.

Zhou

Z.-H.

Steinbach

Hand

D.J.

and Steinberg

, Top 10 Algorithms in Data Mining. Knowl Inf Syst 14(1) (2008), 1–37.

41.

Batista

G.E.A.P.A.

Prati

R.C.

and Monard

M.C.

, A study of the behaviour of several methods for balancing machine learning training data, SIGKDD Explor 6(1) (2004), 20–29.

42.

Han

Wang

W.Y.

and Mao

B.H.

, Borderline-SMOTE: a new over-sampling method in imbalanced datasets learning//Advances in intelligent computing. Springer Berlin Heidelberg, 2005, pp. 878–887.

43.

Kubat

Holte

and Matwin

, Learning when negative examples abound/Machine Learning: ECML-97, Springer Berlin Heidelberg, 1997, pp. 146–153.

44.

Yen

S.J.

and Lee

Y.S.

, Cluster-based under-sampling approaches for imbalanced data distributions, Expert Systems with Applications 36 (2009), 5718–5727.

45.

Swets

J.A.

, Measuring the accuracy of diagnostic systems, Science 240 (1988), 1285–1293.

46.

Sáez

J.A.

Luengo

and Stefanowski

, SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences 291 (2015), 184–203.

47.

Napierala

Stefanowski

Wilk

, Learning from imbalanced data in presence of noisy and borderline examples, in: Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, vol. 6086, Springer, Berlin/Heidelberg, 2010, pp. 158–167.

48.

Ramentol

Caballero

and Bello

, SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory, Knowledge and Information Systems 33(2) (2012), 245–265.

49.

García

Fernández

and Luengo

, A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability, Soft Computing 13(10) (2009), 959–977.

Improved over-sampling techniques based on sparse representation for imbalance problem

Abstract

Keywords

1. Introduction

2. Related works

3. Methodology

3.1 Compressive sensing and sparse representation

3.3 Sparse-SMOTE

4.1 Experimental design

Table 1 Description of the data-sets used in experiment

Table 2 Comparison of the TP-rate for five algorithms for test

Footnotes

Acknowledgments

References

Table 1
Description of the data-sets used in experiment

Table 2
Comparison of the TP-rate for five algorithms for test