Optimization of Imbalanced and Multidimensional Learning Under Bayes Minimum Risk and Savings Measure

Abstract

The full potential of data analysis is crippled by imbalanced and high-dimensional data, which makes these topics significantly important. Consequently, substantial research efforts have been directed to obtain dimension reduction and resolve data imbalance, especially in the context of fraud detection analysis. This work aims to investigate the effectiveness of hybrid learning methods for alleviating the class imbalance and integrating dimensionality reduction techniques. In this regard, the current study examines different classification combinations to achieve optimal savings and improve classification performance. Against this background, several well-known machine learning models are selected such as logistic regression, random forest, CatBoost (CB), and XGBoost. These models are constructed and optimized based on Bayes minimum risk (BMR) associated with the oversampling method synthetic minority oversampling technique (SMOTE) and different feature selection (FS) techniques, both univariate and multivariate. To investigate the performance of the proposed approach, different possible scenarios are analyzed both with and without balancing, with and without FS, and optimization using BMR. With a major insight about the best method to use, BMR shows a good optimization when used with SMOTE, symmetrical uncertainty for FS, and CB as a boosted classifier, principally in terms of F1 score and savings metrics.

Introduction

Fraudulent activities have been largely increased during the past decade due to the wide use of online resources for financial transactions. Consequently, fraud detection has emerged as an important research area recently and several studies have been proposed to resolve this problem.¹ Fraudulent transactions are categorized into two groups: online fraud and offline fraud. The latter involves using stolen physical cards to obtain financial gain, while the former is performed by stealing digital identities of people such as credit card numbers, user identities (ID) and passwords, and security certificates for online transactions.²

Because of the increased online fraudulent activities, efficient and effective fraud detection systems should be put in place. A good fraud detection system must be capable of accurately recognizing the fraud deal and able to do so in real time. To achieve this goal, several approaches have been presented to identify fraudulent transactions using machine learning and deep learning models.^3,4 However, the classification of fraudulent transactions with highly imbalanced classes remains a challenging task.⁵ In fact, the number of normal transactions (samples of normal class) is much larger than the number of fraudulent transactions for abnormal class. The samples for minority and majority classes are separated when a classification model is applied to the sample for training. Although a small difference in the number of samples for different classes would not affect the model's performance, often large differences make a huge impact in terms of different evaluation metrics.

It is so because machine learning models are biased toward the samples of the major class and show better performance for the majority class than that of the minor class. The class imbalance has been typically handled in the literature as an independent problem, especially in application fields where the number of features is more critical. Besides the data imbalance, the curse of dimensionality has been considered one of the most critical issues in fraud data analysis. Using a high number of features for classification problems may negatively impact the performance of machine learning algorithms, not only in terms of computational efficiency but also in terms of the final predictive accuracy score. Because the generalization ability of the induced models may significantly devalue when the size of the search space is large.⁶

The studies^7,8 investigate the strengths and weaknesses of the different selection approaches in fraud detection. Indeed, hybrid methods that leverage different heuristics at different stages of the feature selection (FS) process have shown better performance than single models.^9,10 For example, support vector machine (SVM)-based correction classifier and SVM-THR (SVM with threshold adjustment) ensemble approaches provide unique and stable features without ignoring the aspect of predictive accuracy.¹¹

For fraud detection, many approaches have been proposed to deal with imbalanced data sets, such as sampling-based balancing methods and cost-sensitive classification.¹² The sampling-based methods adjust the distribution of the samples so that a number of samples for minority and majority classes may have an almost similar distribution. Common sampling-based methods are random undersampling and random oversampling, which are used to decrease or increase the instances of the majority and minority class, respectively. For example, synthetic minority oversampling technique (SMOTE) generates synthetic samples of minority class to equal the sample distribution.¹³ On the contrary, several research efforts have explored the issues of high dimensionality and class imbalance independently. Only few studies have considered both problems simultaneously.^14–16

Keeping in view the fact that several fraud data sets are both high-dimensional and class-imbalanced, this study investigates the effectiveness of learning techniques designed to handle both issues. By extending our previous research,¹⁷ we explore the combination of sampling-based balancing method and cost-sensitive classification with suitable FS techniques, selected to be representatives of different selection approaches both univariate and multivariate. Using two patterns of challenging fraud data sets, we experimentally evaluate the extent to which the resulting learning schemes are advantageous. In this study, we focus on optimizing the classification performance of class-imbalanced learning under high dimensionality. Based on cost-sensitive learning combined with FS and sampling methods, we can reduce the impact of high dimensionality and imbalanced data and establish a good classification. The key points of this study are as follows:

Proposing a strategy that optimizes the classification of the imbalanced data sets under high-dimensionality difficulty. For this purpose, four machine learning models have been utilized including logistic regression (LR), random forest (RF), CatBoost (CB), and XGBoost (XG).

Testing the model performance based on the preprocessing order where sampling comes before FS and vice versa. In this regard, two well-known data sets from Kaggle have been selected that present the imbalanced data with high-dimensionality difficulties.

Comparing the results of different combinations between FS methods and resampling for analysis. Later, integrating Bayes minimum risk (BMR) to optimize the classification performance of the best combination.

The rest of the article is organized as follows. The Related Work section briefly presents the research works related to the current study. The Materials and Methods section introduces the data sets and the methods included in our study, involving the considered FS techniques and their combination with strategies for reducing the class imbalance. The proposed approach is presented in the Proposed Method section. The Results and Discussion section reports the findings of this work by discussing and analyzing the results. The Conclusion section offers the concluding remarks.

Related Work

The importance of studying imbalanced learning that coexisted with high-dimensionality problems encourages the researchers to look for potential solutions by adapting existing methods and developing novel approaches. In Ref.,¹⁸ the researchers select eleven distinct classifiers and test them on 71 data sets to analyze the performance of the selected classifiers. The study concludes that gradient boosting decision trees perform better and faster than SVM and RF. The authors in Ref.¹⁹ perform fraud detection using RF, balanced bagging ensemble, and naive Bayes, and found that bagging learner has the best prediction; yet, RF shows high adaptability for the high-dimension data set. Similarly, Ref.²⁰ proposed a hybrid classifier combining several classifiers and compared its performance with RF and XG using the traditional known measures.

Predominantly, studies compare classifiers' performance without balancing or FS and do not even consider a cost-sensitive approach. However, several works considered the problem as an unsupervised problem and introduced solutions accordingly. In Ref.,²¹ the authors used anomaly detection techniques. One-class classification based on SVM and multivariate control charts applied to real-world data produced high accuracy and a low false-positive rate. In Ref.,²² autoencoders are used with restricted Boltzmann machines to deal with fraud detection problems.

Comparably, recently the problem is considered cost-sensitive, and the focus is placed on time when the cost-sensitivity is applied, as shown in Figure 1. However, in Ref.,²³ the focus was on improving the SVM algorithm to handle the class-imbalanced problem using cost-sensitive learning, which validates that the cost-sensitive SVM has better results almost for all the used data sets, based on the area under the curve (AUC) evaluation. In Ref.,²⁴ a new approach named cost-sensitive LR has been introduced. Model bias is reduced by selecting the appropriate variable discretization and cost-sensitive LR with the best class weights. Combining multiple models shows good performance, as given in Ref.,²⁵ combining the BMR wrapping method with LR, RF, and DT classifiers for the same task with undersampling instead of using SMOTE to balance the data.

FIG. 1.

Different cost-sensitive algorithms are grouped according to their system level of use.

Similarly in Ref.,²⁶ the researchers suggested a cost-sensitive RF-based ensemble learning technique where results show that the algorithm outperforms two existing cost-sensitive implementations of RF. In Ref.,²⁷ the authors altered boosting classifiers to be the cost-sensitive version by including the cost of misclassification during the training stage. Moreover, in terms of evaluation measures, the researchers managed to get more optimized results using only F-score and cost as measures.

In Taha and Malebary²⁸ an optimized light gradient boosting machine (GBM) is used where Bayesian-based hyperparameter optimization is combined to tune the parameter of the light GBM. The model outperformed concerning accuracy and F-measures compared with other models' performance. While the article²⁹ presented the combination of multiple classifiers through stacking ensemble technique for credit card fraud detection. Using the fuzzy-rough nearest neighbor and sequential minimal optimization, the simulation results compared with seven other algorithms proved that the ensemble model can detect credit card fraud with a sufficiently high accuracy of 84.90%. In addition, improving the resampling technique could further enhance the results of imbalanced learning, as is the case of Ref.,³⁰ where the authors introduced the “SMOTE-local outlier factor” technique to identify the noise from produced synthetic minority data. Results show better Gmean and F-measure compared to the standard SMOTE.

At the same time, several studies prefer to combine the feature elimination method with SMOTE to enhance model performance. By exploiting this combination, the study.³¹ proposed a hybrid approach that combined the elimination of the recursive feature to reduce the number of features, and SMOTE for oversampling. The model achieved a high F-measure and AUC performance than traditional machine learning classifiers.

Table 1 provides a comprehensive summary of the discussed related works.

Table 1.

A Comprehensive Summary of the Discussed Research Works

Refs.	Year	Methods	Results	Limitation	Our approach
¹⁸	2017	Compare the results of 11 classifiers applied on 71 data sets.	Decision Tree and Gradient boosting represented better and faster performance.	- Savings measure not included during performance evaluation.	- Boosting classifiers. -Using savings for evaluation.
¹⁹	2018	RF, NB, and Balanced Bagging ensemble.	BBE has the best performance, but RF is the better with a large data size.	- Accuracy as an evaluation measure - Classification cost not considered. - No balancing.	- Example-dependent cost-sensitive. - Using savings for evaluation. - SMOTE for rebalancing
²⁰	2018	Stacking classifier (LR as metaclassifier), RF, and XGB.	Stacking classifier is the most promising classifier, followed by the RF and XGB classifier.	- Using traditional evaluation measures. - Classification cost not considered.	- Using savings for evaluation. - SMOTE for rebalancing
²¹	2018	One-class SVM and T2 control charts	High accuracy and low FP rate.	- Accuracy as an evaluation measure - Classification cost not considered. - No balancing.	- Example-dependent cost-sensitive. - Using savings for evaluation. - SMOTE for rebalancing.
²²	2018	Autoencoders with restricted Boltzmann machine	Better AUC for almost all used data sets.	- AUC as an evaluation measure. - Classification cost not considered. - No balancing.	- Using savings for evaluation. - BMR to include the cost in our method. - SMOTE for rebalancing.
²³	2019	Cost-sensitive SVM	Better results have been achieved almost for all used data sets.	- AUC as an evaluation measure. - No balancing.	- Using savings for evaluation. - SMOTE for rebalancing.
²⁴	2018	Cost-sensitive LR	Better AUC results have been achieved almost for all used data sets.	- AUC as an evaluation measure. - No balancing. - Not using fraud detection data set.	- Using savings for evaluation. - SMOTE for rebalancing. -Boosting algorithms.
²⁶	2019	Cost-sensitive weighted RF.	G-mean, F-measure, and AUC values.	- AUC as an evaluation measure. - No balancing.	- Using savings for evaluation. - SMOTE for rebalancing. - Boosting algorithms.
²⁷	2020	ICSAdaBoost, ICSRealBoost and ICSGentleBoost.	Better F-measure and cost results.	- Not using fraud detection data set. - No balancing.	- Using savings for evaluation. - SMOTE for rebalancing.

AUC, area under the curve; BBE, Balanced Bagging Ensemble; BMR, Bayes minimum risk; FP, number of false positives; LR, logistic regression; NB, naive Bayes; RF, random forest; SMOTE, synthetic minority oversampling technique; SVM, support vector machine; XGB, eXtreme Gradient Boosting.

This article compares the example-dependent, cost-sensitive BMR wrapping method for algorithms LR, RF, CB, and XG, combined with the FS technique (ReliefF, symmetrical uncertainty [SU], gain ratio [GR], chi-squared, and A Weighted Support Vector Machine [SVM-AW]) and SMOTE in two different preprocessing orders. Then we use traditional metrics for evaluation of these algorithms in addition to AUC and savings measures.

Materials and Methods

Data sets used for experiments

For evaluating the effectiveness and performance of the proposed approach, extensive experiments are performed on two imbalanced data sets with high-dimensionality characteristics. Table 2 shows a summary of the selected data sets while it listed the number of instances, features, the number of the majority and the minority instances, and the imbalance ratio of the data set.

Table 2.

Benchmark Data Sets

Data set	Instances	Features	Majority	Minority	Ratio
IEEE-CIS fraud detection	59,054	433	56,9875	20,663	27.52
Credit card fraud detection	28,4807	31	284,315	492	577.87

IEEE-CIS, IEEE Computational Intelligence Society.

To create independence between transactions, we deleted the “ID” of the credit card for each transaction, and so, two transactions cannot be linked to the same credit card. Furthermore, we scaled both attributes, “Time” and “Amount,” to have the same opportunity during the classification as the other features do.

Machine learning models and techniques

Machine learning models have been widely applied to obtain highly accurate predictions.^32,2 Owing to the results reported in the literature, several well-known models are investigated in this study.

Logistic regression

LR algorithm is a statistical method that analyzes the relationship between multiple independent variables and a categorical dependent variable. It is a simple and more efficient method for binary and linear classification problems. This model measures the probability of occurrence of an event by fitting the data to a logistic curve.²⁴ A logistic function is a common “S” shaped or sigmoid curve, where the vertical axis refers to the probability for a given classification and the horizontal axis indicates the value of x. It assumes that the distribution of $y | x$ is the Bernoulli distribution. The formula of LR is as in Equation (1): $f (x) = \frac{L}{1 + e^{- m (x - v_{o})}},$ (1)

where e is the natural algorithm base (also known as Euler number), v_o is the x-value of the sigmoid midpoint, L is the curve's maximum value, and m is the steepness of the curve.

Random forest

The RF classifier is an ensemble model for classification based on the bagging method. RF involves a combination of decision tree classifiers, where each decision tree contributes a single vote to the assignment of the most frequently targeted class for the input vector (w): $Ĉ_{r f}^{B} = m a j o r i t y v o t e [{\{Ĉ_{b} (w)\}}_{1}^{B}],$ (2)

where $Ĉ_{b} (w)$ is the class prediction for the $b_{m}^{t h}$ RF tree.

The final prediction is made by voting between the RF trees based on the majority class predictions. The combination of many DT indicates that RF has some exceptional characteristics, and thus, RF differs significantly from traditional tree-based algorithms.³³ We can also define RF as the simplest case of majority voting and $\hat{p c}$ is the predicted class obtained by majority voting for each tree in the RF, as shown in Equation (3): $\begin{matrix} \hat{p c} = m o d e \{T_{1} (x), T_{2} (x), T_{3} (x) \dots T_{n} (x)\} \\ \hat{p c} = m o d e \{\sum_{i = 1}^{n} T_{i} (x)\} \end{matrix} .$ (3)

Let 0, 0, and 1 be the predictions from RF trees $T 1$ , $T 2$ , and $T 3$ for a given sample, then the final prediction can be made using $\hat{p c} = m o d e {0, 0, 1} = 0 .$ (4)

In this case, the final prediction is 0 because the majority of the trees $(T 1 & T 2)$ predict the class as 0. These characteristics of RF have attracted much interest because RF is highly accurate and robust to noisy data.

XGBoost

XGBoost (XG) is eXtreme Gradient Boosting, a boosting method for classification tasks. XG approaches the process of sequential tree building using a parallelized implementation of DT. The first DT classifier is fitted based on the data set before the following DT is trained based on the errors of the first classifier and added to the first classifier, and so on. This sequential coupling of classifiers helps to reduce errors and boost classification accuracy. It handles sparse data problems and proposes a theoretically justified weighted quantile plan for estimated learning.⁷ XG defines a mean square error (MSE) and minimizes it as in $L o s s = M S E = Σ {(v_{i} - v_{i}^{p})}^{2},$ (5)

where v_i is the ith target value, and $v_{i}^{p}$ is the ith output prediction. XG updates the prediction outputs based on the learning rate and determines the values where the MSE is minimized.

CatBoost

The CB is a boosting method based on ensemble learning that focuses on categorical columns using permutation methods, $o n e h o t m a x s i z e (O H M S)$ , and target-based statistics. CB deals with the exponential growth problem of the combination of the features by using the greedy method at each split of the recent tree. For each feature with more categories than OHMS (an input parameter), CB uses the following steps:

Dividing the records into subsets randomly.

Converting the labels to integer numbers.

Transforming the categorical to numerical features, using Equation (6):

a v g T a r g e t = \frac{c o u n t I n c l a s s + p r i o r}{t o t a l C o u n t + 1},

(6)

where countInClass is the number of instances in the target for a given categorical feature, totalCount is the number of previous objects, and prior is identified by the starting parameters.³⁴

FS techniques

A very common preprocessing in high-dimensional data analysis, when a subset of meaningful features are required for a descriptive/predictive model, to use a ranking-based selection technique combined with a desirable threshold t.³⁵ So only the first top-ranked T features are selected. If needed, the resulting feature subset may be further refined through more sophisticated search methods that are utilized for dimensionality reduction.

The FS process can be executed using different ranking methods. This study adopts five techniques that are representative of quite different heuristics. In particular, we considered three univariate methods (SU, GR, and chi-squared), which evaluate the relevance of each feature independently from the others. Besides, two multivariate methods (ReliefF and SVM-AW) that consider the interdependencies among the features. More details about these techniques, along with a discussion of their pattern of agreement, can be found in Ref.³⁶ These approaches are briefly described below for completeness.

SU and GR rely on the information gain concept, which is a measure of the extent to which the class entropy decreases when the value of a given feature is known. Nevertheless, SU and GR differ in the way they recover the information gain's bias toward features with more values.

Chi-squared evaluates each feature by measuring its chi-squared statistic concerning the class: the larger the chi-squared, the higher the relevance of the feature for the task at hand.

ReliefF orders the features based on their ability to differentiate between data instances that are near to each other in the attribute space.

SVM-AW exploits a linear SVM classifier that has an embedded competence of assigning a weight to each feature based on the deduced hyperplane function. The absolute value of this weight (AW) is used to rank the features. Although the iterative variant of this method is proposed as a good option for fraud analysis, it is not used due to its poor stability.³⁷

Synthetic minority oversampling technique

A study¹³ developed a method of creating synthetic instances instead of merely copying existing instances in the data set. This technique is known as the SMOTE. For the SMOTE algorithm, the training set is changed by adding synthetically generated minority class instances, producing the balance of class distribution to resolve the imbalanced data problem. A parameter is given to the SMOTE algorithm to set a threshold value for mock data to balance the majority and minority classes. In general, the procedure of this algorithm could be summarized by the steps below:

For each observation x of the minority class, identify its k-nearest neighbor.

Select randomly a few neighbors (the number depends on the rate of oversampling).

Artificial observations are spread along the line joining the original observation x to its nearest neighbor.

Performance evaluation metrics

In credit card fraud detection, we have a binary classification objective where imbalanced learning occurs with high-dimensionality difficulties. The performances of different selected techniques were evaluated in terms of classification metrics relevant to fraud detection. In this study, the used evaluation measures are accuracy, precision, recall, F1, AUC, and savings. Those metrics can be defined as follows: $A c c u r a c y = \frac{N u m b e r o f c o r r e c t p r e d i c t i o n s}{T o t a l n u m b e r o f p r e d i c t i o n s},$ (7) $P r e c i s i o n = \frac{T F}{T P + F P},$ (8)

R e c a l l = \frac{T F}{T P + F N},

(9)

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(10)

where TP, TN, FP, and FN represent the number of true positives, the number of true negatives, the number of false positives, and the number of false negatives, respectively.

Although these evaluation metrics derived from the known confusion matrix are very common, they suppose that the misclassifications errors hold the same cost, so they may not be the most appropriate for evaluation in our study. Hence, the proposed cost matrix in Ref.,³⁸ which is example-dependent, will be more beneficial for assessment. Based on the matrix in Figure 2, we introduce the cost and savings, respectively, as follows:

FIG. 2.

Example-dependent cost-sensitive matrix.

C o s t (f (S)) = \sum_{i = 1}^{N} y_{i} (1 - c_{i}) A m t_{i} + c_{i} C_{a},

(11)

S a v i n g s (f (S)) = \frac{C o s t_{l} (S) - C o s t (f (S))}{C o s t_{l} (S)},

(12)

where $C o s t_{l} (S) = m i n {C o s t (f_{0} (S)), C o s t (f_{1} (S))},$ (13)

N is the number of transactions in a selected set, f₀ is the classifier that predicts all transactions in a set S as belonging to class 0, f₁ is the classifier that predicts all transactions in a set S as belonging to class 1, C_a is the administrative cost, while $A m t_{i}$ is the amount of the transaction in charge categorized as a false negative.

For analysis, we adopt the savings measure instead of the normalized cost,³⁸ because, in the credit card fraud detection field, the companies do not use a predictive model since it uses only traditional evaluation measures (recall, precision, Gmean, F-measure). Therefore, the savings measure makes more sense for this application.

To process, we first start by evaluating the classifiers (LR, RF, XG, CB) with the selected evaluation metrics, and the results are shown in Tables 4, 5, 7, and 8. We then focus on the savings, F-measure, and AUC measures that are more useful for our problem. In Table 6, we introduce the results of the classifiers using BMR also with and without SMOTE and FS, respectively, using the savings, F-measure, and AUC measures. As an evaluation protocol, parameter tuning was implemented by a fivefold cross-validation, and the selected parameters in Table 3 were used depending on the best achieved AUC.

Table 3.

Algorithm Settings

Algorithm	Parameters
SMOTE	sampling_strategy = minority, random_state = 42, kneighbors = 7
LR	solver = liblinear, max_iter = 500
RF	n_estimators = 100, max_depth = 10
XG	objective: reg:squarederror, max_depth: 10, eta: xgb_learning_rate, reg_alpha: 6, min_child_weight: 100
CB	loss_function: Logloss, grow_policy: Lossguide, iterations: 250, depth: 12

CB, CatBoost; XG, XGBoost.

Table 4.

Classifier Results for Different Combinations of FS + SMOTE for the IEEE CIS Data Set

	Without BMR				With BMR
	RF	LR	XB	CB	RF	LR	XB	CB
SU + SMOTE
Recall	0.993	0.983	0.996	0.981	0.995	0.986	0.997	0.981
Precision	0.604	0.913	0.826	0.871	0.742	0.836	0.865	0.892
Gmean	0.994	0.849	0.997	0.998	0.945	0.899	0.977	0.999
F1	0.973	0.836	0.939	0.986	0.866	0.489	0.716	0.818
Savings	0.842	0.737	0.846	0.897	0.972	0.906	0.962	0.989
GR + SMOTE
Recall	0.813	0.843	0.836	0.851	0.825	0.936	0.937	0.921
Precision	0.524	0.773	0.756	0.691	0.672	0.716	0.705	0.832
Gmean	0.914	0.799	0.857	0.848	0.865	0.779	0.827	0.839
F1	0.823	0.726	0.856	0.829	0.626	0.399	0.576	0.759
Savings	0.317	0.124	0.301	0.420	0.645	0.445	0.675	0.766
Chi-squared + SMOTE
Recall	0.863	0.853	0.886	0.881	0.865	0.846	0.857	0.801
Precision	0.464	0.773	0.746	0.811	0.612	0.656	0.705	0.742
Gmean	0.884	0.749	0.877	0.818	0.865	0.819	0.797	0.829
F1	0.813	0.746	0.866	0.879	0.626	0.459	0.586	0.669
Savings	0.418	0.133	0.361	0.401	0.574	0.476	0.601	0.762
ReliefF + SMOTE
Recall	0.93	0.923	0.906	0.901	0.965	0.906	0.967	0.921
Precision	0.844	0.823	0.896	0.821	0.822	0.876	0.875	0.852
Gmean	0.924	0.839	0.907	0.988	0.925	0.869	0.927	0.919
F1	0.973	0.786	0.906	0.979	0.696	0.469	0.716	0.719
Savings	0.538	0.213	0.321	0.487	0.704	0.566	0.701	0.722
SVM-AW + SMOTE
Recall	0.915	0.965	0.953	0.925	0.985	0.983	0.975	0.985
Precision	0.837	0.895	0.853	0.915	0.931	0.943	0.895	0.956
Gmean	0.957	0.953	0.945	0.969	0.875	0.895	0.835	0.629
F1	0.865	0.803	0.883	0.845	0.698	0.495	0.493	0.534
Savings	0.243	0.157	0.186	0.211	0.615	0.406	0.647	0.691

The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.

GR, gain ratio; SU, symmetrical uncertainty; SVM-AW, A Weighted Support Vector Machine.

Table 5.

Classifier Results for Different Combinations of FS + SMOTE for the Credit Card Fraud Detection Data Set

	Without BMR				With BMR
	RF	LR	XB	CB	RF	LR	XB	CB
SU + SMOTE
Recall	0.973	0.913	0.968	0.991	0.925	0.886	0.977	0.9804
Precision	0.554	0.883	0.766	0.771	0.712	0.746	0.785	0.8915
Gmean	0.974	0.829	0.907	0.968	0.875	0.869	0.917	0.9988
F-measure	0.977	0.886	0.904	0.997	0.896	0.789	0.716	0.8189
Savings	0.782	0.756	0.865	0.899	0.92	0.892	0.904	0.925
GR + SMOTE
Recall	0.903	0.783	0.976	0.901	0.755	0.866	0.867	0.9209
Precision	0.494	0.733	0.656	0.651	0.622	0.706	0.615	0.8317
Gmean	0.824	0.769	0.887	0.878	0.855	0.769	0.757	0.8381
F-measure	0.953	0.716	0.876	0.839	0.616	0.379	0.516	0.7583
Savings	0.247	0.114	0.261	0.36	0.605	0.405	0.665	0.7653
Chi-squared + SMOTE
Recall	0.813	0.823	0.836	0.871	0.845	0.776	0.807	0.8005
Precision	0.374	0.703	0.726	0.771	0.512	0.586	0.615	0.7411
Gmean	0.804	0.719	0.837	0.728	0.855	0.789	0.757	0.8282
F-measure	0.793	0.666	0.796	0.849	0.606	0.449	0.576	0.6688
Savings	0.328	0.043	0.301	0.391	0.484	0.376	0.541	0.8611
ReliefF + SMOTE
Recall	0.933	0.833	0.906	0.951	0.905	0.836	0.917	0.9209
Precision	0.794	0.753	0.946	0.781	0.782	0.776	0.985	0.8515
Gmean	0.854	0.789	0.907	0.938	0.885	0.819	0.9 87	0.9183
F-measure	0.923	0.726	0.866	0.909	0.666	0.409	0.626	0.7185
Savings	0.478	0.183	0.241	0.437	0.614	0.556	0.661	0.8214
SVM-AW + SMOTE
Recall	0.925	0.965	0.943	0.915	0.975	0.973	0.935	0.9841
Precision	0.787	0.815	0.763	0.875	0.921	0.863	0.815	0.9555
Gmean	0.857	0.933	0.865	0.879	0.825	0.875	0.805	0.628
F-measure	0.825	0.773	0.783	0.805	0.608	0.2995	0.463	0.5336
Savings	0.213	0.057	0.136	0.131	0.525	0.396	0.597	0.7907

The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.

Table 6.

Classification Results of Different Combinations With and Without Synthetic Minority Oversampling Technique, Feature Selection, and Bayes Minimum Risk

	IEEE-CIS fraud detection			Credit card fraud detection
	F1	AUC	Savings	F1	AUC	Savings
Without FS/SMOTE/BMR
RF	0.8998	0.9723	0.4084	0.9318	0.923	0.5084
LR	0.7448	0.9475	0.3408	0.8048	0.915	0.3408
CB	0.8470	0.9699	0.6124	0.8470	0.978	0.5024
XG	0.8624	0.9688	0.5058	0.8624	0.961	0.4758
Without FS/SMOTE
RF + BMR	0.7420	0.9723	0.746	0.821	0.9723	0.752
LR + BMR	0.6574	0.9475	0.73	0.76	0.9475	0.798
CB + BMR	0.7890	0.9699	0.757	0.831	0.9699	0.845
XG + BMR	0.7694	0.9688	0.746	0.792	0.9688	0.837
Without FS
SMOTE + RF + BMR	0.8966	0.973	0.972	0.8946	0.973	0.972
SMOTE + LR + BMR	0.8192	0.959	0.906	0.8192	0.981	0.906
SMOTE + CB + BMR	0.8996	0.994	0.979	0.8656	0.992	0.976
SMOTE + XG + BMR	0.8750	0.990	0.962	0.8250	0.991	0.9762
Without SMOTE
SU + RF + BMR	0.7873	0.897	0.812	0.8321	0.876	0.802
SU + LR + BMR	0.8199	0.89	0.8776	0.8625	0.8192	0.829
SU + CB + BMR	0.8394	0.894	0.926	0.892	0.8556	0.862
SU + XG + BMR	0.839	0.899	0.8262	0.855	0.8585	0.812

The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.

FS, feature selection.

Table 7.

Classifier Results for Different Combinations of SMOTE + FS for the IEEE CIS Data Set

	Without BMR				With BMR
	RF	LR	XB	CB	RF	LR	XB	CB
SMOTE + SU
Recall	0.963	0.933	0.986	0.951	0.945	0.926	0.947	0.941
Precision	0.534	0.873	0.816	0.841	0.712	0.826	0.855	0.812
Gmean	0.934	0.829	0.907	0.948	0.885	0.829	0.897	0.929
F-measure	0.943	0.796	0.926	0.979	0.726	0.559	0.726	0.789
Savings	0.558	0.223	0.341	0.417	0.724	0.606	0.651	0.752
SMOTE + GR
Recall	0.763	0.783	0.826	0.821	0.735	0.906	0.887	0.851
Precision	0.454	0.743	0.716	0.641	0.592	0.636	0.675	0.812
Gmean	0.864	0.729	0.847	0.808	0.795	0.749	0.767	0.829
F-measure	0.803	0.666	0.776	0.769	0.566	0.309	0.526	0.729
Savings	0.227	0.044	0.251	0.38	0.575	0.365	0.585	0.716
SMOTE + Chi-squared
Recall	0.813	0.773	0.816	0.871	0.835	0.826	0.787	0.721
Precision	0.454	0.733	0.726	0.791	0.522	0.606	0.615	0.682
Gmean	0.874	0.679	0.847	0.798	0.855	0.749	0.717	0.779
F-measure	0.783	0.696	0.776	0.839	0.586	0.379	0.536	0.639
Savings	0.408	0.093	0.271	0.391	0.554	0.416	0.511	0.712
SMOTE + ReliefF
Recall	0.89	0.893	0.816	0.861	0.955	0.856	0.917	0.871
Precision	0.824	0.773	0.816	0.771	0.812	0.796	0.825	0.822
Gmean	0.884	0.809	0.877	0.898	0.835	0.779	0.897	0.879
F-measure	0.923	0.766	0.826	0.899	0.626	0.399	0.686	0.679
Savings	0.508	0.203	0.241	0.467	0.684	0.516	0.621	0.682
SMOTE + SVM-AW
Recall	0.845	0.935	0.903	0.845	0.905	0.973	0.885	0.895
Precision	0.817	0.885	0.813	0.825	0.911	0.893	0.885	0.866
Gmean	0.917	0.943	0.915	0.879	0.845	0.825	0.755	0.589
F-measure	0.805	0.773	0.813	0.795	0.638	0.465	0.413	0.444
Savings	0.223	0.077	0.106	0.171	0.585	0.356	0.617	0.671

The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.

Proposed Method

In most of the classification algorithms, there is no difference between the cost of the correctly classified and misclassified instances because the accuracy is focused fundamentally.

Nevertheless, in some real-world problems, this methodology is not sufficient and it is risky. As in fraud detection, classifying a nonfraudulent transaction as fraudulent results in a cost, known as “The Administrative Cost,” which is linked to the transaction analysis by the human investigators and contacting the cardholders. Conversely, classifying a fraudulent transaction as nonfraudulent will involve a cost that may differ from one transaction to another. The literature related to cost-sensitive learning can distinguish between class-dependent problems where the cost of misclassification is linked with the class (fraud or not fraud) and example-dependent problems where the cost of misclassification is related with each example.³⁹ The class dependence is highly restrictive, as supposing the different costs are regular within the same class, which is not realistic in fraud detection.

Our study aims to examine the effectiveness of different proposed models and demonstrates that including the example-dependent cost could improve the classification results. In addition, to deal with imbalanced learning under the high-dimensionality issue, we combine different techniques including oversampling and FS techniques. Algorithm 1 represents the steps followed in the proposed approach.

In the preprocessing stage, the models use SMOTE for oversampling in combination with FS methods. Considering a binary classification setting, the evaluation is implemented in two main ways:

Algorithm 1: Algorithm for the proposed approach
Input $i, j$
subset_FS_technique = {SU, RG, Chi2,ReliefF and SWM-AW}
subset_classifiers = {LR, RF, CB, XB}
//Preprocessing order SMOTE+FS
for i in subset_FS_technique do
Do apply SMOTE for rebalance, then i for FS
for j in subset_classifiers do
apply j
return(model_without_BMR)
apply BMR
return(model_with_BMR)
end for
end for
//Preprocessing order FS+SMOTE
for i in subset_FS_technique do
Do apply i for FS then SMOTE for rebalance
for j in subset_classifiers do
apply j
return(model_without_BMR)
apply BMR
return(model_with_BMR)
end for
end for

Resampling with FS learning schemes: SMOTE + FS. First, resample the original data by reducing the level of class imbalance to a prespecified ratio R, which means generating R instances of the majority class for each instance of the minority class. This is achieved by adding a proper number of synthetic instances of the minority class, which is the case for SMOTE. FS is then carried out on the sampled data before moving to the execution of the classifier selected for the study.

FS with resampling learning schemes: FS + SMOTE. We first select a subset of meaningful features from the original data set and then perform data sampling. As a final step, the classifier is built.

During training, we apply different classifiers such as RF, LR, XG, and CT. Then, we select BMR as our cost-sensitive wrapping technique and use machine learning models based on the estimated probability and the cost matrix. The BMR method takes the estimated probability from the results after the classifier training and calculates the risk of predicting for each class, then chooses the one with the minimum risk (see Fig. 2). From Ref.,²⁵ we can define the risk of the fraud and nonfraud classes, respectively, as given in the equations below: $R (p_{f} | x) = L (p_{f} | y_{f}) P (p_{f} | x) + L (p_{f} | y_{l}) P (p_{l} | x),$ (14) $R (p_{l} | x) = L (p_{l} | y_{l}) P (p_{l} | x) + L (p_{l} | y_{f}) P (p_{f} | x),$ (15)

where $p_{f}, p_{l}$ are the prediction of fraud and nonfraud, $y_{f}, y_{l}$ are the true target of fraud and nonfraud, $L (a | b)$ is the loss function of a transaction predicted as and when the true target is b, and $P (p_{f} | x), P (p_{l} | x)$ are the estimated probabilities for nonfraud and fraud classes.

So, the transaction will be predicted as fraud if: $R (p_{f} | x) \leq R (p_{l} | x), \dots .$ (16)

In another way, the transaction is considered a fraud, according to Figure 2 when: $C_{a} P (p_{f} | x) = C_{a} P (p_{l} | x) \leq A m t_{i} P (p_{f} | x),$ (17)

where C_a is the administrative cost. $A m t_{i}$ is the amount of transaction in charge categorized as a false negative.

For an easier understanding, Figure 3 illustrates the main steps to optimize imbalanced data classification under high-dimensionality difficulties. For each selected data set, and each preprocessing order, we choose one from the five FS techniques combined with SMOTE. Then we apply the RF/LR/CB/XG classifier on the obtained data set; then, we opt for an optimization using BMR. The results of different scenarios are introduced in the Analysis of Results section.

FIG. 3.

Architecture of the proposed methodology. AUC, area under the curve; BMR, Bayes minimum risk; GR, gain ratio; LR, logistic regression; RF, random forest; SMOTE, synthetic minority oversampling technique; SU, symmetrical uncertainty; SVM-AW, A Weighted Support Vector Machine.

This study used different methods to solve the imbalanced data set classification problem under the high-dimensionality issue. The implementation details of all the used methods with their hyperparameters are shown in Table 3.

Results and Discussion

As mentioned before, the main challenge in our study is the imbalanced data under high dimensionality. In this section, we focused on the performance of different combinations of our selected classifiers with SMOTE/FS/BMR. And based on recall, precision, Gmean, F-measure, AUC, and savings, we investigated the best combination and discussed the advantage of each method and its effects on the optimization of the classifications.

Analysis of results

For results shown in Tables 4 and 5, the preprocessing starts with reducing the number of attributes using FS methods, and then SMOTE is applied. Focusing on FS methods, results generally show a significant performance of GU compared with the other models concerning different evaluation metrics, while the ReliefF method secures the second place. However, based on F-measure, a large difference between the classifiers' outputs has been observed, where LR has poor results than XG, while RF and CB keep showing better performance. Moreover, comparing both results with and without BMR, we observe that the savings measure has significantly increased using BMR, especially with CB and XG classifiers, whereas F-measure is decreased. This difference has an important interpretation since data analysts could be more interested in the amount of the discovered cases than the number of predicted fraud cases.

For Tables 7 and 8, preprocessing starts with SMOTE for oversampling, then it reduces the number of attributes using FS methods. Results indicate that SU, GR, chi-squared, and SVM-AW methods lead to very similar results, while ReliefF performs mostly better in terms of recall, precision, and Gmean. Extending the analysis to the considered classifiers, the best results have been achieved with the RF classifier. With a slight difference, CB also shows a good performance, especially concerning F-measure. Furthermore, comparing both results with and without BMR, as anticipated, the BMR helps the model to increase the savings measure, while it keeps F-measure decreasing compared with the model without BMR.

Table 8.

Classifier Results for Different Combinations of SMOTE + FS for the Credit Card Fraud Detection Data Set

	Without BMR				With BMR
	RF	LR	XB	CB	RF	LR	XB	CB
SMOTE + SU
Recall	0.923	0.903	0.908	0.981	0.905	0.806	0.947	0.9004
Precision	0.524	0.873	0.706	0.741	0.622	0.676	0.725	0.8715
Gmean	0.964	0.749	0.887	0.898	0.815	0.779	0.837	0.9188
F1	0.933	0.686	0.806	0.889	0.636	0.419	0.646	0.7488
Savings	0.458	0.143	0.281	0.377	0.654	0.496	0.561	0.871
SMOTE + GR
Recall	0.783	0.763	0.766	0.721	0.685	0.816	0.807	0.8409
Precision	0.454	0.663	0.646	0.611	0.552	0.656	0.545	0.7717
Gmean	0.774	0.689	0.777	0.738	0.795	0.699	0.697	0.7881
F1	0.713	0.686	0.696	0.689	0.596	0.289	0.466	0.7183
Savings	0.157	0.104	0.191	0.35	0.535	0.315	0.645	0.6853
SMOTE + Chi-squared
Recall	0.803	0.793	0.766	0.821	0.835	0.756	0.757	0.7905
Precision	0.354	0.693	0.666	0.761	0.482	0.556	0.595	0.6911
Gmean	0.724	0.689	0.827	0.718	0.785	0.709	0.687	0.7882
F1	0.773	0.596	0.736	0.829	0.546	0.409	0.556	0.6188
Savings	0.318	0.033	0.261	0.381	0.444	0.326	0.451	0.8311
SMOTE + ReliefF
Recall	0.873	0.743	0.876	0.931	0.865	0.766	0.887	0.8509
Precision	0.754	0.673	0.916	0.771	0.762	0.736	0.955	0.7815
Gmean	0.804	0.699	0.857	0.898	0.825	0.799	0.917	0.8583
F1	0.873	0.696	0.816	0.879	0.626	0.359	0.556	0.6285
Savings	0.428	0.133	0.211	0.357	0.554	0.536	0.611	0.7414
SMOTE + SVM-AW
Recall	0.765	0.825	0.933	0.845	0.965	0.943	0.845	0.9641
Precision	0.747	0.765	0.723	0.835	0.831	0.783	0.775	0.8855
Gmean	0.817	0.863	0.845	0.859	0.775	0.865	0.765	0.588
F1	0.745	0.683	0.763	0.765	0.528	0.2495	0.383	0.5036
Savings	0.183	0.013	0.306	0.081	0.455	0.346	0.577	0.6207

The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.

Noteworthy, and based on the results shown so far, results where preprocessing starts with FS are in great part consistent and better compared with the results obtained with the second preprocessing order. For the rest of our discussion, we focus more on the SU FS method since it achieved the best performance compared with the other techniques.

To facilitate the comparison and investigate the impact of each method concerning the overall performance, we performed several experiments. Table 6 shows the results for other possible combinations, with and without FS, SMOTE, and BMR, and evaluates the outputs using F-measure, AUC, and savings. Results suggest that without FS, SMOTE, or even BMR, the classifier CB achieved the best results regarding savings, while RF outperforms concerning F-measure and AUC. However, without SMOTE and FS, CB and RF still achieved good results except that using BMR we identified a decrease of F-measure against a considerable increase of savings, while AUC keeps almost the same results. Moving to the case of not using FS, where SMOTE has boosted the achievement for all classifiers, especially for the RF, which has considerably improved. Altogether, CB and XG have the best results identically as the RF when used with SMOTE.

On the contrary, in the case of not using SMOTE, we observe almost the same behavior as not using FS. Nevertheless, the presence of SMOTE seems to be more beneficial for our classification results than using only FS, as we can see for both data sets' results despite the noticeable difference in terms of the number of features.

Since SMOTE shows high importance in our classification optimization, we illustrate Figure 4 to investigate the effect of using SMOTE with and without FS, as well as BMR on the final results. Focusing on both F-measure and savings metrics, we notice that using SMOTE generally increases both the savings and the F-measure, which signifies detecting more fraud cases and saving more transaction amounts. However, using BMR upsurges the savings metric on the expense of the F-measure, which indicates detecting higher amounts while identifying fewer fraud cases. Concerning the used classifier, the LR method was the worst for both savings and F-measure, while CB and RF show the best results for savings as well as F-measure.

FIG. 4.

Comparison of F1-score and savings with and without BMR. CB, CatBoost; XG, XGBoost.

Statistical analysis

To analyze our results statistically, this study used a T-test to validate if the difference between the models' results is statistically significant or not.¹⁷ Thus, we consider two hypotheses to illustrate T-test results.

Null hypothesis (Ho) is that the difference between the evaluation metrics in results is not significant.

Alternative hypothesis (Ha) affirms that the difference between those metrics is significant.

The T-test showed that the null hypothesis could not be rejected in favor of alternative hypotheses for RF and CB algorithms. However, when the test was performed between LR and RF, between LR and XG, and also between LR and CB results, it favored the alternative hypothesis and identified that the difference in terms of F-measure, AUC, and savings is statistically significant.

Conclusion

This study aims at analyzing the fraud detection problem in credit cards using FS for high dimensionality and resampling for handling imbalanced data. Several combinations of FS and oversampling method SMOTE are investigated using two different preprocessing orders. Two well-known machine learning classifiers RF and LR are applied at the preprocessing level with two robust boosting learning classifiers CB and XG for the classification stage. In addition, BMR is used as an example-dependent cost-sensitivity learner for optimization. The results are analyzed using F1-score, AUC, and savings measures. For experiments, the order of preprocessing and the technique of FS are focused to increase the efficacy of the proposed approach. Results indicate that the best feature is SU concerning the performance when it is used before the oversampling.

Second, the results are compared with and without FS SMOTE/BMR separately and it is found that the best scenario is the combination of SU + SMOTE + classifier concerning AUC and F-measures, particularly for RF and CB classifiers. Generally, the use of SMOTE with SU boosts both the savings measure and F-measure. Moreover, using BMR increases the savings on the expense of the F-measure, which shows identifying higher loss of amount against less number of detected fraud cases. As a result, the decision maker can priories to save a high fraudulent amount than considering the predicted number of fraud cases. In future work, we plan to further optimize the classification process using a modified loss function of cost sensitivity during the training stage.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

No funding was received for this article.

Abbreviations Used

References

Nagrecha

, Johnson

, Chawla

. Fraudbuster: Reducing fraud in an auto insurance market. Big Data. 2018; 6:3–12.

Rasheed

, Jamil

, Hameed

, et al. Improving stock prediction accuracy using CNN and LSTM. In: 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI). Bahrain: IEEE, 2020. pp. 1–5.

Makki

, Assaghir

, Taher

, et al. An experimental study with imbalanced classification approaches for credit card fraud detection. IEEE Access. 2019; 7:93010–93022.

Shen

, Tong

, Deng

. Application of classification models on credit card fraud detection. In: 2007 International Conference on Service Systems and Service Management. China: IEEE, 2007. pp. 1–4.

Kelotra

, Pandey

. Stock market prediction using optimized deep-convlstm model. Big Data. 2020; 8:5–24.

Bolón-Canedo

, Sánchez-Maroño

, Alonso-Betanzos

. Feature selection for high-dimensional data. Belgium: Springer, 2015.

Chen

, Guestrin

Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. pp. 785–794.

Dal Pozzolo

Adaptive machine learning for credit card fraud detection. Belgium: Universite libre de Bruxelles, 2015.

Khalid

, Ashraf

, Mehmood

, et al. GbSVM: Sentiment classification from unstructured reviews using ensemble classifier. Appl Sci. 2020; 10:2788.

10.

Mehmood

, On

B-W

, Lee

, et al. Spam comments prediction using stacking with ensemble learning. J Phys Conf Ser. 2017; 933:012012.

11.

Ben Brahim

, Limam

. Ensemble feature selection for high dimensional data: A new method and a comparative study. Adv Data Anal Classif. 2018; 12:937–952.

12.

Abdallah

, Maarof

, Zainal

. Fraud detection system: A survey. J Netw Comput Appl. 2016; 68:90–113.

13.

Chawla

, Bowyer

, Hall

, et al. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res. 2002; 16:321–357.

14.

Khoshgoftaar

, Fazelpour

, Dittman

, et al. Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data. In: Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014). USA: IEEE, 2014. pp. 315–321.

15.

Ksieniewicz

, Woźniak

. Dealing with the task of imbalanced, multidimensional data classification using ensembles of exposers. In: First International Workshop on Learning with Imbalanced Domains: Theory and Applications. USA: PMLR, 2017. pp. 164–175.

16.

Blagus

, Lusa

. Improved shrunken centroid classifiers for high-dimensional class-imbalanced data. BMC Bioinformatics. 2013; 14:1–13.

17.

Omar

, Rustam

, Mehmood

, et al. Minimizing the overlapping degree to improve class-imbalanced learning under sparse feature selection: Application to fraud detection. IEEE Access. 2021; 9:28101–28110.

18.

Zhang

, Liu

, Zhang

, et al. An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl. 2017; 82:128–150.

19.

Mohammed

, Wong

K-W

, Shiratuddin

, et al. Scalable machine learning techniques for highly imbalanced credit card fraud detection: a comparative study. In: Pacific Rim International Conference on Artificial Intelligence. Cham: Springer, 2018. pp. 237–246.

20.

Dhankhad

, Mohammed

, Far

. Supervised machine learning algorithms for credit card fraudulent transaction detection: A comparative study. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI). IEEE, 2018. pp. 122–125.

21.

Tran

, Tran

, Huong

, et al. Real time data-driven approaches for credit card fraud detection. In: Proceedings of the 2018 International Conference on E-business and Applications. USA: Association for Computing Machinery, 2018. pp. 6–9.

22.

Pumsirirat

, Yan

. Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine. Int J Adv Comput Sci Appl. 2018; 9:18–25.

23.

Park

, Luo

, Parhi

, et al. Seizure prediction with spectral power of EEG using cost-sensitive support vector machines. Epilepsia. 2011; 52:1761–1770.

24.

Zhang

, Ray

, Priestley

, Tan

. A descriptive study of variable discretization and cost-sensitive logistic regression on imbalanced credit data. J Appl Stat. 2020; 47:568–581.

25.

Bahnsen

, Stojanovic

, Aouada

, et al. Cost sensitive credit card fraud detection using Bayes minimum risk. In: 2013 12th International Conference on Machine Learning and Applications, Vol. 1. USA: IEEE, 2013. pp. 333–338.

26.

Devi

, Biswas

, Purkayastha

. A cost-sensitive weighted random forest technique for credit card fraud detection. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). India: IEEE, 2019. pp. 1–6.

27.

Sharifnia

, Boostani

. Instance-based cost-sensitive boosting. Intern J Pattern Recogn Artif Intell. 2020; 34:2050002.

28.

Taha

, Malebary

. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access. 2020; 8:25579–25587.

29.

Hussein

, Khairy

, Najeeb

SMM

, et al. Credit card fraud detection using fuzzy rough nearest neighbor and sequential minimal optimization with logistic regression. Int J Interact Mob Technol. 2021; 15.

30.

, Maulidevi

, Surendro

. SMOTE-LOF for noise identification in imbalanced data classification. J King Saud Univ Comput Inf Sci. 2021. [Epub ahead of print]; DOI: 10.1016/j.jksuci.2021.01.014.

31.

Rtayli

, Enneya

. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. J Inf Secur Appl. 2020, 55:102596.

32.

Alhasan

, Audah

, Alabbas

. Energy overhead evaluation of security trust models for IOT applications. Pakistan: Little Lion Scientific, 2020.

33.

Shalev-Shwartz

, Ben-David

. Understanding machine learning: From theory to algorithms. United Kingdom: Cambridge University Press, 2014.

34.

Dorogush

, Ershov

, Gulin

. Catboost: Gradient boosting with categorical features support. USA: Cornell University, arXiv preprint arXiv:1810.11363, 2018.

35.

Al-Khshali

, Ilyas

, Ucan

. Effect of pe file header features on accuracy. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2020. pp. 1115–1120.

36.

Dessì

, Pes

. Similarity of feature selection methods: An empirical study across data intensive classification tasks. Expert Syst Appl. 2015; 42:4632–4642.

37.

Pes

. Feature selection for high-dimensional data: The issue of stability. In: 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). IEEE, 2017. pp. 170–175.

38.

Bahnsen

, Aouada

, Stojanovic

, et al. Feature engineering strategies for credit card fraud detection. Expert Syst Appl. 2016; 51:134–142.

39.

Correa Bahnsen

Example-dependent cost-sensitive classification with applications in financial risk modeling and marketing analytics. PhD thesis, University of Luxembourg, Luxembourg, 2015.