Abstract
The full potential of data analysis is crippled by imbalanced and high-dimensional data, which makes these topics significantly important. Consequently, substantial research efforts have been directed to obtain dimension reduction and resolve data imbalance, especially in the context of fraud detection analysis. This work aims to investigate the effectiveness of hybrid learning methods for alleviating the class imbalance and integrating dimensionality reduction techniques. In this regard, the current study examines different classification combinations to achieve optimal savings and improve classification performance. Against this background, several well-known machine learning models are selected such as logistic regression, random forest, CatBoost (CB), and XGBoost. These models are constructed and optimized based on Bayes minimum risk (BMR) associated with the oversampling method synthetic minority oversampling technique (SMOTE) and different feature selection (FS) techniques, both univariate and multivariate. To investigate the performance of the proposed approach, different possible scenarios are analyzed both with and without balancing, with and without FS, and optimization using BMR. With a major insight about the best method to use, BMR shows a good optimization when used with SMOTE, symmetrical uncertainty for FS, and CB as a boosted classifier, principally in terms of F1 score and savings metrics.
Introduction
Fraudulent activities have been largely increased during the past decade due to the wide use of online resources for financial transactions. Consequently, fraud detection has emerged as an important research area recently and several studies have been proposed to resolve this problem. 1 Fraudulent transactions are categorized into two groups: online fraud and offline fraud. The latter involves using stolen physical cards to obtain financial gain, while the former is performed by stealing digital identities of people such as credit card numbers, user identities (ID) and passwords, and security certificates for online transactions. 2
Because of the increased online fraudulent activities, efficient and effective fraud detection systems should be put in place. A good fraud detection system must be capable of accurately recognizing the fraud deal and able to do so in real time. To achieve this goal, several approaches have been presented to identify fraudulent transactions using machine learning and deep learning models.3,4 However, the classification of fraudulent transactions with highly imbalanced classes remains a challenging task. 5 In fact, the number of normal transactions (samples of normal class) is much larger than the number of fraudulent transactions for abnormal class. The samples for minority and majority classes are separated when a classification model is applied to the sample for training. Although a small difference in the number of samples for different classes would not affect the model's performance, often large differences make a huge impact in terms of different evaluation metrics.
It is so because machine learning models are biased toward the samples of the major class and show better performance for the majority class than that of the minor class. The class imbalance has been typically handled in the literature as an independent problem, especially in application fields where the number of features is more critical. Besides the data imbalance, the curse of dimensionality has been considered one of the most critical issues in fraud data analysis. Using a high number of features for classification problems may negatively impact the performance of machine learning algorithms, not only in terms of computational efficiency but also in terms of the final predictive accuracy score. Because the generalization ability of the induced models may significantly devalue when the size of the search space is large. 6
The studies7,8 investigate the strengths and weaknesses of the different selection approaches in fraud detection. Indeed, hybrid methods that leverage different heuristics at different stages of the feature selection (FS) process have shown better performance than single models.9,10 For example, support vector machine (SVM)-based correction classifier and SVM-THR (SVM with threshold adjustment) ensemble approaches provide unique and stable features without ignoring the aspect of predictive accuracy. 11
For fraud detection, many approaches have been proposed to deal with imbalanced data sets, such as sampling-based balancing methods and cost-sensitive classification. 12 The sampling-based methods adjust the distribution of the samples so that a number of samples for minority and majority classes may have an almost similar distribution. Common sampling-based methods are random undersampling and random oversampling, which are used to decrease or increase the instances of the majority and minority class, respectively. For example, synthetic minority oversampling technique (SMOTE) generates synthetic samples of minority class to equal the sample distribution. 13 On the contrary, several research efforts have explored the issues of high dimensionality and class imbalance independently. Only few studies have considered both problems simultaneously.14–16
Keeping in view the fact that several fraud data sets are both high-dimensional and class-imbalanced, this study investigates the effectiveness of learning techniques designed to handle both issues. By extending our previous research,
17
we explore the combination of sampling-based balancing method and cost-sensitive classification with suitable FS techniques, selected to be representatives of different selection approaches both univariate and multivariate. Using two patterns of challenging fraud data sets, we experimentally evaluate the extent to which the resulting learning schemes are advantageous. In this study, we focus on optimizing the classification performance of class-imbalanced learning under high dimensionality. Based on cost-sensitive learning combined with FS and sampling methods, we can reduce the impact of high dimensionality and imbalanced data and establish a good classification. The key points of this study are as follows:
Proposing a strategy that optimizes the classification of the imbalanced data sets under high-dimensionality difficulty. For this purpose, four machine learning models have been utilized including logistic regression (LR), random forest (RF), CatBoost (CB), and XGBoost (XG). Testing the model performance based on the preprocessing order where sampling comes before FS and vice versa. In this regard, two well-known data sets from Kaggle have been selected that present the imbalanced data with high-dimensionality difficulties. Comparing the results of different combinations between FS methods and resampling for analysis. Later, integrating Bayes minimum risk (BMR) to optimize the classification performance of the best combination.
The rest of the article is organized as follows. The Related Work section briefly presents the research works related to the current study. The Materials and Methods section introduces the data sets and the methods included in our study, involving the considered FS techniques and their combination with strategies for reducing the class imbalance. The proposed approach is presented in the Proposed Method section. The Results and Discussion section reports the findings of this work by discussing and analyzing the results. The Conclusion section offers the concluding remarks.
Related Work
The importance of studying imbalanced learning that coexisted with high-dimensionality problems encourages the researchers to look for potential solutions by adapting existing methods and developing novel approaches. In Ref., 18 the researchers select eleven distinct classifiers and test them on 71 data sets to analyze the performance of the selected classifiers. The study concludes that gradient boosting decision trees perform better and faster than SVM and RF. The authors in Ref. 19 perform fraud detection using RF, balanced bagging ensemble, and naive Bayes, and found that bagging learner has the best prediction; yet, RF shows high adaptability for the high-dimension data set. Similarly, Ref. 20 proposed a hybrid classifier combining several classifiers and compared its performance with RF and XG using the traditional known measures.
Predominantly, studies compare classifiers' performance without balancing or FS and do not even consider a cost-sensitive approach. However, several works considered the problem as an unsupervised problem and introduced solutions accordingly. In Ref., 21 the authors used anomaly detection techniques. One-class classification based on SVM and multivariate control charts applied to real-world data produced high accuracy and a low false-positive rate. In Ref., 22 autoencoders are used with restricted Boltzmann machines to deal with fraud detection problems.
Comparably, recently the problem is considered cost-sensitive, and the focus is placed on time when the cost-sensitivity is applied, as shown in Figure 1. However, in Ref., 23 the focus was on improving the SVM algorithm to handle the class-imbalanced problem using cost-sensitive learning, which validates that the cost-sensitive SVM has better results almost for all the used data sets, based on the area under the curve (AUC) evaluation. In Ref., 24 a new approach named cost-sensitive LR has been introduced. Model bias is reduced by selecting the appropriate variable discretization and cost-sensitive LR with the best class weights. Combining multiple models shows good performance, as given in Ref., 25 combining the BMR wrapping method with LR, RF, and DT classifiers for the same task with undersampling instead of using SMOTE to balance the data.

Different cost-sensitive algorithms are grouped according to their system level of use.
Similarly in Ref., 26 the researchers suggested a cost-sensitive RF-based ensemble learning technique where results show that the algorithm outperforms two existing cost-sensitive implementations of RF. In Ref., 27 the authors altered boosting classifiers to be the cost-sensitive version by including the cost of misclassification during the training stage. Moreover, in terms of evaluation measures, the researchers managed to get more optimized results using only F-score and cost as measures.
In Taha and Malebary 28 an optimized light gradient boosting machine (GBM) is used where Bayesian-based hyperparameter optimization is combined to tune the parameter of the light GBM. The model outperformed concerning accuracy and F-measures compared with other models' performance. While the article 29 presented the combination of multiple classifiers through stacking ensemble technique for credit card fraud detection. Using the fuzzy-rough nearest neighbor and sequential minimal optimization, the simulation results compared with seven other algorithms proved that the ensemble model can detect credit card fraud with a sufficiently high accuracy of 84.90%. In addition, improving the resampling technique could further enhance the results of imbalanced learning, as is the case of Ref., 30 where the authors introduced the “SMOTE-local outlier factor” technique to identify the noise from produced synthetic minority data. Results show better Gmean and F-measure compared to the standard SMOTE.
At the same time, several studies prefer to combine the feature elimination method with SMOTE to enhance model performance. By exploiting this combination, the study. 31 proposed a hybrid approach that combined the elimination of the recursive feature to reduce the number of features, and SMOTE for oversampling. The model achieved a high F-measure and AUC performance than traditional machine learning classifiers.
Table 1 provides a comprehensive summary of the discussed related works.
A Comprehensive Summary of the Discussed Research Works
AUC, area under the curve; BBE, Balanced Bagging Ensemble; BMR, Bayes minimum risk; FP, number of false positives; LR, logistic regression; NB, naive Bayes; RF, random forest; SMOTE, synthetic minority oversampling technique; SVM, support vector machine; XGB, eXtreme Gradient Boosting.
This article compares the example-dependent, cost-sensitive BMR wrapping method for algorithms LR, RF, CB, and XG, combined with the FS technique (ReliefF, symmetrical uncertainty [SU], gain ratio [GR], chi-squared, and A Weighted Support Vector Machine [SVM-AW]) and SMOTE in two different preprocessing orders. Then we use traditional metrics for evaluation of these algorithms in addition to AUC and savings measures.
Materials and Methods
Data sets used for experiments
For evaluating the effectiveness and performance of the proposed approach, extensive experiments are performed on two imbalanced data sets with high-dimensionality characteristics. Table 2 shows a summary of the selected data sets while it listed the number of instances, features, the number of the majority and the minority instances, and the imbalance ratio of the data set.
Benchmark Data Sets
IEEE-CIS, IEEE Computational Intelligence Society.
To create independence between transactions, we deleted the “ID” of the credit card for each transaction, and so, two transactions cannot be linked to the same credit card. Furthermore, we scaled both attributes, “Time” and “Amount,” to have the same opportunity during the classification as the other features do.
Machine learning models and techniques
Machine learning models have been widely applied to obtain highly accurate predictions.32,2 Owing to the results reported in the literature, several well-known models are investigated in this study.
Logistic regression
LR algorithm is a statistical method that analyzes the relationship between multiple independent variables and a categorical dependent variable. It is a simple and more efficient method for binary and linear classification problems. This model measures the probability of occurrence of an event by fitting the data to a logistic curve.
24
A logistic function is a common “S” shaped or sigmoid curve, where the vertical axis refers to the probability for a given classification and the horizontal axis indicates the value of x. It assumes that the distribution of
where e is the natural algorithm base (also known as Euler number), vo is the x-value of the sigmoid midpoint, L is the curve's maximum value, and m is the steepness of the curve.
Random forest
The RF classifier is an ensemble model for classification based on the bagging method. RF involves a combination of decision tree classifiers, where each decision tree contributes a single vote to the assignment of the most frequently targeted class for the input vector (w):
where
The final prediction is made by voting between the RF trees based on the majority class predictions. The combination of many DT indicates that RF has some exceptional characteristics, and thus, RF differs significantly from traditional tree-based algorithms.
33
We can also define RF as the simplest case of majority voting and
Let 0, 0, and 1 be the predictions from RF trees
In this case, the final prediction is 0 because the majority of the trees
XGBoost
XGBoost (XG) is eXtreme Gradient Boosting, a boosting method for classification tasks. XG approaches the process of sequential tree building using a parallelized implementation of DT. The first DT classifier is fitted based on the data set before the following DT is trained based on the errors of the first classifier and added to the first classifier, and so on. This sequential coupling of classifiers helps to reduce errors and boost classification accuracy. It handles sparse data problems and proposes a theoretically justified weighted quantile plan for estimated learning.
7
XG defines a mean square error (MSE) and minimizes it as in
where vi is the ith target value, and
CatBoost
The CB is a boosting method based on ensemble learning that focuses on categorical columns using permutation methods,
Dividing the records into subsets randomly.
Converting the labels to integer numbers.
Transforming the categorical to numerical features, using Equation (6):
where countInClass is the number of instances in the target for a given categorical feature, totalCount is the number of previous objects, and prior is identified by the starting parameters. 34
FS techniques
A very common preprocessing in high-dimensional data analysis, when a subset of meaningful features are required for a descriptive/predictive model, to use a ranking-based selection technique combined with a desirable threshold t. 35 So only the first top-ranked T features are selected. If needed, the resulting feature subset may be further refined through more sophisticated search methods that are utilized for dimensionality reduction.
The FS process can be executed using different ranking methods. This study adopts five techniques that are representative of quite different heuristics. In particular, we considered three univariate methods (SU, GR, and chi-squared), which evaluate the relevance of each feature independently from the others. Besides, two multivariate methods (ReliefF and SVM-AW) that consider the interdependencies among the features. More details about these techniques, along with a discussion of their pattern of agreement, can be found in Ref. 36 These approaches are briefly described below for completeness.
SU and GR rely on the information gain concept, which is a measure of the extent to which the class entropy decreases when the value of a given feature is known. Nevertheless, SU and GR differ in the way they recover the information gain's bias toward features with more values.
Chi-squared evaluates each feature by measuring its chi-squared statistic concerning the class: the larger the chi-squared, the higher the relevance of the feature for the task at hand.
ReliefF orders the features based on their ability to differentiate between data instances that are near to each other in the attribute space.
SVM-AW exploits a linear SVM classifier that has an embedded competence of assigning a weight to each feature based on the deduced hyperplane function. The absolute value of this weight (AW) is used to rank the features. Although the iterative variant of this method is proposed as a good option for fraud analysis, it is not used due to its poor stability. 37
Synthetic minority oversampling technique
A study 13 developed a method of creating synthetic instances instead of merely copying existing instances in the data set. This technique is known as the SMOTE. For the SMOTE algorithm, the training set is changed by adding synthetically generated minority class instances, producing the balance of class distribution to resolve the imbalanced data problem. A parameter is given to the SMOTE algorithm to set a threshold value for mock data to balance the majority and minority classes. In general, the procedure of this algorithm could be summarized by the steps below:
For each observation x of the minority class, identify its k-nearest neighbor.
Select randomly a few neighbors (the number depends on the rate of oversampling).
Artificial observations are spread along the line joining the original observation x to its nearest neighbor.
Performance evaluation metrics
In credit card fraud detection, we have a binary classification objective where imbalanced learning occurs with high-dimensionality difficulties. The performances of different selected techniques were evaluated in terms of classification metrics relevant to fraud detection. In this study, the used evaluation measures are accuracy, precision, recall, F1, AUC, and savings. Those metrics can be defined as follows:
where TP, TN, FP, and FN represent the number of true positives, the number of true negatives, the number of false positives, and the number of false negatives, respectively.
Although these evaluation metrics derived from the known confusion matrix are very common, they suppose that the misclassifications errors hold the same cost, so they may not be the most appropriate for evaluation in our study. Hence, the proposed cost matrix in Ref., 38 which is example-dependent, will be more beneficial for assessment. Based on the matrix in Figure 2, we introduce the cost and savings, respectively, as follows:

Example-dependent cost-sensitive matrix.
where
N is the number of transactions in a selected set, f0 is the classifier that predicts all transactions in a set S as belonging to class 0, f1 is the classifier that predicts all transactions in a set S as belonging to class 1, Ca is the administrative cost, while
For analysis, we adopt the savings measure instead of the normalized cost, 38 because, in the credit card fraud detection field, the companies do not use a predictive model since it uses only traditional evaluation measures (recall, precision, Gmean, F-measure). Therefore, the savings measure makes more sense for this application.
To process, we first start by evaluating the classifiers (LR, RF, XG, CB) with the selected evaluation metrics, and the results are shown in Tables 4, 5, 7, and 8. We then focus on the savings, F-measure, and AUC measures that are more useful for our problem. In Table 6, we introduce the results of the classifiers using BMR also with and without SMOTE and FS, respectively, using the savings, F-measure, and AUC measures. As an evaluation protocol, parameter tuning was implemented by a fivefold cross-validation, and the selected parameters in Table 3 were used depending on the best achieved AUC.
Algorithm Settings
CB, CatBoost; XG, XGBoost.
Classifier Results for Different Combinations of FS + SMOTE for the IEEE CIS Data Set
The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.
GR, gain ratio; SU, symmetrical uncertainty; SVM-AW, A Weighted Support Vector Machine.
Classifier Results for Different Combinations of FS + SMOTE for the Credit Card Fraud Detection Data Set
The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.
Classification Results of Different Combinations With and Without Synthetic Minority Oversampling Technique, Feature Selection, and Bayes Minimum Risk
The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.
FS, feature selection.
Classifier Results for Different Combinations of SMOTE + FS for the IEEE CIS Data Set
The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.
Proposed Method
In most of the classification algorithms, there is no difference between the cost of the correctly classified and misclassified instances because the accuracy is focused fundamentally.
Nevertheless, in some real-world problems, this methodology is not sufficient and it is risky. As in fraud detection, classifying a nonfraudulent transaction as fraudulent results in a cost, known as “The Administrative Cost,” which is linked to the transaction analysis by the human investigators and contacting the cardholders. Conversely, classifying a fraudulent transaction as nonfraudulent will involve a cost that may differ from one transaction to another. The literature related to cost-sensitive learning can distinguish between class-dependent problems where the cost of misclassification is linked with the class (fraud or not fraud) and example-dependent problems where the cost of misclassification is related with each example. 39 The class dependence is highly restrictive, as supposing the different costs are regular within the same class, which is not realistic in fraud detection.
Our study aims to examine the effectiveness of different proposed models and demonstrates that including the example-dependent cost could improve the classification results. In addition, to deal with imbalanced learning under the high-dimensionality issue, we combine different techniques including oversampling and FS techniques. Algorithm 1 represents the steps followed in the proposed approach.
In the preprocessing stage, the models use SMOTE for oversampling in combination with FS methods. Considering a binary classification setting, the evaluation is implemented in two main ways:
Resampling with FS learning schemes: SMOTE + FS. First, resample the original data by reducing the level of class imbalance to a prespecified ratio R, which means generating R instances of the majority class for each instance of the minority class. This is achieved by adding a proper number of synthetic instances of the minority class, which is the case for SMOTE. FS is then carried out on the sampled data before moving to the execution of the classifier selected for the study.
FS with resampling learning schemes: FS + SMOTE. We first select a subset of meaningful features from the original data set and then perform data sampling. As a final step, the classifier is built.
During training, we apply different classifiers such as RF, LR, XG, and CT. Then, we select BMR as our cost-sensitive wrapping technique and use machine learning models based on the estimated probability and the cost matrix. The BMR method takes the estimated probability from the results after the classifier training and calculates the risk of predicting for each class, then chooses the one with the minimum risk (see Fig. 2). From Ref.,
25
we can define the risk of the fraud and nonfraud classes, respectively, as given in the equations below:
where
So, the transaction will be predicted as fraud if:
In another way, the transaction is considered a fraud, according to Figure 2 when:
where Ca is the administrative cost.
For an easier understanding, Figure 3 illustrates the main steps to optimize imbalanced data classification under high-dimensionality difficulties. For each selected data set, and each preprocessing order, we choose one from the five FS techniques combined with SMOTE. Then we apply the RF/LR/CB/XG classifier on the obtained data set; then, we opt for an optimization using BMR. The results of different scenarios are introduced in the Analysis of Results section.

Architecture of the proposed methodology. AUC, area under the curve; BMR, Bayes minimum risk; GR, gain ratio; LR, logistic regression; RF, random forest; SMOTE, synthetic minority oversampling technique; SU, symmetrical uncertainty; SVM-AW, A Weighted Support Vector Machine.
This study used different methods to solve the imbalanced data set classification problem under the high-dimensionality issue. The implementation details of all the used methods with their hyperparameters are shown in Table 3.
Results and Discussion
As mentioned before, the main challenge in our study is the imbalanced data under high dimensionality. In this section, we focused on the performance of different combinations of our selected classifiers with SMOTE/FS/BMR. And based on recall, precision, Gmean, F-measure, AUC, and savings, we investigated the best combination and discussed the advantage of each method and its effects on the optimization of the classifications.
Analysis of results
For results shown in Tables 4 and 5, the preprocessing starts with reducing the number of attributes using FS methods, and then SMOTE is applied. Focusing on FS methods, results generally show a significant performance of GU compared with the other models concerning different evaluation metrics, while the ReliefF method secures the second place. However, based on F-measure, a large difference between the classifiers' outputs has been observed, where LR has poor results than XG, while RF and CB keep showing better performance. Moreover, comparing both results with and without BMR, we observe that the savings measure has significantly increased using BMR, especially with CB and XG classifiers, whereas F-measure is decreased. This difference has an important interpretation since data analysts could be more interested in the amount of the discovered cases than the number of predicted fraud cases.
For Tables 7 and 8, preprocessing starts with SMOTE for oversampling, then it reduces the number of attributes using FS methods. Results indicate that SU, GR, chi-squared, and SVM-AW methods lead to very similar results, while ReliefF performs mostly better in terms of recall, precision, and Gmean. Extending the analysis to the considered classifiers, the best results have been achieved with the RF classifier. With a slight difference, CB also shows a good performance, especially concerning F-measure. Furthermore, comparing both results with and without BMR, as anticipated, the BMR helps the model to increase the savings measure, while it keeps F-measure decreasing compared with the model without BMR.
Classifier Results for Different Combinations of SMOTE + FS for the Credit Card Fraud Detection Data Set
The value in bold highlighted the algorithm showing the good result with respect to each evaluation metric.
Noteworthy, and based on the results shown so far, results where preprocessing starts with FS are in great part consistent and better compared with the results obtained with the second preprocessing order. For the rest of our discussion, we focus more on the SU FS method since it achieved the best performance compared with the other techniques.
To facilitate the comparison and investigate the impact of each method concerning the overall performance, we performed several experiments. Table 6 shows the results for other possible combinations, with and without FS, SMOTE, and BMR, and evaluates the outputs using F-measure, AUC, and savings. Results suggest that without FS, SMOTE, or even BMR, the classifier CB achieved the best results regarding savings, while RF outperforms concerning F-measure and AUC. However, without SMOTE and FS, CB and RF still achieved good results except that using BMR we identified a decrease of F-measure against a considerable increase of savings, while AUC keeps almost the same results. Moving to the case of not using FS, where SMOTE has boosted the achievement for all classifiers, especially for the RF, which has considerably improved. Altogether, CB and XG have the best results identically as the RF when used with SMOTE.
On the contrary, in the case of not using SMOTE, we observe almost the same behavior as not using FS. Nevertheless, the presence of SMOTE seems to be more beneficial for our classification results than using only FS, as we can see for both data sets' results despite the noticeable difference in terms of the number of features.
Since SMOTE shows high importance in our classification optimization, we illustrate Figure 4 to investigate the effect of using SMOTE with and without FS, as well as BMR on the final results. Focusing on both F-measure and savings metrics, we notice that using SMOTE generally increases both the savings and the F-measure, which signifies detecting more fraud cases and saving more transaction amounts. However, using BMR upsurges the savings metric on the expense of the F-measure, which indicates detecting higher amounts while identifying fewer fraud cases. Concerning the used classifier, the LR method was the worst for both savings and F-measure, while CB and RF show the best results for savings as well as F-measure.

Comparison of F1-score and savings with and without BMR. CB, CatBoost; XG, XGBoost.
Statistical analysis
To analyze our results statistically, this study used a T-test to validate if the difference between the models' results is statistically significant or not. 17 Thus, we consider two hypotheses to illustrate T-test results.
Null hypothesis (Ho) is that the difference between the evaluation metrics in results is not significant.
Alternative hypothesis (Ha) affirms that the difference between those metrics is significant.
The T-test showed that the null hypothesis could not be rejected in favor of alternative hypotheses for RF and CB algorithms. However, when the test was performed between LR and RF, between LR and XG, and also between LR and CB results, it favored the alternative hypothesis and identified that the difference in terms of F-measure, AUC, and savings is statistically significant.
Conclusion
This study aims at analyzing the fraud detection problem in credit cards using FS for high dimensionality and resampling for handling imbalanced data. Several combinations of FS and oversampling method SMOTE are investigated using two different preprocessing orders. Two well-known machine learning classifiers RF and LR are applied at the preprocessing level with two robust boosting learning classifiers CB and XG for the classification stage. In addition, BMR is used as an example-dependent cost-sensitivity learner for optimization. The results are analyzed using F1-score, AUC, and savings measures. For experiments, the order of preprocessing and the technique of FS are focused to increase the efficacy of the proposed approach. Results indicate that the best feature is SU concerning the performance when it is used before the oversampling.
Second, the results are compared with and without FS SMOTE/BMR separately and it is found that the best scenario is the combination of SU + SMOTE + classifier concerning AUC and F-measures, particularly for RF and CB classifiers. Generally, the use of SMOTE with SU boosts both the savings measure and F-measure. Moreover, using BMR increases the savings on the expense of the F-measure, which shows identifying higher loss of amount against less number of detected fraud cases. As a result, the decision maker can priories to save a high fraudulent amount than considering the predicted number of fraud cases. In future work, we plan to further optimize the classification process using a modified loss function of cost sensitivity during the training stage.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
