BAT algorithm based feature selection: Application in credit scoring

Abstract

Credit scoring plays a vital role for financial institutions to estimate the risk associated with a credit applicant applied for credit product. It is estimated based on applicants’ credentials and directly affects to viability of issuing institutions. However, there may be a large number of irrelevant features in the credit scoring dataset. Due to irrelevant features, the credit scoring models may lead to poorer classification performances and higher complexity. So, by removing redundant and irrelevant features may overcome the problem with large number of features. In this work, we emphasized on the role of feature selection to enhance the predictive performance of credit scoring model. Towards to feature selection, Binary BAT optimization technique is utilized with a novel fitness function. Further, proposed approach aggregated with “Radial Basis Function Neural Network (RBFN)”, “Support Vector Machine (SVM)” and “Random Forest (RF)” for classification. Proposed approach is validated on four bench-marked credit scoring datasets obtained from UCI repository. Further, the comprehensive investigational results analysis are directed to show the comparative performance of the classification tasks with features selected by various approaches and other state-of-the-art approaches for credit scoring.

Keywords

BAT algorithm credit score feature selection

1 Introduction

Credit risk is the principal concern for financial institutions, and the viability of credit risk assessment is the principal issue for the endurance and improvement of financial institutions. It is indicated by Thomas et al. [25] “Credit scoring is a set of decision models and their underlying techniques that aid credit lenders in the granting of credit” [17]. In other words, Credit scoring is a way to determine the risk associated with credit products and applicants [19, 27]. Further, based on evaluated credit risk an applicant can be categorized as “good credit or legitimate”, “bad credit or suspicious”, or “moderate credit” classes [26, 29]. The discriminative ability of the credit scoring model is important for financial institutions and a slight improvement in predictive precision could result in a noteworthy improvement in revenues or reduces potential losses for financial institutions [6, 28]. Various financial institutions are carrying out credit scoring in various steps such as: Application scoring concerns for the evaluating the legitimateness or suspiciousness of new applicant based on his social, financial and other status, Behavioural scoring is applied on active consumers to analyse their behavioural patterns to support “dynamic portfolio management processes”, Collection scoring is about to separate the consumers into various groups to put appropriate attention to appropriate group and Fraud detection ranks the consumers according to the relative likelihood that consumer may be dishonest [13, 22].

From the literature, credit risk evaluation has been viewed as classification problem, and found machine learning as a reliable way to explore hidden patterns from applicants’ details. In this context, a range of Machine Learning (ML) techniques such as “Artificial Neural Network (ANN)” and “Support Vector Machine (SVM)” are utilized to model the risk evaluation systems and to improve credit risk prediction. Li et al. [14] have projected a credit assessment model using SVM with optimized hyperplane in order to maximize the margin of separation for binary class to recognize potential candidates for consumer loans. An approach based on “Least Squares Support Vector Machine (LS-SVM)” with Bayesian evidence framework to categorize the reliability of potential corporate clients [31]. Zhou et al. [37] have applied weighted SVM with Genetic Algorithm (GA) based parameters optimization and t-test based features weighting for credit scoring model etc. West [34] has presented a comparative analysis of various neural networks, Parametric models and Non-parametric models for classification and evaluated the performance in terms of classification accuracy.

As credit scoring dataset comprises of various status such as financial, social, personal, etc. of a credit applicant. So, it has a large number of features and some of the features may be irrelevant. With high-dimensional and heterogeneous features, credit scoring models will be unstable and will have high computational complexity [33]. Hence, selection of significant features (or elimination of extraneous feature) is a way to decrease the computational complexity and to get better accuracy [18, 33]. Feature selection is an approach to choose input features which are more relevant towards the particular outcome [6]. The major intention behind the feature selection is to improve the prediction performance faster and cost-effective. Because, with a large number of features, noise may be enhanced which can affect classification accuracy and it also consumes more time to train the classifier. Feature selection methods can be used to isolate and eliminate irrelevant and redundant attributes from a dataset which reduce the predictive capability of a model. Suppose that there are total N number of features in the dataset then for selection of M features, there are (N! / (M! *(N-M)!) number of possible combinations. To check all possible combinations to find out which set is the best features subset is a heuristic based approach and not a real time solvable problem. Thus, meta-heuristic Binary BAT Algorithm (BBA) has been utilized to find the approximate optimal subset of the features. Moreover, a new objective function is proposed for this combinatorial optimization problem.

Reminder of the article is structured as follows: Section 2 describes the BAT optimization approach, Section 3 presents the proposed approach for feature selection, Section 4 exhibits a comparative analysis with state-of-art approach for credit scoring followed by the concluding remarks.

2 Binary BAT algorithm

Bats (micro bats) have a special ability of echolocation. Basically, bats emit loud and a short pulse of sound and wait for a few amount of time. When they receive echo returns, they can calculate the distance of the object. Using this ability of bats, Yang (2010) [36] has discovered a new meta-heuristic optimization technique and named it as BAT Algorithm (BA). In this algorithm, a team of bats traces for the food or prays using their echolocation ability. Using the behaviour of the bats, Yang (2010) [36] has proposed some rules based on the echolocation characteristics of bats, these idealized rules are presented below. ${Freq}_{i} = {Freq}_{\min} + ({Freq}_{\max} - {Freq}_{\min}) * β$ (1) ${Vl}_{i} (t + 1) = {Vl}_{i} (t) + ({Pos}_{i} (t) - G_{best}) * {Freq}_{i}$ (2) ${Pos}_{i} (t + 1) = {Pos}_{i} (t) + {Vl}_{i} (t + 1)$ (3) Where, Freq_i represents the frequency of i^th bat which is updated in each iteration as in Equation 1, β ∈ [0, 1] is a randomly generated number between 0 to 1 and G_best is the best solution obtained. Vl_i (t) and Pos_i (t) represent velocity and position of i^th bat at t^th iteration.

Equations 1-3 could guarantee the exploitability of the BA then also in order to enhance exploitation capabilities, random walk procedure has been added and as in Equation 4. ${Pos}_{new} = {Pos}_{old} + ε * L^{t}$ (4) $L_{i} (t + 1) = α * L_{i} (t)$ (5) $R_{i} (t + 1) = R_{i} (0) [1 - e^{- γ t}]$ (6) Where, ε denotes a random number and value lies in between -1 to 1. L and R indicate the loudness and pulse rate of i^th bat at t^th iteration and both are updated as in Equations 5 and 6 respectively, when new solution is moving towards the improved solution. α and γ are pre-defined constant.

As, BA works on continuous search space and feature selection is a binary search space optimization problem. So, the new positions of bats can be updated by the Equation 3 by adding the velocity with earlier best position. But, in case of discrete or binary spaces the position must be presented with either 1 or 0. So, updating the position of binary spaces differ from continuous spaces. Binary version of BAT algorithm (BBA), which is similar to BA algorithm is proposed with difference in transfer function and position updating [20, 30]. Mirjalili et al. (2014) [20] have revised the transfer function to map the continuous search space to discreet search space as given below in Equation 9. $S ({Vl}_{i}^{k} (t)) = \frac{1}{1 + e^{- {Vl}_{i}^{k} (t)}}$ (7) Where, ${Vl}_{i}^{k} (t)$ is velocity of bat i at k^th dimension on t^th iteration and it is updated by Equation 2 in each iteration.

Further, particles’ positions are reorganized as in Equation 8 by considering the updated velocity as in Equation 7. ${Pos}_{i}^{k} (t + 1) = {\begin{matrix} 1 if rand > S ({Vl}_{i}^{k} (t + 1)) \\ 0 if rand < S ({Vl}_{i}^{k} (t + 1)) \end{matrix}$ (8) Where, ${Vl}_{i}^{k} (t)$ and ${Pos}_{i}^{k} (t)$ represent velocity and position of i^th bat at t^th iteration at k^th dimension.

The limitation of above Equations 7 and 8 is that there is hard threshold to convert the Pos values into either 1 or 0. So, the position of the bats are not changing when their velocity increase. To solve this problem authors in article [20] have proposed a V-shaped transfer function and position updating rule as follows in Equations 9 and 10. $V ({Vl}_{i}^{k} (t)) = | \frac{2}{π} * arktan (\frac{π}{2} {Vl}_{i}^{k} (t)) |$ (9) ${Pos}_{i}^{k} (t + 1) = {\begin{matrix} {Pos}_{i}^{k} (t)^{- 1} if rand < V ({Vl}_{i}^{k} (t + 1)) \\ {Pos}_{i}^{k} (t) if rand \geq V ({Vl}_{i}^{k} (t + 1)) \end{matrix}$ (10) Where, ${Vl}_{i}^{k} (t)$ and ${Pos}_{i}^{k} (t)$ represent velocity and position of i^th bat at t^th iteration on k^th dimension respectively. $({Pos}_{i}^{k} (t))^{- 1}$ represents the inverse of the ${Pos}_{i}^{k} (t)$ .

3 Proposed methodology

The proposed work emphasizes on improving the classification accuracy by reducing the number of features on credit scoring datasets. This section presents feature selection approach and proposed fitness function.

3.1 Feature selection

In this work, a new feature selection algorithm using BBA algorithm with a novel fitness function is proposed. Toward to feature subset selection, main motive is to select a set of less and valuable features which improves the classification performance. The proposed architecture for feature selection by utilizing BBA is as shown in Fig. 1.

Fig. 1

Architecture of the proposed model for feature selection.

For feature selection, dataset is divided into two parts training dataset and testing dataset denoted as Tr and Ts respectively with all features. In this algorithm, first the population, positions, loudness and pulse emission rate of bats are initialized. Positions of the bats are selected randomly and values must be either 0 or 1 with the size of total number of features in that dataset. If it is one, it represents that feature is present else it is absent. Further, new training and testing datasets are generated from Tr and Ts i.e. D1 and D2 as per the bats positions. Classifier is trained on D1 and tested on D2 to calculate the fitness value against each bat. Further, the loudness L_i and the rate of pulse emission R_i are updated as per the Equation 5 and 6 respectively, if a new solution has been accepted. Generally, the pulse emission rate increases and the loudness decreases after a prey is caught by a bat. Steps of feature selection algorithm are as follows:

Initialize the population and positions of bats. Size of bat must be as the size of features of dataset chose the random values either 1 or 0. Where, 1 represents corresponding positioned feature is present and 0 is not present.

Initialize the velocity, loudness and frequency of bats.

Create the training and testing datasets form the original dataset.

For each bat generate the training and testing datasets with selected features (D1 and D2).

Calculate the fitness value for each bat and find the local best based on the fitness value.

Update the velocity, loudness and frequency of bat.

Repeat the step 2 till number of iterations or fitness value is less then number of max iterations or threshold respectively.

Find the global best as G_best and position of G_best is as selected features.

3.2 Proposed fitness function

For formation of fitness function, sensitivity, specificity and cost of selected feature set are considered. Thus, the main objective is to search the bat position with higher classification performances with less number of features. By keeping the aforementioned points, a fitness function is designed which combines the multiple criteria as Sensitivity (Sen), Specificity (Spe), number of feature and as defined in Equation 12. Sensitivity is associated with pre-defined weight W_a and it can be adjusted to 1 (in case of sensitivity is the most important) and same way specificity and cost of selected feature are associated with weights W_b and W_c respectively. Cost of selected features is considered as ratio of total number of features as F_t and features selected (as per the bat position) as F_s. Main motive is to improve the classification performances which is maximization problem with less number of features which is minimization problem. Maximization and minimization can’t be mapped in single fitness function. So, number of features in feature set is converted to maximization by $T = \frac{F_{t}}{F_{s}}$ . If the ratio is high (large number of features is selected) then T will be less else with less number of features T will be high. Now, classification performances along with T can be mapped into a single objective function as maximization problem as in Equation 12. $fitness - value = W_{a} * Sen + W_{b} * Spe + W_{c} * \frac{F_{t}}{F_{s}}$ (11)

The probability of preserving a bat with a higher fitness-value for the next generation is quite high. Weights can be adjusted (weights to sensitivity, specificity and cost of the feature) as per the requirement of optimized fitness value.

4 Experimental analysis

This section comprises of three sub-sections which represent the description of credit approval datasets and performance measures used in this experimental work, comparative result analysis of proposed approach with some existing feature selection approaches and comparative result analysis with prior works.

4.1 Credit approval datasets and performance measures

Four datasets are used in this experiment namely: “Australian Dataset (AUS)”, “German (categorical) Dataset (GCD)”, “German (numerical) Dataset (GND)” and “Japanese Dataset (JPD)” acquired from the UCI Machine Learning Repository [1], and comprehensive explanation about datasets are tabulated in Table 1. All aforementioned datasets are real-world (bench-marked) credit scoring datasets and have a combination of categorical and numerical attributes.

Table 1
Description of credit scoring datasets

S. No Dataset Samples Class-1/Class-2 Features

1 AUS 690 307/383 14

2 GCD 1000 700/300 20

3 GND 1000 700/300 24

4 JPD 690 307/383 15

S. No	Dataset	Samples	Class-1/Class-2	Features
1	AUS	690	307/383	14
2	GCD	1000	700/300	20
3	GND	1000	700/300	24
4	JPD	690	307/383	15

There are various performance measures to evaluate the classification. The most popular performance measures is accuracy. But, with imbalance dataset towards a specific class, accuracy is not adequate as measure for performance evaluation of a model. Because, with well prediction of only majority classes samples can show high accuracy. So, in this work, another measure “F1-score is a measure of a test’s accuracy and considers both the positive and negative accuracies of the test to compute the score” is considered. Accuracy and F1-score are calculated as per the Equations 19 and 20 respectively. $Accuracy (ACC) = \frac{TP + TN}{TP + TN + FP + FN}$ (12) $F 1 - score (F 1 - S) = 2 * TP * \frac{1}{2 * TP + FP + FN}$ (13) Where, TP, FP, TN and FN denote “True Positive”, “False Positive”, “True Negative” and “False Negative”, respectively.

4.2 Comparative analysis

The experimental results described in this section are performed on HP PC with 3.60 GHz Intel Core I7 8^gen CPU, 16 GB RAM and 64 bit Windows 10 operating system. Implementation is done using Matlab R2012a. Pre-processing is an imperative phase in case of machine learning, here “treatment for the missing values”, “ data-transformation”, “data-normalization” followed by “data-sampling” are considered. After the pre-processing of datasets, AUS, GCD, GND and JPD datasets have 250, 653, 1000, 1000 and 601 samples respectively. Further pre-processed dataset is separated into training dataset (with 75%) and test dataset (with 25%). Meantime, the proportion on the two class (healthy or bankruptcy) of both the training and testing set remain the same as the original one. As per the proposed fitness function (as in Equation 12), three predefined weights W_a, W_b and W_c are required. For this work, we have considered W_a = 0.48, W_b = 0.48 and W_c = 0.04. As, BBA is population based approach, all the experiments are conducted with population size as 50 and number of iterations as 100. In this work, we have applied proposed approach with three classifiers namely RBFN, RF and SVM. So, for RBFN the parameter σ = 0.3 is used and it shows the spread, for RF we have experimented with 100 no of trees and in case of SVM C=0.7 which is the regularization parameter.

Figures 2a–2d present the convergence curves of feature selection approach with respective dataset and classifier. In these figures, horizontal axis denotes the number of iterations and vertical axis denotes the corresponding fitness value. BBA-RBFN, BBA-RF and BBA-SVM show the convergence graph of BBA with classifier RBFN, RF and SVM respectively. As depicted in Fig. 2a, the BBA has been conversed within 20^th, 10^th and 40^th iterations and achieved the highest fitness value with each RBFN, RF [24] and SVM on Australian dataset and BBA-RBFN has obtained the highest fitness value. Similar to Australian dataset, In case of German categorical, numerical datasets along with Japanese dataset as depicted in Figs. 2b, 2c and 2d respectively, proposed approach with RBFN (BBA-RBFN) achieves the highest fitness values.

Fig. 2

Convergence curve of BBA based feature selection with respective classifiers on (a) AUS, (b) GCD (c) GND (d) JPD datasets.

As, BBA is population based optimization approach, sometimes it is converging better and sometimes may not. So, In order to show the stability of proposed approach, this procedure is repeated 10-iterations with different set of training and testing dataset. Dataset with optimized features set in each iteration is segregated by 10-fold-cross-validation (10-FCV) and mean of 10-FCV are considered for comparative analysis. Mean of 10-FCV results in terms of Accuracy and F1-Score are depicted in Figs. 3 –6 with respective datasets and classifiers. From the Figs. 3 –6, it is observed that RBFN has better classification accuracy and F1-score as compared to SVM and RF in most of cases.

Fig. 3

Comparison graph of RBFN, SVM and RF on Australian dataset.

Fig. 4

Comparison graph of RBFN, SVM and RF on German (categorical) dataset.

Fig. 5

Comparison graph of RBFN, SVM and RF on German (numerical) dataset.

Fig. 6

Comparison graph of RBFN, SVM and RF on Japanese dataset.

Further, for comparative results analysis, various feature selection approaches such as “Stepwise Regression (STEP) [35]”, “Classification & Regression Tree (CART) [3]”, “Multivariate Adaptive Regression Splines (MARS) [7]”, “Correlation Coefficient (CORR) [8]” and “Multi-Cluster Feature Selection (MFCS) [4]” are considered. Mean of 10-FCV, with 10-iterations is utilized for comparative analysis and are presented as in Table 2 with respective dataset. In Australian dataset, as the results are tabulated in Table 2, PA achieves 87.61% and 84.93% accuracy and F1-score respectively. PA with RBFN beats the best feature selection method CART and makes improvement of 0.96% and 0.23% with accuracy and F1-score performance measures respectively. Further, PA is also applied with two more classifiers as RF and SVM. With both classifiers, PA achieves better performances. Overall, PA improves the classification performances of RBFN, RF and SVM, and PA with RBFN has the best classification performances as compared to feature selection approaches STEP, CART, MARS, CORR and MCFS with RBFN, SVM, RF based classification. Similar to Australian dataset, proposed approach has best classification performances as compare to other approaches except than MCFS with RF and CORR with SVM has the best performances on Japanese dataset. Overall, it can be concluded as proposed approach for feature selection has better classification performances with RBFN, RF and SVM on four credit scoring datasets.

Table 2

Performances of RBFN, RF and SVM with various feature selection approach on various credit scoring datasets

Approach	RBFN		RF		SVM
	ACC	F1-S	ACC	F1-S	ACC	F1-S
Australian Dataset
STEP	84.46	81.31	83.33	81.08	85.23	83.36
CART	86.65	84.67	84.56	82.88	86.22	81.62
MARS	79.88	78.43	74.51	72.98	79.94	78.79
CORR	84.96	83.56	87.35	84.39	86.13	83.18
MCFS	85.94	84.82	85.12	83.03	87.07	84.73
PA	87.61	84.93	85.73	83.66	86.98	82.35
Japanese Dataset
STEP	85.03	82.79	84.31	82.22	86.23	81.73
CART	85.61	83.92	86.09	83.41	86.74	82.35
MARS	85.23	83.36	85.32	82.67	86.28	83.03
CORR	84.91	83.97	85.71	83.46	83.78	81.78
MCFS	84.62	84.53	87.78	85.72	82.28	80.71
PA	87.93	84.12	86.97	84.13	87.69	83.47
German Categorical Dataset
STEP	80.34	79.36	78.93	74.70	79.10	79.24
CART	83.49	83.07	80.30	80.46	78.70	77.97
MARS	82.50	81.93	79.22	80.56	78.30	79.47
CORR	78.93	80.46	78.06	80.03	79.31	80.67
MCFS	79.39	80.93	79.97	80.73	76.39	80.73
PA	84.34	84.86	82.80	81.40	84.11	82.89
German Numerical Dataset
STEP	74.05	68.40	75.50	61.93	71.20	64.34
CART	73.93	63.29	76.50	61.62	72.20	72.48
MARS	76.38	75.08	75.30	62.18	71.00	62.76
CORR	75.67	74.38	78.84	68.09	76.06	71.84
MCFS	75.13	74.83	79.27	70.67	75.67	72.15
PA	77.32	76.39	76.83	62.02	72.69	72.38

4.3 Comprehensive comparative analysis

This sub-section presents a comparison of outcomes acquired by proposed method with the outcomes received from literature on credit scoring datasets. These results are tabulated in Table 3 in terms of classification accuracy with respective dataset, approaches applied along with respective references. From the outcomes as depicted in Table 3, it is determined that proposed approach has achieved the best performances in case of Japanese, German (categorical) and German (numerical) dataset. And, it achieves 4^th best performance in case of Australian dataset. As, this study focused to combine the feature selection with classification. So, inclusive it can be determined that proposed approach (PA-RBFN) has the best performances with most of the real world credit scoring datasets.

Table 3
Performance of various credit scoring models on credit scoring datasets

Method Dataset References

AUS JPD GCD GND

Classification

ANN 84.10 – 72.80 – [2]

KNN 83.60 – 66.90 – [2]

SVM-L 87.40 – 74.80 – [2]

SVM-R 86.10 – 75.90 – [2]

CART 85.90 – 55.90 – [2]

J48 84.50 – 64.10 – [2]

LR-R 86.20 – 75.40 – [2]

RBFN 87.14 – 74.60 – [34]

MLP 85.84 – 73.28 – [34]

LVQ 82.97 – 68.37 – [34]

Hybrid approach

SR+ANN 84.09 – – – [35]

Sampling+F- 86.76 – 76.84 – [9]

score+SVM

NRS based FS – 85.48 74.50 – [10]

SVM + Grid search 85.51 – 76.00 – [11]

SVM + Grid search 84.20 – 77.50 – [11]

+ F-score

SVM + GA 86.90 – 77.92 – [11]

NRS+SVM+ Grid 87.52 – 76.60 – [23]

search

GA+NB 85.56 – – 74.03 [15]

LDA+MLP 86.00 – – 73.44 [15]

HGA-NN – – 78.90 – [21]

PSO+SVM 91.03 – 81.63 – [16]

LDA + SVM 88.10 – 73.60 – [5]

GA+SVM 90.19 – 84.24 – [12]

SVM-RBF – – 78.00 – [1]

RS+TS+LR – 86.40 – – [32]

Proposed approach

PA+RBFN 87.61 87.93 84.34 77.32 This study

PA+SVM 86.98 87.69 84.11 72.69 This study

PA+RF 85.73 86.97 82.80 76.83 This study

Method	Dataset	References
Classification
ANN	84.10	–	72.80	–	[2]
KNN	83.60	–	66.90	–	[2]
SVM-L	87.40	–	74.80	–	[2]
SVM-R	86.10	–	75.90	–	[2]
CART	85.90	–	55.90	–	[2]
J48	84.50	–	64.10	–	[2]
LR-R	86.20	–	75.40	–	[2]
RBFN	87.14	–	74.60	–	[34]
MLP	85.84	–	73.28	–	[34]
LVQ	82.97	–	68.37	–	[34]
Hybrid approach
SR+ANN	84.09	–	–	–	[35]
Sampling+F-	86.76	–	76.84	–	[9]
score+SVM
NRS based FS	–	85.48	74.50	–	[10]
SVM + Grid search	85.51	–	76.00	–	[11]
SVM + Grid search	84.20	–	77.50	–	[11]
+ F-score
SVM + GA	86.90	–	77.92	–	[11]
NRS+SVM+ Grid	87.52	–	76.60	–	[23]
search
GA+NB	85.56	–	–	74.03	[15]
LDA+MLP	86.00	–	–	73.44	[15]
HGA-NN	–	–	78.90	–	[21]
PSO+SVM	91.03	–	81.63	–	[16]
LDA + SVM	88.10	–	73.60	–	[5]
GA+SVM	90.19	–	84.24	–	[12]
SVM-RBF	–	–	78.00	–	[1]
RS+TS+LR	–	86.40	–	–	[32]
Proposed approach
PA+RBFN	87.61	87.93	84.34	77.32	This study
PA+SVM	86.98	87.69	84.11	72.69	This study
PA+RF	85.73	86.97	82.80	76.83	This study

5 Conclusion

In this paper, Binary BAT algorithm based feature selection method has been proposed with a novel fitness function. Fitness function is based on classification performances and cost of features selected (bat position) by Binary BAT algorithm. Proposed approach has been experimented on credit scoring datasets with three classifiers namely: RBFN, RF and SVM. Further, results are compared with various feature selection approaches such as STEP, CART, MARS, CORR and MCFS with RBFN, RF and SVM and various credit scoring models obtained from the literature. From the experimental outcomes, it located that proposed method for feature selection with RBFN has better performance as compared to same with SVM and RF in terms of accuracy, F1-score and convergence rate. Features selected by proposed approach are more representative and improves the classification performances of RBFN, SVM and RF as compared to existing feature selection approaches. So, overall it can be concluded that proposed approach has the best performances with most of the real world credit scoring datasets.

References

UCI machine learning repository (Last Accessed 2019/12/25), https://archive.ics.uci.edu/ml/index.php

Bequé

and Lessmann

, Extreme learning machines for credit scoring:An empirical evaluation, Expert Systems with Applications (2017).

Breiman

, Friedman

, Stone

C.J.

and Olshen

R.A.

, Classification and regression trees. CRC Press (1984).

Cai

, Zhang

and He

, Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010), 333–342.

Chen

F.L.

and Li

F.C.

, Combination of feature selection approaches with svm in credit scoring, Expert Systems with Applications37(7) (2010), 4902–4909.

Edla

D.R.

, Tripathi

, Cheruku

and Kuppili

, An efficient multi-layer ensemble framework with bpsogsa-based feature selection for credit scoring data analysis, Arabian Journal for Science and Engineering43(12) (2018), 6909–6928.

Friedman

J.H.

, Multivariate adaptive regression splines, The Annals of Statistics (1991), 1–67.

Hall

M.A.

, Correlation-based feature selection for machine learning (1999).

Hens

A.B.

and Tiwari

M.K.

, Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method, Expert Systems with Applications39(8) (2012), 6774–6781.

10.

, Yu

, Liu

and Wu

, Neighborhood rough set based heterogeneous feature subset selection, Information Sciences178(18) (2008), 3577–3594.

11.

Huang

C.L.

, Chen

M.C.

and Wang

C.J.

, Credit scoring with a data mining approachbased on support vector machines, Expert Systems with Applications33(4) (2007), 847–856.

12.

Huang

C.L.

and Wang

C.J.

, A ga-based feature selection and parameters optimizationfor support vector machines, Expert Systems with Applications31(2) (2006), 231–240.

13.

Kuppili

, Tripathi

and Reddy Edla

, Credit score classification using spiking extreme learning machine, Computational Intelligence36(2) (2020), 402–426.

14.

S.T.

, Shiue

and Huang

M.H.

, The evaluation of consumer loans using support vector machines, Expert Systems with Applications30(4) (2006), 772–782.

15.

Liang

, Tsai

C.F.

and Wu

H.T.

, The effect of feature selection on financial distress prediction, Knowledge-Based Systems73 (2015), 289–297.

16.

Lin

S.W.

, Ying

K.C.

, Chen

S.C.

and Lee

Z.J.

, Particle swarm optimization for parameter determination and feature selection of support vector machines, Expert Systems with Applications35(4) (2008), 1817–1824.

17.

Louzada

, Ara

and Fernandes

G.B.

, Classification methods applied to credit scoring: Systematic review and overall comparison, Surveys in Operations Research and Management Science (2016).

18.

Maldonado

, Weber

and Basak

, Simultaneous feature selection and classification using kernel-penalized support vector machines, Information Sciences181(1) (2011), 115–128.

19.

Mester

L.J.

, et al., What’s the point of credit scoring?, Business Review3(Sep/Oct) (1997), 3–16.

20.

Mirjalili

, Mirjalili

S.M.

and Yang

X.S.

, Binary bat algorithm, Neural Computing and Applications25(3–4) (2014), 663–681.

21.

Oreski

and Oreski

, Genetic algorithm-based heuristic for feature selection in credit risk assessment, Expert Systems with Applications41(4) (2014), 2052–2064.

22.

Paleologo

, Elisseeff

and Antonini

, Subagging for credit scoring models, European Journal of Operational Research201(2) (2010), 490–499.

23.

Ping

and Yongheng

, Neighborhood rough set and svm based hybrid credit scoring classifier, Expert Systems with Applications38(9) (2011), 11300–11304.

24.

Rodriguez

J.J.

, Kuncheva

L.I.

and Alonso

C.J.

, Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence28(10) (2006), 1619–1630.

25.

Thomas

L.C.

, Edelman

D.B.

and Crook

J.N.

, Credit scoring and its applications, SIAM (2002).

26.

Tripathi

, Cheruku

and Bablani

, Relative performance evaluation of ensemble classification with feature reduction in credit scoring datasets. In: Advances in Machine Learning and Data Science, pp. 293–304. Springer (2018).

27.

Tripathi

, Edla

D.R.

and Cheruku

, Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification, Journal of Intelligent & Fuzzy Systems34(3) (2018), 1543–1549.

28.

Tripathi

, Edla

D.R.

, Cheruku

and Kuppili

, A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification, Computational Intelligence35(2) (2019), 371–394.

29.

Tripathi

, Edla

D.R.

, Kuppili

, Bablani

and Dharavath

, Credit scoring model based on weighted voting and cluster based feature selection, Procedia Computer Science132 (2018), 22–31.

30.

Tripathi

, Edla

D.R.

, Kuppili

and Dharavath

, Binary bat algorithm and RBFN based hybrid credit scoring model, Multimedia Tools and Applications (2020), 1–24.

31.

Van Gestel

, Baesens

, Suykens

J.A.

, Van den Poel

, Baestaens

D.E.

and Willekens

, Bayesian kernel based classification for financial distress detection, European Journal of Operational Research172(3) (2006), 979–1003.

32.

Wang

, Guo

and Wang

, Rough set and tabu search based feature selectionfor credit scoring, Procedia Computer Science1(1) (2010), 2425–2432.

33.

Wang

, Hedar

A.R.

, Wang

and Ma

, Rough set and scatter search metaheuristic based feature selection for credit scoring, Expert Systems with Applications39(6) (2012), 6123–6128.

34.

West

, Neural network credit scoring models, Computers & Operations Research27(11) (2000), 1131–1152.

35.

Wongchinsri

and Kuratach

, Sr-based binary classification in credit scoring. In: 2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). pp. 385–388. IEEE (2017).

36.

Yang

X.S.

, A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010), pp. 65–74. Springer (2010).

37.

Zhou

, Lai

K.K.

and Yen

, Credit scoring models with auc maximization based on weighted svm, International Journal of Information Technology & Decision Making8(04) (2009) 677–696

BAT algorithm based feature selection: Application in credit scoring

Abstract

Keywords

1 Introduction

2 Binary BAT algorithm

3.1 Feature selection

4.1 Credit approval datasets and performance measures

Table 1 Description of credit scoring datasets S. No Dataset Samples Class-1/Class-2 Features 1 AUS 690 307/383 14 2 GCD 1000 700/300 20 3 GND 1000 700/300 24 4 JPD 690 307/383 15

References

Table 1
Description of credit scoring datasets

S. No Dataset Samples Class-1/Class-2 Features

1 AUS 690 307/383 14

2 GCD 1000 700/300 20

3 GND 1000 700/300 24

4 JPD 690 307/383 15