Abstract
Credit scoring is a procedure to estimate the risk related with credit products which is calculated using applicants’ credentials and applicants’ historical data. However, the data may have some redundant and irrelevant information and features, which lead to lower accuracy on the credit scoring model. So, by eliminating the redundant features can resolve the problem of credit scoring dataset. In this work, we have proposed a hybrid credit scoring model based on dimensionality reduction by Neighborhood Rough Set (NRS) algorithm and layered ensemble classification with weighted voting approach to improve the classification performance. For classifiers’ raking, we have proposed a novel classifier ranking algorithm as an underlying model for representing ranks of the classifiers based on classifier accuracy. It is used on seven heterogeneous classifiers for finding the ranks of those classifiers. Further five best ranking classifiers are used as base classifier in layered ensemble framework. Results of the ensemble frameworks (Majority Voting (MV), Weighted Voting (WV), Layered Majority Voting (LMV), Layered Weighted Voting (LWV)) with all features and after feature reduction by various existing feature selection algorithms are compared in terms of accuracy, sensitivity, specificity and G-measure. Further, results of ensemble frameworks with NRS are also compared in terms of ROC curve analysis. The experimental outcomes reveal the success of proposed methods in two benchmarked credit scoring (Australian credit scoring and German loan approval) datasets obtained from UCI repository.
Introduction
Credit scoring is a way of calculate the risk associated with credit products [1] using applicants’ credentials (such as annual income, job status, residential status and etc.), historical data and statistical techniques. It tries to separate the effects of various applicant characteristics on criminal behavior and defaults. The main focus of credit score model is to determine whether credit consumer belongs to either legitimate or suspicious customer group. Credit scoring is not a single step process, it is done by financial institutions in many phases such as application scoring, behavioral scoring, collection scoring and fraud detection [2]. When a new application arrives for new credit, application scoring is done for the evaluation of the legitimateness or suspiciousness of new applicant. That evaluation is done on the basis of social, financial, and other data collected at the time of the application. Behavioral scoring is similar as application scoring, but it is for the existing customers to analyses the existing consumer’s behavior patterns to support dynamic portfolio management processes. Collection scoring is about to separate the customers into different groups (early, middle, late recovery), to put more, moderate or less level of attention on these groups. Fraud scoring models rank the applicants according to the relative probability that an applicant may be dishonest. In this paper, our focus is on the application scoring problem.
Many researchers such as [3–5] have proposed bio-inspired algorithms for feature selection to improve the performance of the classifiers. Ping et al. (2011) [5], in this study a hybrid Neighborhood Rough Set and the Support Vector Machine (SVM) based classifier is proposed for credit scoring. Hu et al. [6, 7] have presented a neighborhood rough set model to resolve the problem of heterogeneous feature subset selection. Neighborhood relations are a kind of similarity relations, which satisfy the properties of reflexivity and symmetry and draw the objects together for similarity or indistinguishability in terms of distances and the samples in the same neighborhood granule are close to each other.
Conventional credit scoring models are based on individual classifiers or a simple combination of these classifiers which tend to show moderate performance. Many classifiers such as Naive Bayes (NB), SVM, Decision Tree (DT), Neural Network based classifiers and more have been proposed to learn problems thus far. However, all of them have their own positive and negative aspects. So they are good only for specific problems. But there is no specific way to recognize which classifier is a better or good classifier for a specific problem. Thus ensemble classifier is a strong approach to produce a near to optimal classifier for any problem [8]. This method reinforces the ensemble in error-prone subspaces, and hence it can lead to better performance for the classification. Generally the result of combination of diverse classifiers, is better classification [9, 10]. Basically there are two types of ensemble frameworks as homogeneous and heterogeneous ensemble frameworks [10, 11]. In homogeneous ensemble framework, the base classifiers of same type are used whereas heterogeneous ensemble framework is composed of base classifiers that belong to different types. Most popular ways to combine the base classifiers are as majority voting and weighted voting [9, 12]. A multi-layer classifiers ensemble framework [10, 11] is used based on the optimal combination of heterogeneous classifiers. The multi-layer model overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of five heterogeneous classifiers.
Rest of the paper is organized as follows: Section 2, describes our proposed work flow for credit scoring data classification. Section 3, presents experimental results obtained from the proposed model and relative performance of different feature selection techniques followed with ensemble classification performance. Further, it is followed by concluding remarks and references.
Proposed methodology
Here we have proposed a hybrid approach which combines the NRS approach for feature selection and layered ensemble classification as in Fig. 1. Our proposed approach for credit scoring works on three phases, first phase for finding the rank of the classifiers and weights assigning to these classifiers, second is associated with feature selection using NRS algorithm and third phase is and construction of the ensemble framework. Preprocessed data which includes the data cleaning, data transformation and data normalization is used for rank assignment to classifier and feature selection phase. Detailed descriptions about proposed hybrid approach is given in following subsections.

Proposed work flow for credit scoring.
There is no specific way to recognize which classifier is a better or good classifier for a specific problem. Thus ensemble learning is a strong approach to produce a near to optimal classifier for any problem. In this phase seven classifiers as Naive Bayes (NB), Multilayer Feed Forward Neural Network (MLFFNN) and Decision Tree (DT), Quadratic Discriminant Analysis (QDA), Time Delay Neural Network (TDNN), Deep Tenser Neural Network (DTNN), Decision Tree (DT) and Probabilistic Neural Network (PNN) are initially utilized to find the rank of each classifier. Equation (1) is used as an underlying model for representing ranking of the classifiers. Further the five classifiers with best ranking are arranged as in Fig. 1. In case of multi-layer ensemble framework C1 and C2 with highest ranking are at the second layer and rest of three classifiers are in first layer.
In this work we have used weighted voting approach with heterogeneous multi-layer ensemble framework. In case of weighted voting a weight is assigned to each base classifiers. For weight assignment, classification accuracy is used as parameter to calculate the weights for the classifiers. A classifier with the highest accuracy is assigned the highest weight and the lowest weight is assigned to a classifier with the lowest accuracy. These weights are calculated by the Equation (1). Initially equal weights are assigned to each base classifier, then dataset is applied for classification to calculate the accuracy. Further the weights are updated (Equation (1)) and this procedure is repeated for n iteration and mean of the updated weights of n iterations are assigned to the respective classifier.
Where,
Feature selection is a standout amongst the most basic issues in the field of machine learning. The main aim of feature selection is to determine a minimal subset of features from a set of features. In other words, feature selection is a process of finding a subset of features that ideally is necessary and sufficient to describe the target concept from the original set of features in a given dataset. Here we have used a NRS feature selection approach to select the best features from a feature set. Complete algorithm is described in algorithm (1) [6, 13].
1: ∀a ∈ A c : compute equivalence relation R a ;
2: ∀a ∈ A n : compute neighborhood relation N a or k a ;
3: φ → red:red is the pool to contain the selected attributes;
4: For each a i ∈ A - red;
Compute SIG(a
i
, red, D) =
:where
end
5: select the attribute a k which satisfies:
SIG(a k , red, D)= maxi (SIG(a i , red, D)
6: if (SIG(a k , red, D) >0)
red ∪ a k → red
Goto step 4;
else
return (red);
7: end
Descriptions about datasets used
Descriptions about datasets used
Proposed ensemble framework is as in Fig. 1, where C1, C2, C3, C4 and C5 are the classifiers, which are chosen as best classifiers among seven heterogeneous classifiers in phase-1. Data with selected features is fed with weights assigned to respective classifier for evaluation of the final results against the input samples. Further the five classifiers with best ranking are arranged as in Fig. 1. In this framework C1 and C2 with highest ranking are at the second layer and rest of three classifiers are in first layer. Combiner-1 aggregates the results obtained by three classifiers associated with it and combiner-2 aggregates the results obtained by two classifiers associated with it and result obtained by combiner-1.
Combiner-1 and combiner-2 aggregates the output predicted by associated classifiers using the Equation (2).
Where, W i and X i are the weight and predicted output of the ith classifier respectively.
This section have four sub section which covers about datasets used in this work, performance measures used to show the proof of proposed work, feature selection results by various feature selection approaches along with results obtained by various ensemble framework and ROC curve analysis.
Datasets used in experiment
Australian dataset and German dataset (categorical) are used in this work. These datasets are acquired from the UCI Machine Learning Repository [14, 15]. All datasets have combination of attribute types continuous and nominal. Australian dataset is related to credit approval and German dataset is related to loan application. To protect confidentiality of the data, the values of some attributes are replaced by random meaningless symbols. Detailed description about datasets are illustrated in Table 1.
Performance measures
There are several measures to evaluate the classification measures commonly available in the literature. Accuracy (Equation 3) is not sufficient as a performance measure, if there is significant class imbalance towards a class in the dataset. Dataset used in this experimental work is binary class dataset having the positive (credit approved) and negative (credit not approved) classes. Sensitivity (Equation 4) represents the accuracy of only positive samples prediction and specificity (Equation 5) is about the negative samples prediction accuracy. G-measure (Equation 6) is a measure of a test’s accuracy. It considers both the positive and negative accuracies of the test to compute the score and can be interpreted as geometric mean of sensitivity and specificity. It would be 1 at best case and 0 at worst case. Four performance measures defined with respect to the confusion matrix are as True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN), where positive corresponds to credit approved and negative corresponds to credit not approved cases. Observed positive and actual positive is TP, observed negative and actual negative is TN, observed negative and actual positive is FP, observed positive and actual negative is FN.
The experimental results described in this section are performed on a DELL PC with a 3.60 GHz Intel Core I7 vPRO CPU, 8 GB RAM and 64 bit Windows 7 operating system. Implementation is done using Matlab R2012a. As per the proposed model, preprocessed dataset is applied in layered ensemble framework for the classification. In this work, preprocessing consists of four main steps as treatment for the missing values, transformation, normalization and feature selection. Results of feature selection on Australian and German dataset by various approaches such as Stepwise Regression (STEP), Classification and Regression Tree (CART), Correlations (CORR), Multivariate Adaptive Regression Splines (MARS), T-Test and NRS are presented in Tables 2 and 3 respectively.
Selected Features in Australian dataset
Selected Features in Australian dataset
Selected Features in German dataset
In this work, we have used weighted voting approach. So, weights are assigned to classifiers are calculated as described earlier section in the both case (five classifiers are combined with weighted voting in single and multi-layered approach). Table 4 represents the weights assigned to classifier on single layer approach where W1, W2, W3, W4 and W5 are the weights assigned to classifiers C1, C2, C3, C4 and C5 respectively with respective dataset. Table 5 represents the weights assigned to classifier on multi-layer approach where W11, W12, and W13 are the weights assigned to C3, C4 and C5 (classifiers at layer 1) respectively, further W21, W22, and W23 are the weights assigned results obtained by combiner-1, C1 and C2 respectively with respective dataset.
Weights assigned to classifiers in single layer ensemble framework
Weights assigned to classifiers in layered ensemble framework
To make balanced 10-Fold Cross Validation (10-FCV), in each fold we arranged similar number of samples towards each class because German dataset is imbalance dataset. The 10-FCV, first the whole dataset is divided into two parts data-1 and data-2 which contains the samples of class 1 and class 2 respectively. Further data-1 is divided into 10 parts and data-2 is also divided into 10 parts. Data-1 part-1 and data-2 part-1 is considered as fold-1 similarly fold-2 and so on. Because classification algorithms suffer when the class distribution is imbalanced towards one of the classes [16]. In this section, we compare the average of 10-FCV on two credit scoring data sets results obtained from our proposed work with existing feature selection in terms of classification accuracy, sensitivity, specificity and G-measure. These results are presented in Tables 6 and 7 for Australian dataset and German dataset respectively.
Performance comparisons on Australian dataset
Performance comparisons on German dataset
Starting with the Australian dataset regarding accuracy, (sensitivity, specificity) and G-measure, the proposed approach achieves 95.39%, 99.69%, 90.86% and 95.17% respectively. In German dataset regarding accuracy, sensitivity specificity and G-measure, the proposed approach achieves 86.47%, 98.78%, 72.33% and 84.53% respectively. As per results in Tables 6 and 7, results obtained by proposed approach are the best in terms of accuracy, sensitivity and G-measure on both datasets.
To validate the separation and discrimination ability of the proposed models and to measure their performance from a different perspective as well as measuring their sensitivity and specificity over various thresholds, ROC curves are depicted for the MV, WV, LMV, and LWV approaches. Figures 2 and 3 display the ROC curves for all the above-mentioned ensemble approaches with feature selected by NRS on datasets as aforementioned on fold-1 (first 90% as training and rest for the test dataset).

Fold-1 ROC on Australian dataset.

Fold-1 ROC on German dataset.
In case of Australian dataset, all other ensemble classifiers’ curve lies below LWV ROC curve for all threshold values. So it shows that for Australian dataset the LWV method is the best for all the required values of sensitivity and specificity. In case of German dataset, conclusions are same as for the Australian dataset.
In this paper, a hybrid approach based on NRS for feature selection and multi-layer ensemble classification which are aggregated with weighted voting approach is proposed for credit scoring. Further the results of proposed approach is compared with MV, WV and LMV with all features and with selected features by STEP, CART, CORR, MARS are compared in terms of accuracy, sensitivity, specificity and G-measure on two benchmark dataset Australian and German dataset. According to results obtained by the proposed approach outperformed all other approaches in terms of accuracy, sensitivity and G-measure on Australian and German dataset. Results obtained by the proposed approach outperformed by comparing to other approaches such as MV, WV, LMV in terms of ROC curve analysis on Australian and German dataset.
