Using machine learning techniques to develop prediction models for detecting unpaid credit card customers

Abstract

Customer behavior prediction is gaining more importance in the banking sector like in any other sector recently. This study aims to propose a model to predict whether credit card users will pay their debts or not. Using the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. For the prediction of customers’ payment status of next months, we use Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Tree (CART) and C4.5, which are widely used artificial intelligence and decision tree algorithms. Our dataset includes 10713 customer’s records obtained from a well-known bank in Taiwan. These records consist of customer information such as the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We apply cross validation and hold-out methods to divide our dataset into two parts as training and test sets. Then we evaluate the algorithms with the proposed performance metrics. We also optimize the parameters of the algorithms to improve the performance of prediction. The results show that the model built with the CART algorithm, one of the decision tree algorithm, provides high accuracy (about 86%) to predict the customers’ payment status for next month. When the algorithm parameters are optimized, classification accuracy and performance are increased.

Keywords

Credit card machine learning classification parameter optimization ANN SVM CART

1 Introduction

Nowadays, credit cards which are indispensable for many consumers, have reached a tremendous amount in the market with the rapid advances in technology. It is an alternative payment tool instead of cash and can be used for paying at shopping points with POS devices or for cash advance from the bank ATMs. The expenses made by credit card must be paid to the bank on a monthly basis or in installments. For banks, there exist credit card risks which can be defined as losses that may arise due to not fulfilling the liabilities of credit card customers in a timely manner. The management of unpaid credit card problem is gaining importance due to the increasing amount of credit cards and the negative effects of the credit card risks. Banks may suffer losses as a result of their customers failing to fulfill their obligations partially or fully without complying with the contractual requirements. Moreover, successive debt burden for credit card users brings along bad conditions such as depression, suicide, divorce, violence, theft etc. Thus, banks should reduce the negative effects of unpaid credit card debts on banks and on its credit card customers. For the management of such risks, predicting customer behavior is a significant subject in the banking sector. In recent years, researchers studied this problem using statistical and mathematical techniques such as discriminant analysis, nearest neighbor, logistic regression, decision trees, neural networks, machine learning etc.

The main purpose of this study is to propose a model that predicts with high accuracy whether credit card users will pay their debts or not. With the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. The contribution of this study is to apply ANN and SVM that are traditional artificial intelligent algorithms, CART and C4.5, which are widely used decision tree algorithms for forecasting the customers’ payment status of next months. We compare the prediction power of these four methods using various performance criteria and identify the inferior methods relative to the others for the presented prediction problem. Moreover, we conduct an extensive literature review for parameter optimization and present the usefulness of parameter optimization for each method in terms of prediction accuracy.

The organization of the study is as follows: we present literature review about machine learning methods and risk prediction in bank sector in the second section. In the third section, we present the theoretical fundamentals of the prediction methodologies used in this study. In the fourth section, we describe the dataset and variables. In the fifth section, we analyze the results of the proposed methods and in the last section we discuss the findings and conclusions.

2 Literature review

Machine learning is used to extract structure from data and validate this structure using specialized algorithms. It is a discipline focused on two related questions: “How can computer systems evolve automatically with experience?” and “What are the basic statistics, calculation, knowledge and theory rules governing all learning systems, including computers, people and organizations?”. Machine learning is applied in many areas such as speech/voice recognition, computer vision, biological surveillance, robot control etc.

One of the most common aims of machine learning is to make predictions for the future by using historical data. In the related literature, we summarize some of the studies that use machine learning techniques for forecasting in different sectors/areas as follows: Stock-market predictions have been made by multi-layer perceptron, dynamic and hybrid ANNs [10] as well as the Nonlinear Autoregressive Network with Exogenous inputs (NARX) neural network [6], forecasts of the household electricity consumption are made by hybrid Autoregressive Integrated Moving Average (ARIMA) ANN [45] financial time series forecasting (e.g. exchange rates) is studied by a hybrid method combining SVM with ARIMA [26], rainfall forecasting models are solved using an evolutionary hybrid of adaptive neuro fuzzy inference system (ANFIS) with Firefly Algorithm (ANFIS-FFA) [29], forecasting a port’s container throughput is made by ANN and SVM [15], prediction of Remaining Useful Life (RUL) of aircraft engines is studied by ARIMA and SVM [30], tourist demand forecasting was applied using autoregressive neural network [51] and personal consumption expenditure inflation prediction is studied using three machine learning models (ANN, k-Nearest Neighbor methods (k-NN) and SVR) [43].

In the banking sector, there exist also many studies using machine learning algorithms for predicting the bank failure, credit card risk assessment, fraud prediction and customer churn. In what follows, a brief review of the applications using machine learning techniques in banking sector is presented. Boyacioğlu et al. [25] used four different neural network models, three multivariate statistical methods and SVM for the prediction of bank failures. Le and Viviani, [14] compared the efficiency of discriminant analysis and logistic regression as traditional statistical techniques and ANN, SVM and k-NN as machine learning approaches for predicting the failure of banks. The results showed that ANN and k-NN methods had better performance for predicting the failure of banks. Another study for bank failure prediction was done by Gogas et al. [32] using SVM. Carmona et al. [31] applied extreme gradient boosting method for bank failure prediction. For determining important factors of prediction of bank financial strength ratings, Öğüt et al. [16] used multiple discriminant analysis and ordered logistic regression and compared the model performances with SVM and ANN. Patil et al. [39] proposed three different machine learning algorithms which are logistic regression, Interactive Dichotomimizer 3 (ID3) decision tree algorithm and random forest decision tree algorithm for fraud detection on credit card. It is obtained from empirical analysis that random forest decision tree algorithm had better performance in terms of accuracy, precision and recall. Shirazia and Mohammadi [13] used CART for churn prediction. Nami and Sharaji [38] proposed a method that applied dynamic random forest algorithm for detecting fraudulent payment card transactions. Smeureanu et al. [18] compared neural networks and SVM techniques’ prediction performance for customer segmentation of a commercial bank in Romania. Both neural networks and SVM performed well for customer segmentation but the SVM model with radial basis kernel function had a better performance for the segmentation process than the multilayer perceptron. Serengil and Ozpinar [36] focused on workforce planning in bank operation centers by a hybrid multi-level machine learning algorithm. They compared neural network with alternative exponential smoothing algorithms for evaluation of results. For credit risk assessment, Zhang et al. [42] employed multiple instance learning method which is based on radius basis function. The above mentioned studies that used machine learning methods for prediction are summarized in Table 1.

Table 1
Summary of literature review

Author(s) Study Methodology Specific Point of the Study/Property

Öğüt et al. [16] Prediction of bank financial strength ratings: The case of Turkey Multiple Discriminant Analysis, Ordered Logistic Regression, SVM and ANN Prediction of bank financial strength ratings

Boyacioğlu et al. [25] Predicting bank financial failures using neural networks, SVM and multi-variate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey Multivariate Statistical Methods, Neural Network Methods, SVM Predicting bank financial failures

Smeureanu al. [18] Customer segmentation in private banking sector using machine learning techniques Neural Networks and SVM Customer segmentation

Serengil &Ozpinar [36] Workforce Optimization for Bank Operation Centers: A Machine Learning Approach ANN Planning bank operation centers

Zhang et al. [42] Multiple instance learning for credit risk assessment with transaction data Multiple Instance Learning (MIL) Credit risk assessment

Gogas et al. [32] Forecasting bank failures and stress testing: A machine learning approach SVM Bank failure prediction

Patil et al. [39] Predictive Modelling for Credit Card Fraud Detection Using Data Analytics Logistic Regression, ID3 and Random Forest Fraud prediction

Nami &Shajari [38] Cost-sensitive payment card fraud detection based on dynamic random forest and k-nearest neighbors Dynamic Random Forest and Minimum Risk Model Detecting fraudulent payment card transactions

Le &Viviani [14] Predicting bank failure: An improvement by implementing a machine-learning approach to classical financial ratios Discriminant Analysis, Logistic Regression, ANN, SVM and k-NN Predicting the failure of banks

Carmona et al. [31] Predicting failure in the U.S. banking sector: An extreme gradient boosting approach Extreme Gradient Boosting Algorithm, Logistic Regression and Random Forest Predicting the bank failure

Shirazia &Mohammadi [13] A big data analytics model for customer churn prediction in the retiree segment Classification and Regression Tree Predicting customer churn

Climent et al. [11] Anticipating bank distress in the Eurozone: An Extreme Gradient Boosting approach Extreme Gradient Boosting Bank failure prediction

Bao et al. [45] Integration of unsupervised and supervised machine learning algorithms for credit risk assessment Unsupervised and Supervised Machine Learning Algorithms Credit risk assessment

Shen et al. [12] A novel ensemble classification model based on neural networks and a classifier optimization technique for imbalanced credit risk evaluation Back Propagation Neural Network Credit risk evaluation

Author(s)	Study	Methodology	Specific Point of the Study/Property
Öğüt et al. [16]	Prediction of bank financial strength ratings: The case of Turkey	Multiple Discriminant Analysis, Ordered Logistic Regression, SVM and ANN	Prediction of bank financial strength ratings
Boyacioğlu et al. [25]	Predicting bank financial failures using neural networks, SVM and multi-variate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey	Multivariate Statistical Methods, Neural Network Methods, SVM	Predicting bank financial failures
Smeureanu al. [18]	Customer segmentation in private banking sector using machine learning techniques	Neural Networks and SVM	Customer segmentation
Serengil &Ozpinar [36]	Workforce Optimization for Bank Operation Centers: A Machine Learning Approach	ANN	Planning bank operation centers
Zhang et al. [42]	Multiple instance learning for credit risk assessment with transaction data	Multiple Instance Learning (MIL)	Credit risk assessment
Gogas et al. [32]	Forecasting bank failures and stress testing: A machine learning approach	SVM	Bank failure prediction
Patil et al. [39]	Predictive Modelling for Credit Card Fraud Detection Using Data Analytics	Logistic Regression, ID3 and Random Forest	Fraud prediction
Nami &Shajari [38]	Cost-sensitive payment card fraud detection based on dynamic random forest and k-nearest neighbors	Dynamic Random Forest and Minimum Risk Model	Detecting fraudulent payment card transactions
Le &Viviani [14]	Predicting bank failure: An improvement by implementing a machine-learning approach to classical financial ratios	Discriminant Analysis, Logistic Regression, ANN, SVM and k-NN	Predicting the failure of banks
Carmona et al. [31]	Predicting failure in the U.S. banking sector: An extreme gradient boosting approach	Extreme Gradient Boosting Algorithm, Logistic Regression and Random Forest	Predicting the bank failure
Shirazia &Mohammadi [13]	A big data analytics model for customer churn prediction in the retiree segment	Classification and Regression Tree	Predicting customer churn
Climent et al. [11]	Anticipating bank distress in the Eurozone: An Extreme Gradient Boosting approach	Extreme Gradient Boosting	Bank failure prediction
Bao et al. [45]	Integration of unsupervised and supervised machine learning algorithms for credit risk assessment	Unsupervised and Supervised Machine Learning Algorithms	Credit risk assessment
Shen et al. [12]	A novel ensemble classification model based on neural networks and a classifier optimization technique for imbalanced credit risk evaluation	Back Propagation Neural Network	Credit risk evaluation

To avoid creating under fitting or over fitting models, the parameters must be optimized. However, there exists no consensus on the exact rules to find best parameter values for each machine learning model in the literature. In Table 2, a literature review on the parameter setting for ANN, SVM, CART and C4.5 methods is presented. For each technique, the most important parameters, parameter settings and the optimization methods are listed.

Table 2

Summary of literature review about parameter settings of SVM, C4.5, CART, ANN

Machine learning techniques	Authors	Parameter	Selected parameter settings	Optimization method
SVM	Lin &Zhang [21]	Kernel function	Kernel function = RBF	Grids earch
		Kernel parameter gamma	C = (2^–15, 2¹⁵)
		Kernel penalty factor C	Gamma = (2^–15, 2¹⁵)
	Tao et al. [52]	Kernel function	Kernel function = RBF	Geneti calgorithm
		Kernel parameter gamma	C = NG	Particle swarm analysis
		Kernel penalty factor C	Gamma = NG
	Yu et al. [20]	Kernel function	Kernel function = RBF	Weighted least squares SVM (LSSVM) Classifier with design of
		Kernel parameter sigma	C = (2^–5,2⁹)	experiment (DOE)
		Kernel penalty factor C	Sigma = (2^–9,2⁵)
	Wang et al. [48]	Kernel function	Kernel function = Gaussian kernel	Genetic algorithm
		Kernel parameter sigma	(1) For the grid search method; Gamma = [2^–3:2:2⁵] C = [2⁰:2:2¹²]	Grid search
		Kernel penalty factor C	(2) Forthe GA method; Population size = 10, Maximum number of variations = 20, Gamma = [0.0001:10] C = [1:1000]	Particle swarm analysis (PSO)
			(3) Forthe PSO method; Inertia weight = 1, Acceleration coefficients = 1.5 and 1.7 respectively, Maximum rate coefficient = 0.6, Gamma = [0.0001:10], C = [1:1000]
C4.5	Lin &Chen [41]	Minimum cases (M) for the leaf	M = [2, 22]	SS-based approach
		Pruning confidence level (CF)	CF = 1 to 35%
	Beck et al. [19]	CF (confidence factor)	CF = 25%, M = 12, 22, 32, 42
		MS (minimum numbers of split-off cases)	M = 2, CF = 75%, 50%, 25%, 10%, 5%, 1%, 0.1%, 0.01%, and 0.001%
CART	Yang et al. [49]	Maximum depth	Maximum depth = [1, 8]	Grid search
		Minimum samples split	Minimum samples split = [0, 60]
		Minimum samples leaf	Minimum samples leaf = [0, 30]
	Sarkar et al. [40]	Complexity parameter	Complexity parameter = [0, 1]	Particle swarm optimization
		Minimum samples split	Minimum samples split = [1, 200]	Genetic algorithm
		Maximum depth	Maximum depth = [1, 30]
	Hamze-Ziabaria &Bakhshpoori [37]	Minimum size of leave (leaf) Depth of tree	‘Minimum size of leave (leaf) = [4, 70]Depth of tree = [1, 11]	248 different models in different ranges are developed and experienced in terms of Root Mean Squared Error (RMSE) values.
ANN	Chakraborty &Elzarka [9]	Hidden layer size	Hidden layer size = (10, 500)	Randomized Search CV
		Activation functions	Activation functions = [’identity’, ‘logistic’, ‘tanh’, ‘relu’]
		The rate of learning Regularization parameter alpha	Learning rate = (0.001, 0.05)Alpha = log space(–5, 3, 5)
	Rajendra et al. [28]	Learning rates	Learning rates = (0.5, 0.9)	Genetic algorithm
		Number of cycles	Number of cycles = (5000, 50000) times
	Jo et al. [50]	Number of hidden layers	Number of hidden layers = {1, 2, 3}	Random search
		Number of hidden units	Number of hidden units: [3, 20]	Tree-structured Parzen estimator
		Types of activation function	Types of activation function: [’Linear’, ‘Sigmoid’, ‘Tanh’, ‘Hard-sigmoid’]	Hyper-parameter optimization via radial basis function and dynamic
		Weight initialization methods	Weight initialization methods: [’Xauniform’, ‘Xanormal’, ‘Heuniform’, ‘Henormal’}	coordinate search
		Maximum number of epochs	Maximum number of epochs: [10, 100]
	Benvidi et al [2]	Number of iteration	Genetic algorithms
		Number of hidden layer	Partial swarm optimization
	Nantasenamat et al. [8]	Number of hidden nodes	Geometry optimization
		Learning epochs
		Learning rate
		Momentum

3 Methodology

We use ANN and SVM that are traditional artificial intelligent algorithms as well as CART and C4.5, which are widely used decision tree algorithms, for forecasting the customers’ payment status of coming months. The features, mostly used parameters and advantages/disadvantages of these four machine learning techniques employed in this study are summarized as follows.

3.1 Support vector machine (SVM)

SVM is a supervised machine learning algorithm that can be used for classification or regression problems [44]. In supervised learning, the existing data set is called training set and is composed of coupled input variables “X” and output variable “Y”, shown as S = ((x₁, y₁),(x₂, y₂),.....(x_l, y_l)) ɛ (Rⁿ×Y)^l, l is the size of the existing observations. The learning type differs according to the nature of the output space Y. If y_i = {1,-1} then the problem is defined as a binary classification problem, if y_i = {1,2,3,....m} the problem is a multiple class classification problem. In binary classification, the overall aim is to find a function h(X) in Rⁿ for predicting the output value of y using the input variables x: $f (x) = sgn (h (x))$ (1)

When using a linear classifier, the dot product between two vectors, also referred to as an inner product or scalar product is defined as w^Tx = Σ w_ix_i. A linear classifier is based on a linear discriminant function of the form: $f (x) = w^{T} x + w_{0}$ (2)

However, in real-life applications a nonlinear classifier commonly provides better accuracy. Yet, linear classifiers have advantages, one of them being that they often have simple training algorithms. A simple way of making a nonlinear classifier out of a linear classifier is to map the data from the input space X to a feature space F using a nonlinear function. While achieving mapping of data to a high-dimensional feature space, kernel methods perform well in terms of memory usage and computational time.

Moreover, the concept of margin for a SVM classifier is defined differently. In SVM, support vectors are the closest points to the hyperplane separating the positive and negative examples. A margin is specified to separate two classes using these examples defined as support vectors. In SVM, in order to obtain a maxium-margin classifier the below convex quadratic programming is solved: $min_{w, w_{0}, ξ} \frac{1}{2} w^{2} + C \sum_{i = 1}^{l} ξ_{i}_$ (3) $s . t . y_{i} (w^{T} x + w_{0}) \geq 1 - ξ_{i}, i = 1, \dots, l$ (4) $ξ_{i} \geq 0, i = 1, \dots, l$ (5) where C > 0 is a penalty factor, weighting the two terms of the objective function. In the objective function, the first term aims to minimize the margin, the second term minimize the measurement violation of the constraints y_i (w^Tx + w₀) ≥ 1 i = 1, …, l. This problem is solved using its Lagrangian dual problem: $min_{α} \frac{1}{2} \sum_{i = 1}^{l} \sum_{j = 1}^{l} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) - \sum_{j = 1}^{l} α_{j}$ (6) $s . t . \sum_{i = 1}^{l} y_{i} α_{i} = 0$ (7) $0 \leq α_{i} \leq C, i = 1, \dots, l$ (8) where K(x, x’) is the kernel function.

SVM has many key parameters that should be set properly. The classification accuracy of SVM is sensitive to two factors: (i) the algorithms needed for solving the quadratic problem and (ii) parameters setting. Generally, different parameter settings will cause different classification performance. One of the most important parameters is kernel function. The most common used kernel functions are linear kernel function, polynomial kernel function, normalized polynomial kernel function, Fourier kernel function, radial basis kernel function (RBF), Sigmoid kernel function and MLP Kernel function. Formulation of some kernel functions is shown in Table 3.

Table 3

Formulation of some kernel functions [21, 48]

Kernel Function	K (x, x _j)
Linear	x^Tx_j
Polynomial	(γx^Tx_j+C)^d
Sigmoid	tanh(γx^Tx_j + C)
MLP Kernel	tanh(βx^Tx_j+θ)
RBF	exp(-γ ∥ x - x_i ∥ ²/σ²)
Gauss Kernel	exp(-γ ∥ x - x_i ∥ ²)

In addition to kernel function, SVM performance depends on the values of kernel parameters and penalty factor C. The penalty parameter C controls trade-off between the classification accuracy and acceptable misclassification errors [20].

3.2 Artificial neural network (ANN)

ANN is one of the most powerful machine learning algorithms used in many different fields such as estimation, classification, signal processing and pattern recognition. In this study, we use a multi-perceptron neural network with a back-propagation learning.

The ANN algorithm is performed in two steps: a forward step and a backward step. In backward steps the synaptic weights are updated according to an error using a supervised learning method. In forward steps the signals are propagated fm input to output using the computational units.

The ANN model is presented below [35].

Forward Computation: $v_{j}^{(l)} (n) = \sum_{i = 0}^{m_{0}} w_{ji}^{(l)} (n) y_{i}^{(l - 1)} (n)$ (9) where $y_{i}^{(l - 1)} (n)$ denotes the output signal of neuron i in layer l-1 at iteration n and $w_{ji}^{(l)} (n)$ denotes the synaptic wght of neuron j in layer l at iteration n.

The output signal of neuron j in the layer l is as follow: $y_{j}^{(l)} = φ_{j} (v_{j} (n))$ (10) where φ_j denotes the activation function. The activation function, which is used to transform the activation level of a neuron into an output signal, introduce non-linearity into the output of the neuron. The most used activation functions are as follows:

Logistic Activation Function: $f (x) = \frac{1}{1 + e^{x}}$ (11)

Hyperbolic Tangent Activation Function: $f (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}$ (12)

In the output layer (l = L): $y_{j}^{(L)} = o_{j} (n)$ (13)

and the error signal can be computed as follow: $e_{j} (n) = d_{j} (n) - o_{j} (n)$ (14) where d_j (n) denotes response vector and includes real values.

Backward Computation:

Local gradients (δ) are calculated: $δ_{j}^{(l)} (n) = [\begin{matrix} e_{j}^{(L)} (n) φ_{j}^{'} (v_{j}^{(L)} (n)) in a \\ φ_{j}^{'} (v_{j}^{(L)} (n)) \sum_{k} δ_{k}^{(l + 1)} (n) w_{kj}^{(l + 1)} (n) in b \end{matrix}$ (15) where a is input layer and b is hidden layer.

The synaptic weights of the network are updated according to follow equation. $w_{ji}^{(l)} (n + 1) = w_{ji}^{(l)} (n) + \propto [w_{ji}^{(l)} (n - 1)] + η δ_{j}^{(l)} (n) y_{i}^{(l - 1)} (n)$ (16) where η denotes learning rate and ∝ denotes momentum constant used to avoid oscillation. This process is repeated until the epochs are completed and the stop criterion is met. Afterwards the classification is carried out using the forward computation with the achieved weights.

The main parameters of ANN can be defined as the number of hidden layers, the number of hidden neurons and the transfer function that directly affect the ANN model’s reliability [3]. Hidden layers, that are a layer of an ANN between the input and output layers, ensure the network’s nonlinear modelling capabilities [27].

3.3 Classification and regression tree (CART)

Decision trees are mainly based on the division of input data into groups by means of a clustering algorithm. In decision tree learning, a tree structure is created. The leaves of this tree show the class labels. The arms leading to these leaves represent the processes on the properties. During decision tree learning, the information learned is modeled on a tree. All interior nodes of this tree represent inputs.

The CART model is presented as follows [33]:

First step: The candidate split of each parameter is determined.

Second step: The possibility of left candidate split (P_L) and the possibility of right candidate split (P_R) are calculated. $P_{L} = \frac{amount of left candidate split t_{L}}{amount of training data}$ (17) $P_{R} = \frac{amount of right candidate split t_{R}}{amount of training data}$ (18) where t_L indicates left candidate split at a decision node t and t_R indicates right candidate split on decision node t.

Third Step: The possibility oleft candidate split at every class (P_(j/tL)) and the possibility of right candidate split at ery class (P_(j/tR)) are calculated. $P (\frac{i}{tL}) = \frac{amount of j data on the left candidate split t_{L}}{amount of j data on the training data}$ (19) $P (\frac{j}{tR}) = \frac{amount of j data on the right candidate split t_{R}}{amount of j data on the training data}$ (20)

Forth Step: The value of 2 P_LP_R on the first candidate split is calculated. $2 P_{L} P_{R} = 2 \times P_{L} \times P_{R}$ (21)

Fifth Step: The sum of all reductions $^{P} (\frac{j}{tL})$ and $^{P} (\frac{j}{tR})$ is calculated. $Q (s | t) = \sum_{j = 1}^{category} | P (\frac{j}{tL})^{- P} (\frac{j}{tR}) |$ (22)

Sixth Step: θ_(s|t) is calculated. $θ_{(s | t)} = 2 P_{L} P_{R} \sum_{j = 1}^{category} | P (\frac{j}{tL})^{- P} (\frac{j}{tR}) |$ (23)

After getting the amount of conformity, maximum θ_(s|t) for the main ne is decribed. Afterwards the iterations are continued uil the leaf nodes and a complete decision tree is formed.

To establish a reliable CART model, two main parameters of this algorithm must be adjusted: the minimum size of leave and the depth of tree. The minimum size of leave specifies the minimum number of instances required to produce a leaf. The depth of tree identifies the number of layers to grow up the tree [37].

3.4 C4.5 Tree

In this study, we also apply one of the most used decision tree algorithms: C4.5 tree algorithms. The C4.5 tree is an extension of ID3 and used for classification problems. It spreads a decision from the nodes by dividing data over the feature with the highest information gain. The decision tree constructing process has two main phases: the growth phase and the pruning phase. The basic difference of C4.5 tree and CART is that during the growth phase, the central selection made by the ID3 algorithm is the selection in which the properties at each node are tested in the most useful way to classify samples. Whereas C4.5 algorithm uses entropy evaluation function as the selection criteria. For the calculation of the entropy evaluation function, five steps must be applied [22]. The purpose of these calculations is to determine the class of predictors that provides the highest information gained.

First step: For identifying the class of the training set S, calculate Info(S). $Info (S) = - \sum_{i = 1}^{k} [{[\frac{freq (Ci, S)}{| S |}] \log_{2} [\frac{freg (Ci, S)}{| S |}]}$ (24) where C_i is class i, |S| is the number of cases, freq (Ci, S) is the number of cases for each C_i.

Second Step: For feature X to partition S calculate expected information value. ${Info}_{x} (S) = - \sum_{i = 1}^{L} [(| S_{i} | / | S |) Info (S_{i})]$ (25) where L is the number of outputs for X, S_i is i^th output subset of S, |S| is the number of cases in subset S_i.

Third Step: Calculate the information gained. $Gain (X) = Info (S) - {Info}_{x} (S)$ (26)

Forth Step: Calculate the partition information value. $SplitInfo (X) = - \sum_{i = 1}^{L} (\frac{| S_{i} |}{| S |}) \log_{2} (\frac{| S_{i} |}{| S |})$ (27)

Fifth Step: Calculate the gain ratio. $Gain Ratio (X) = \frac{Gain (X)}{Split Info (X)}$ (28)

The aim of the pruning stage is to remove tree parts that do not contribute to the classification of the tree in order to avoid over fitting of the training data. Prior to the application of the C4.5 algorithm, two main parameters must be set: the minimum number of cases that reach a leaf/minimum numbers of split-off cases (M) and the pruning confidence level factor (CF). In the pruning phase, pruning confidence level parameter influences whether a node of the tree will be delete or not. A small value of CF will cause the more aggressive pruning of tree nodes. The parameter that sets the minimum number of cases reaching a leaf (M) affects the size of the grown tree in the construct phase. If M has a small value, the tree can be very large and have many branches. Parameter settings for C4.5 are crucial for avoiding overfitting or under fitting and achieving high rate classification accuracy. Quinlan [22], developer of the C4.5 algorithm suggests default values for M and CF to be 2 and 25% respectively. The major advantage of C4.5 is that it can work with both continuous and discrete data. Also, C4.5 tree can handle incomplete or missing data. The prime weakness is small data variations could cause different decision trees.

3.5 Pameter optimization

Although parameter setting is crucial to improve model accuracy and performance, most of the machine learning algorithms suffers from incorrect parameter settings. To avoid creating under fitting or over fitting models, the parameters must be optimized. However, there exists no consensus on the exact rules to find best parameter values for each machine learning models.

Similar to other machine learning algorithms before applying ANN, SVM, CART and C4.5 methods, the parameter settings must be done in advance. Using the review of parameter optimization approaches in Section 2, the most important parameters, parameter settings and the optimization methods are identified for each method. The performance of SVM is highly related to its kernel function types, kernel parameters gamma or sigma and kernel penalty factor C. For C4.5 minimum cases (M) for the leaf and pruning confidence level (CF) affect performance of the model. CART model accuracy and performance is related to maximum depth, minimum samples split and minimum samples leaf. And, for ANNs, parameters such as hidden layer size, activation functions, the rate of learning, regularization parameter alpha, number of iteration can be significant.

Selecting the proper parameters is utmost importance to improve the accuracy of the machine learning model. Generally, a trial and error method is used but there are also numerous optimization techniques which search for the best parameters. Grid search, genetic algorithm, particle swarm analysis, SS-based approach are just some of them.

4 Empirical analysis

In this section, we first describe the dataset that is used to train and test our model. Then we present the criteria used to evaluate the performance of the proposed model. After that, we present parameter selection and optimization. Finally, we report the experimental results using the predictions obtained.

4.1 Description of the dataset

In this study, we use the data obtained from an important bank in Taiwan [17]. The dataset includes 10713 customer’s records. These records include customer information like the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We represent these customer data as input variables of our model. Details about to the variables in the dataset are shown in Table 4.

Table 4
Description of dataset

Variable Definition of Variable Type of variable Variable Value

I1 The amount of credit provided. It includes not only the individual consumer credit but also his or her family credit Numerical {10000, 16000, ... , 1000000} in dollars

I2 Gender Numerical 1 means male

2 means females

I3 Education level of customer Numerical 1 means graduate school

2 means university

3 means high school

4 means others

I4 Marital status of customer Numerical 1 means married

2 means single

3 means others

I5 Age of customer Numerical 21, 22, ... , 79

I6 Past payments status of the last six months Numerical -1 means pay duly

I7 1 means payment delay for one month

I8 2 means payment delay for two months

I9 ...

I10 8 means payment delay for eight months

I11 9 means payment delay of nine months or more

I12 Invoice amounts for the last six months Numerical {0, 1, 2. ... , 534, ... } in dollars

I13

I14

I15

I16

I17

I18 Amount of credit card payment for the last six months Numerical {0, 1, 2. ... , 1284, ... } in dollars

I19

I20

I21

I22

I23

O24 Payment status for the next month Categorical “yes”, “no”

Variable	Definition of Variable	Type of variable	Variable Value
I1	The amount of credit provided. It includes not only the individual consumer credit but also his or her family credit	Numerical	{10000, 16000, ... , 1000000} in dollars
I2	Gender	Numerical	1 means male
			2 means females
I3	Education level of customer	Numerical	1 means graduate school
			2 means university
			3 means high school
			4 means others
I4	Marital status of customer	Numerical	1 means married
			2 means single
			3 means others
I5	Age of customer	Numerical	21, 22, ... , 79
I6	Past payments status of the last six months	Numerical	-1 means pay duly
I7			1 means payment delay for one month
I8			2 means payment delay for two months
I9			...
I10			8 means payment delay for eight months
I11			9 means payment delay of nine months or more
I12	Invoice amounts for the last six months	Numerical	{0, 1, 2. ... , 534, ... } in dollars
I13
I14
I15
I16
I17
I18	Amount of credit card payment for the last six months	Numerical	{0, 1, 2. ... , 1284, ... } in dollars
I19
I20
I21
I22
I23
O24	Payment status for the next month	Categorical	“yes”, “no”

We use two different methods to divide the dataset into two parts as model training and the model validation. First method is cross validation and second method is hold-out method.

k-fold Cross Validation: The steps of the k-fold cross validation method, which is a variant of the Monte-Carlo cross validation method, are as follows [4]:

First step: The elements of dataset X are divided into sets of approximately equal K sets. The elements of the k^th set constitute the validation set X_val. The other sets constitute the learning dataset X_learn.

Second step: The training of the model g is done using X_learn and the error E_k(g) is calculated with the following equation: $E_{k} (g) = \frac{\sum_{i = 1}^{\frac{N}{K}} {(g (x_{i}^{val}) - y_{i}^{val})}^{2}}{\frac{N}{K}}$ (29)

Third step: First step and second step are repeated for each k value varying from 1 to K. The average error is calculated by following equation: $E_{gen} (g) = \frac{\sum_{i = 1}^{K} E_{k} (g)}{K}$ (30)

Hold-out: The hold-out method divides a database randomly into two groups with at the given rate. One group is used for training the data and the other group is used to test the data [46].

4.2 Performance criteria

In order to evaluate the performance of the proposed algorithms, performance criteria such as kappa statistics, precision, recall, F-measure, ROC (Receiver Operating Characteristic Curve) Area, PRC (Precision-Recall Curves) Area, mean absolute error and especially correctly classified rate are used. The confusion matrix required for calculating the performance criteria are shown in Table 5:

Table 5
Confusion matrix

Predicted: YES Predicted: NO

Actual: YES True Positive (TP) False Negative (FN)

Actual: NO False Positive (FP) True Negative (TN)

	Predicted: YES	Predicted: NO
Actual: YES	True Positive (TP)	False Negative (FN)
Actual: NO	False Positive (FP)	True Negative (TN)

Correctly Classified Rate: Correctly classified rate, which is one of the widely used performance criteria, shows the prediction success. To calculate the correctly classified rate, the following equation is used: $Correctly Classified Rate = \frac{TP + TN}{TP + FP + FN + TN}$ (31)

Kappa Statistics: The Kappa statistic, which is one of the performance criteria to evaluate the proposed model, measures the degree of consistency between the predicted and observed values [1]. Kappa statistic is obtained from following equation: $= \frac{\sum O_{ij} - \sum E_{ij}}{n - \sum E_{ij}} for i = j$ (32)

Here O_ij: the observed values and E_ij: expected values.

Precision: Precision can be expressed of as the probability that the detected structural change points are correct [24]. Precision is obtained from following equation: $Precision = \frac{TP}{TP + FP}$ (33)

Recall: Recall states the measure of how many of the true roles were extcted by the model [5]. Recall is calculated by following equation: $Recall = \frac{TP}{TP + FN}$ (34)

F-Measure: The F-measure is defined as a harmonic mean of precision and recall performance metrics [7]. To calculate the F-measure, the following equation is used: $F - measure = \frac{2 Precision \times Recall}{Precision + Recall}$ (35)

ROC Area: The area under the ROC curve is one of the commonly used pformance metrics to indicate the overall discrimination. A ROC area varies between 0.5 (no discrimination) and 1.0 (perfect discrimination) [23].

PRC Area: Precision-Recall Curves is a performance metric to understand the quality of the model when the dataset contains imbalanced classes. The value of this metric closer to 1 shows a good classifier [34].

Mean Absolute Error: Mean absolute error is a measure of the difference between the prediction value and the observed value and calculated by following equation. When this measure is close to zero, prediction success increases. $MAE = \frac{1}{N} \sum_{i = 1}^{N} | x_{prep} - x_{obs} |$ (36)

5 Empirical results

In this study, after executing ANN and SVM artificial intelligent algorithms as well as CART and C4.5 decision tree algorithms for predicting the customers’ payment status of next months, we also optimize the parameters of each algorithm in order to improve prediction accuracy. We aim to show whether modifying default value of the parameters have valuable effects on the performance of each algorithm. We use Weka 3.8 Software to make predictions using different machine learning algorithms and Weka’s CVParameter Selection to perform parameter selection for our four classification algorithms. Based on the literature review about parameter selection and optimization, we select the parameters shown in Table 6 for improving our proposed model performance.

Table 6
Parameter selection

SVM Parameter Kernel Function C

Range {RBF, Polynomial} [1, 100]

Optimum Value RBF 74

ANN Parameter Hidden Layer Learning Rate

Range [10, 50] [0.1, 0.9]

Optimum Value 40 0,1

C4.5 Parameter C M

Range [0.1, 0.5] [2, 22]

Optimum Value 0.1 20

CART Parameter Minimum Samples Leaf

Range [0, 31]

Optimum Value 0

SVM	Parameter	Kernel Function	C
	Range	{RBF, Polynomial}	[1, 100]
	Optimum Value	RBF	74
ANN	Parameter	Hidden Layer	Learning Rate
	Range	[10, 50]	[0.1, 0.9]
	Optimum Value	40	0,1
C4.5	Parameter	C	M
	Range	[0.1, 0.5]	[2, 22]
	Optimum Value	0.1	20
CART	Parameter	Minimum Samples Leaf
	Range	[0, 31]
	Optimum Value	0

The algorithms are evaluated using the performance metrics for each different training and test sets. The results of the algorithms according to the performance criteria are shown in Tables 7 and 8. ROC and PRC curves which show the performance of algorithms are shown in Figs. 1 and 2.

Table 7

Results of algorithms with respect to performance criteria for 10-fold cross-validation and 80% split

Algorithms	Cross-validation/Percentage split	Correctly Classified Rate %	Kappa statistic	Precision	Recall	F-Measure	ROC Area	PRC Area	Mean absolute error
SVM	10-fold	82.769	0.631	0.842	0.828	0.821	0.803	0.765	0.172
ANN	10-fold	84.374	0.672	0.847	0.844	0.841	0.876	0.870	0.221
C4.5	10-fold	83.683	0.658	0.839	0.837	0.834	0.837	0.807	0.212
CART	10-fold	85.709	0.697	0.866	0.857	0.854	0.855	0.849	0.225
SVM	80%	82.221	0.621	0.834	0.822	0.816	0.799	0.758	0.178
ANN	80%	83.574	0.655	0.840	0.836	0.833	0.866	0.859	0.228
C4.5	80%	83.061	0.646	0.932	0.831	0.828	0.832	0.807	0.218
CART	80%	84.228	0.667	0.849	0.842	0.839	0.860	0.840	0.226

Table 8

Results of algorithms with respect to performance criteria for parameter optimization

Algorithms	Optimized Parameter	Correctly Classified Rate %	Kappa Statistic	Precision	Recall	F-Measure	ROC Area	PRC Area	Mean Absolute Error	Improvement in Correctly Classified Rate (%)
SVM	Default Parameter	82.769	0.631	0.842	0.828	0.821	0.803	0.765	0.172	0.3
	Kernel Function	82.769	0.631	0.842	0.828	0.821	0.803	0.765	0.172
	C	82.769	0.631	0.842	0.828	0.821	0.803	0.765	0.172
	Both	83.067	0.638	0.846	0.831	0.824	0.806	0.768	0.169
ANN	Default Parameter	84.374	0.672	0.847	0.844	0.841	0.876	0.870	0.221	0.2
	Hidden layer	84.458	0.674	0.848	0.845	0.842	0.874	0.870	0.223
	Learning Rate	84.458	0.674	0.847	0.845	0.842	0.875	0.872	0.224
	Both	84.551	0.675	0.849	0.846	0.843	0.879	0.875	0.220
C4.5	Default Parameter	83.683	0.658	0.839	0.837	0.834	0.837	0.807	0.212	2.5
	C	85.224	0.688	0.858	0.852	0.849	0.860	0.842	0.217
	M	85.410	0.692	0.859	0.854	0.851	0.879	0.871	0.217
	Both	85.746	0.699	0.865	0.857	0.854	0.870	0.863	0.221
CART	Default Parameter	85.709	0.697	0.866	0.857	0.854	0.855	0.849	0.225	0.1
	Minimum Samples Leaf	85.803	0.699	0.867	0.858	0.854	0.856	0.851	0.225

Fig. 1

ROC curve and PRC curve for 10-fold cross validation and 80% split.

Fig. 2

ROC curve and PRC curve for 10-fold cross validation with optimized parameters.

As shown in Table 7, when the dataset is divided into two groups as training and test sets using 10-fold cross validation and the proposed algorithms are applied to the data set, CART algorithm gives the best result with a rate of 85.709% for the correct classification rate, which is one of our most important performance metrics. Although the SVM algorithm according to the mean absolute error and the ANN algorithm for the ROC area and PRC area give the best results, it can be said that the CART algorithm has the best forecasting success among the proposed algorithms, considering the status of all performance criteria.

When the dataset is divided into two groups as training and test sets using 80% split, as seen in Table 7, CART algorithm has the best result with a rate of 84.2277% according to the correct classification rate. While SVM algorithm for the mean absolute error and the ANN algorithm for the kappa statistic yield the best results, CART algorithm for all other performance measures gives good results.

After executing the algorithms with default parameter settings, we optimize some important parameters for each algorithm to improve the performance of the models. Because of the 10-fold cross-validation method’s positive effect on the results compared to the split method, we only optimize parameters for 10-fold cross validation. We execute SVM, ANN, CART and C4.5 algorithms using default parameter settings, selected parameters setting separately and together. The results show that prediction performance of the models are improved according to Improvement in Correctly Classified Rate after parameter optimization. The improvement of SVM, ANN and C4.5 are limited, whereas the prediction accuracy improvement of CART is larger.

According to the optimized final results in Table 8, CART algorithm with minimum samples leaf parameter settings has the best prediction result with a rate of 85.803% according to the correct classification rate. The SVM algorithm according to the mean absolute error and the ANN algorithm for the ROC area and PRC area have the best results. These results are parallel with the 10-fold cross validation results. Moreover, the parameter optimization improves the performance criteria for all algorithms.

6 Discussion and conclusions

Risk is a concept which is present in every field and every sector and seriously affects competition in today’s world. Predicting customer’s behaviors is a significant to manage risks in the banking sector like in other sectors and fields. For this reason, we aim to propose models that predicts whether credit card users will pay their debts or not with a high accuracy.

We use ANN and SVM algorithms as traditional artificial intelligent algorithms, CART and C4.5, which are widely used decision tree algorithms, for forecasting the customers’ payment status of next month. We apply cross validation and hold-out method to divide the dataset into two parts as training and test sets and we evaluate the algorithms with the proposed performance metrics. Moreover we present a literature review for parameter optimization of these algorithms and compare the improvements obtained for earch.

Algorithm results show that the model, built with the CART algorithm, one of the most basic decision tree algorithms, provides the highest accuracy (about 86%) to forecast the customers’ payment status for next month. It is observed that the model established by CART algorithm has comparable results with ANN and SVM algorithms which are traditional artificial intelligence algorithms frequently used in prediction models. Furthermore, it is possible to say that the CART algorithm is an effective and preferable algorithm for this problem since the model installation and application is simpler than the other proposed models. In addition, it is clearly understood that the 10-fold cross-validation method had a positive effect on the results compared to the split method, and it can be said that the cross-validation method provided better learning for this problem.

After executing the algorithms with default parameter settings, we optimize some important parameters for each algorithm to improve the performance of the models. Because we obtain higher accuracy using 10-fold cross-validation method compared to the split method, we only optimize parameters for 10-fold cross validation. Experimental results demonstrate that the optimized parameter setting for all algorithms gives better performance for customer’s payment status prediction model.

Future studies can be conducted in these directions: First, for each selected parameter in the present study a sensitivity analysis can be carried out to ensure the robustness of proposed models. Furthermore, different parameters of ANN, SVM, CART and C4.5 can be selected to optimize. In this study, we use Weka’s CVParameter Selection to perform parameter selection for our four classification algorithms. In the future studies, different optimization techniques such as grid search, genetic algorithm, particle swarm analysis, SS-based approach can be used and the optimization performance of these techniques can be compared. Second, different machine learning methods or ensemble machine learning methods such as Bagging, AdaBoost can be used to establish a model for forecasting of the customers’ payment status. Finally, the proposed model can be applied to other real world problem to conclude whether it can efficiently solve such problems.

References

Azari

, Janeja

V.P.

and Mohseni

, Predicting hospital length of stay (PHLOS): A multi-tiered data mining approach, In Data Mining Workshops (ICDMW), (2012), 17–24.

Benvidi

, Abbasi

, Gharaghani

, Dehghan Tezerjani

and Masoum

, Spectrophotometric determination of synthetic colorants using PSO–GA-ANN, Food Chemistry 220 (2017), 377–384.

Farizhandi

A.A.K.

, Zhao

and Lau

, Modeling the change in particle size distribution in a gas-solid fluidized bed due to particle attrition using a hybrid artificial neural network-genetic algorithm approach, Chemical Engineering Science 155 (2016), 210–220.

Lendasse

, Wertz

and Verleysen

, Model selection with cross-validations and bootstraps—application to time series prediction with RBFN models, In Artificial Neural Networks and Neural Information Processing ICANN/ICONIP, Springer, Berlin, Heidelberg, (2003), 573–580.

Rosario

and Hearst

M.A.

, Classifying semantic relations in bioscience texts, In Proceedings of the 42nd annual meeting on association for computational linguistics Association for Computational Linguistics, (2004), 430.

Cocianu

C.L.

and Grigoryan

, An Artificial Neural Network for Data Forecasting Purposes, Informatică Economică 19(2) (2015), 34–45.

Goutte

and Gaussier

, A probabilistic interpretation of precision, recall and F-score, with implication for evaluation, In European Conference on Information Retrieval, Springer, Berlin, Heidelberg, (2005), 345–359.

Nantasenamat

, Monnor

, Worachartcheewan

, Mandi

, Isarankura-Na-Ayudhya

and Prachayasittikul

, Predictive QSAR modeling of aldose reductase inhibitors using Monte Carlo feature selection, European Journal of Medicinal Chemistry 76 (2014), 352–359.

Chakraborty

and Elzarka

, H, Advanced machine learning techniques for building performance simulation: a comparative analysis, Journal of Building Performance Simulation 12(2), 193–207.

10.

Guresen

, Kayakutlu

and Daim

T.U.

, Using artificial neural network models in stock market index prediction, Expert Systems with Applications 38 (2011), 10389–10397.

11.

Climent

, Momparler

and Carmona

, Anticipating bank distress in the Eurozone: An Extreme Gradient Boosting approach, Journal of Business Research 61 (2019).

12.

Shen

, Zhao

, Li

and Meng

, A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation, Physica A: Statistical Mechanics and Its Applications 526 (2019), 121073.

13.

Shirazia

and Mohammadi

, A big data analytics model for customer churn prediction in the retiree segment, International Journal of Information Management 58 (2019), 238–253.

14.

H.H.

and Viviani

J.L.

, Full length Article: Predicting bank failure: An improvement by implementing a machine-learning approach to classical financial ratios, Research in International Business and Finance 44 (2018), 16–25.

15.

Chan

H.K.

, Xu

and Qi

, A comparison of time series methods for forecasting container throughput, International Journal of Logistics Research and Applications 22(3) (2019), 294–303.

16.

Öğüt

, Doğanay

M.M.

, Ceylan

N.B.

and Aktaş

, Prediction of bank financial strength ratings: The case of Turkey, Economic Modelling 29 (2012), 632–640.

17.

Yeh

I.C.

and Lien

, The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, Expert Systems with Applications 36(Part 1) (2009), 2473–2480.

18.

Smeureanu

, Ruxanda

and Badea

L.M.

, Customer segmentation in private banking sec-tor using machine learning techniques, Journal of Business Economics & Management 14(5) (2013), 923–939.

19.

Beck

, Garcia

R.M.

, Zhong

, Georgiopoulos

and Anagnostopoulos

G.C.

, A Backward Adjusting Strategy and Optimization of the C4.5 Parameters to Improve C4.5’s Performance, Proceedings of the Twenty-First International FLAIRS Conference, (2008), 35–42.

20.

, Yao

, Wang

and Lai

K.K.

, Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection, Expert Systems with Applications 38(12) (2011), 15392–15399.

21.

Lin

and Zhang

, A Fast Parameters Selection Method of Support Vector Machine Based on Coarse Grid Search and Pattern Search, Fourth Global Congress on Intelligent Systems (GCIS), 2010 Second WRI Global Congress On, (2013), 77–81.

22.

Quinlan

J.R.

, C4.5: programs formachine learning. Morgan Kaufmann, Menlo Park, (1993).

23.

Janssen

K.J.

, Donders

A.R.T.

, Harrell

F.E.

Jr, , Vergouwe

, Chen

, Grobbee

D.E.

and Moons

K.G.

, Missing covariate data in medical research: to impute is better than to ignore, Journal of Clinical Epidemiology 63(7) (2010), 721–727.

24.

Abou-Nasr

, Lessmann

, Stahlbock

and Weiss

G.M.

, Real world data mining applications, 17 Springer (2014).

25.

Boyacioglu

M.A.

, Kara

and Baykan

Ö.K.

, Predicting bank financial failures using neural networks, support vector machines and multivariate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey, Expert Systems with Applications 2 (2009), 3355–3366.

26.

Khairalla

M.A.

and Ning

, Financial Time Series Forecasting Using Hybridized Support Vector Machines and ARIMA Models, WCNA2017, October 20–22, (2017), Shenzhen, China.

27.

Gul

and Guneri

A.F.

, An artificial neural network-based earthquake casualty estimation model for Istanbul city, Natural Hazards 84(3) (2016), 2163–2178.

28.

Rajendra

, Jena

P.C.

and Raheman

, Prediction of optimized pretreatment process parameters for biodiesel production using ANN and GA, Fuel 5 (2019), 868–875.

29.

Zeynoddin

, Bonakdari

, Azari

, Ebtehaj

, Gharabaghi

and Riahi Madavar

, Research article: Novel hybrid linear stochastic with non-linear extreme learning machine methods for forecasting monthly rainfall a tropical climate, Journal of Environmental Management 222 (2018), 190–206.

30.

Celestino

, Fernando

S.L.

, Javier

R.-P.

and Javier de Cos

J.F.

, A hybrid ARIMA–SVM model for the study of the remaining useful life of aircraft engines, Journal of Computational and Applied Mathematics 346 (2019), 184–191.

31.

Carmona

, Climent

and Momparler

, Predicting failure in the U.S. banking sector: An extreme gradient boosting approach, International Review of Economics and Finance (2019).

32.

Gogas

, Papadimitriou

and Agrapetidou

, Forecasting bank failures and stress testing: A machine learning approach. Int, ernational Journal of Forecasting 34 (2018), 440–455.

33.

Putri

P.K.

, Putra

, Gede

I.K.

, Marini Mandenni

N.I.

and Ika

, An Expert System to Detect Car Damage by Using Cart Method, Journal of Theoretical & Applied Information Technology 66(3) (2014).

34.

Abdulhammed

, Faezipour

, Abuzneid

and Alessa

, Enhancing Wireless Intrusion Detection Using Machine Learning Classification with Reduced Attribute Sets. Proceedings of the InternationalWireless Communications & Mobile Computing Conference (IWCMC), (2018), 524–529.

35.

Haykin

, Neural networks: a comprehensive foundation, Prentice Hall PTR, (1994).

36.

Serengil

S.I.

and Ozpinar

, Workforce Optimization for Bank Operation Centers: A Machine Learning Approach, International Journal of Interactive Multimedia and Artificial Intelligence 4(6) (2017), 81–87.

37.

Hamze-Ziabari

S.M.

and Bakhshpoori

, Improving the prediction of ground motion parameters based on an efficient bagging ensemble model of M5^′ and CART algorithms, Applied Soft Computing 68 (2018), 147–161.

38.

Nami

and Shajari

, Cost-sensitive payment card fraud detection based on dynamic random forest and k-nearest neighbors, Expert Systems with Applications 110 (2018), 381–392.

39.

Patil

, Nemade

and Soni

P.K.

, Predictive Modelling for Credit Card Fraud Detection Using Data Analytics, Procedia Computer Science 132 (2018), 385–395.

40.

Sarkar

, Maiti

, Raj

, Vinay

and Pratihar

D.K.

, An optimization-based decision tree approach for predicting slip-trip-fall accidents at work, Safety Science 118(n.d), 57–69.

41.

Lin

S.W.

and Chen

S.C.

, Parameter determination and feature selection for C4.5 algorithm using scatter search approach, Soft Computing: A Fusion of Foundations Methodologies and Applications 16(1) (2012), 63.

42.

Zhang

, Zhang

, Xu

and Hao

, Multiple Instance Learning for Credit Risk Assessment with Transaction Data, Knowledge-Based Systems 161 (2018), 65–77.

43.

Ulke

, Sahin

and Subasi

, A comparison of time series and machine learning models for inflation forecasting: empirical evidence from the USA, Neural Computing & Applications 30(5) (2018), 1519–1527.

44.

Vapnik

, The nature of statistical learning theory, Springer-Verlag, New-York, (1995).

45.

Bao

, Lianju

and Yue

, Integration of unsupervised and supervised machine learning algorithms for credit risk assessment, Expert Systems With Applications 128 (2019), 301–315.

46.

Sweeney

W.P.

, Musavi

M.T.

and Guidi

J.N.

, Classification of chromosomes using a probabilistic neural network, Cytometry: The Journal of the International Society for Analytical Cytology 16(1) (1994), 17–24.

47.

Wibowo

, Dwijantari

and Hartati

, Time Series Machine Learning: Implementing ARIMA and Hybrid ARIMA-ANN for Electricity Forecasting Modeling, Communications in Computer and Information Science, Springer Verlag 788 (2018), 126–139.

48.

[48] Wang

, Huang

and Cheng

, Super-parameter selection for Gaussian-Kernel SVM based on outlier-resisting, Measurement 58 (2014), 147–153.

49.

Yang

, Fang

, Li

, Guo

, Li

, Huang

and Li

, Pre-diabetes diagnosis based on ATR-FTIR spectroscopy combined with CART and XGBoots, Optik 180 (2019), 189–198.

50.

, Min

, Jung

, Sunwoo

and Han

, Comparative study of the artificial neural network with three hyper-parameter optimization methods for the precise LP-EGR estimation using in-cylinder pressure in a turbocharged GDI engine, Applied Thermal Engineering 149 (2019), 1324–1334.

51.

Yao

, Cao

, Ding

, Zhai

, Liu

, Luo

and Zou

, A paired neural network model for tourist arrival forecasting, Expert Systems with Applications 114 (2018), 588–614.

52.

Tao

, Huiling

, Wenwen

and Xia

, GA-SVM based feature selection and parameter optimization in hospitalization expense modeling, Applied Soft Computing Journal 75 (2019), 323–332.

Using machine learning techniques to develop prediction models for detecting unpaid credit card customers

Abstract

Keywords

1 Introduction

2 Literature review

3.1 Support vector machine (SVM)

4 Empirical analysis

4.1 Description of the dataset

Table 5 Confusion matrix Predicted: YES Predicted: NO Actual: YES True Positive (TP) False Negative (FN) Actual: NO False Positive (FP) True Negative (TN)

References

Table 5
Confusion matrix

Predicted: YES Predicted: NO

Actual: YES True Positive (TP) False Negative (FN)

Actual: NO False Positive (FP) True Negative (TN)