Integrating deep neural network with logic rules for credit scoring

Abstract

Credit scoring is an important topic in financial activities and bankruptcy prediction that has been extensively explored using deep neural network (DNN) methods. DNN-based credit scoring models rely heavily on a large amount of labeled data. The accuracy of DNN-based credit assessment models relies heavily on large amounts of labeled data. However, purely data-driven learning makes it difficult to encode human intent to guide the model to capture the desired patterns and leads to low transparency of the model. Therefore, the Probabilistic Soft Logic Posterior Regularization (PSLPR) framework is proposed for integrating prior knowledge of logic rule with neural network. First, the PSLPR framework calculates the rule satisfaction distance for each instance using a probabilistic soft logic formula. Second, the logic rules are integrated into the posterior distribution of the DNN output to form a logic output. Finally, a novel discrepancy loss which measures the difference between the real label and the logic output is used to incorporate logic rules into the parameters of the neural network. Extensive experiments were conducted on two datasets, the Australian credit dataset and the credit card customer default dataset. To evaluate the obtained systems, several performance metrics were used, including PCC, Recall, F1 and AUC. The results show that compared to the standard DNN model, the four evaluation metrics are increased by 7.14%, 14.29%, 8.15%, and 5.43% respectively on the Australian credit dataset.

Keywords

Credit scoring deep neural network Probabilistic Soft Logic posterior regularization

1. Introduction

Credit scoring methods have been widely investigated by researchers. As an important part of the financial industry, credit scoring plays an important role in modern affairs such as credit customer selection, risk measurement, supervision before and after loans, comprehensive performance evaluation, and asset portfolio risk management [1]. In the financial industry, the increasing number of bank collapses and massive losses has lead to international banking regulation’s demand for the development of more appropriate credit risk models for scoring their financial loan portfolios[2]. The credit scoring model classifies credit applicants into good credit applicants and bad credit applicants based on annual income, bank account type and bank balance, occupation type, marital status, age, and education level [3]. Hence, credit scoring can be regarded as a binary classification problem.

With the deep integration of Internet technology into the financial industry, previous studies have proposed a variety of machine learning methods applied to credit scoring, such as decision trees (DT), logistic regression (LR), discriminant analysis, and support vector machines (SVMs). Over the last decade, deep neural network (DNN) have emerged as a popular artificial intelligence technique that has employed in a wide range of areas. Credit scoring models based on DNN have gradually become a research hotspot with excellent accuracy[4]. The successful application of DNN in credit scoring is grounded in the data-based nature of the approach of learning from a tremendous number of examples. However, there are many circumstances where purely data-driven approaches can reach their limits or lead to unsatisfactory results[5]. The most obvious case is that there is not enough data to train models that perform well and generalize enough. Another important aspect is that purely data-driven models may not meet the constraints, such as the transparency conditions given by regulatory or security guidelines, which is a major challenge for credit scoring models. A recent scientific report on artificial intelligence suggests: “ML and AI practice treats datasets in the same way and ignore domain knowledge that extends far beyond the raw data. Improving our ability to systematically incorporate diverse forms of domain knowledge can impact every aspect of AI.” [6] The field of credit scoring has a wealth of theoretical knowledge, but the current DNN-based credit scoring model completely relies on labeled data sets, ignoring the rich credit scoring logic rules. Therefore, incorporating prior knowledge to guide credit scoring model training is also one of the future research directions.

Recently, some research has explored the method to integrating DNN with logic rules. A fairly standard way is by introducing an additional loss term into the loss function that the network optimises. The additional loss term contains the constraint on the logic rules to be learned by the DNN model. Inspired by them, we propose a DNN-based credit scoring model based on the Probabilistic Soft Logic Posterior Regularization (PSLPR) framework, which will formulate the probabilistic soft logic riles of credit scoring knowledge as a posterior regularization term to integrate DNN with logic rules. At the same time, it was found that combining DNN with logical rules, and using the flexibility of logic rules to improve the interpretability of DNN is a feasible method[7]. In summary, our main contributions are listed as follows:

(1)
The PSLPR framework enables the DNN model to learn not only from the labeled dataset for training but also from the logic rules.
(2)
The PSLPR framework improves the DNN-based credit scoring model’s prediction accuracy by incorporating the logic rules into the DNN model.
(3)
Integrating DNN with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models.

The other parts of this paper are as follows. Section 2 reviews related resarch on credit scoring models and integrating DNN with logic rules methods. In Section 3, we review the definition of soft logic rules and the knowledge of posterior regularization. In Section 4, the PSLPR framework is established. In Section 5, we carry out experimental verification and analysis. Finally, Section 6 presents conclusions and looks forward to future work.
2. Related work

In this paper, we propose the Probabilistic Soft Logic Posterior Regularization (PSLPR) framework for DNN-based credit scoring models. For this reason, this section’s focus will be on reviewing the credit scoring models and integrating DNN with logic rules methods.

2.1 Credit scoring model

A broad range of techniques have been applied to solve the credit scoring problem. The research methods on credit scoring mainly include statistical methods, machine learning and deep learning. Among them, statistical techniques were often used, including logistic regression (LR) [8] and linear discriminant analysis (LDA) [9]. The rational behind statistical models is to find an optimal linear combination of explanatory input variables able to model, analyze, and predit default risk [10]. In general, the main drawback of statistical methods is their non-sufficiently high accuracy whereas their main advantage is simplicity.

Later on, machine learning techniques were considered to achieve higher accuracy in the presence of complex credit risk datasets. Applications of machine learning techniques include SVM, DT, KNN, and Random Forest (RF). In the literature on SVM-based methods. Huang et al. [11] employed three methods to build hybrid SVM-based credit score models to determine each applicant’s input credit score. Their results showed that SVMs work better than existing methods of data mining. Chern et al. [12] proposed a decision tree credit assessment approach (DTCAA) to solve vast, messy data sources and ever-changing loan regualtion problems. KNN, LR and Bayes were used by Itoo and Singh [13] to assess the credit worthiness of applicants. Ensemble methods, which combine the advantages of various single classifiers,have been developing fast recently. For example, Zhang et al. [14] proposed a novel multi-stage ensemble model with enhanced outlier adaptation. The ensemble approach improved the prediction performance against other base classifiers validated over ten real-world credit scoring datasets. Compared with traditional statistical models, machine learning models do not rely on individual subjective judgment and have relatively outstanding predictive power on complex nonlinear systems, making them more suitable for application in personal credit assessment with complicated features.

In recent years, researchers have shown an interest in the application of DNN models in credit scoring. For instance, Neagoe et al. [15] applied Deep Convolutional Neural Networks (DCNN) and Deep Multilayer Perceptron (DMLP) respectively in the field of credit scoring, and the results showed the DCNN significantly outperformed DMLP. Kvamme et al. [16] applied CNN to consumer transaction data to predict mortgage default. Dastile et al. [17] proposed an explainable deep learning model for credit scoring which convert tabular datasets into images and then use 2DCNN. The predictions from the 2DCNN were explained using Saliency Map [18], Gradient weighted Class Activation Map (Grad-CAM) [19] and Local Interpretable Model-Agnostic Explanations (LIME) [20]. Yu et al. [21] proposed a new multi-level deep belief network (DBN) based on the Extreme Learning Machine (ELM) integrated learning method to improve the accuracy of credit scoring. Ala’raj et al. [22] used directional Long-Short Term Memory (LSTM) neural network in credit scoring and the results showed that this method performed better than traditional methods. Despite DNN-based credit models’ superior performances, the complex networks make learning harder when the amount of training data is insufficient. Moreover, the automation in DNNs makes it challenging to inject prior knowledge to guide the training process.

2.2 Intergrating DNN with logic rule methods

Xu et al. [23] proposed a semantic loss that signifies how well the outputs of the DNN matches some given constrains encoded as propositional rules. Proposition rules have restricted expression capabilities, making it difficult to explain complex background information. Fischer et al. [24] constructed a system DL2 for training a DNN with domain-knowledge encoded as logical constraints. Unlike semantic loss, each individual logic operation (such as negation, and, or) is translated to a loss term. The closest ones to the setting are Probabilistic Soft Logic. Zhou et al. [25] proposed a method based on Probabilistic Soft Logic Regularization to extract the clinical time relations for global reasoning. This method summarizes the clinical knowledge into first-order logic rules [26] (FOL), uses the soft logic calculation method to calculate the satisfaction distance between the sample and the rule, and then uses it as a loss function to transfer the clinical knowledge to the neural network. The three approaches discussed above are all investigating ways to better express prior knowledge. They express prior knowledge using propositional rules, DL2 and PSL, respectively, and then transformed by mathematical axioms into a prediction that satisfies a priori knowledge. Although the above three methods can convert the prior knowledge into differentiable data to obtain prediction results that satisfy the prior knowledge, the problem with such methods is that they cannot guarantee that the logic prediction satisfies the prior knowledge while remaining close to the prediction results of DNN. Hence, the PSLPR framework uses a posteriori regularization method to obtain the logical prediction results, which efficiently incorporates indirect supervision via constraints on the posterior distributions of probabilistic models with latent variables. Hu et al. [7] fused logical knowledge into deep models through knowledge distillation [27] and posterior regularization (PR) [28]. Specifically, this method improves the knowledge distillation, and uses the logic rule as a loss function to transfer it to the weight of the deep neural network through each iteration. Kalpesh et al. [29] conducted decomposition experiments on the effectiveness of the techniques in Hu.

The field of credit scoring has a wealth of theoretical knowledge, but the current DNN-based credit scoring model completely relies on labeled data sets, ignoring the rich credit scoring logic rules. Therefore, it is important to study the integration of applied background knowledge. This study proposes a PSLPR framework that integrates credit scoring logic rules with the DNN model to further improve the predictive performance of the DNN model.

3. Preliminaries

3.1 Probabilistic Soft Logic

Probabilistic Soft Logic is proposed by Kimmig et al.[30], Probabilistic soft logic is to build a joint probability model of atoms based on the rules of first-order logic. Specifically,soft logic allows continuous truth values from $[0,1]$ instead of $\left\{0,1\right\}$ . The definition and calculation of probabilistic soft logic are explained in detail below.

.

A predicate p is a relation defines by a unique identifier, used to describe the nature of the object. An atom a is combined with a sequence of objects and predicates. Specifically, atoms in PSL take on continuous values in the unit interval $[0,1]$ .

.

A PSL rule $r$ is a disjunctive clause of atoms or negative atoms, as shown in Eq. (1).

$\displaystyle A\leftarrow B_{1}\land B_{2}\land\ldots\land B_{n}$ (1)

where $A$ as rule head, denoting the result of the rule, and $B_{1}\land B_{2}\land\ldots\land B_{n}$ as rule body, denoting the premise of the rule.

.

$I(a)$ denotes the probability of atom $a$ . $I(r)$ denotes the probability of establishment of rule $r$ , which used to define the basic logical operations in PSL, including logical conjunction( $\land$ ), disjunction( $\lor$ ), and negation( $\neg$ ), as given by Eqs (2)–(4).

$\displaystyle I(a_{1}\land a_{2})=\max\left\{0,I(a_{1})+I(a_{2})-1\right\}$ (2) $\displaystyle I(a_{1}\lor a_{2})=\min\left\{I(a_{1})+I(a_{2}),1\right\}$ (3) $\displaystyle I(\neg a_{1})=1-I(a_{1})$ (4)

The PSL rule in Definition 3.2 can also be represented as Eq. (5).

$\displaystyle I(r_{\textit{body}}\leftarrow r_{\textit{head}})=I(\neg r_{% \textit{body}}\lor r_{\textit{head}})$ (5)

.

The distance to satisfaction $s(r)$ of rule $r$ is as illustrated in Eq. (6).

$\displaystyle s_{r}=\max\left\{0,I(r_{\textit{body}})-I(r_{\textit{head}})\right\}$ (6)

where PSL program determines a rule $r$ as satisfied when the truth value of $I(r_{\textit{head}}-I(r_{\textit{body}}))>0$ .

The following example program encodes a simple model to predict loan default based on credit scoring rules. The rule is illustrated as Eq. (7).

$\displaystyle\textit{default(x)}\leftarrow\textit{less}_{100}(x)\land\neg% \textit{married(x)}$ (7)

consider loan applicants with annual income and marital status features, this rule means people who earn less than 100K and are not married will default. Given that $I(\textit{default(x)})=$ 0.3, $I(\textit{less}_{100}(x))=$ 0.8, $I(\neg\textit{married(x)})=$ 0.7, we compute the distance according to Eq. (8).

$\displaystyle s_{r}=\max\left\{0,I(\textit{less}_{100}\land\neg\textit{married% (x)})-I(\textit{default(x)})\right\}=\max\left\{0,0.7+0.8-1-0.3\right\}=0.2$ (8)

according to the previous definition, the instance does not satisfy the rule $r$ and the distance to satisfaction is 0.2.

3.2 Posterior regularization

Posterior regularization (PR) was first proposed by Ganchev [28] in order to make machine learning models imitate the human learning process, not only to obtain knowledge from a large amount of labeled data, but also to learn some experience related to specific problems. Posterior regularization provides a set of formal definition methods that use mathematical formulas to define prior knowledge, and incorporate prior knowledge into the model by constraining the posterior probability distribution of the model.

In PR, the log-likelihood of a model is penalized with KL-devation between the prior knowledge and the expected distribution of the model posterior. We define the constraint set $Q$ to contain all the expectations of prior knowledge. As shown in Eq. (9).

$\displaystyle Q=\left\{q(Y):E_{q}\left[\phi(X,Y)\right]\leqslant b\right\}$ (9)

where $\phi(X,Y)$ is the constraint feature and $b$ is the expected bound of the constraint feature. Therefore, $Q$ denotes the region where the expectation of the constraint feature is bounded by $b$ . Assuming that the log-likelihood of the model without posterior regularization is $L(\theta)$ , the posterior regularized likelihood is constrained by the expected distribution $Q$ is as Eq. (10).

$\displaystyle J_{Q}(\theta)=L(\theta)-\textit{KL}(Q\parallel p_{\theta}(Y|X))$ (10)

4. Probabilistic soft logic posterior regularization framework

4.1 Framework overview

PSLPR framework contains three parts, which are the DNN model, the probabilistic soft logic posterior regularization unit and the error calculation unit. The functions of these three parts are as follows.

(1)
DNN model takes the dataset as input and generates a DNN output for each credit applicant.
(2)
The logic posterior regularization unit accepts the DNN output as input, then constrains it by using the posterior regularization principle and generates a logic output.
(3)
The error unit introduces a new term to calcute the difference between DNN output, logic output and true label.

In general, the PSLPR framework enables the DNN model to learn from both labeled datasets and soft logic rules. This is achieved by constraining the DNN outputs in each iteration by adapting the posterior regularization, and updating the DNN model throughout the training process. The parameters of the DNN are updated using the error BackProgation (BP) algorithm until convergence. In this framework, domain expert are responsible for providing datasets and domain knowledge. Domain knowledge can encode human intetion to guide the models to capture desired patterns. The framework is shown in Fig. 1.

Figure 1.
Probabilistic soft logic posterior regularization framework.

4.2 Multilayer perceptron

The DNN model in this study uses a Multilayer Perceptron (MLP) model, which is based on a forward-feedback artificial neural network containing multiple layers of nodes, including an input layer, a hidden layer, and an output layer, with each layer fully connected to the nodes of the next layer of the network. The MLP model is shown in Fig. 2.

Figure 2.

Multilayer perceptron.

For the purpose of credit scoring, we consider a MLP with one input layer with $m$ neurons, $H$ hidden layers with $n$ neurons and output layer with two neurons. Assume neuron $h_{j}$ receives a signal from $i$ th input neurons with a weight $w_{ij}$ and a bias $b_{j}$ , where $i=1,2,\ldots,m$ and $j=1,2,\ldots,n$ . The iutput value of $h_{j}$ is given by $\alpha_{i}$ , the output value $\beta_{j}$ is as Eq. (11).

$\displaystyle\beta_{j}=\sum_{i=1}^{m}\alpha_{i}w_{ij}+b_{j}$ (11)

In this paper, Relu function is used in the hidden layer, which is given by Eq. (12).

$\displaystyle f(x)=\max(0,x)$ (12)

The output value of $k$ th output neuron can be expressed as a function of the input values and networks, which is calculated as Eq. (13).

$\displaystyle Y_{k}=\sum_{j=1}^{n}f\left(\sum_{i=1}^{m}\alpha_{i}w_{ij}+b_{j}% \right)w_{jk}+b_{k}$ (13)

where $w_{ij}$ and $w_{jk}$ are the weights of hidden and output layer respectively. After that,the output layer will predit the appliaction type $Y_{k}$ ( $k=$ 2) with the Softmax function, which is given by Eq. (14).

$\displaystyle p(y_{k}|x)=\textit{softmax}(Y_{k})$ (14)

Specifically, the MLP-based credit scoring model will learn from the credit scoring dataset. Assume the dataset $D(x,y)$ is a set of credit applicant history data. To learn the MLP-based credit scoring model, we first compute a loss with the binary cross entropy objective for each instance. The loss function is as Eq. (15).

$\displaystyle L_{\textit{DNN}}=-\sum_{d}\{y_{d}\log p_{d}+(1-y_{d})\log(1-p_{d% })\}$ (15)

where $y_{d}$ denotes the true class label as {0,1}, $p_{d}$ denotes the DNN output for the customer $d$ in [0,1].

4.3 Probabilistic soft logic posteriori regularization unit

The probabilistic soft logic poeterior regularization unit is designed to incorprate the logic rule into the DNN model through posterior regularization. We organize the priori knowledge into a soft logic rule base. Define the rule base as $R=\left\{r_{l}\right\}^{L}_{l=1}$ , $r_{l}$ indicates the $l$ th rule in the rule base. Given a set of dataset $D(x,y)$ , the assignment of the $l$ th rule $r_{l}$ to all instances in the dataset is noted as $s_{l}(X,Y)$ , $s_{l}(X,Y)$ is calculated according to Eq. (6), and the assignment of all rules to the dataset instances is ${s_{1},s_{2},\ldots,s_{l}}$ .

The probability obtained after incorporating the rules is defined as $q(y|x)$ , denoted as $u^{\theta}$ . To better incorporate the rules, $q(y|x)$ has to fit the rules. The strategy used is $E_{q(y|x)}\left[s_{l}(X,Y)\right]=1$ . This constraint defines the regularization space of all valid distributions. In addition, $q(y|x)$ has to stay close to the DNN output $p(y|x)$ . For this condition, the KL scatter is used to measure the distance between $p_{\theta}(y|x)$ and $q(y|x)$ , and it is expected to minimize it. Combining these two components and further adding the relaxation variables, the following optimization problem is finally obtained, as shown in Eq. (4.3).

$\displaystyle\min_{q,\xi}KL(q(y|x)\parallel p_{\theta}(y|x))+C\sum_{l}\xi_{l}$ $\displaystyle s.t.\quad\left\{\begin{array}[]{l}1-E_{q}(y|x)[s_{l}(x,y)]% \leqslant\xi_{l}\\ l=1,\ldots,L\end{array}\right.$ (16)

where, $\xi_{l}$ is the relaxation variable for each logical rule and $C$ is the regularization parameter. This problem can then be viewed as a projection of $p_{\theta}(y|x)$ into the constrained subspace. The solution to this problem is shown in Eq. (17).

$\displaystyle q(y|x)\propto p_{\theta}(y|x)\exp\left\{-\sum_{l}C(1-s_{l}(x,y))\right\}$ (17)

4.4 The error calculation unit

After obtaining the logic output $q(y|x)$ which contains the logic rules. It is also important to consider how the information embedded in the dataset and the prior knowledge can be transferred to the parameters of the neural network. Therefore, the loss function of the training process should not only measure the loss between the DNN output and the true label, but also between the logic output and the true label. Here, the new loss term $L_{l}ogic$ measures the logic output with respect to the true label. As shown in Eq. (18).

$\displaystyle L_{\textit{logic}}=-\sum_{d}\{y_{d}\log q_{d}+(1-y_{d})\log(1-q_% {d})\}$ (18)

where $y_{d}$ denotes the true class label as {0,1}, $p_{d}$ denotes the logic output for the customer $d$ in [0,1]. The loss function is as Eq. (19).

$\displaystyle L=(1-\lambda)L_{\textit{DNN}}+\lambda L_{\textit{logic}}$ (19)

where $\lambda$ denotes the relative importance of the two parts of the learning resources. We apply gradient descent to minimize the loss function and to update the parameters of our model. The training process of the PSLPR framework is shown in Algorithm 1.

: Training process of PSLPR framework Input: Dataset $D\left\{(x_{n},y_{n})\right\}_{n=1}^{N}$ Rule base $R=\left\{r_{l}\right\}_{l=1}^{L}$ Regularization parameters $C$ The relative importance of the two components of learning resources $\lambda$ Training step $K$ Output: Trained neural network

[1] Initialize network parameters $w_{ij}$ and $b_{j}$ $i\in K$ all $(x_{d},y_{d})\in D$ Calculate the neural network prediction results $p(y|x)$ Calculate the logic rule prediction results $q(y|x)$ according to Eq.(17) Calculate the loss and update the parameters of the neural network according to Eq. (19)

5. Experimental results and discussion

The configuration of the experimental environment in this paper is as follows: CPU Intel(R) Core(TM) i7-8565U CPU @ 1.80 GHz 1.99 GHz; RAM 8 G; OS Windows 10 64-bit. The development language used in this paper is Python (3.6), and the deep learning framework is selected from the Python-based deep learning library Tensorflow version 2.0. In order to test the performance of the model, the following three main experiments are done in this paper on two different data sets.

(1) (1)
Parametric sensitivity experiments to verify the model’s performance results when different values of the importance parameter $\lambda$ of the logic rule are taken.
(2)
Comparing the performance of the improved DNN-based credit scoring model with other credit scoring models.
(3)
Compare the performance of the improved DNN-based credit scoring model with PSLPR framework and other intergrating DNN with logic rules methods.

5.1 Experimental settings

5.1.1 Data set

Two datasets are used in evaluating the performance of our method. The Australian credit dataset and the default of credit card clients dataset are both from the UCI repository of machine learning databases (http://archive.ics.uci.edu/ml). These two datasets have been widely used in credit scoring research. A summary of the characteristics of the above two datasets is reported in Table 1.

Table 1
The characteristics of three datasets used in the experiment.

	Total cases	Good/bad cases	No.of attributes	Missing values
Australian credit	690	307/382	14	NA
The default of credit card clients dataset	30000	23364/6636	23	NA

The Australian credit dataset has 14 features and 6 of features are numerical and 8 are categorical. Their names have been changed to meaningless symbols to protect the confidentiality of the data. Particular data records belong to one of two classes: application rejected or application approved. In general, the bad cases occupy 55.5% of the whole set, whereas the remaining 45.5% represents the good cases. The Australian credit dataset is a balanced dataset.

The default of credit card clients has 23 numerical features. There are four demographic features covering gender, education, marital satatus and age of each client. These are followed by 18 features providing a history of payments and bill statements, i.e., repayment status as well as the amount of bill statements and previous payments for the six consecutive months. Like the Australian credit dataset, this data set is divided into two categories. In general, the bad cases occupy 77.88% of the whole set, whereas the remaining 22.12% represents the good cases. The default of credit card clients dataset is an imbalanced dataset. In this paper, 70% of the data is randomly selected as the training set and 20% of the data is used as the test set.

5.1.2 Evaluation metrics

The performance evaluation criterion is an indispensable part of the measurement model. There are many evaluation metrics which are used in the literature. The following metrics are the most popular metrics for assessing the performance of modeks in credit scoring. The Percentage Correctly Classified (PCC), Kolmogorov-Smirnov Statistic(K-S), Recall, F1 and Area Under Receiver Operating Characteristics Curve (AUC).

A confusion matrix consists of True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN) and is used for calculating the metrics which are discussed in this section. In order to test the accuracy of our model, we selected the following four evaluation indexes. From the confusion matrix. These performance metrics can be derived accourding to Eqs (20)–(23).

$\displaystyle\textit{PCC}=\frac{\textit{TP}}{(\textit{TP}+\textit{FP})}$ (20) $\displaystyle\textit{Recall}=\frac{\textit{TP}}{(\textit{TP}+\textit{FN})}$ (21) $\displaystyle\textit{F1}=\frac{2\times\textit{Recall}\times\textit{PCC}}{(% \textit{PCC}+\textit{Recall})}$ (22) $\displaystyle\textit{AUC}=\frac{1}{2}\left(1+\frac{\textit{TP}}{\textit{TP}+% \textit{FN}}-\frac{\textit{FP}}{\textit{FP}+\textit{TN}}\right)$ (23)

Table 2

Confusion matrix

		Predicted
		Positives	Negatives
Actual	Positives	TP	FN
	Negatives	FP	TN

Table 3

MLP model parameter settings on the Australian credit dataset

Layer	Neuron	Activation functions
Input	14	Relu
Hidden	41	Relu
Hidden	41	Relu
Hidden	41	Relu
Hidden	41	Relu
Hidden	41	Relu
Hidden	41	Relu
Output	2	Softmax

5.1.3 Comparison algorithm

In order to prove the advantages of our method, we make a detailed comparative analysis. We selected two types of comparison models: the credit scoring models and the intergrating DNN with logic rules methods. The details are as follows.

(1)
Credit scoring models: LR (Logistic Regression), SVM (Support Vector Machine), RF (Random Forest), CNN (Convolutional Neural Network).
(2)
Intergrating DNN with logic rules methods: HDNNLR (Harnessing Deep Neural Networks with Logic Rules) [7], PSLR [25] (Probabilistic Soft Logic Regularization).

5.1.4 Parameter and logic rules

The PSLPR framework is agnostic to the network architecture, and thus applicable to general types of DNN models, including MLP and CNN. In order to verify the effectiveness of the PSLPR framework, the MLP-based credit scoring model was chosen for the experiments in this study. The MLP model is shown in Fig. 2, and the parameter settings of the model on different datasets are shown in Tables 3 and 4.

Table 4
MLP model parameter settings on the default of credit card clients dataset

Layer	Neuron	Activation functions
Input	23	Relu
Hidden	50	Relu
Hidden	40	Relu
Hidden	30	Relu
Hidden	20	Relu
Hidden	10	Relu
Hidden	2	Softmax

Table 5

Parametric sensitivity result on the Australian credit dataset

$\lambda$	PCC	Recall	F1	AUC
0	87.50	72.41	79.25	93.16
0.1	86.59	77.59	81.82	92.03
0.2	88.89	82.76	85.71	94.63
0.3	89.58	74.14	81.13	94.74
0.4	91.67	73.90	70.21	94.76
0.5	85.11	68.97	76.19	99.27
0.6	93.62	75.86	83.81	94.76
0.7	89.58	74.14	81.13	92.95
0.8	93.75	77.59	84.91	93.41
0.9	91.30	72.41	80.77	94.38

The purpose of this study is to construct a framework that enables the fusion of DNN models with logic rules to optimize the models. The logic rule on the Australian credit dataset are derived from the experimental results of the paper by Marian et al. [3]. The logic rule on the default of credit card clients dataset are derived from the experimental results of the paper by Kampfer [31]. For the Australian credit dataset, $a_{1}$ indicates that $A_{8}=$ 0, $a_{2}$ indicated that 900 $\leqslant A_{13}\leqslant$ 1100. The logic rules are written as Eq. (24).

$\displaystyle\textit{Default(x)}\leftarrow a_{1}(x)\land a_{2}(x)$ (24)

Equation (23) indicates that for each instance when $A_{8}=$ 0 and 900 $\leqslant A_{13}\leqslant$ 1100, this applicant will dafault. For the default of credit card clients dataset, $b_{1}$ indicates that $\textit{PAY}_{1}\geqslant$ 1.5, $b_{2}$ indicated that $\textit{PAY}_{3}\geqslant$ $-$ 0.5. The logic rules are written as Eq. (25).

$\displaystyle\textit{Default(x)}\leftarrow b_{1}(x)\land b_{2}(x)$ (25)

Equation (24) indicates that for each instance when $\textit{PAY}_{1}\geqslant$ 1.5 and $\textit{PAY}_{3}\geqslant$ $-$ 0.5, this applicant will dafault.

5.2 Results and discussion

5.2.1 Parameter sensitivity

In this section, we explore how the labeled dataset and the logic rules affect the accuracy of our method. We vary the relative importance of the labeled dataset and the logic rules parameters ( $\lambda$ ) from 0 to 1 with a step of 0.1. According to Eq. (13), when $\lambda=$ 0 indicates that the logic rule has no effect on the DNN model, which is equivalent to the standard DNN model. As $\lambda$ increases, the logic rules have an increasing impact on the model training. Tables 5 and 6 report PCC, Recall, F1, and AUC of our method for the Australian credit dataset and the default of credit card clients dataset.

Table 6
Parametric sensitivity on the default of credit card clients dataset

$\lambda$	PCC	Recall	F1	AUC
0	96.79	83.89	89.88	78.41
0.1	96.79	84.22	90.06	79.41
0.2	96.89	84.09	90.04	79.29
0.3	96.85	84.11	90.03	79.45
0.4	96.51	84.36	90.03	79.45
0.5	96.30	84.45	89.99	79.45
0.6	96.49	84.50	90.10	79.33
0.7	96.18	84.61	90.02	79.20
0.8	96.11	84.64	90.01	79.22
0.9	95.48	85.33	90.12	78.60

Figure 3.

Parametric sensitivity result on Australian credit dataset. (a) PCC. (b) Recall. (c) F1. (d) AUC.

Figure 4.

Parametric sensitivity result on default of credit card clients dataset. (a) PCC. (b) Recall. (c) F1. (d) AUC.

Figure 3 shows the influence of the parameters $\lambda$ on the PCC, Recall, F1, and AUC on the Australian credit dataset, which can been seen as follows.

(1)

The highest PCC of 93.75% was achieved when $\lambda=$ 0.8, increase 7.14% compared to the standard MLP. The highest Recall of 82.76% and the highest F1 of 85.71% were achieved when $\lambda=$ 0.2, increasing 14.29% and 8.15% compared to the standard MLP. The highest AUC of 99.27% was achieved when $\lambda=$ 0.5, increase of 5.43% compared to the standard MLP.

(2)

The average effect of the evaluation metrics is best when $\lambda=$ 0.6, PCC, Recall, F1 and AUC are 93.62%, 75.86%, 83.81% and 94.76% .

Figure 4 shows the influence of the parameters $\lambda$ on the PCC, Recall, F1, and AUC on the default of credit card clients dataset, which can been seen as follows.

(1)

PCC reaches a maximum of 96.89% when $\lambda=$ 0.2, increase 0.103% compared to the standard MLP. Recall reaches a maximum of 85.33% and F1 reaches a maximum of 90.12% when $\lambda=$ 0.9, increases by 1.72% and 0.27% compared to the standard MLP. AUC reaches a maximum of 79.45% when $\lambda=$ 0.3, 0.4, 0.5, increase of 1.33% compared to the standard MLP.

(2)

The average effect of the evaluation metrics is best when $\lambda=$ 0.2, PCC, Recall, F1 and AUC are 96.89%, 84.09%, 90.04% and 79.29% .

Overall, the PSLPR framework can improve the overall performance of the MLP model, especially on the Australian Credit dataset with large improvements. Also, in the experiments on both datasets, the improvement in Recall exceeds the other metrics, which may be due to the enhanced ability of the model to identify counterexamples, since the logic rules incorporated into the model are all about how to identify samples as negative. This situation also further indicates that the logic rules can be effectively intregrated into the DNN model through the PSLPR framework of this study.

5.2.2 Credit scoring models’ comparison experiment

To verify the effectiveness of MLP-based credit scoring model incorporated with logic rules, its performance was compared with five commonly used credit scoring models, namely LR, SVM, RF, CNN and stardand MLP. Table 7 reports PCC, Recall, F1, and AUC of the credit scoring models for the Australian credit dataset and the default of credit card clients dataset.

Figure 5.

Histogram of each model metrics value on Australian credit dataset. (a) PCC. (b) Recall. (c) F1. (d) AUC.

Table 7

Performance comparison of five different credit scoring models

Database	Model	PCC	Recall	F1	AUC
Australian credit	LR	82.26	87.93	85.00	87.09
	SVM	82.76	82.76	82.76	85.13
	RF	88.00	75.86	84.18	84.18
	MLP	87.50	72.41	79.25	93.16
	CNN	83.02	75.86	79.28	93.1
	Ours	93.75	77.59	84.91	93.41
Default of credit card clients	LR	97.84	82.49	89.52	60.1
	SVM	96.58	84.25	89.80	64.52
	RF	94.63	84.08	89.05	63.83
	MLP	96.79	83.89	89.88	78.41
	CNN	96.28	84.14	89.80	76.25
	Ours	96.89	84.09	90.04	79.29

Figure 6.

Histogram of each model metrics value on default of credit card clients dataset. (a) PCC. (b) Recall. (c) F1. (d) AUC.

Figure 5 shows the PCC, Recall, F1 and AUC of credit scoring models on the Australian credit dataset. As shown, the MLP-PSLPR credit scoring model performs better than other credit scoring models. Based on the above results, initial fingings are frawn in following.

(1)

The results show that on the Australian credit dataset, the MLP-PSLPR credit scoring model performs better in terms of PCC and AUC. The LR model achieves the highest Recall and F1. However, its performance on PCC is the worst of several models.

(2)

In the terms of PCC and AUC, the DNN-based credit scoring model like MLP and CNN performe better than all machine learning-based credit scoring models. However, Recall and F1 values are not as good as the machine learning-based models. The MLP-PSLPR model compensates for these drawbacks by intregrating logic rules into the MLP model, resulting in higher Recall and F1 values, while maintaining PCC and AUC.

Figure 6 presents the PCC, Recall, F1 and AUC of models on the default of credit card clients dataset. As shown, the MLP-PSLPR model performs better than other credit scoring models, which can been seen as follows.

(1)

The MLP-PSLPR credit scoring model performs better in term of F1 and AUC. The LR model achieves the highest PCC and the SVM model achieves the highest Recall. Similarly, they both perform poorly on AUC.

(2)

The improvement effect of MLP-PSLPR model on the default of credit card clients dataset is smaller than that on the Australian Credit dataset. This situation may be due to the imbalance of positive and negative samples in the default of credit card clients dataset, with a small percentage of negative samples. As a result, when the logic rules are used to determine negative samples, the model’s overall performance does not increase significantly.

5.2.3 Intergrating DNN with logic rules methods’ comparison experiment

To demonstrate the effectiveness of our intergrating neural network with logic rules framework, it is compared with two intergrating neural network with logic rule approaches: HDNNLR [7] and PSLR [24]. These methods all involve different selections of $\lambda$ values. In order to maintain the impartiality of the experimental findings, each method chooses the $\lambda$ values when the model performs best overall. The results of the comparisons are shown in Table 8.

Table 8
Performance comparison of two different intergrating neural network with logic rule methods

Database	Model	PCC	Recall	F1	AUC
Australian credit	MLP-PSLP	90.00	77.59	83.33	92.92
	MLP-HDNNLR	89.80	75.86	82.24	94.12
	Ours	93.75	77.59	84.91	93.41
Default of credit card clients	MLP-PSLR	96.03	84.64	89.98	79.21
	MLP-HDNNLR	96.83	83.89	89.88	79.33
	Ours	96.85	84.11	90.03	79.45

Observe the effect of the MLP model on two datasets after being improved by different intergrating logic rules approaches. On the Australian credit dataset, the HDNNLR approach improves the AUC better than our approach. On the default of credit card clients dataset, PSLR method has the best effect on improving the Recall value of the MLP model. However, on both datasets, the MLP-PSLPR model has the best overall results. In general, the PSLPR method has the best improvement on the MLP model.

6. Conclusion

Credit scoring is an important component of a critical financial decision. The credit scoring model based on DNN has gradually become a research hotpot. However, over-reliance on good quality data and low transparency limits its further development. Therefore, the PSLPR framework that intergrating logic rules into the DNN is proposed. In particular, we formulate the probabilistic soft logic rules of credit scoring knowledge as a posterior regularization term to integrate deep learning and logical rules, and introduce new error terms to incorporate logic rules into the parameters of the DNN model.

In order to measure the effectiveness of the proposed model, this paper conducts experimental validation using a public dataset of credit scoring. The experimental results show that the new approach proposed in this paper for building a DNN-based credit scoring model can utilize credit scoring prior knowledge and dataset together to guide the generation of a DNN model with improvements in several assessment metrics. It shows superiority compared to other inergrating the DNN with logic rule approaches. Meanwhile, the flexible first-order logic rules provide some interpretability for the DNN model.

Although the framework proposed in this paper achieves the incorporation of prior knowledge of credit scoring in the modeling process, provides some interpretability and achieves the improvement of model accuracy. However, it still cannot achieve the transparency of the inference stage like the decision tree model. Therefore, future work in this study will further investigate the neural network rule extraction method to achieve interpretability of the inference process, and use the generated rules to improve the rule base.

Footnotes

Acknowledgments

This work was funded by the National Key Research and Development Program of China under Grant (2019YFB1405000) and Shaanxi Natural Science Foundation of China (2020JQ-758). The authors also, acknowledge with thanks Professors in Xi’an University of Science and Technology for theoretical support.

References

Kozeny

, Genetic algorithms for credit scoring: Alternative fitness function performance comparison, Expert Systems with Applications 46 (2015), 2998–3004.

Florez-Lopez

and Ramon-Jeronimo

J.M.

, Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal, Expert Systems with Applications 42 (2015), 5737–5753.

Gorzałczany

M.B.

and Rudziński

, A multi-objective genetic optimization for fast, fuzzy rule-based credit classifization with balanced auucracy and interpretability, Applied Soft Computing 40 (2016), 206–220.

Dastile

Celik

and Potsane

, Statistical and machine learning models in credit scoring: A systematic literature survey, Applied Soft Computing 91 (2020), 106263.

Vonrueden

Mayer

Beckh

Georgiev

Giesselbach

Heese

Kirsch

Pfrommer

Pick

Ramamurthy

Walczak

Garcke

Bauckhage

and Schuecker

, Informed Machine Learning – A Taxonomy and Survey of Integrating Knowledge into Learning Systems, IEEE Transactions on Knowledge and Data Engineering 99 (2021), 1.

Stevens

Taylor

Nichols

Maccabe

A.B.

Yelick

and Brown

, Ai for science. Technical report, Argonne National Lab.(ANL), Argonne, IL (United States), 2020.

and Liu

, Harnessing Deep Neural Networks with Logic Rules, In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016, Volume 1: Long Papers.

Jones

and Hensher

D.A.

, Modelling corporate failure: A multinomial nested logit analysis for unordered outcomes, The British Accounting Review 39 (2007), 89–107.

Altman

E.I.

, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance 23 (1968), 589–609.

10.

Chen

Ribeiro

and Chen

, Financial credit risk assessment: a recent review, Artificial Intelligence Review 45 (2016), 1–23.

11.

Huang

C.L.

Chen

M.C.

and Wang

C.J.

, Credit scoring with a data mining approach based on support vector machines, Expert systems with applications 33 (2007), 847–856.

12.

Chern

C.C.

Lei

W.U.

Huang

K.L.

and Chen

S.Y.

, A decision tree classifier for credit assessment problems in big data environments, Information Systems and e-Business Management 19 (2021), 363–386.

13.

Itoo

and Singh

, Comparison and analysis of logistic regression, NaÃ¯ve Bayes and KNN machine learning algorithms for credit card fraud detection, International Journal of Information Technology 13 (2021), 1503–1511.

14.

Zhang

Yang

Zhang

Jose

H.A.

and Wu

, A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring, Expert Systems with Applications 165 (2021), 113872.

15.

Neagoe

V.E.

Ciotec

A.D.

and Cucu

G.S.

, Deep convolutional neural networks versus multilayer perceptron for financial prediction, In 2018 International Conference on Communications (COMM), 2018, pp. 201–206.

16.

Kvamme

Sellereite

Aas

and Sjursen

, Predicting mortgage default using convolutional neural networks, Expert Systems with Applications 102 (2018), 207–217.

17.

Datile

and Celik

, Making Deep Learning-Based Predictions for Credit Scoring Explainable, IEEE Access 9 (2021), 50426–50440.

18.

Samek

Montavon

Vedaldi

Hansen

L.K.

and Muller

K.R.

, Explainable AI: interpreting, explaining and visualizing deep learning, Lecture Notes in Computer Science, 2019.

19.

Selvaraju

R.R.

Cogswell

Das

Vedantam

Parikh

and Batra

, Grad-cam: Visual explanations from deep networks via gradient-based localization, In Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.

20.

Ribeiro

M.T.

Singh

and Guestrin

, “Why should i trust you?” Explaining the predictions of any classifier, In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.

21.

Yang

and Tang

, A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment, Flexible Services and Manufacturing Journal 28 (2016), 576–592.

22.

Ala’raj

Abbod

M.F.

and Majdalawieh

, Modelling customers credit card behaviour using bidirectional LSTM neural networks, Journal of Big Data 8 (2021), 1–27.

23.

Zhang

Friedman

Liang

and Broeck

, A semantic loss function for deep learning with symbolic knowledge, In International Conference on Machine Learning, PMLR, 2018, pp. 5502–5511.

24.

Fischer

Balunovic

Drachsler-Cohen

Gehr

Zhang

and Vechev

, DL2: training and querying neural networks with logic, In International Conference on Machine Learning, PMLR, 2019, pp. 1931–1941.

25.

Zhou

Yan

Han

Caufield

J.H.

Chang

KW.

Sun

Ping

and Wang

, Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference, In Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 14647–14655.

26.

Enderton

H.B.

, A mathematical introduction to logic, Elsevier, 2001.

27.

Hinton

Vinyals

and Dean

, Distilling the Knowledge in a Neural Network, Computer Science 14 (2015), 38–39.

28.

Ganchev

GraÃ§a

Gillenwaater

and Taskar

, Posterior regularization for structured latent variable models, The Journal of Machine Learning Research 11 (2010), 2001–2049.

29.

Krishna

Jyothi

and Iyyer

, Revisiting the importance of encoding logic rules in sentiment classification, In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 4743–4751.

30.

Kimmig

Bach

Broecheler

Huang

and Getoor

, A short introduction to probabilistic soft logic, In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, 2012, pp. 1–4.

31.

Kampfer

T.L.

, Performance and Interpretability of Machine Learning Algorithms for Credit Risk Modelling, Ph.D. Dissertation, Ludwig-Maximilians-University Munich, 2018.

Integrating deep neural network with logic rules for credit scoring

Abstract

Keywords

1. Introduction

2.1 Credit scoring model

2.2 Intergrating DNN with logic rule methods

3. Preliminaries

3.1 Probabilistic Soft Logic

.

.

.

.

4.1 Framework overview

5.1.1 Data set

Table 1 The characteristics of three datasets used in the experiment.

Table 4 MLP model parameter settings on the default of credit card clients dataset

5.2.1 Parameter sensitivity

Table 6 Parametric sensitivity on the default of credit card clients dataset

Table 8 Performance comparison of two different intergrating neural network with logic rule methods

Footnotes

Acknowledgments

References

Table 1
The characteristics of three datasets used in the experiment.

Table 4
MLP model parameter settings on the default of credit card clients dataset

Table 6
Parametric sensitivity on the default of credit card clients dataset

Table 8
Performance comparison of two different intergrating neural network with logic rule methods