Parallel double-layer prediction model construction and empirical analysis for enterprise credit assessment

Abstract

Credit is a part of external image of enterprises, and it directly affects interests of enterprises. Nowadays, most of researches on predictions of enterprises credit use a single algorithm model or optimize a single model to predict an enterprises credit score. The accuracy of each model is different, and the generalization ability is generally weak. In order to improve generalization ability of models and accuracy of prediction results, a parallel double-layer prediction model is proposed in this paper. The model is based on Stacking and Bagging methods, which can improve generalization ability with high accuracy. Through experiments, we compare three single algorithm models, four integrated learning models with other combination strategies and parallel double-layer prediction model. Average value of four evaluation indexes are increased by 4.2349%, 63.1464%, 34.11837%, 1.26104%, 15.7862%, 10.1457% and 25.6310% respectively. The results show that the parallel double-layer prediction model is accurate and feasible.

Keywords

Integration technology credit scoring machine learning neural network stacking

1. Introduction

Credit is an effective “ID card” and a reliable “pass” of enterprises [1]. It reflects external image of enterprises and it has a great impact on the interests of enterprises [2]. Traditional single regression algorithms and neural network are widely used in all aspects of evaluation and prediction, including credit evaluation.

Common single regression algorithms include linear regression algorithm [4], Support Vector Machine (SVM) algorithm [5], K-Nearest Neighbor (KNN) regression algorithm [6], etc. Single regression algorithm models training speed are fast, they are relatively easy to understand. However, there are shortcomings, polynomial regression is difficult to design for nonlinear data. For different data sets, there will be instability, and reliability of prediction results is low.

Neural network [7, 8] has many applications, including credit score prediction, however, it has some defects. It lacks explicability, and we do not know the cause of forecast results. And it needs a lot of data to train.

Furthermore, many models use optimization algorithm to optimize a single regression algorithm to improve accuracy of model prediction. Although optimization algorithms are used to optimize regression algorithms, we can only get model with better prediction effect on one specific data set or one aspect, that is weak supervision model [9]. For different data sets, there is still a problem of weak generalization ability. We can use ensemble learning method to solve this problem well.

Ensemble learning [9, 10] is combining multiple weak supervised models here to get a better and more comprehensive supervised model. Even if one weak classifier gets a wrong prediction, other weak classifiers can correct error back, so as to reduce variance or improve prediction results [11]. Combination strategies of ensemble learning can be formulated and edited according to different methods. Different methods have different results, and it will lead to differences in accuracy and generalization ability of ensemble effect. In order to improve the prediction accuracy and the generalization ability of models, this paper develops the combination strategy and proposes a parallel double-layer prediction model. The contribution of the model is as follows.

•
To satisfy different data sets, the base learner can be reselected according to actual data.
•
The model solves over fitting phenomenon of model with high prediction accuracy and improves generalization ability. A large number of experiments on various data sets have proved the accuracy and stability of the model.
•
Our method improves reliability of evaluation in aspect of credit evaluation, and provides partial support for agent intelligent inspection.

The other parts of this paper are as follows. Section 2 reviews related researches on credit evaluation. In Section 3, the methods are given and the parallel double-layer prediction model is established. In Section 4, we carry out experimental verification and analysis. Finally, we present conclusions and look forward to the future work.
2. Related work

There are a lot of work on credit scoring before, most of them use regression prediction algorithms in machine learning or neural network to predict, such as Support Vector Machine (SVM), Back Propagation (BP) neural network and so on. Through literature review, current credit evaluation methods mainly include single traditional algorithms, neural network methods and integrated learning methods.

2.1 Traditional single algorithm

Fan et al. [12] proposed a credit scoring model of support vector machine, it was based on premature convergence index and adaptive mutation partial swarm optimization, which solved the premature convergence problem of traditional particle swarm optimization. Hu et al. [13] proposed a credit rating evaluation method for commercial websites based on Weighted Support Vector Machine (WSVM), and associated it with website construction time. Maldonado et al. [14] proposed two formulations based on Support Vector Machines for simultaneous classification and feature selection that explicitly incorporated attribute acquisition costs. Niu et al. [15] used smote algorithm to optimize the model and improve the evaluation performance of the model, evaluated and analyzed the complex credit risk of online lending. Junior et al. [16] evaluated applicability of dynamic selection technique to credit scoring problem, and proposed a method to reduce the number of K-Nearest Neighbors (RMKNN). This method improved the latest level of the local region of dynamic selection technique for defining imbalanced credit scoring data sets. Zhang et al. [17] mentioned two typical variants of the integrated Decision tree (DT) method, Random Forest (RF) and Gradient Boosting Decision Tree (GBDT), have been used in recent credit scoring studies.

2.2 Neural network method

Hu et al. [18] used particle swarm optimization to train BP neural network and improved existing algorithm. By changing the speed of particle search in the weight space, the mean square error of network output was gradually reduced. Li et al. [19] improved optimal segmentation algorithm, applied it to the training of Radical Basis Function (RBF) neural network parameters to realize the adaptive selection of the number of hidden nodes and established credit rating model. Pławiak et al. [20] proposed a Deep Genetic Hierarchical Network Learning (DGHNL) method, the novelty of the method depended on appropriate information flow and fusion.

2.3 Integrated learning method

Xia et al. [21] proposed a sequential integrated credit scoring model based on gradient elevator, which provided feature importance score and decision graph, and enhanced the interpretability of credit scoring model. Yeh et al. [22] proposed a credit rating prediction model with market information as the prediction variable, which provided valuable information, better classification results and meaningful rules for credit rating.

To improve credit score prediction, Trivedi et al. [23] used different feature selection techniques and machine learning classifiers. Based on the Consensus Aproach (ConsA) of different classification algorithms, Ala’raj et al. [24] proposed a new classifier combination rule. Niu et al. [25] constructed three machine learning algorithms (random forest, AdaBoost, and LightGBM) to demonstrate prediction performance of social network information. To achieve transparency and simplicity of credit scoring data sets with heterogeneous attributes, Hayashi et al. [26] used one-dimensional full connection layer first CNN and recursive rule extraction (Re-RX) algorithm combined with J48graft decision tree.

2.4 Others

Besides, there are other ways, Yu et al. [27] established a credit evaluation weight optimization model, which changed disadvantages that the existing research can not guarantee evaluation result with maximum discrimination power after the weight was given. Based on a theoretical model proposed by the literature of economics and credit risk management, Hodgkinson et al. [28] used expert system to evaluate credit value of applicant enterprise, which can decide whether to grant the credit line to applicant enterprise. Yin et al. [29] proposed a framework to identify legal judgments that can effectively predict credit risk, extract relevant features contained in effective legal judgments, and use legal judgments to evaluate credit risk of SMEs. Zhang et al. [30] proposed a novel multi criteria optimization classifier based on kernel, fuzzification, and penalty factor (KFPMCOC), which can improve efficiency of credit risk scoring and generalization degree of credit rating prediction for new applicants. Guo et al. [31] proposed a case-based credit risk assessment model for P2P lending investment decision.

Some of above methods use optimization methods to optimize traditional algorithms, or use integration technology to combine a variety of algorithms. These methods basically improve accuracy, but most of them blindly pursue accuracy and ignore generalization ability.

3. Double-layer prediction model

3.1 Approach overview

On the basis of ensemble learning, combining Bagging and Stacking ideas, we propose a parallel double-layer prediction model by changing ensemble strategies. The data processing flow of the model are as follows.

•
A processed standard data set is brought into a variety of basic learners to get multiple outputs $\textit{Output}_{1},\textit{Output}_{2},\cdots,\textit{Output}_{n}$ .
•
Neural network is used to calculate the weight of each outputs $\omega_{1},\omega_{2},\cdots,\omega_{n}$ .
•
$\textit{Output}=\textit{Output}_{1}\omega_{1}+\textit{Output}_{2}\omega_{2}+% \textit{Output}_{n}*\omega_{n}$ is calculated as a new sample value and update to the data set, form a new standard data set $D^{\prime}$ . Specifically, Output represent new sample values.
•
The new standard data set $D^{\prime}$ is brought into basic learner with the best prediction effect, and obtain final evaluation results.

After using a training data set to train the model, we can input the relevant data of enterprises for credit evaluation. The parallel double-layer prediction model combines many kinds of basic learning algorithms. The overall combination method adopts double-layer structure of Stacking method, and the local part adopts Bagging structure. The specific combination strategy is shown in Fig. 1.

Figure 1.
Parallel double-layer prediction model.

On the whole, the combination strategy described in this paper is based on the Stacking structure, the lifting processing is done. Bagging structure is used in local. Specifically, the combination of several basic learners in the Preliminary combination in Fig. 1 is the Bagging parallel structure. As a key step of the parallel double-layer prediction model, this step has two main functions.

•
Improve generalization ability of the model. Avoiding single learner model performs well in a certain dataset, but the effect of a new data set is not ideal.
•
There is a problem that the value of tag item does not match the value of feature item in a data set. This step can prepare for updating tag item in the data set, making data more reasonable and easy to fit.

The second main step is the second layer structure of Stacking framework, which is to improve accuracy of prediction results. Using updated data set train the best basic learner, this can improve accuracy of prediction results of the parallel double-layer prediction model.

Bagging algorithm is one of the most famous representative of parallel integrated learning method in Machine learning field. Its flow chart is shown in Fig. 2.

Figure 2.
Bagging algorithm flow.

Stacking algorithm is a hierarchical model integration framework, a double-layer structure is adopted in this paper. The first layer is composed of multiple base learners, whose input is a original training set. The second layer model is based on the output of the first layer of base learners, as a feature added to the training set for retraining, so as to get a complete Stacking model. The description of stacking algorithm is shown in Algorithm 1.

: Description of Stacking algorithm[1] Training set $D(x_{1},y_{1})(x_{2},y_{2}),\ldots,(x_{m},y_{m})$ Base learning algorithm $\xi_{1},\xi_{2},\ldots,\xi_{T}$ Secondary learning algorithm $\xi$ $H(x)=h^{\prime}(h_{1}(x),h_{2}(x),\ldots,h_{T}(x))$ $t=1,2,\cdots,T$ $h^{\prime}=\xi(D)^{\prime}$ $h_{t}=\xi_{t}(D)$ $i=1,2,\ldots,m$ $t=1,2,\ldots,T$ $z_{it}=h_{t}(x_{i})$ $D^{\prime}=D^{\prime}\cup((z_{i1},z_{i2},\ldots,z_{iT}),y_{i})$ $h=\xi(D^{\prime})$
3.2 Construction of standard data set

Accurate and comprehensive data samples are the basis of model training. The credit data set used in this paper consists of 29 features. Except the first attribute is character type, the attributes of other data are floating-point type and integer type, with a total of more than 50000 groups. In original data, there is a problem of non-standard data, which will affect the prediction results. Therefore, in order to obtain more accurate results, before data is substituted into models, we need to standardize the data to make it conform to standard. The main preparations are as follows.

•
Manual elimination of character features. For example, in the credit data set applied in this paper, the first attribute is user code, which has no effect on credit score, so we manually remove this item.
•
Data normalization. We use min-max standardization method to make the linear transformation of original data, so that the result value can be mapped between [0–1]. This step can greatly reduce the training time of neural network for the data. The transformation function is shown in Eq. (1).

$\displaystyle x^{\prime}={\displaystyle\frac{x-x_{\min}}{x_{\max}-x_{\min}}}$ (1)

Where $x$ is raw data, $x^{\prime}$ is normalized data, $x_{\max}$ and $x_{\min}$ represent the maximum and minimum values of raw data respectively.
•
Correlation analysis. After character type features are eliminated, the remaining data are continuous variables. We use Pearson correlation coefficient to calculate. If samples are set as $X$ and $Y$ sets, the Pearson correlation coefficient is expressed as Eq. (2).

$\displaystyle\rho_{X,Y}={\displaystyle\frac{\textit{Cov}(X,Y)}{\sigma X\sigma Y}}$ (2)

Specifically, $\textit{Cov}(X,Y)$ , $\sigma X$ and $\sigma Y$ represent the covariance of sample $X$ and $Y$ , and the standard deviation of sample $X$ and $Y$ respectively. $\textit{Cov}(X,Y)$ can be transformed. The specific transformation is shown in Eq. (3), where $E[]$ represents the expected value of sample.

$\displaystyle\textit{Cov}(X,Y)\leftrightarrow E[(X-E[X])(Y-E[Y])]% \leftrightarrow E[XY]-2E(Y)E(X)+E[X]E[Y]\leftrightarrow E(XY)-E[X]E[Y]$ (3)

The correlation coefficient of sample is expressed by $R$ , and the expression of $R$ is obtained by transforming Eq. (2) with Eq. (3), as shown in Eq. (4).

$\displaystyle R={\displaystyle\frac{\sum_{i=1}^{n}(X_{i}-\bar{X})(Y_{i}-\bar{Y% })}{\sqrt{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}\sqrt{\sum_{i=1}^{n}(Y_{i}-\bar{Y}% )^{2}}}}$ (4)

Where $\bar{X}$ and $\bar{Y}$ is the value of sample $X$ and $Y$ respectively. Pearson correlation coefficient is used to check the direction and degree of change trend between the two variables. The value range is $-$ 1 to $+$ 1, 0 means that two variables are not related, positive means positive correlation, negative means negative correlation. The larger the value is, the stronger the correlation is.

Using correlation analysis, we test whether there are variables that are not related to the target variables, and if they exist, we delete them.
•
Default value processing. Due to the lack of data in data set, the lack of key data will greatly reduce the value of the whole data set, and finally can not get accurate results, so it is necessary to process the missing data.

We use the above method to process original data, making the data in data set more real and complete, which plays a very important role in data processing of the later model, and improves the accuracy of the model to predict enterprises credit score.
3.3 The selection and combination of base learner model

In Section 3.1, this paper describes the combination strategy based on Stacking and Bagging, and constructs a parallel double-layer prediction model. Next, we choose appropriate basic learners and build corresponding neural network to build a specific prediction model.

3.3.1 Decision tree model

In comparative study of Abellan and Castellano (2017), Decision Tree (DT) algorithm is regarded as the best basic classifier to build the overall credit scoring model. Here, we choose decision tree algorithm as a base learner. After standard data set $D$ is generated, the DT algorithm is trained to generate the decision tree learner $T_{DT}$ needed in the first stage. Using decision tree learner to predict training data and generate prediction data $R_{DT}$ , as a part of the second stage support vector machine model training, prepares for the second stage. Specifically, the decision tree model constructed in this paper takes MAE as the criterion of feature selection, it can be seen from Fig. 3, when the maximum depth of the tree is 7, the value of MAE is the smallest. So we choose 7 as the maximum depth of the tree.

Figure 3.

MAE value of different maximum depth.

3.3.2 Support vector machine model

[34] Because Support Vector Machine (SVM) follows the maximum margin hyperplane, it usually provides good generalization ability, which partly explains the popularity of SVM in credit scoring research. We choose support vector machine algorithm as another kind of base learner. After standard data set $D$ is generated, the SVM algorithm is trained to generate the support vector machine learner $T_{\textit{SVM}}$ in the first stage. Using SVM to predict training data and generate prediction data $R_{\textit{SVM}}$ , as a part of the second stage of support vector machine model training. Specifically, this paper uses radial basis function as a kernel function, and penalty factor is 1.0.

3.3.3 Back-ProPagation network model

Back ProPagation (BP) neural network has the ability of self-learning, self-adaptive and generalization. We choose BP neural network as the third base learner. After standard data set $D$ is generated, BP neural network is trained to generate BP learner $T_{BP}$ needed in the first stage. Using BP learner to predict training data and generate prediction data $R_{BP}$ , as a part of the second stage. Specifically, this paper designs a single hidden layer BP neural network model.

Data from input layer is $X$ , and parameters from input layer to hidden layer are $w$ and $b_{1}$ , parameters from hidden layer to output layer are $v$ and $b_{2}$ , activation function is $g_{1}$ and $g_{2}$ . In this paper, $g_{1}$ is tanh function, $g_{2}$ take sigmoid function, $y$ is real value. From input layer to hidden layer, it is set as shown in Eqs (5) and (6).

$\displaystyle\textit{net}_{1}=w^{T}x+b_{1}$ (5) $\displaystyle h=g_{1}(\textit{net}_{1})$ (6)

The hidden layer to the output layer is shown in Eqs (7) and (8).

$\displaystyle\textit{net}_{2}=v^{T}h+b_{2}$ (7) $\displaystyle\hat{y}=g_{2}(\textit{net}_{2})$ (8)

The whole BP network model can be obtained by combining Eqs (5)–(8), as shown in Eq. (9).

$\displaystyle\hat{y}=g_{2}(\textit{net}_{2})=g_{2}(v^{T}g_{1}(\textit{net}_{1}% )+b_{2})=g_{2}(v^{T}g_{1}(w^{T}x+b_{1})+b_{2})$ (9)

The loss function is shown in Eq. (10).

$\displaystyle E(y,\hat{y})=\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}$ (10)

Where $y$ is the true value of samples and $\hat{y}$ is the predictive value of samples. Specifically,

$\displaystyle x=\begin{bmatrix}x_{1}\\ x_{2}\\ \vdots\\ x_{28}\\ \end{bmatrix}\;w=\begin{bmatrix}w_{1\;1}&w_{1\;2}&\cdots&w_{1\;40}\\ w_{2\;1}&w_{2\;2}&\cdots&w_{2\;40}\\ \vdots&\vdots&\ddots&\vdots\\ w_{28\;1}&w_{28\;2}&\cdots&w_{28\;40}\\ \end{bmatrix}$ $\displaystyle b_{1}=\begin{bmatrix}b_{1\;1}\\ b_{1\;2}\\ \vdots\\ b_{1\;40}\\ \end{bmatrix}\;v=\begin{bmatrix}v_{1}\\ v_{2}\\ \vdots\\ v_{40}\\ \end{bmatrix}\;b_{2}=\begin{bmatrix}b_{2\;1}\\ \end{bmatrix}$

3.3.4 Automatic weight configuration

Different from the average idea of traditional bagging algorithm, because the importance of each base learner in integrated model is different, the automatic weight configuration model is used to configure different weights for each base learner. After selection and training of base learners, we automatically configure the weights of the three base learners and get new prediction data $y^{\prime}$ . In this paper, through the construction of neurons for automatic weight configuration, the specific structure of neurons is shown in Fig. 4.

Figure 4.

Neuron structure of Automatic weight configuration.

Combing the prediction data generated by the base learner, take it as the input data $X^{\prime}$ , the input and output parameters are $W$ and $b$ , the activation function $G$ takes sigmoid function, and $y$ is the real value. The predicted value $\hat{y}=G(W*x)$ , we calculate the error by Eq. (10), and update $W$ until the cycle condition is reached.

4. Experimental results and conclusions

In order to evaluate the accuracy and generalization ability of the parallel double-layer prediction model, we implement the prediction model by coding. The data set of this experiment is public data set of credit score downloaded from GitHub. All experimental codes are Python code, the integrated development environment is PyCharm, and the version number is Community Edition 2019.1.3. PC parameters are Intel (R) core (TM) i5-6300HQ, CPU 2.30 GHz, 8.00 GB Memory, 64 bit windows 10 Operating System.

In Section 3.3, we select three basic learners to build the parallel double-layer prediction model. In order to test the effectiveness of the model, we use other four ensemble learning models for comparison, the details are as follows.

•
Used particle swarm optimization algorithm to optimize BP neural network (PSO-BP). The number of particle swarm is 40 and the number of iterations is 15.
•
Used particle swarm optimization algorithm to optimize support vector machine (PSO-SVM). The number of particle swarm is 40 and the number of iterations is 40.
•
XGBoost (eXtreme Gradient Boosting) adopts boosting iteration idea, the maximum depth of a single tree is 6, and the number of iterations is 10.
•
Automatic weight allocation prediction model, which configures the weights of each base learner through neurons. The specific structure of neurons has been described in 3.3.4.

4.1 Model performance evaluation index

The performance evaluation criterion is an indispensable part of measurement model. There are many error measurement methods to evaluate the matching degree between model and observation data, such as MAE (Mean Absolute Error), $C A$ (Classification Accuracy), $R^{2}$ (Square Correlation Coefficient), etc. [35]. In order to test the accuracy of the parallel double-layer prediction model, we select following four evaluation indexes.

•
EVS (Explained Variance Score), explains the variance score of regression model, and its value range is [0, 1]. The closer to 1, the more independent variable can explain the variance change of dependent variable, and the smaller the value, the worse fitting effect it is.
•
MAE (Mean Absolute Error), average absolute error, the formula is shown in Eq. (11).

$\displaystyle\textit{MAE}(y,\hat{y})=\frac{1}{n}\sum_{i=1}^{n}|y_{i}-\hat{y}_{% i}|$ (11)

Specifically, $n$ is the number of samples, $y_{i}$ is the true value of samples, $\hat{y}_{i}$ is the predicted value of samples. MAE is used to evaluate the degree of closeness between predicted results and the true value of samples. The more obvious the value, the better fitting effect it is.
•
MSE (Mean Squared Error), is consistent with Eq. (10), as shown in Eq. (12).

$\displaystyle\textit{MSE}(y,\hat{y})=\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y}_{% i})^{2}$ (12)

The smaller the MSE index value, the better fitting effect it is.
•
$R^{2}$ (Square Correlation Coefficient), the judgment coefficient, which also explains the variance score of the regression model. The formula is shown in Eq. (13).

$\displaystyle R^{2}(y,\hat{y})=1-\frac{\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}/n% }{\sum_{i=1}^{n}(y_{i}-\bar{y}_{i})^{2}/n}$ (13)

Specifically, $\bar{y}$ is the average value of $y_{i}$ . The value range of $R^{2}(y,\hat{y})$ is [0, 1]. The closer it gets to 1, the more independent variable can explain variance change of dependent variable.

4.2 Model accuracy test

Through four indicators introduced in 4.1, we analyze the accuracy of the model, and calculate $\textit{EVS},\textit{MAE},\textit{MSE}$ and $R^{2}$ of above three basic learners and five integration methods respectively, and then visualize them.

Figure 5.

Histogram of each model index value. (a) EVS. (b) MAE. (c) MSE. (d) $R^{2}$ .

According to the analysis of above four evaluation indexes in Fig. 5, we can see that the four indexes show that the parallel double-layer prediction model using integrated technology is superior to other models, and the specific data of the four evaluation indexes are shown in Table 1.

Table 1

Performance comparison of seven different models

Name	Evs	MAE	MSE	$R^{2}$
SVM	0.61502	0.06993	0.00892	0.59197
BP	0.46876	0.11945	0.03264	0.48058
DT	0.44633	0.07926	0.01146	0.43651
PSO-SVM	0.63709	0.06937	0.00797	0.64042
PSO-BP	0.58387	0.07051	0.01209	0.55914
XGBoost	0.58331	0.07495	0.00945	0.61283
AUTO-Conf	0.46618	0.07781	0.01103	0.51916
Parallel double-layer	0.64086	0.06814	0.00791	0.64552

The values in Table 1 represent the four index values of each model. It can be seen as follows.

•

The effect of SVM based learner model is better, and the ensemble learning model with automatic weight configuration is not as good as SVM model.

•

After the corresponding ensemble improvement of a single base learner, four indicators of PSO-SVM model relative to SVM model, PSO-BP model relative to BP model, XGBoost model relative to DT model are improved.

•

The parallel double-layer prediction model is better than the other four integrated models and three base learner models.

Specifically, compared with PSO-SVM model, PSO-BP model, XGBoost model and Auto-Conf model, it can be got as follows.

•

EVS value of the parallel double-layer prediction model is 0.00377, 0.05699, 0.05755 and 0.17468 higher.

•

MAE value of the parallel double-layer prediction model is 0.00123, 0.00237, 0.00681 and 0.00967 lower.

•

MSE value of the parallel double-layer prediction model is 0.00006, 0.00418, 0.00154 and 0.0031 lower.

•

$R^{2}$ value of the parallel double-layer prediction model is 0.00510, 0.08638, 0.03269 and 0.12636 higher.

In order to accurately get the actual improvement effect of the method proposed in this paper, we transform Table 1 to get data as shown in Table 2.

Table 2

The improvement of the parallel double-layer prediction model compared with others

Name	$\textit{Evs}(\%)$	$\textit{MAE}(\%)$	$\textit{MSE}(\%)$	$R^{2}(\%)$
SVM	4.202	2.560	1.132	9.046
BP	36.714	105.785	75.766	34.321
DT	43.584	14.030	30.977	47.882
PSO-SVM	1.722	1.773	0.753	0.796
PSO-BP	9.761	3.361	34.574	15.449
XGBoost	9.866	9.086	16.296	5.334
AUTO-Conf	37.471	12.428	28.287	24.339

In Table 2, the corresponding values of four indicators are the promotion ratio of the parallel double-layer prediction model compared with other models, and results are expressed in percentage. From Tables 1 and 2, we observe that

•

The PSO-SVM model, PSO-BP model and XGBoost model are improved compared with the base learner model (SVM, BP network and DT).

•

Among them, PSO-BP model and XGBoost model are improved greatly.

•

After using Bagging method, the automatic weight configuration model is better than single BP network model and DT model, but not as good as single SVM model.

After adding Stacking method on the basis of Bagging method, the parallel double-layer prediction model has a great improvement compared with the three base learner prediction models. At the same time, compared with PSO-SVM model, PSO-BP model and XGBoost model, we observe that

•

EVS value increased by 1.722%, 9.761%, 9.866% and 37.471%.

•

MAE value increased by 1.773%, 3.361%, 9.086% and 12.428%.

•

MSE value increased by 0.753%, 34.574%, 16.296% and 28.287%.

•

$R^{2}$ value increased by 0.796%,15.449%,5.334% and 24.339%.

PSO-SVM model, PSO-BP model and XGBoost model have different prediction accuracy due to different iterations. We set different iterations to train these three models, and the results are shown in Fig. 6.

Figure 6.

Accuracy curve of different iterations.

We can see from Fig. 6, PSO-SVM model tends to be stable when the number of iterations is close to 40, PSO-BP model has the best effect when the number of iterations is 15, and XGBoost model has the best effect when the number of iterations is 10. Therefore, in the begin of Section 6, we choose 40, 15 and 10 as the iterations of PSO-SVM model, PSO-BP model and XGBoost model.

4.3 Model generalization ability test

In order to test the generalization ability of the model, we test five data sets and get the four index values corresponding to each data set. We use line chart to visualize the $\textit{EVS},\textit{MAE},\textit{MSE}$ and $R^{2}$ of PSO-SVM model, PSO-BP model, XGBoost model, automatic weight configuration model and the parallel double-layer prediction model. Results are shown in Fig. 7, it can be seen as follows.

Figure 7.

Index curve of each model. (a) EVS. (b) MAE. (c) MSE. (d) $R^{2}$ .

•

In Fig. 7a, about 7/8 of the parallel double-layer prediction model is better than PSO-SVM model, about 7/8 is better than PSO-BP model, about 3/4 is better than XGBoost model, and about 7/8 is better than automatic weight configuration model.

•

In Fig. 7b, about 3/4 of the parallel double-layer prediction model is better than PSO-SVM model, about 15/16 is better than PSO-BP model, about 3/4 is better than XGBoost model, and about 15/16 is better than automatic weight configuration model.

•

In Fig. 7c, about 3/4 of the parallel double-layer prediction model is better than PSO-SVM model, about 11/16 is better than PSO-BP model, about 3/4 is better than XGBoost model, and about 7/8 is better than automatic weight configuration model.

•

In Fig. 7d, about 15/16 of the parallel double-layer prediction model is better than PSO-SVM model, about 15/16 is better than PSO-BP model, about 11/16 is better than XGBoost model, and about 15/16 is better than automatic weight configuration model.

Specifically, abscissa Number of data represent five data sets. We can see from Fig. 7 that the parallel double-layer prediction model is better than the other four ensemble learning models on the whole, which proves that the model has good generalization ability.

4.4 Conclusions

Nowadays, credit evaluation has more and more influence on the external image of enterprises. How to improve the accuracy of main credit evaluation has become a hot issue. Many enterprises use traditional single algorithms and neural network algorithms in credit evaluation, or use optimization algorithms to optimize a single model. There are some shortcomings in interpretability, accuracy or generalization ability. In order to improve the accuracy and generalization ability of prediction, this paper proposes a parallel double-layer prediction model, the model based on Stacking and Bagging integration technology, which not only improves accuracy, but also has good generalization ability.

Specifically, this paper integrates DT model, SVM model and BP neural network model. Breaking the traditional integration way, we use three models to predict training data set, processing results and updating them to the data set. The best prediction effect of the data set used in this experiment is SVM model, and then training the second layer of SVM model with the updated data set. Finally, model generates final credit score.

From experimental analysis and comparison, it can be seen as follows.

•

The parallel double-layer prediction model is superior to other models in four evaluation indexes.

•

The parallel double-layer prediction model improves the accuracy of enterprise credit scoring, and performs best in other data sets, with good generalization ability.

The combination strategy of this paper can develop a practical method to evaluate enterprises credit. The research results provide a way of thinking for solving the problem of enterprises credit evaluation, and also provide a solid foundation for the theoretical research of subject intelligent check.

Footnotes

Acknowledgments

This work was funded by the National Key Research and Development Program of China under Grant 2019YFB1405000. The authors also,acknowledge with thanks Professors in Xi’an University of Science and Technology for theoretical support.

References

Fitzpatrick

et al., The Use of Trade Credit by Businesses|Bulletin-September Quarter 2013, Bulletin, 2013, September.

Samuel

O.L.

, The effect of credit risk on the performance of commercial banks in Nigeria, African Journal of Accounting, Auditing and Finance 4(1) (2015), 29–52.

Mitani

, Regression Analysis Using Modular Structured Neural Network, in: 2013 First International Symposium on Computing and Networking, IEEE, 2013, pp. 257–262.

Harimurti

et al., Predicting student’s psychomotor domain on the vocational senior high school using linear regression, in: 2018 International Conference on Information and Communications Technology (ICOIACT), IEEE, 2018, pp. 448–453.

Cheng

, Prediction of magnetic remanence of NdFeB magnets by using novel machine learning intelligence approach-Support vector regression, in: 2014 IEEE 13th International Conference on Cognitive Informatics and Cognitive Computing, IEEE, 2014 August, pp. 431–435.

Singh

and Pandey

, An euclidean distance based KNN computational method for assessing degree of liver damage, in: 2016 International Conference on Inventive Computation Technologies (ICICT), IEEE, Vol. 1, 2016, pp. 1–4.

Bekhet

H.A.

and Eletter

S.F.K.

, Credit risk assessment model for Jordanian commercial banks: Neural scoring approach, Review of Development Finance 4(1) (2014), 20–28.

Pasila

, Credit scoring modeling of Indonesian micro, small and medium enterprises using neuro-fuzzy algorithm, in: 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, 2019, pp. 1–6.

Dutta

and Saha

, A weak supervision technique with a generative model for improved gene clustering, in: 2019 IEEE Congress on Evolutionary Computation (CEC), IEEE, 2019, pp. 2521–2528.

10.

Zheng

Cheng

and Li

, Investigation of model ensemble for fine-grained air quality prediction, China Communications 17(7) (2020), 207–223.

11.

Nadeem

Alghazzawi

Mashat

et al., Using machine learning ensemble methods to predict execution time of e-science workflows in heterogeneous distributed systems, IEEE Access 7 (2019), 25138–25149.

12.

Fan

Liu

Zhang

et al., Adaptive mutation PSO based SVM model for credit scoring, in: Proceedings of the 2nd International Conference on Computer Science and Application Engineering, 2018, pp. 1–7.

13.

G.-S.

and Zhang

G.-H.

, The study of credit evaluation of business websites using Support Vector Machines, in: 2007 International Conference on Management Science and Engineering, IEEE, 2007, pp. 263–267.

14.

Maldonado

Pérez

and Bravo

, Cost-based feature selection for support vector machines: An application in credit scoring, European Journal of Operational Research 261(2) (2017), 656–665.

15.

Niu

Cai

and Cai

, Big Data Analytics for Complex Credit Risk Assessment of Network Lending Based on SMOTE Algorithm, Complexity, 2020, 2020.

16.

Junior

L.M.

Nardini

F.M.

Renso

et al., A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems, Expert Systems with Applications 152 (2020), 113351.

17.

Zhang

and Zhang

, A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring, Expert Systems with Applications 121 (2019), 221–232.

18.

, Statistical optimization of supply chain financial credit based on deep learning and fuzzy algorithm, Journal of Intelligent & Fuzzy Systems, 2020 (Preprint), 1–12.

19.

and Sun

, Application of RBF neural network optimal segmentation algorithm in credit rating, Neural Computing and Applications, 2020, 1–9.

20.

Pławiak

Abdar

Pławiak

et al., DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring, Information Sciences 516 (2020), 401–418.

21.

Xia

Liu

Y.Y.

et al., A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring, Expert Systems with Applications 78 (2017), 225–241.

22.

Yeh

C.C.

Lin

and Hsu

C.Y.

, A hybrid KMV model, random forests and rough set theory approach for credit rating, Knowledge-Based Systems 33 (2012), 166–172.

23.

Trivedi

S.K.

, A study on credit scoring modeling with different feature selection and machine learning approaches, Technology in Society 63 (2020), 101413.

24.

Ala’raj

and Abbod

M.F.

, A new hybrid ensemble credit scoring model based on classifiers consensus system approach, Expert Systems with Applications 64 (2016), 36–55.

25.

Niu

Ren

and Li

, Credit scoring using machine learning by combing social network information: Evidence from peer-to-peer lending, Information 10(12) (2019), 397.

26.

Hayashi

and Takano

, One-dimensional convolutional neural networks with feature selection for highly concise rule extraction from credit scoring datasets with heterogeneous attributes, Electronics 9(8) (2020), 1318.

27.

and Chi

, Weight optimization model based on the maximum discriminating power of credit evaluation result, in: Proceedings of the International Conference on Business and Information Management, 2017, pp. 6–11.

28.

Hodgkinson

and Walker

, An expert system for credit evaluation and explanation, Journal of Computing Sciences in Colleges 19(1) (2003), 62–72.

29.

Yin

Jiang

Jain

H.K.

et al., Evaluating the credit risk of SMEs using legal judgments, Decision Support Systems 136 (2020), 113364.

30.

Zhang

Gao

and Shi

, Credit risk evaluation using multi-criteria optimization classifier with kernel, fuzzification and penalty factors, European Journal of Operational Research 237(1) (2014), 335–348.

31.

Guo

Zhou

Luo

et al., Instance-based credit risk assessment for investment decisions in P2P lending, European Journal of Operational Research 249(2) (2016), 417–426.

32.

Barddal

J.P.

Loezer

Enembreck

et al., Lessons learned from data stream classification applied to credit scoring, Expert Systems with Applications 162 (2020), 113899.

33.

Abellán

and Castellano

J.G.

, A comparative study on base classifiers in ensemble methods for credit scoring, Expert Systems with Applications 73 (2017), 1–10.

34.

Tian

Yong

and Luo

, A new approach for reject inference in credit scoring using kernel-free fuzzy quadratic surface support vector machines, Applied Soft Computing 73 (2018), 96–105.

35.

Jingming

Xuhui

Daoming

et al., Research on Credit Risk Measurement of Small and Micro Enterprises Based on the Integrated Algorithm of Improved GSO and ELM, Mathematical Problems in Engineering, 2020, 2020.