Abstract
In view of the current demand for risk identification and classification prevention of bank outlets caused by the difficulty in identifying operational efficiency and wind control capability, a risk data measurement and warning classification model based on information entropy and BP neural network is proposed. The model establishes two-level risk data measurement elements from three dimensions. Based on the data set itself, the information entropy is used to determine the weights of the two-level risk elements, and then calculates the risk quantities recorded under the first-level risk measurement elements in the data set. The BP neural network is used to output the risk data classification results without presupposing the weights of the measurement. The proposed model obtains smaller reductions and higher classification accuracies with relatively low computational cost. Experiments show that the model can measure and classify risk data with very low mis-judgment rate and small mis-judgment bias.
Introduction
In the course of the development of the financial industry, the biggest obstacle is financial risk. The occurrence of financial risk is often accompanied by the emergence of financial crisis, which will seriously affect the pace of economic and social construction and development. Therefore, through effective means and methods of financial management, financial risk can be predicted and predicted to a certain extent, so as to effectively do a good job of financial wind. Risk prevention and control. There are different types of financial risks, and the impact of different types of financial risks on the financial industry is also different. However, no matter what kind of financial risks, they will pose a huge threat to the sustainable and stable development of the financial industry. Therefore, we must do a good job in identifying financial risks and eliminating the occurrence of financialrisks.
From the application of big data in financial risk identification and management, it mainly concentrates on credit risk identification, transaction anti-fraud and other fields. However, from the existing research results and practical results, the application of big data in financial risk identification is still relatively limited. How to collate and use large amounts of banking business data to extract effective and available information to identify potential financial risks is a challenging topic in front of the industry and the academic community.
In the research of liquidity risk contagion, the pioneering research of Freixas et al. laid the foundation for network analysis: when the banking system suffers liquidity shocks, banks will adopt the “pecking” principle to meet liquidity demand, and different network structures will lead to different scale of risk contagion [1]. Georg found that there is no monotonous relationship between the degree of bank correlation and the scale of risk contagion [2]. Increasing the degree of correlation helps to disperse the risk, but if the connection exceeds a certain degree, the level of risk contagion will increase.
In the research of solvency risk contagion, Paresh Kumar Narayan, et al. believes that the network of debt and creditor’s rights formed by inter-bank market transactions has dual attributes [3], that is, risk sharing can be realized, and system collapse can also be caused. Upper uses the maximum entropy method to estimate the interbank debt-claim matrix [4]. By assuming one or more banks go bankrupt, it examines the risk contagion path and scale. Many scholars use this method to warn the risk contagion effect of various countries, such as Memmel and Sachs study of German banking system [5]. From the conclusion of the study, the scholars who took the inter-bank market as the research object before 2010 believe that the systemic risk of our country’s banks is low and the contagion scope is small, and the scholars who took the inter-bank market as the research object after 2010 found that the systemic risk of this market is accumulating year by year, and the contagion effect of solvency risk is more obvious.
Based on this, this paper establishes two levels of risk measurement elements under three risk dimensions, and proposes a risk data measurement and classification model without setting the weight of risk measurement elements in advance. The model sets up two risk factors under three risk dimensions, and uses Shannon information entropy to determine the weight of the two risk factors reasonably, then calculates the data risk amount under the first risk factor, realizes the dimensionality reduction of the risk factors, and then realizes the classification of the risk data with the help of BP neural network.
Methodology
Entropy
Assuming that there are n states in a system X, denoted as X {x1, x2, . . x
n
} and p (x
i
) indicates the probability of the occurrence of state x
i
in system X, the Shannon information entropy H (x) of system X is defined as Equation (1):
Shannon’s theory of information entropy holds that the larger the information entropy is, the more disorderly the information is, the less the amount of information it carries; the smaller the information entropy is, the lower the degree of information disorder is, the larger the amount of information it carries.
BP neural network is a multi-layer feed forward network trained by error back propagation. Its algorithm is called BP algorithm. The basic idea of this algorithm is gradient descent method. Gradient search technology is used to minimize the mean square error of the actual output value and the expected output value of the network.
BP neural network is a multi-layer network, which consists of input layer, hidden layer and output layer. All neurons are fully connected with the neurons in the next layer, but there is no connection between the neurons in the same layer. The specific structure of a BP neural network consisting of four input layer neurons, five hidden layer neurons and three output layer neurons is shown in Fig. 1.

Structure of BP Neural Network.
The greatest advantage of BP neural network is that it can learn and store a large number of input-output relations, and it does not need to reveal this mathematical relationship in advance, including the forward propagation of signals and the reverse propagation of errors. In forward propagation, the input signal acts on the output node through the hidden layer and generates the output signal through non-linear transformation. If the actual output does not match the expected output, the error will be transferred into the reverse propagation process. Error back propagation is that the output error is transmitted layer by layer from the hidden layer to the input layer, and the error is allocated to all units in each layer. The error signal obtained from each layer is used as the basis for adjusting the weight of each unit.
Every record in a bank outlet involves more or less risk information. However, there is no clear definition of these information at the risk level. On the one hand, the information contained in the same record has great difference in the amount of risk for different customers; on the other hand, for risk information collectors who use data mining for different purposes, whether a data has risk value is different. At the same time, the correlation between the information contained in each record, the timeliness of the information, the diversity of the application scenarios and the subjectivity of the risk participants to the risk concept are all the key factors affecting the quantitative data risk. If all the factors mentioned above are taken into account in data risk measurement, it will be very difficult to determine the weights of risk measurement elements. At the same time, the measurement process is multidimensional and complex, and the results of measurement for specific needs are not universal. The purpose of this paper is to propose a general risk identification and early warning classification model for bank outlets. The purpose of this model is to achieve a reasonable measurement and classification of data risk at a lower computational cost without presupposing the weights of risk measurement elements in advance.
Therefore, this paper proposes a risk identification and warning classification model of bank outlets based on information entropy and BP neural network. Its basic framework is divided into three modules: risk data regularization, risk factor measurement and weighting, and risk classification. The basic idea is to divide the data in network traffic into records by unit time window, analyse the risk elements contained in the records and regularize them; then, determine their weights by calculating the information entropy of the same secondary risk elements between different records and calculate the risk quantities recorded under the first level risk elements accordingly, and measure the same data preliminarily. Finally, the final risk level of each record is obtained by trained risk classification BP neural network.
Risk data regularization module
The concept of risk is very broad. Accurate measurement and classification of data risk need to involve many risk factors. Too many risk factors will challenge the efficiency of risk measurement and classification. Based on the analysis of the sensitivity of different risk individuals to different aspects of data risk in the relevant literature, this paper takes rural bank outlets as an example, and selects the following representative elements as the indicators of risk measurement and classification from the four dimensions of coverage depth (F1), coverage breadth (F2), coverage quality (F3), coverage area (F4). As shown in Table 1.
Risk measurement elements
Risk measurement elements
In the dimension of coverage quality (F3), this paper assumes that risk measurement and classification are based on user location trajectory, and then chooses two levels of factor cycle and efficiency, and service information feedback to correspond to one level of factor accurate information and fuzzy information respectively. The selection of secondary elements can be replaced according to actual needs. The measured data set is D, which is divided into n risk records by unit time window, and recorded as D {d1, . . . , d
n
} was used to analyze the records in four dimensions of coverage depth (F1), coverage breadth (F2), coverage quality (F3), coverage area (F4), and L {L1, . . . , L
n
} is the secondary element L
a
{la1, . . . , l
ab
} value is denoted as
This paper establishes the information entropy measurement matrix for each level factor of n records on three measurement dimensions, assuming that one level factor La contains B secondary factors, and calculates the measurement results for n records by establishing the information entropy measurement matrix of the level two factors of n × b size. The specific steps are as follows.
In the matrix, b ij is the record value of the second level element corresponding to the j in the first record after normalization, and the value is 0 or 1.
This paper establishes a BP neural network to get the final classification results of network risk data. The number of nodes in the input layer is set to b, which corresponds to the number of first-level elements, and corresponds to the risk measurement of B first-level elements respectively. The number of nodes in the output layer is 3, and the output measured as the highest risk level is set to (1,1,1) and the output measured as the lowest risk level is set to (0,0,0) respectively. By analogy, the output vector corresponds to eight risk levels.
In each round of BP neural network training, 10% records of training data are randomly extracted. The risk factor measurement vector of training data is obtained by using the information entropy-based risk factor measurement method in Section 3.2, and normalized to form the training sample. The specific training process is shown in Fig. 2.

BP Neural Network Hierarchical Module Training.
The implementation process of risk data classification module based on BP neural network is shown in Fig. 3.

Model implementation.
The output of risk measurement results of the model consists of three aspects: 1) The set of risk measurement vectors obtained by the risk factor measurement module is filtered, calculated and stored directly to generate the set of measurement values. In this paper, the method of screening calculation is used to screen out the first-level elements which mainly reflect the recording of risk status under the three risk measurement dimensions by principal component analysis, and to phase their measurement values. The measures recorded in three dimensions are added and stored together with the risk measurement vectors as output. 2) Record risk measurement level obtained by BP neural network classification module; 3) Record risk ranking in data set. Specific ranking rules take the record measurement level as the first. When the measurement level is the same, the risk measurement values under three dimensions are compared in turn, and the records with large measurement values under two or more dimensions are ranked first.
In the model proposed in this paper, the weighting and risk calculation of risk elements in the risk element measurement module are simple logarithmic and multiplicative operations. Therefore, this paper focuses on the risk data classification module based on BP neural network, mainly carries out the training test of neural network and the accuracy test of classification. In section 3.1, the data sets of eight risk levels are simulated and generated based on the presence or absence of secondary measurement elements. Among them, each risk level contains 1000 risk records, a total of 8000 data records as training data sets.
The training rounds are set at 1100, the learning efficiency is 0.11, and the target error is 0.0001. According to formula

BP Neural Network Training Error Curve.
In the three parts of the output results of the model proposed in this paper, the measurement set is calculated by the information entropy of the corresponding risk elements of the data, and the risk ranking is generated by the comprehensive comparison between the measurement set and the measurement level. Therefore, the accuracy test of the model is focused on the accuracy of the risk classification. Firstly, the concept of error rate is proposed to describe the degree of deviation between the measurement results and the actual risk situation. The calculation formula is shown in Equation (8).
On this basis, this paper randomly extracts 980 records from the training data set as test samples, and tests the classification accuracy of the proposed risk data measurement and classification model as follows. The test results are shown in Table 2.
Accuracy test of model risk classification
From the test results, it can be seen that the overall classification accuracy of the risk data measurement and classification model proposed in this paper can reach 97.8%, and it can also provide more than 95% classification accuracy for a single risk level sample. On the other hand, from the misjudgment rate of each risk level, it can be found that the accuracy of data classification of the model at the relative risk boundary level (risk level 1, 2, 3, 7, 8) is higher than that at the intermediate risk level (risk level 4, 5, 6), which accords with the fact that the greater the risk information provided by the data is, the easier it is to measure its risk. This also reflects the variation trend of the misjudgment rate (E), the misjudgment deviation rate (ɛ) and

Change Trend of Misjudgment Rate and Misjudgment Deviation Rate.

Change Trend of
From Fig. 5, we can see that the variation trend of the misjudgment deviation rate of the model is similar to that of the misjudgment rate of the data set as a whole. Analytical formula (8) shows that if the value of ɛ is exponential to
This paper presents a risk data measurement and classification model based on information entropy and BP neural network. The model uses information entropy to calculate the risk of data set hierarchically, and then uses BP neural network to measure and classify the risk data accurately without presetting the weights of risk measurement elements without revealing the mathematical relationship between input and output in advance. Next, the following two aspects can be done: 1) research on automatic data parsing technology in mass network traffic environment, and propose corresponding solutions for automatic acquisition of data risk elements; 2) research on the intrinsic principle and optimization technology of BP neural network based on the risk measurement and classification model proposed in this paper, and further optimize the efficiency and accuracy of the model.
Footnotes
Acknowledgments
The authors acknowledge the National Natural Science Foundation of China (Grant: 71873118).
