The prediction algorithm of credit risk of science and technology finance based on cloud computing

Abstract

In order to effectively guarantee the effect of credit risk prediction of science and technology finance and improve the ability of risk prediction, a credit risk prediction algorithm of science and technology finance based on cloud computing is proposed. The logistic regression model is used to predict, and the financial indicators of science and technology credit are selected as the model covariates. According to the characteristics and strong correlation of many financial indicators of science and technology credit, this paper constructs the final index system of online supply chain technology credit risk evaluation based on SMEs. Then the principal component analysis method is used to select the principal component. Combined with the penalty method, the data space dimension of financial indicators is further reduced, and the unrelated principal components are obtained. On this basis, a logistic regression model is established to predict the credit risk by taking the selected main components as covariates. The experimental results show that the algorithm has a good fit to the credit risk of 16 science and technology credit enterprises, and the risk prediction ability is significantly improved, which can effectively guarantee the effect of science and technology credit risk prediction.

Keywords

Cloud computing science and technology finance credit risk risk prediction

1. Introduction

Credit risk is the main risk faced by the modern economic system. With the development of the financial market, new financial instruments continue to emerge, and the scale of financial derivatives transactions continues to expand. These credit derivatives rapidly expanded the scope and scale of credit risk [1]. How to improve the credit risk prediction capability of technology finance is already a hot issue, and data is a key factor for effective and accurate risk measurement. Especially with the development of information technology and the Internet, financial risk management is increasingly dependent on massive amounts of data [2]. Big data refers to massive and complex data sets that cannot be extracted, stored, searched, shared, analyzed, and processed by existing software tools, and credit risk is analyzed using a large amount of non-financial data modeling and analysis [3, 4]. Assess the credit risk of science and technology finance in related fields, through the consistent quantification of the credit risk indicators of technology-based small and micro enterprises and the construction of a fuzzy measurement matrix, the weight of each measurement indicator of credit risk is obtained, and the comprehensive impact matrix analysis of credit risk, initial value defuzzification.

In order to solve the above problems, the technology credit risk prediction algorithm based on cloud computing has been researched and optimized. Use logistic regression model combined with punishment method and principal component analysis method to predict the country’s science and technology credit risk. Before constructing the index system, we need to understand the basic principles of construction, and conduct secondary screening based on the principles of comprehensiveness, scientificity and operability. To build the final indicator system of online supply chain technology credit risk evaluation based on small and medium-sized enterprises, through 7 Chinese science and technology credit building models, introducing ordinary linear models, and on this basis, extending to punishment methods and logistic regression models to construct punishment logic models. Preprocess the data to ensure that the dimensions of the indicators are uniform. The factor analysis is carried out to eliminate the collinearity between the indicators, so as to effectively ensure the effect of science and technology credit risk prediction, improve the prediction accuracy, and realize the research of cloud computing-based science and technology financial credit risk prediction algorithm.

2. Credit risk prediction algorithm of science and technology finance

2.1 Collection of credit risk characteristic index of science and technology finance

Credit risk is the oldest kind of risk in the financial market, which refers to the possibility of losses caused by the default of borrowers or market counterparties. There are various views on the definition of credit risk in the field of finance. The traditional credit risk mainly comes from the loan business [5]. However, due to the poor liquidity of loans and the lack of an active secondary market, the value of loan assets is mainly measured by historical cost rather than market price. Credit risk includes two aspects: On the one hand, it refers to the possibility that the debtor is unwilling or unable to perform the contract on time when the contract is due in financial transactions, that is the risk of breach of contract; On the other hand, it refers to the risk that the debtor’s credit condition becomes worse, that is the risk of credit spread, this paper focuses on the default risk [6]. The measurement of credit risk from the initial qualitative point of view of the measurement method developed to quantitative analysis of the measurement model, through the enterprise’s relevant data to quantitative analysis of its credit risk, the conclusion is more objective and reasonable. China’s science and technology credit data generally does not meet the normal distribution, there are many data indicators, and there is multicollinearity among the indicators [7]. In view of these characteristics, this paper uses logistic regression model combined with penalty method and principal component analysis method to predict China’s science and technology credit risk [8]. For the measurement of credit risk, expert system is used at first. The financing mode of accounts receivable is to pledge the accounts receivable that are not yet due to the financial institution, so as to obtain financial support from the financial institution, the collection mode of technology credit risk is as Fig. 1.

The basic methods of risk control include risk avoidance, risk transfer, risk retention, risk compensation, loss control, risk shielding, etc. The credit rating of small and medium-sized enterprises is generally low, but after using the self compensating technology in the credit mode, the impact of the low credit level of small and medium-sized enterprises on the credit security is effectively isolated and blocked, thus the credit risk can be greatly reduced. In addition, according to the content of credit risk management, we should choose core enterprises and logistics enterprises with good credit standing and perfect online platform construction to help ourselves reduce the cost of credit risk control as much as possible [9]. From the perspective of financing enterprises, confirmed warehouse financing mode helps them complete leveraged procurement and batch sales of suppliers, realizing convenient financing; From the perspective of financing enterprises, confirmed warehouse financing not only excavates potential customers, but also can take acceptance bill as joint liability guarantee and real right guarantee of suppliers, greatly reducing the credit risk of business [10]. Nowadays, the financial market is becoming more and more diversified, which obviously can not meet the needs of risk management. Monetary supervision institutions have developed a rating system method, which is mainly used to assess the adequacy of loan loss reserve. According to the rating system method, loans are divided into two grades: low-quality grade and high-quality grade [11]. There are five categories: special attention, substandard, suspicious, loss and qualified. The two grades include four and one categories of loans respectively. The loan loss reserve requirements for each category are as Table 1.

Table 1
Loan loss reserve requirements for each category

Loan level	Loan category	Loan loss reserve requirements
Low quality level	Special attention	0%
	Not up to standard	20%
	Suspicious	50%
	Loss	100%
High quality level	Qualified	0%

Figure 1.

Collection mode of science and technology credit risk.

Table 1 shows that the low-quality levels are divided into special attention, not up to standard, suspicious, and loss. The loan loss reserve requirements are 0%, 20%, 50%, and 100% respectively. The high-quality level is qualified. The loan loss reserve requirement is 0%. The rating system method directly makes a simple qualitative analysis of the financial and operating data of enterprises, which has limitations. Although the corresponding analysis of financial indicators, but there is no clear index risk of different weights, there is no establishment of different weights of multivariate index evaluation system [12]. From the index variables selected by the credit risk measurement model. In modern sense, credit risk includes the risk of assets loss caused by the direct default of the counterparties and the change of the possibility of default of the counterparties. Because the credit portfolio will not only suffer losses due to the direct default of the counterparties (including loan borrowers, bond issuers, etc.), but also lead to changes in the market value of the portfolio assets when the counterparties do not default but the credit status changes (such as credit rating reduction, profitability decline, etc.), which may give the creditors (portfolio assets holding). The small probability event of enterprise default and the asymmetry of loan income and loss lead to the deviation of probability distribution of credit risk [13]. The fluctuation of market price is centered on its expectation and mainly concentrates on the two sides of the market. Generally speaking, the return distribution of market risk is relatively symmetrical, which can be roughly described by the normal distribution curve [14]. In contrast, the distribution of credit risk is asymmetric, but biased. One end of the income distribution curve inclines downward to the left, and a fat tail appears on the left, the probability distribution of credit risk is as Fig. 2.

Figure 2.

Probability distribution of credit risk.

As for the credit risk of science and technology, this paper studies the risk of default, and takes the “financial distress” of science and technology credit as the sign of “default”. Based on the existing literature, whether a company is in financial distress is generally judged by “special treatment company/normal company” and measured by financial data [15]. Credit risk prediction index system is the basis for enterprises to evaluate credit risk. Before constructing the index system, we need to understand the basic principles of construction, as follows:

The content of the index system should comprehensively reflect all the factors that affect the credit status of the forecast object, not only consider the existing performance, but also predict the future development trend, not only study the forecast object itself, but also consider the surrounding environment and its impact.

When constructing the index system, it is necessary to ensure the comparability of the selected indexes in a wide range, and turn the overall situation of the enterprise which is not comparable and quantifiable into the index data which can be calculated and compared [16].

At present, the construction of index system mostly uses quantitative financial index data, but there are still some that can not be obtained through statistics, such as the quality of leaders, market competitiveness and so on, which can only be judged qualitatively.

The selection of indicators should be objective or real, and the data must be as accurate as possible, and easy to understand and standardize, so as to ensure the effectiveness of the whole forecast.

It is necessary to comprehensively consider the degree of information reflection and the cost of sample data collection, and understand the evaluation indicators of scientific and technological credit risk prediction to ensure easy access to data and economic feasibility.

2.2 Evaluation index of science and technology credit risk prediction

With the deepening of the research on credit risk, the definition of credit risk has changed accordingly. Credit risk refers to the risk of loss caused by the debtor’s default or the change of the debtor’s credit rating or performance ability [17]. For business, this risk mainly comes from the possibility of the borrower’s default or credit deterioration, which makes the creditor suffer losses. Generally speaking, credit risk is credit risk, which can be regarded as a kind of default risk, that is the possibility that the debtor is unable to repay or unwilling to perform the debt service agreement, resulting in the loss of creditors. Another view is that credit risk can be divided into broad sense and narrow sense [18]. The narrow sense of credit risk is consistent with the traditional point of view, while the broad sense of credit risk refers to all the risks caused by the default of customers, including the deterioration of asset quality caused by the borrower’s failure to repay the principal and interest on time in the asset business, the formation of crowding by the depositors in the liability business, the aggravation of payment difficulties, and the default of the counterparties in the off balance sheet business debt [19]. The third point of view is that credit risk refers to the possibility of loss caused by the default of the borrower or the market counterparties, including the possibility of loss caused by the change of the market value of the debt caused by the change of the borrower’s credit rating and performance ability. In the context of supply chain finance, business began to gradually weaken the assessment of financial indicators of small and medium-sized financing enterprises, but pay more attention to the overall supply chain, core enterprises and logistics service providers. According to the above scoring results and expert opinions, the index selection after the second screening is in line with the principles of comprehensiveness, scientificity and operability [20]. The final index system of technology credit risk evaluation of online supply chain based on small and medium-sized enterprises is constructed, which is composed of 4 first level indicators, 17 second level indicators and 24 third level indicators. The evaluation index system of science and technology credit risk is as Table 2.

Table 2
Evaluation index system of science and technology credit risk

First level indicators	Secondary indicators	Third level indicators	Index description
Small and medium sized financing enterprises	Basic quality of enterprises	Management level (status)	Management system, organizational structure, assessment system, etc
	Reputation status	Default rate	Number of defaults/total number of transactions
	Operating capacity	Accounts receivable turnover	Log (average balance of sales revenue/ accounts receivable)
		Inventory turnover	Log (cost of sales/average net inventory)
	Solvency	Current ratio	Current assets/current liabilities
		Asset liability ratio	Total liabilities/total assets
	Profitability	Sales profit margin	Sales profit/sales revenue
	Growth ability	Sales growth rate	Current sales revenue growth/previous sales revenue
		Asset growth rate	(Current total assets-previous total assets)/ previous total assets
	Pledge and characteristics of accounts receivable	Price stability	Quarterly fluctuation
		Aging and accounting period	Average aging
Core enterprise	Credit status	Credit rating of core enterprises	Bank credit rating
		External guarantee	The dispersion degree of guarantee amount and the strength of the guaranteed enterprise
	Solvency	Current ratio	Current assets/current liabilities, compared with the same industry
	Profitability	Sales profit margin	Sales profit/sales revenue, compared with the same industry score
	Platform construction	Degree of online platform construction	Smooth connection with commercial banks and real-time information transmission

Different financial indicators in scientific and technological credit financial data reflect the financial situation of enterprises from different aspects. Most domestic research on credit risk prediction use few financial indicators, which are based on previous scholars’ research and select the same financial indicators. Too few financial indicators can not reflect the financial situation of the company comprehensively [21]. In addition, considering the availability of data Combining with the previous research of domestic and foreign scholars, this paper selects as many financial indicators as possible to build the model from seven aspects of China’s science and Technology Credit: per share index, capital structure, cash flow, solvency, operating ability, profitability and risk prevention ability. Among them, solvency refers to the ability of an enterprise to repay its debt when it matures. Solvency is the basic premise to ensure the survival and sustainable development of an enterprise, and is an important analysis index of enterprise credit. Solvency reflects the financial status and operation ability of an enterprise. The stronger the solvency, the better the financial status and operation ability of an enterprise. There are many financial indicators used to explain the solvency of an enterprise. Generally, the indicators used to reflect the solvency are: current ratio, express ratio, cash ratio, property right ratio, interest protection ratio, net asset liability ratio, etc. The following six indicators are selected to measure the solvency. The indicators of solvency of science and technology enterprises are as Table 3.

Table 3

Indicators of solvency of science and technology enterprises

Index name	Name of secondary indicators	Symbol	Calculation formula
Solvency	Current ratio	X1	Current ratio assets/current liabilities
	Express ratio	X2	(Current assets, inventory)/current liabilities
	Cash ratio	X3	(Monetary capital $+$ securities)/current liabilities
	Equity ratio	X4	Liabilities/owners’ equity
	Interest cover	X5	(Net profit $+$ income tax $+$ financial expenses)/financial expenses
	Net asset liability ratio	X6	Total liabilities/total net assets

From Table 3, it can be seen that solvency can be divided into current ratio, express ratio, cash ratio, equity ratio, interest cover, net asset liability ratio. They are represented as current ratio assets/current liabilities, (current assets, inventory)/current liabilities, (monetary capital $+$ securities)/current liabilities, liabilities/owners’ equity, (net profit $+$ income tax $+$ financial expenses)/financial expenses, total liabilities/total net assets. Profitability refers to the capital appreciation ability of an enterprise to earn profits. The stronger the profitability of an enterprise, the higher the profits it earns, and the stable survival and development of the enterprise. There is a strong positive correlation between profitability and solvency. The stronger the profitability, the higher the solvency. There are many indicators to explain profitability. Generally, the indicators used to explain profitability are: return on net assets, return on total assets, net interest rate on sales, profit rate on cost and expense, net interest rate on total assets, net interest rate on business, etc. The following six indicators are selected to measure profitability. The profitability indicators of science and technology enterprises are at Table 4.

Table 4

Profitability index of science and technology enterprises

Index name	Name of secondary indicators	Symbol	Calculation formula
Profitability	Earnings per share	X7	Net profit/number of common shares at the end of the period
	Return on net assets	X8	Net profit/average shareholders’ equity
	Return on total assets	X9	Total profit/total assets
	Net profit rate of sales	X10	Net profit/operating income
	Cost profit margin	X11	Total profit/total cost
	Net interest rate of total assets	X12	Net profit/total assets
	Net operating rate	X13	Net profit/operating income
	Operating profit margin before	X14	(net profit $+$ income tax $+$ financial expenses)/operating income
	interest and tax

Operational capability refers to the operational capability of an enterprise. Operation ability includes the ability to manage the capital of an enterprise. The strength of operation ability depends on the speed of capital circulation. The faster the capital circulation speed of small and medium-sized enterprises is, the higher the efficiency of asset utilization is, the more profits the enterprises will get in a certain period of time, and the stronger their operation ability will be. Generally, the indicators reflecting the operating capacity include: inventory turnover rate, accounts receivable turnover rate, total assets turnover rate, current assets turnover rate, accounts payable turnover rate, etc. The following six indicators are selected to reflect the operating capacity. The index of operation ability of science and technology enterprises is as Table 5.

Table 5

Index of operation ability of science and technology enterprises

Index name	Name of secondary indicators	Symbol	Calculation formula
Operating capacity	Inventory turnover	X15	Main business cost/average inventory balance
	Accounts receivable turnover	X16	Operating income/average accounts receivable
	Turnover of total assets	X17	Operating income/average total assets
	Turnover of current assets	X18	Operating income/ending balance of current assets
	Accounts payable turnover	X19	Operating cost/closing balance of accounts receivable
	Working capital turnover	X20	Operating income/working capital

Risk defense capability is the growth capability of small and medium-sized enterprises in the development process. Compared with large enterprises, small and medium-sized enterprises have smaller asset scale and lower anti risk ability. Risk defense ability is the core index of credit risk of small and medium-sized enterprises. This index is related to the future of the enterprise, and can reflect the future development speed and future value of the enterprise. Therefore, the analysis of risk prevention ability should be added to the index system. Generally, the indicators reflecting the enterprise’s risk defense capability include: growth rate of operating revenue, growth rate of operating profit, growth rate of net assets, growth rate of total assets, growth rate of net profit, etc. The following six indicators are selected to measure the risk defense capability. The index of risk defense capability of science and technology enterprises is as Table 6.

Table 6

Index of risk defense capability of science and technology enterprises

Index name	Name of secondary indicators	Symbol	Calculation formula
Growth ability	Growth rate of operating revenue	X21	(Current operating income $-$ previous operating income)/ previous operating income
	Growth rate of operating profit	X22	(Current operating profit $-$ previous operating profit)/previous operating income
	Growth rate of net assets	X23	(Ending net assets initial net assets)/beginning net assets
	Growth rate of total assets	X24	(Ending total assets $-$ beginning total assets)/beginning net assets
	Growth rate of R&D expenses	X25	(Current R&D expenses $-$ previous R&D expenses)/previous R&D expenses
	Net profit growth rate	X26	(Current net profit $-$ previous net profit)/previous net profit

The ability to obtain cash mainly refers to the ability to obtain cash from current operating activities. Having enough cash flow is the base for SMEs to repay their debts, because cash flow is the direct source of debt repayment. The ability to obtain cash is also a powerful guarantee for the future development of small and medium-sized enterprises and an important factor in the analysis of enterprise credit. Therefore, the analysis of the ability to obtain cash should be added to the index system. The indicators usually used to reflect an enterprise’s ability to obtain cash include: the proportion of net cash flow from operating activities, net cash content of operating income, net cash content of net profit, etc. The following two indicators are to be used to reflect the enterprise’s ability to obtain cash. The indicators of the ability of science and technology enterprises to obtain cash are as Table 7.

Table 7

Index of the ability of science and technology enterprises to obtain cash

Index name	Name of secondary indicators	Symbol	Calculation formula
Cash access	Proportion of net cash flow from	X27	Net cash flow from operating activities/total net cash flow
	operating activities
	Net cash content of operating income	X28	Net cash flow from operating activities/operating income
	Net profit net cash content	X29	Net cash flow/net profit from operating activities

To sum up, the financial indicators in the credit risk prediction index system are shown as Table 8.

Table 8

Financial indicators in the credit risk prediction index system

Index name	Name of secondary indicators	Symbol
Solvency	Current ratio	X1
	Express ratio	X2
	Cash ratio	X3
	Equity ratio	X4
	Interest cover	X5
	Net asset liability ratio	X6
Profitability	Earnings per share	X7
	Return on net assets	X8
	Return on total assets	X9
	Net profit rate of sales	X10
	Cost profit margin	X11
	Net interest rate of total assets	X12
	Net operating rate	X13
	Operating profit margin before interest and tax	X14
Operating capacity	Inventory turnover	X15
	Accounts receivable turnover	X16
	Turnover of total assets	X17
	Turnover of current assets	X18
	Accounts payable turnover	X19
	Working capital turnover	X20
Growth ability	Growth rate of operating revenue	X21
	Growth rate of operating profit	X22
	Growth rate of net assets	X23
	Growth rate of total assets	X24
	Growth rate of R&D expenses	X25
	Net profit growth rate	X26
Cash access	Proportion of net cash flow from operating activities	X27
	Net cash content of operating income	X28
	Net profit net cash content	X29

Compared with the discriminant analysis method, the regression analysis method has relatively loose requirements on the data distribution of personal credit evaluation indicators, and the regression analysis method can get the probability of default in the personal credit risk assessment, while the discriminant analysis method has requirements on the data distribution of personal credit evaluation indicators. Among them, the logical regression method is suitable for the problem that the explanatory variables are qualitative indicators. The classification effect of personal credit risk assessment is better than that of discriminant analysis, and regression is one of the most widely used models in the field of personal credit risk assessment.

By improving the regression model, complete the budget of technology credit risk.

2.3 The realization of science and technology credit risk budget

Different financial indicators in the financial data of science and technology credit reflect the financial status of enterprises from different aspects. In order to study the credit risk of enterprises comprehensively through the financial data system, 106 financial indicators are selected as variables from seven aspects of profitability, solvency, risk defense, cash flow, capital structure, operating capacity and per share indicators of building a model.

Profitability is the ability of an enterprise to obtain profits. Profit is the capital source for investors to obtain investment income and creditors to collect principal and interest. It is an important index to measure the long-term development ability of enterprises.

Solvency index refers to the ability of an enterprise to repay its long-term debt and short-term debt with its assets; dynamically speaking, it refers to the ability to repay its debt with its assets and the income created by the operation process.

The index of risk defense capability is a longitudinal analysis of the various indexes of an enterprise compared with those in previous years. Through the analysis of the index of risk defense capability, we can roughly judge the change trend of an enterprise, so as to make an accurate prediction of the future development of the enterprise.

Cash flow indicators reflect cash flow information. The financial indicators calculated according to the accrual basis principle of the current accounting system can not fully reflect the real financial and profit situation of science and technology credit, and the indicators expressed by cash flow can complement each other.

Capital structure refers to the value composition and proportional relationship of various kinds of capital in an enterprise, and it is the result of financing combination in a certain period of time.

Operation ability refers to the operation ability of an enterprise, that is the ability of an enterprise to make profits by using various assets.

The per share index refers to the relevant financial index of the enterprise, reflecting the financial situation of each share. Calculate the value and volatility of the company’s assets:

$\displaystyle V_{E}=V_{A}N({d_{1}})-e^{rT}DN({d_{2}})$ (1)

Among them:

$\displaystyle d_{1}=\frac{\ln({V_{A}/D})+({r+\sigma_{A}^{2}/2})T}{V_{E}\sigma_% {A}\sqrt{T}}$ (2) $\displaystyle d_{2}=d_{1}-\sigma_{A}\sqrt{T}$ (3) $\displaystyle\sigma_{E}=\frac{V_{A}}{V_{E}}N({d_{1}})\sigma_{A}$ (4)

where $N$ is the standard normal distribution function, $V_{E},\sigma_{E},D$ and $r$ is equity value, total negative volatility and risk-free return, which are known or can be calculated from market observation data, $T$ is the repayment period, and it is a certain value.

To deal with the credit risk indicators of technology-based small and micro enterprises in a consistent and non-quantified manner, construct a fuzzy measurement matrix to obtain the weight of each measurement indicator of credit risk, analyze the comprehensive influence matrix of credit risk, and defuzzify the initial value. Calculate the probability of the error rate of each base learner under the distribution of its prediction error, namely:

$\displaystyle\zeta_{t}=H_{x}-M_{t}$ (5)

The RUSBoost algorithm will finally output a weighted linear combination of T base learners:

$\displaystyle L_{t}=\zeta_{t}({V_{E}\sqrt{T}-NM_{t}})$ (6)

Only the asset value $V_{A}$ and asset volatility a are unknown variables, which can be solved by iterative method:

$\displaystyle D_{O}=\frac{E({V_{A}^{T}-\textit{DPT}})}{V_{E}\sigma_{E}}$ (7)

Regression model, penalty method and principal component analysis method are based on the ordinary linear model to do the corresponding extension, so this chapter first introduces the ordinary linear model. On this basis extends to the penalty method and logistic regression model, the combination of the two is the penalty logistic model, and finally introduces the principal component analysis method. General linear model:

$\displaystyle Y=X\beta+D_{O}\varepsilon$ (8)

The dependent variable $y$ represents the possibility of the event, and $y\in R$ . If the critical point is set to 0.5, when the dependent variable $y>$ 0.5, $y=$ 1, the event occurs; When the dependent variable $y=$ 0.5 the event does not occur, as shown in the formula:

$\displaystyle y_{i}=\left\{{{\begin{array}[]{*{20}c}1&{y_{i}^{\ast}>0.5}\\ 0&{y_{i}^{\ast}\leqslant 0.5}\\ \end{array}}}\right.$ (9)

If the dependent variable, $y$ and the independent variable, $x$ are linear, the expression is as follows:

$\displaystyle y^{x}=Y\alpha+\beta x_{i}+\varepsilon_{i}$ (10)

Let $P$ be the probability of the occurrence of an event:

$\displaystyle P({y_{i}=1\mid x_{i}})=y^{x}P[{({\alpha+\beta x_{i}+\varepsilon_% {i}})>0}]$ (11)

If the error term is logistic distribution, we can get the following results:

$\displaystyle P({y_{i}=1\mid x_{i}})=P[{\varepsilon_{i}\leqslant({\alpha+\beta x% _{i}})}]=\frac{1}{1+e^{-\varepsilon_{i}}}$ (12)

According to the index system, 29 financial indicators and 10 non-financial indicators are selected. According to the ST system of our country, using the data of t-1 year to predict the financial situation of enterprises in t year will obviously exaggerate the prediction ability of the model. At the same time, in order to take into account the influence of the situation of enterprises before t-1 year on the current credit risk of enterprises, the financial data of t-2, t-3 and t-4 years are used to predict the credit risk of enterprises in t year, so the final financial sample is obtained The index part is a matrix of 29 $\times$ 3; Since the non-financial data of the enterprise is basically unchanged, it can be regarded as static data, so the input of this part is a vector of 10 $\times$ 1. The ultimate goal of prediction is to judge whether the enterprise will have credit risk in the t-Year, so the result output should be normal (quantified as 0) or credit risk (quantified as 1). According to the above analysis, the input and output forms of the network are as follows:

$\displaystyle\textit{Input}\$_{1}=\left\{{{\begin{array}[]{*{20}c}{x_{1}t_{1},% }&{x_{1}t_{2},}&{x_{1}t_{3}}\\ {x_{2}t_{1},}&{x_{2}t_{2},}&{x_{2}t_{3}}\\ &{\ldots}&\\ {x_{29}t_{1},}&{x_{29}t_{2},}&{x_{29}t_{3}}\\ \end{array}}}\right.$ (13) $\displaystyle\textit{Input}\$_{2}=\left\{{{\begin{array}[]{*{20}c}{x_{30}}\\ {x_{32}}\\ {\cdots}\\ {x_{39}}\\ \end{array}}}\right.$ (14) $\displaystyle\textit{output}=\left\{{{\begin{array}[]{*{20}c}{{\rm{\bf y}}_{1}% }\\ {{\rm{\bf y}}_{2}}\\ \end{array}}}\right.$ (15)

To standardize the index data, that is to convert the original value of the index into the standardized data between F, and to standardize the index can eliminate the influence of the index dimension. The standardization methods of positive indicators are as follows: $x_{ij}$ the standardized value of the $i$ enterprise and the $j$ index; $v_{ij}$ the original value of the $i$ enterprise and the $j$ index; $n$ the total number of enterprises. For positive indicators, the larger the value is, the better the enterprise credit is, such as “quick ratio”. Then the standardized value $x_{ij}$ of the positive index is:

$\displaystyle x_{ij}=\frac{v_{ij}-\min_{1\leqslant i\leqslant n}({v_{ij}})}{% \max_{1\leqslant i\leqslant n}({v_{ij}})-\min_{1\leqslant i\leqslant n}({v_{ij% }})}$ (16)

For negative indicators, the larger the value is, the worse the enterprise credit is, such as “asset liability ratio” and other indicators. The standardized values of negative indicators are as follows:

$\displaystyle x_{ij}=\frac{\max_{1\leqslant i\leqslant n}({v_{ij}})-v_{ij}}{% \max_{1\leqslant i\leqslant n}({v_{ij}})-\min_{1\leqslant i\leqslant n}({v_{ij% }})}$ (17)

Considering the problem of sample data size, and the initial index system has too many input indexes, it will not only waste computing resources, reduce the operation efficiency, but also reduce the training effect of multicollinearity between indexes. Therefore, the financial indexes are screened twice, and then combined with the processed non-financial indexes as the model input. The process of screening and forecasting technical credit risk indicators is in Fig. 3.

As shown in Fig. 3, due to the non-uniform dimension of indicators and the lack of data, we need to preprocess the data first. Significance test of indicators: in the case of normal distribution of data, independent sample $t$ test is used to test the differences of indicators and form the main indicator system. On the basis of the main index system obtained by $t$ -test, factor analysis is carried out to eliminate the collinearity between the indexes, so as to effectively guarantee the effect of science and technology credit risk prediction and improve the prediction accuracy.

3. Experimental results

In order to verify the practical application effect of the technology credit risk prediction algorithm, the experimental detection is carried out. The experimental data is mainly integrated, preprocessed and feature extracted on the big data financial cloud platform. Using python programming language to achieve the relevant classification algorithm. The following will introduce the whole process of the experiment and the experimental results of the improved xgboost algorithm. In this paper, 106 financial indicators of 63 science and technology credit companies in China are selected as research samples. Firstly, the financial indicators with strong correlation with other financial indicators are eliminated. Then, based on the conclusion of Chapter 5, the elasticnet method is selected to screen the financial indicators. The screened financial indicators are used as new model variables, and the principal components are extracted by principal component analysis. The components are used as covariates to construct logistic regression model, then the parameters are estimated and the model is tested. Finally, the prediction effect of the model on China’s science and technology credit risk is evaluated by confusion matrix and ROC curve. For the binary classification problem of imbalanced data sets, the confusion matrix is usually obtained. The risk index confusion matrix is as Table 9.

Table 9
Risk index confusion matrix

	Samples of positive classes	Samples of negative classes	Total number of samples
Forecast positive class	IP	FP	A
Forecast negative class	FN	EN	B
Forecast sample	C	D	C

Figure 3.

Process of screening and forecasting technology credit risk indicators.

According to the confusion matrix, the accuracy of classification can be obtained by the following formula:

$\displaystyle\textit{ACC}=\frac{TP+TN}{TP+FP+FN+TN}$ (18)

The development environment mainly includes the development platform and the development framework. In the development platform, pycharm is selected to build the convolution neural network model of Python version. The deep learning framework adopted is tensorflow, which is currently the most active deep learning framework developed by Google. It uses data flow graph to calculate, where nodes represent mathematical operations and lines represent the interaction between tensors. At present, tensorflow can be deployed on one or more CPUs, GPUs, or mobile devices. The configuration of the development platform is as Table 10.

Table 10

Development platform configuration

Processor	2.2 GHz Intel Core i7
Memory	64 GB
Hard disk	512 GB SSD
Graphics card	Intel Iris Pro 1536 MB

The five main components are used as covariates to construct the logistic regression model. The ROC curve of the prediction results of the model is as Fig. 4.

Figure 4.

ROC curve of model prediction results.

The final logistic regression model can be seen from Fig. 4. The regression model has a good fitting degree to the credit risk of 16 science and technology credit companies, and the prediction effect is better. Its AUC value reaches 86.1%, which is a good state.

In order to improve its robustness, when using xgboost algorithm for pre training, the experiment carried out 30 cycles to get the feature importance score in the dataset and sort them. Then, the individual classifiers, including xgboost, Ann and LR, and the benchmark model for comparative analysis are iterated for 36 times, and the prediction results of these experiments are averaged to reduce the impact of separate partition. Get the class distribution of training data and test data as Figs 5 and 6.

Figure 5.

Class distribution ratio of training data.

Figure 6.

Class distribution ratio of test data.

According to Figs 5 and 6, count the amount of missing values of each sample in the training set and test set. According to the order of the number of missing values from small to large, with the serial number as the abscissa and the missing values as the ordinate, draw the distribution map of the number of missing values in the training set test set as Figs 7 and 8.

Figure 7.

Distribution of missing values in training set.

Figure 8.

Distribution of missing values in test set.

In Figs 7 and 8, the black curve is the scatter diagram of missing values, and the blue curve is the cumulative number of negative samples (i.e. default samples) It is a strict straight line, which indicates that there is a certain correlation between the sample category and the number of missing values.

Based on the constructed credit risk prediction system and the results of simulation experiments, a number of science and technology credit companies are selected as research samples for model construction and credit risk prediction, and 100 times of cyclic training are conducted. The variables selected into the model for the first time are 106 financial index variables, which include technology credit profitability, debt paying ability, risk defense ability, cash flow, capital structure, operating ability and per share index for comprehensive calculation. Through the correlation analysis of multiple financial index variables, 8 financial index variables with strong correlation with other variables are eliminated. On this basis, the elasticnet penalty method is used to screen the model indicators. The 11 significant financial index variables are further analyzed by principal component analysis, and finally five principal components are obtained. The logistic regression model is constructed to predict the risk, and the experimental results are satisfactory. The results show that the number of data is obviously reduced, the ability of risk prediction is obviously improved, and the algorithm has better practical application effect.

4. Conclusion

Credit risk is a hot issue in financial risk management, which is closely related to national macroeconomic form and international development trend. The emergence and development of Internet credit and other emerging business models have brought a huge impact on financial technology. How to give full play to the value of scientific and technological credit data has become an urgent problem for data mining technology. Scientific and technological credit risk assessment with scientific and technological data technology as the core can build a more accurate and wider scientific and technological credit risk model, which will become a powerful supplement and development trend of traditional credit assessment system.

In order to verify the practical application effect of the technical credit risk prediction algorithm, experimental testing was carried out. The elastic net method is selected to screen financial indicators, and the selected financial indicators are used as new model variables. The confusion matrix and ROC curve are used to evaluate the prediction effect of the model on China’s science and technology credit risk. For the binary classification problem of unbalanced data sets, a confusion matrix is usually obtained. The classification accuracy is calculated according to the confusion matrix. The logistic regression model shown in Fig. 4 is constructed to predict the credit risk of technology credit companies, and the prediction effect is better. The experiment carried out 30 cycles to get the feature importance score in the dataset and sort them. Then, the individual classifiers, including xgboost, Ann and LR, and the benchmark model for comparative analysis are iterated for 36 times, and the prediction results of these experiments are averaged to reduce the impact of separate partition. Calculate the number of missing values for each sample in the training set and test set. According to the order of the number of missing values from small to large, with the serial number as the abscissa and the missing value as the ordinate, draw a distribution map of the number of missing values in the test set of the training set. It can be seen that there is a certain correlation between the sample category and the number of missing values. The amount of data is significantly reduced, and the risk prediction ability is significantly improved. The algorithm has a good practical application effect.

This paper analyzes the challenges faced by science and technology credit risk assessment, expounds the key technologies and applications of data mining in the process of data selection, preprocessing and credit risk modeling, and puts forward some ideas for the future research direction of science and technology credit risk assessment.

Footnotes

Acknowledgments

This work is supported by the open funding project of Financial Synergy Innovation of Science and Technology Center in Hebei Province and Science and Technology Finance Key Laboratory of Hebei Province in 2020 (Grant No. STFCIC202003).

References

Zhang

. Factors influencing the allocation of regional sci-tech financial resources based on the multiple regression model. Math Probl Eng. 2021; 2021(4): 1-9.

Huang

Liu

. Ultralow lattice thermal conductivity and high thermoelectric performance of monolayer KCuTe: A first principles study. RSC Adv. 2019; 9(62): 36301-36307.

Geddes

Schmidt

. Integrating finance into the multi-level perspective: Technology niche-finance regime interactions and financial policy interventions. Res Policy. 2020; 49(6): 103985.

Lee

Han

Sciubba

. Analysis of climate mitigation technology and finance in relation to multilateral development banks. Energ. 2020; 13(2): 311-312.

Shao

Huang

. Blockchain-enabled logistics finance execution platform for capital-constrained E-commerce retail. Rob Comput-Integr Manuf. 2020; 65(10): 101962.

Kaufmann

. Editorial introduction on complexity and big data in economics and finance: Recent developments from a Bayesian perspective. J Econometrics. 2019; 210(1): 1-3.

Lee

Han

. Analysis of climate mitigation technology and finance in relation to multilateral development banks. Energ. 2020; 13(2): 311.

Dijk

Lounsbury

Hoekstra

Wang

. Strategic design and finance of rainwater harvesting to cost-effectively meet large-scale urban water infrastructure needs. Water Res. 2020; 184(11): 116063.

Skonieczny

Shukla

Faragalli

Cole

Iagnemma

. Data-driven mobility risk prediction for planetary rovers. J Field Rob. 2019; 36(2): 475-491.

10.

Slavkovic

Bjelica

. Risk prediction algorithm based on image texture extraction using mobile vehicle road scanning system as support for autonomous driving. J Electron Imaging. 2019; 28(3): 033034.

11.

Hsu

Zhao

Huang

Tsui

. A novel approach for fall risk prediction using the inertial sensor data from the timed-up-and-go test in a community setting. IEEE Sens J. 2020; 20(16): 9339-9350.

12.

Chen

Yuan

. Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques. Int J Med Inform. 2021; 149(1): 104429.

13.

Wang

. A driver’s physiology sensor-based driving risk prediction method for lane-changing process using hidden markov model. Sens. 2019; 19(12): 2670.

14.

Xie

Wang

Song

. Investigate contribution of multi-microseismic data to rockburst risk prediction using support vector machine with genetic algorithm. IEEE Access. 2020; 8(10): 58817-58828.

15.

Sperrin

Martin

Ashcroft

Van Staa

. Examining the impact of data quality and completeness of electronic health records on predictions of patients’ risks of cardiovascular disease. Int J Med Inform. 2020; 133: 104033.

16.

Baek

Jeong

Choi

Lee

. Vehicle trajectory prediction and collision warning via fusion of multisensors and wireless vehicular communications. Sens. 2020; 20(1): 288-289.

17.

Chen

Sawan

. Trends and challenges of wearable multimodal technologies for stroke risk prediction. Sens. 2021; 21(2): 460.

18.

Kakileti

Madhu

Manjunath

Wee

Dekker

Sampangi

. Personalized risk prediction for breast cancer pre-screening using artificial intelligence and thermal radiomics. Artif Intell Med. 2020; 105(4): 101854.

19.

Casiraghi

Malchiodi

Trucco

Frasca

Valentini

. Explainable machine learning for early assessment of COVID-19 risk prediction in emergency departments. IEEE Access. 2020; 8(5): 196299-196325.

20.

Luo

Jin

Liu

Yang

. Dosimetric evaluation and second primary cancer risk prediction after three-dimensional ultrasound-guided radiotherapy of cervical cancer. Int J Radiat Oncol Biol Phys. 2019; 105(1): e743-e744.

21.

Park

Lee

Cho

Wang

Kim

Lee

. Sleep prediction algorithm based on machine learning technology. Eur Neuropsychopharmacol. 2019; 29: S514.

The prediction algorithm of credit risk of science and technology finance based on cloud computing

Abstract

Keywords

1. Introduction

2. Credit risk prediction algorithm of science and technology finance

2.1 Collection of credit risk characteristic index of science and technology finance

Table 1 Loan loss reserve requirements for each category

Table 2 Evaluation index system of science and technology credit risk

Table 9 Risk index confusion matrix

Footnotes

Acknowledgments

References

Table 1
Loan loss reserve requirements for each category

Table 2
Evaluation index system of science and technology credit risk

Table 9
Risk index confusion matrix