Enterprise financial early warning based on ensemble learning and stacked generalization fusion algorithm model

Abstract

With the growing complexity of enterprise financial data, traditional financial warning models face limitations in handling large datasets and outliers. This study proposes a novel financial warning model integrating ensemble learning and stacked generalization. A two-layer fusion model is constructed using stacking generalization, while SMOTETomek addresses data imbalance. Model parameters are optimized via grid search and five-fold cross validation. Experimental results demonstrate superior performance, with an average accuracy of over 90%. The accuracy on the training and testing sets reach 0.93 and 0.95, respectively. The model achieves a low false positive rate (3.8%) and false negative rate (3.2%) in the low debt category, outperforming comparison models. It also exhibits high resource efficiency and low time costs, making it an ideal tool for enterprise financial early warning. The model aids in identifying financial risks, enabling proactive response strategies, promoting healthy financial management, and enhancing market stability.

Keywords

ensemble learning stacked generalization fusion enterprise financial warning machine learning SMOTETomek technology

Introduction

Building accurate and timely financial warning systems has become an important component of modern enterprise financial management.¹ An efficient financial warning system can not only help businesses identify potential risks and ensure financial stability, but also improve overall operational efficiency and competitiveness in a fierce and unpredictable market environment. As information technology and big data advance rapidly, traditional financial warning models are no longer able to cope with the diversity and complexity of enterprise financial data.² Recently, the utilization of machine learning and ensemble learning (EL) in financial early warning has gradually become a research hotspot. These emerging technologies can demonstrate significant advantages in model performance, prediction accuracy, and generalization ability. Machine learning algorithms analyze and predict financial data in a data-driven manner, capturing potential nonlinear relationships.³ The integrated learning method effectively reduces the deviation and variance of a single model by integrating multiple basic models, and enhances the robustness and reliability of the forecasting system.⁴ The enterprise financial warning system predicts the company’s upcoming financial condition, helps the management to identify potential financial risks beforehand and implement prompt countermeasures, and is an important component of enterprise management.⁵ The instability of the economic market makes financial risk management for enterprises more complex and necessary. An effective financial warning system can improve the consistency of the monetary market, especially for listed companies. Accurate financial warnings contribute to the sound progression of both the company and the financial market. Against the backdrop of increasingly volatile economic markets and accelerating globalization, the financial risks faced by enterprises have become more complex, and traditional financial analysis methods can no longer fulfill the risk management requirements of modern enterprises.⁶

Numerous researchers globally have investigated improving the accuracy of financial forecasting for enterprises. To compensate for the shortcomings of a single machine learning model in predictive performance, Yi combined Support Vector Machine (SVM) with Backpropagation Neural Network (BPNN) to construct a business financial and budgetary advance alert system. The simulation experiment outcomes showed that the combined model surpassed the single algorithm model in the context of convergence speed and prediction accuracy, and had a positive impact on the fiscal stability of publicly traded firms.⁷ Jiao believed that the rational utilization of financial resources is helpful for achieving business goals. Therefore, he studied the use of chaotic particle swarm optimization algorithm to improve the BPNN and constructed a financial management early warning model for circular economy in enterprises. The experiment showed that the model had low error and fast convergence speed, significantly better than traditional models, and had high application value in enterprise financial management.⁸ Li developed an enterprise financial risk identification system grounded on logistic regression models to handle the problem of low recognition accuracy in existing financial warning models. By comparing companies with losses and those with normal financial conditions, it was found that the model had an accuracy of 94.68% in identifying financial risks in enterprises. The high accuracy of the model in identifying financial risks in scientific research enterprises was verified, providing an effective risk management tool for enterprise management.⁹ Li and Chen proposed a financial distress prediction framework grounded on the Improved Fruit Fly Optimization Algorithm (IFOA) and studied the use of quantum computing to select financial crisis indicators and optimize the FOA. By optimizing the diversity of solutions and individual cross performance, the prediction accuracy and convergence speed of the model was significantly proposed, rendering it appropriate for financial crisis warning in the manufacturing industry.¹⁰ Zeng proposed a financial risk warning model grounded on the Internet of Things to improve the financial management capabilities of enterprises. By using BPNN to mine financial data and input the model, mobile edge computing (MEC) service was introduced to improve the timeliness of financial information processing. Experiments showed that optimizing service preloading improved the response speed of the model and had important reference value for enterprise financial management.¹¹

Current research shows that the financial early warning model grounded on integrated learning shows stronger adaptability and prediction performance when dealing with complex and changeable financial data compared with the traditional single model.¹² By integrating multiple algorithms and data processing techniques, EL models can validly enhance the performance of early warning systems.¹³ Gao et al. proposed a reformed hybrid Bayesian network structure learning approach grounded on EL, which is particularly suitable for small datasets. The study used elite-based structural learners and genetic algorithms as the base learners, weighted and averaged adjacency matrices, and then filtered according to a preset threshold to get the final Bayesian network structure. The experiment outcomes showed that this algorithm surpassed the others in the contexts of accuracy and reliability.¹⁴ Anisha et al. proposed a classification model grounded on EL for classifying liver lesions on Computed Tomography (CT) images. They improved classification accuracy by deep feature fusion and optimized feature selection, using pre-trained Deep Neural Network (DNN) models, and reached an accuracy of 98.3% through hybrid optimization methods.¹⁵ Zuo et al. proposed a multi-focus image fusion algorithm grounded on random feature embedding (RFE) and EL, which decreased computational workload and raised accuracy. The experiment results showed that the algorithm approximated the kernel function through RFE, eliminated outliers in the decision graph, decreased the possibility of overfitting, and improved the generalization capability of the algorithm.¹⁶ Fei et al. used unmanned aerial vehicle multi-sensor data fusion and EL algorithms to predict wheat yield. By integrating five machine learning algorithms including Cubist, SVM, CNN, Ridge Regression (RR), and Random Forest (RF), the accuracy of crop yield prediction was significantly improved, providing efficient decision support for large-scale agricultural breeding.¹⁷ Hou et al. utilized chaos theory phase space reconstruction and stacked EL methods for rolling load prediction, and combined multiple algorithms to construct a stacked EL model for load prediction, significantly improving prediction accuracy. The experiment results showed that compared with a single model, the integrated learning model had higher accuracy in power load forecasting.¹⁸ To more intuitively demonstrate the characteristics of the relevant research and the advantages of the proposed methods, the relevant work is further organized as shown in Table 1.

Table 1.

Comparative summary table of related research.

Reference	Methodology	Key metrics	Strengths	Limitations
Yi (2022)	SVM + BPNN fusion model	/	Fast convergence speed, better accuracy and recall than single models	Lack of in-depth exploration of feature selection mechanisms
Jiao (2024)	Chaos particle swarm optimization BPNN	Accuracy: 92%	Fast convergence, low error, high prediction accuracy	Only validated in circular economy scenarios
Li Z (2024)	Logistic regression model	Accuracy: 94.68%	Constructing a financial indicator discrimination system with high recognition accuracy	The feature source depends on sample comparison and has limited scalability
Li S & Chen (2024)	Financial crisis early warning model based on IFOA	/	Introducing orthogonal design and quantum computing to enhance global search capability	The model structure is complex and the training cost is high
Zeng (2022)	Financial risk warning model based on internet of things	Accuracy: 75.0%	Introducing a service preloading mechanism to improve response speed	High dependence on MEC and IoT environments, limited applicability scenarios
Gao (2023)	Hybrid Bayesian network structure learning algorithm	/	Suitable for small sample problems, improving the quality of BN learning	No predictive tasks involved
Anisha (2023)	Classification model based on EL	Accuracy: 98.3%	Pre-trained feature optimization with extremely high accuracy	Medical imaging scene specific, with high data requirements
Zuo (2022)	A new algorithm based on RFE and EL	/	Reduce the risk of overfitting	Experimental verification of non-traditional structured financial data
Fei (2023)	Multi model integration + multi-sensor data fusion	$R^{2}$ = 0.692, RMSE = 0.916	Multi source fusion improves prediction performance with low cost and high throughput	Complex data sources and high deployment requirements
Hou (2022)	Rolling load prediction method based on chaos theory phase space reconstruction and stacked EL	/	Adapt to rolling prediction tasks, flexible model fusion	Feature data has strong dependence on sequences
This study	RF + GBDT + XGBoost + DT + SVM (Stacking)	AUC: 0.93, ACC: >90%	High robustness, low FPR/FNR, scalable	High training cost

According to the comparison of relevant works in Table 1, although existing research has made some progress, most models still face challenges such as data imbalance, difficulty in feature selection, and insufficient model generalization ability. In view of this, an innovative enterprise financial warning model based on EL and stacked generalization fusion algorithm is proposed. By integrating multiple machine learning algorithms such as RF, Gradient Boosting Decision Tree (GBDT), eXtreme Gradient Boosting (XGBoost), Decision Tree (DT), and SVM, a high-performance and robust fusion model is constructed. Intended to help enterprises maintain financial health in complex and ever-changing market environments, support the creation of business strategy choices, and provide guarantees for the healthy development of financial markets.

Methods and materials

The study first constructed a financial warning indicator system based on 8 dimensions, aiming to provide comprehensive monitoring and management of the fiscal well-being of companies. Meanwhile, to tackle the problem of unequal data distribution, the study adopted the SMOTETomek sampling method, which combines oversampling and undersampling to ensure the balance of the dataset. In terms of model construction, the study adopted EL and stacked generalization techniques, integrating five machine learning algorithms to construct a two-layer stacked fusion model. Model parameters were optimized through grid search and five-fold cross validation to ensure good robustness and high prediction accuracy of the model.

Construction of enterprise financial early warning indicator system

A comprehensive indicator system is the key foundation for the efficient and accurate financial warning model of a company. Based on the principles of comprehensiveness, operability, comparability, and sensitivity, the study comprehensively considered multiple key dimensions to guarantee the comprehensiveness and accuracy of the model and constructed a preliminary financial warning indicator system, as shown in Figure 1.¹⁹

Figure 1.

Preliminary financial warning indicator system.

In Figure 1, when constructing the enterprise financial early warning system, the study considered eight main dimensions, namely industry classification, operational capability, profitability, debt paying capability, development capability, risk level, market performance, and internal governance, to ensure that the model is comprehensive and accurate.⁵ Financial indicators are the core, directly related to financial crises, and also easy to obtain and operate. Financial losses, insufficient cash flow, and negative net profit are clear signs of a crisis, and these indicators can be obtained through financial statements. The market efficiency hypothesis supports stock price changes as warning indicators. The study also emphasizes the importance of internal governance, such as shareholder affiliation and executive dual roles, which can raise the predictive precision of the model. The study selected 29 financial indicators, 4 market performance indicators, and 4 internal governance indicators, totaling 37 dimensions, to construct a comprehensive financial risk monitoring and management tool. In reality, the proportion of companies facing financial difficulties among all listed companies is relatively small, and the proportion between companies with financial crises and those with good financial conditions is extremely uneven. If this imbalanced data are directly used for analysis, the predicted results are likely to lean towards samples with good financial conditions. Faced with this imbalance problem, previous studies often increased the number of financial crisis samples through random oversampling, or reduced the number of financial health samples through random undersampling, in order to achieve data balance. However, oversampling may lead to overfitting of the model to positive samples, while undersampling may result in loss of key information. Considering the shortcomings of using oversampling or undersampling separately, a comprehensive sampling strategy was adopted in the study, which combines SMOTE oversampling and Tomek link undersampling methods to achieve data balance. The SMOTETomek comprehensive sampling process is shown in Figure 2.²⁰

Figure 2.

SMOTETomek comprehensive sampling flowchart. (a) SMOTE (b) Tomek link.

Figure 2(a) and (b) respectively show the process of SMOTE oversampling and Tomek link undersampling, which are combined to perform data balancing operations at a 1:1 target ratio. The SMOTE oversampling technique involves finding $d$ nearest neighboring samples for each positive sample x, then extracting u samples based on the magnification U, combining them with x, and generating new samples through random linear interpolation to balance the number of positive and negative samples. The interpolation formula is shown in equation (1).²¹

x_{i}^{'} = x + r a n d (0, 1) \cdot (y_{i} - x), i = 1, 2, . . ., m

(1)

In equation (1), $x_{i}^{'}$ means the newly generated sample, $y_{i}$ represents the $i$ th sample among the nearest neighboring samples, and $r a n d (0, 1)$ means a random number within the interval (0,1). The oversampling rate U is determined by the ratio of negative samples to positive samples, as shown in equation (2).²²

U = r o u n d (I R)

(2)

In equation (2),

I R

represents the ratio of negative samples to positive samples. SMOTE oversampling may cause newly generated positive and negative samples to overlap, making it difficult to classify boundary samples and increasing the risk of misjudgment. The Tomek link undersampling technique can solve this problem by removing overlapping samples. If there are no other samples between two different categories of samples

x_{i}

and

x_{j}

that satisfy

D (x_{i}, x_{j}) < D (x_{d}, x_{i})

D (x_{i}, x_{j}) < D (x_{d}, x_{j})

, then

x_{i}

and

x_{j}

may be boundary samples or noise. Removing these samples to reduce the interference of noise ultimately generates a new dataset with a clear boundary between financial crisis and financial health, and a positive and negative sample ratio of 1:1. Finally, the indicators are optimized through feature selection steps and the final indicator system is constructed. Feature filtering is the process of selecting the most effective variables from the raw data to reduce data complexity and raise the model’s generalization capability. The study utilized Pearson correlation coefficient to identify highly correlated variables and used RF algorithm to evaluate the impact of each variable on prediction performance, removing redundant variables (Table 2). Finally, a 22 dimensional indicator system was constructed, which includes 8 aspects including profitability, debt paying ability, growth ability, operational ability, risk level, internal governance, market performance, and industry classification, as represented in Table 1.

Table 2.

The final financial warning indicator system for enterprises.

First level indicator	Secondary indicators
Business capability	Accounts receivable turnover ratio
Business capability	Total asset turnover ratio
Profitability	Net profit margin on total assets (ROA)
	Return on equity
	Long term capital return rate
	Operating gross profit margin
	Operating net profit margin
Debt paying ability	Cash ratio
	Asset liability ratio
	Net cash flows from operating activities
	Long term debt to working capital ratio
Development capability	Capital accumulation rate
	Total asset growth rate
	Revenue growth rate
Risk level	Financial leverage
Risk level	Operating leverage
Internal governance	Shareholding ratio of the top ten shareholders
	Separation rate of two rights
	Statistics on the consistency of working places between independent directors and listed companies
Market performance	Price to book ratio
Market performance	Book to market ratio
Industry classification	Industry code

Establishment of enterprise financial early warning model grounded on EL and stacked generalization fusion algorithm

Due to the fact that enterprise financial early warning is essentially a binary classification problem, and the powerful functions of machine learning in computation and classification recognition, this study adopted five classic machine learning algorithms and a stacked generalization ensemble model to construct a financial crisis early warning model. EL algorithms combine multiple simple learners to form a powerful learner, with the aim of effectively reducing variance and bias while improving predictive performance. This includes RF, GBDT, and XGBoost, while a single machine algorithm includes DT and SVM.^23,24 RF consists of multiple decision trees, each of which can independently predict and increase diversity through random sampling and feature selection. RF enhances robustness through the Bagging integration mechanism of multiple decision trees, making it suitable for processing high-dimensional feature data and effectively suppressing overfitting. Although the bias of individual trees may slightly increase, the overall model performance is significantly improved. Unlike RF, GBDT optimizes model performance by gradually fitting residuals and excels at handling nonlinear relationships and feature interactions. Each decision tree is associated with the previous $N - 1$ trees, and the residual of the preceding tree is learned after calculating the samples, as shown in equation (3).²⁵

r_{n} = - [\frac{\partial f (b, f (a))}{\partial f (a)}] f (a) = f_{n - 1} (a)

(3)

In equation (3), $r_{n}$ means the residual of the n th iteration, a and b are the input sample and target value, respectively, and $\frac{\partial f (b, f (a))}{\partial f (a)}$ is the partial derivative. The n th decision tree model is built to adapt to the data and determine the output leaf node range of the n th tree. Within the range of these leaf nodes, a value that can minimize losses to the greatest extent possible is selected, as shown in equation (4).

{c_{n}}_{m} = {argmin}_{c} \sum_{a \in R_{n}} L (b, f_{n - 1} (a) + c)

(4)

In equation (4), c represents the correction term for the predicted value, then ${c_{n}}_{m}$ represents the optimal correction value for the $m$ th leaf node in the n th iteration, and L represents the loss function. After completing the iteration, the predicted results of all trees are finally accumulated and output, as shown in equation (5).

f (b) = \sum_{n - 1}^{N} \sum_{m - 1}^{M} c_{n m} I (a \in R_{n})

(5)

In equation (5), $I (a \in R_{n})$ represents the function for determining whether sample a belongs to a certain leaf node. However, compared to GBDT, XGBoost has higher computational efficiency. Based on GBDT, the introduction of regularization terms, missing value processing, and second-order derivative optimization strategies further improves training efficiency and generalization ability. The objective function (OF) is shown in equation (6).²⁶

L (ϕ) = \sum_{p = 1}^{P} l (\hat{q}, q) + \sum_{k = 1}^{K} Ω (f_{k})

(6)

In equation (6), $L (ϕ)$ is the OF, $l (\hat{q}, q)$ represents the prediction error of a single sample, $Ω (f)$ represents the regularization factor in XGBoost, K means the amount of base functions, and P means the amount of samples. The XGBoost algorithm introduces a regularization factor $Ω (f)$ in the OF to balance the reduction of model loss and complexity. The expression of $Ω (f)$ is shown in equation (7).

Ω (f) = γ M + \frac{1}{2} λ \sum_{m = 1}^{M} w_{m}^{2}

(7)

In equation (7), $γ$ is the penalty coefficient for the number of leaf nodes, $λ$ means the regularization strength coefficient, $w_{m}$ means the weight of the $m$ th leaf node in the model, and M represents the number of leaf nodes in the current tree. In addition, a second-order Taylor approximation was applied to the OF, which helps to search the best remedy more easily, as shown in equation (8).

O^{t} \approx \sum_{p}^{P} l ({(\hat{q}, q)}^{t - 1} + g_{p} f_{t} (a_{t}) + \frac{1}{2} h_{p} f_{t}^{2} (a_{t}) + Ω (f_{t}) + z

(8)

In equation (8), $O^{t}$ represents the OF of the $t$ th iteration, $g_{p}$ represents the first derivative of the loss function with respect to the previous prediction value, $h_{p}$ represents the second derivative of the loss function with respect to the previous prediction value, and z is a constant term. The final optimal solution is shown in equation (9).

O = - \frac{1}{2} \sum_{m = 1}^{M} \frac{G_{m}^{2}}{H_{m} + λ} + γ M

(9)

In equation (9), $O$ is the value of the OF and $G_{m}$ and $H_{m}$ represent the sum of the first-order and second-order degrees of the loss function on the predicted value at leaf node $m$ . XGBoost adopts a pre-sorting strategy, which sorts the features involved in the nodes before the iteration process begins, in order to determine the best segmentation point during traversal. When dealing with large-scale datasets, this algorithm may become very time-consuming and occupy a lot of memory, resulting in reduced operational efficiency. Before model training, a data preprocessing process was designed to ensure the integrity, comparability, and stability of input data and model learning, as shown in Figure 3.

Figure 3.

Data preprocessing process.

In Figure 3, for the original financial data, mean imputation and industry median imputation were used to handle missing values and eliminate abnormal data. In the feature selection stage, variables are initially screened through information gain, and then the top 30% of core features are retained in combination with RF importance ranking as the feature set for subsequent training. Considering the significant uneven distribution of sample categories in enterprise financial early warning issues, this study further introduces SMOTETomek sampling technology, which effectively balances the positive and negative sample ratios through a combination of oversampling and undersampling, enhancing the model’s ability to identify minority financial crisis samples. Stacking generalization is a very effective strategy in EL algorithms, especially when multiple different machine learning models have their own advantages and disadvantages. Stacking generalization can fully utilize the strengths of different models, integrate the predictive abilities of each model, and thus improve overall performance. Therefore, the study chose ensemble generalization as the model fusion strategy, and constructed a more high-performance enterprise financial warning model by integrating RF, GBDT, XGBoost, as well as DT and SVM learning algorithms. The structure of the stacked generalization fusion algorithm is shown in Figure 4.

Figure 4.

Structure of stacked generalization fusion algorithm.

In Figure 4, the first layer base learner of the stacked generalization fusion algorithm needs to have both strong classification performance and differentiation. Therefore, RF, GBDT, XGBoost, and DT were selected as the basic models for the first layer in the study. Among them, RF has good generalization ability by introducing random subspaces and bagging mechanism. GBDT and XGBoost respectively use traditional and improved Boosting strategies to process complex data, while XGBoost also improves efficiency through regularization. The DT model has a simple structure and is suitable for capturing patterns dominated by a few features. These tree-based models can effectively handle nonlinear relationships and high-dimensional data, and their training strategies complement each other. The second layer uses SVM as the meta learner. SVM excels in high-dimensional classification tasks and boasts a relatively straightforward architectural design. It can learn the implicit relationships output by the base model and reduce the risk of overfitting. Compared with complex networks, SVM is more conducive to improving generalization and stability. The workflow is as follows: Firstly, the first layer model segments the data through five-fold cross validation, leaving one fold as the test set and the rest as the training set. Then, the prediction results from five cross validations are accumulated to form the training data for the second layer model. Meanwhile, five predictions are made on the original test set, and the average is taken as the test data for the second layer model. In this way, the output of the first layer model becomes the input of the second layer model, and ultimately the prediction results are output by the second layer model. In addition, building an efficient prediction model not only requires filtering out useful features, but also determining the optimal parameter settings. The study used grid search and five-fold cross validation methods to optimize model parameters. Firstly, using grid search technology to systematically arrange and combine all possible parameter values, a “grid” containing all candidate combinations is constructed. Subsequently, five-fold cross validation is performed for each parameter combination to calculate five sets of generalization performance indicators and take their average. Finally, the parameter combination with the highest average score is selected as the optimal parameter configuration for the model. The five-fold cross validation process is in Figure 5.²⁷

Figure 5.

Five-fold cross inspection process diagram.

In Figure 5, each iteration produces a test result, so there are a total of five test results labeled as Test result 1, 2, 3, 4, and 5. By averaging the results of these five tests, the final return result is obtained to reduce model bias caused by different data partitioning, thereby obtaining a more robust model performance evaluation result. In stacked generalization techniques, each model in the first layer needs to have high performance and similar performance, otherwise it may influence the overall capability of the model. Therefore, among numerous models, DT, RF, GBDT, and XGBoost are studied as the first layer models. To prevent overfitting, a relatively simple SVM classifier is used as the second layer model, as shown in Figure 6.

Figure 6.

Stacking generalization flowchart.

Figure 6 further illustrates the input–output data flow relationship between various models during the training and prediction stages of the model, and more specifically reflects how the output of the base learner in Stacking serves as the input for the meta learner. Each model in the first layer generates new feature training and testing sets through five rounds of cross validation, and these new datasets are then input into the SVM learner in the second layer to obtain the final enterprise financial prediction results.

Results

Research is conducted by setting experimental parameters, setting up an experimental environment, and selecting Compustat as the dataset for algorithm performance testing. Comparative experiments were conducted between the EL and stacked generalization fusion algorithm proposed in the study, the Z-score model, and the Differential Evolution SVM (DE-SVM). Finally, the three models were applied to run in the financial management system of a certain financial enterprise, and real-time data analysis was conducted to compare and confirm the validity of the proposed models in enterprise financial early warning.

Performance testing of EL and stacked generalization fusion algorithm

To confirm the capability of the proposed algorithm in data classification balance, Compustat was selected as the dataset for algorithm performance testing. The Compustat dataset is a commonly applied source of corporate financial and market data, containing financial information, market performance, and industry data of listed companies from multiple countries and regions around the world. It is an important tool for conducting corporate financial analysis, financial research, and economic studies. To ensure the timeliness of enterprise financial crisis warning analysis, the study screened and controlled the data time range in the experimental design. The study selected annual financial data between 2018 and 2022 to avoid interference from early samples during model training and ensure that the samples reflect the real operational status and risk structure of enterprises in recent years. The configuration details of the experimental setup are provided in Table 3.

Table 3.

The configuration details of the experimental setup.

Component	Configuration details
OS	Ubuntu 22.04 LTS (64-bit)
Central processing unit	Intel Xeon Gold 6400R (32 cores, 2.9 GHz)
GPU	NVIDIA T-V100 (64 GB)
RAM	512 GB DDR4
Storage	2 TB NVMe SSD
Programming language	Python 3.10

On the basis of the experimental environment in Table 3, the Compustat dataset was split into training and testing subsets, with a ratio of 70% for training and 30% for testing, and with an iteration number of 500. The fusion algorithm based on EL and stacked generalization proposed by the research was compared and tested with the Z-score model⁹ and DE-SVM.²⁸ The accuracy results are shown in Figure 7.

Figure 7.

Changes in accuracy of different algorithms. (a) Training set (b) Test set.

Figure 7(a) and (b) show the accuracy changes of different models on the training and testing sets as the number of iterations increases. In Figure 7(a), the algorithm proposed by the research showed the best accuracy and the fastest convergence speed during the training process. By the 50th iteration, the accuracy already exceeded 0.8, ultimately reached 0.93. DE-SVM improved rapidly in the early stage, but its accuracy was slightly lower than the model proposed by the author in the later stage, resulting in poor performance. The performance of Z-score was significantly lower than the other two models, indicating that using Z-score standardization alone is not sufficient to significantly improve the capability of the model. In Figure 7(b), the suggested algorithm also achieved optimal results on the test set with an accuracy of nearly 0.95. DE-SVM performed second with an accuracy of around 0.9, which was relatively stable. Z-score performed the worst, with an accuracy of about 0.75 and a slow convergence speed. The algorithm proposed by the research performed the best on both the training and testing sets, with significantly higher accuracy than other models, indicating that the model has fast convergence and good generalization performance. The changes in true and false positive rates of different algorithms at different thresholds are shown in Figure 8.

Figure 8.

ROC curves and AUC results for different models. (a) Training set (b) Test set.

Figure 8(a) and (b) show the comparison of Receiver Operating Characteristic (ROC) curves and Area Under Curve (AUC) values of different models on the training and testing sets, respectively, to evaluate the classification capability of different models on the training and testing sets. In Figure 8(a), the model proposed by the research had the highest AUC value on the training set, which was 0.9258, closest to 1. The ROC curve was located at the top and covered the largest area, demonstrating extremely high classification performance. The AUC value of DE-SVM was 0.6906, which was inferior to the model proposed by the author, but significantly better than the Z-score standardized model. The AUC value of Z-score was 0.5436, close to 0.5, indicating that the model was almost unable to distinguish between positive and negative samples, and the classification performance was poor. In Figure 8(b), the model proposed by the research still achieved optimal results on the test set, with an AUC value of 0.9288, demonstrating excellent generalization performance. The AUC value of DE-SVM on the test set improved to 0.7983, but it was still lower than the model proposed by the author. The AUC value of Z-score on the test set was 0.5846, which was slightly higher than the training set, but still performed poorly. The improved model proposed by the research performed well on both the training and testing sets, with high classification accuracy and generalization ability. Comparing and analyzing the error differences of three algorithm models using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination

R^{2}

. The output layer of EL models often uses continuous probability values, so indicators such as MSE can be used to evaluate the model’s fit to risk probability, reflecting its sensitivity and stability to risk trends. In addition, regression error indicators can assist in evaluating the overall fitting bias of the model, especially suitable for scenarios with soft labels, fuzzy boundaries, or threshold partitioning. The data are in Table 4.

Table 4.

Comparative experimental results.

Algorithm	Data set	MSE	RMSE	MAE	$R^{2}$
Z-score	TRN	0.056	0.236	0.181	0.47
Z-score	TST	0.054	0.232	0.175	0.45
DE-SVM	TRN	0.056	0.236	0.178	0.54
DE-SVM	TST	0.048	0.219	0.168	0.55
Ours	TRN	0.024	0.155	0.121	0.78
Ours	TST	0.015	0.122	0.118	0.77

In Table 4, the MSE, RMSE, and MAE of the suggested algorithm on both the training and testing sets were the smallest, indicating the minimum prediction error of the model. Among them, the MSE, RMSE, and MAE on the training set were 0.024, 0.155, and 0.121, demonstrating that the model has a good fitting effect. $R^{2}$ value of 0.78 indicated that the model had good goodness of fit on both the training and testing sets. In the same test set, the algorithm was found to be optimal in all four metrics. DE-SVM exhibited moderate error on both the training and testing sets, with significant improvement compared to Z-score, demonstrating the enhancement effect of differential evolution optimization on SVM models. The $R^{2}$ values of DE-SVM on the training and testing sets were 0.54 and 0.55, demonstrating that the model’s explanatory power for the data was relatively average. Z-score performed the worst on all indicators, with the largest error. The $R^{2}$ values on the training and testing sets were 0.47 and 0.45, indicating that the model has weak fitting and explanatory power on the data. From this, the algorithm proposed by the research performed the best in all indicators, with the smallest error and the highest $R^{2}$ value, demonstrating that the model had good generalization capability and no overfitting problems occurred. The precision recall (PR) and F1 value outcomes of the three methods are in Figure 9.

Figure 9.

Accuracy results of different models. (a) PR curves (b) F1 value results.

Figure 9(a) and (b) show the ability comparison of different models in terms of PR curve and F1 value. In Figure 9(a), the PR curve of the proposed algorithm was clearly located at the top, and its overall performance was better than other models. At high recall rates, the accuracy remained high, demonstrating strong overall performance. The PR curve of DE-SVM was centered, and the balance between accuracy and recall was moderate. At high recall rates, the accuracy decreased significantly, indicating that the model’s predictions for a large number of samples were not accurate enough. The PR curve of Z-score was located at the bottom, with the lowest accuracy. At low recall rates, the accuracy slightly improved, but the overall performance was poor, indicating that the model’s ability to identify positive samples was limited. The algorithm proposed by the research performed the best in the PR curve, indicating that it could maintain high accuracy at different recall rates and was suitable for use in scenarios with high precision and high recall requirements. In Figure 9(b), the F1 value of the proposed algorithm consistently remained above 90%, showing the most stable and excellent performance. As the count of repetitions grew, the fluctuation of F1 value decreased, demonstrating strong stability and robustness. The F1 value of DE-SVM fluctuated between 80% and 85%, with some degree of volatility, but the overall trend was relatively stable. The F1 value of Z-score ranged from 70% to 75%, which was significantly lower than the other two models. At the same time, the F1 curve fluctuated greatly, especially in the early and middle stages of iteration, indicating poor stability of the model. The method proposed by the research performed the best in both PR curve and F1 value, indicating significant advantages in accuracy, recall, and stability.

Analysis of the effect of enterprise financial early warning grounded on EL and stacked generalization fusion algorithm

To confirm the utility in real-world deployment of the enterprise financial early warning model grounded on EL and stacked generalization fusion algorithm proposed in the research, the model was integrated into the financial management system of a certain financial enterprise to run, analyze the latest financial data in real time, and generate early warning reports. Meanwhile, a comparative analysis was conducted with the performance of enterprise financial warning models grounded on Z-score and DE-SVM. Positive samples were classified into loss, cash flow shortage, and high debt, while negative samples were classified into profit, cash flow abundance, and low debt. The confusion matrix obtained is shown in Figure 10.

Figure 10.

Classification confusion matrix. (a) Z-score (b) DE-SVM (c) Ours.

In Figure 10(a)–(c) represent the confusion matrix results of Z-score, DE-SVM, and the proposed model, respectively. Each confusion matrix reflected the sorting ability of the model in predicting the financial health status of enterprises. In Figure 10(a), the accuracy of predicting corporate losses was 0.87, indicating high recognition ability. The prediction accuracy of high debt for enterprises was 0.91, which was the best performing category of this model. The forecast accuracy of cash flow abundant enterprises and low debt enterprises were 0.93 and 0.94, respectively, and their performance was also relatively ideal. However, the ability to identify profitable enterprises and cash flow deficient enterprises was weak, and the misjudgment rate was high. There was confusion between high debt and loss making enterprises, which might lead to underreporting of risky enterprises. In Figure 10(b), the prediction accuracy of low debt enterprises was the highest, reaching 0.96, indicating strong classification ability. The prediction accuracy for high debt enterprises and cash flow abundant enterprises was 0.92 and 0.94, respectively, which also showed good performance. The prediction accuracy of loss making enterprises and cash flow deficient enterprises was 0.88 and 0.89, slightly better than the Z-score model. Similarly, the prediction accuracy of this model for profitable enterprises was still relatively low, only 0.85. In Figure 10(c), the model proposed by the research showed high accuracy in all categories, especially in low debt enterprises, with a prediction accuracy of 0.98. The prediction accuracy for high debt enterprises, cash flow abundant enterprises, and profitable enterprises was 0.94, 0.96, and 0.88, respectively, all of which were better than the other two models. Meanwhile, the prediction accuracy for loss making enterprises and cash flow deficient enterprises also reached 0.89 and 0.90, respectively, demonstrating good robustness. The model proposed by the research significantly reduced false positives and omissions, and performed the best in identifying cash flow abundant and low debt enterprises. Overall, the EL model proposed by the research performed the best in enterprise financial warning, with significantly better accuracy and robustness than the other two models, providing more reliable and accurate financial risk warning support for enterprises. The performance of three models in enterprise financial early warning was mainly compared through two indicators: false positive rate and false negative rate. The outcomes are shown in Table 5.

Table 5.

Comparison of false positive rate and false negative rate results.

Type	False alarm rate (%)			Leakage rate (%)
Type	Z-score	DE-SVM	Ours	Z-score	DE-SVM	Ours
Loss	12.5	8.9	5.2	15.3	10.1	4.8
Cash flow shortage	13.2	9.5	5.7	14.8	9.9	5.1
High debt	10.9	7.6	4.9	11.3	8.5	3.7
Profit	11.7	8.3	4.5	13.9	9.7	4.3
Abundant cash flow	12.1	7.8	4.1	13.5	8.9	3.9
Low debt	10.5	6.9	3.8	12.7	8.2	3.2

In Table 5, the Z-score model had a high false alarm rate, with 13.2%, 12.5%, and 12.1% in the categories of cash flow shortage, loss making enterprises, and cash flow abundance, respectively, indicating that the model had a high misjudgment rate in identifying normal enterprises. The false positive rate of the DE-SVM model was relatively low compared to the Z-score, but there were still some categories with high false positive rates, such as cash flow shortages and loss making enterprises, which were 9.5% and 8.9%, respectively. The model proposed by the research performed the best, achieving the lowest false positive rate in all categories. In low debt, its false alarm rate was only 3.8%, and in cash flow ample, it was 4.1%, indicating that the model had stronger accuracy in distinguishing between normal and risky enterprises. In terms of omission rate, the Z-score model had the highest omission rate, reaching 15.3% in the category of loss making enterprises and 14.8% in the category of cash flow shortage, indicating that the Z-score model had significant shortcomings in detecting truly risky enterprises. Compared to Z-score, the DE-SVM model showed a decrease in false negatives across all categories, such as 8.5% in the high debt category. The model proposed by the research also performed the best in terms of false positive rate. The underreporting rate in the high debt category was only 3.7%, while the underreporting rate in the low debt category was 3.2%. The results indicated that the model had high sensitivity in identifying financial risks and could effectively reduce underreporting. The model proposed by the research exhibited the lowest false positive and false negative rates in various categories, demonstrating significant advantages. Based on the indicator system constructed by the research, predictable indicators were selected and the results for prediction were compared, as shown in Table 6.

Table 6.

Analysis of the application effect of evaluation indicators.

First level indicator	Secondary indicators	Z-score		DE-SVM		Ours
Business capability	Accounts receivable turnover ratio	ACC	AUC	ACC	AUC	ACC	AUC
Business capability	Total asset turnover ratio	80%	0.70	85%	0.78	90%	0.85
Profitability	ROA	78%	0.68	83%	0.76	88%	0.83
Profitability	Return on equity	82%	0.72	87%	0.81	92%	0.88
Debt paying ability	Cash ratio	79%	0.69	85%	0.77	89%	0.84
	Debt to asset ratio	76%	0.66	82%	0.74	87%	0.82
	Net cash flows from operating activities	80%	0.70	84%	0.78	91%	0.86
Development capability	Total asset growth rate	78%	0.68	83%	0.76	90%	0.85
Development capability	Revenue growth rate	79%	0.69	85%	0.77	92%	0.88
Risk level	Financial leverage	77%	0.67	82%	0.75	88%	0.83
Risk level	Operating leverage	75%	0.65	80%	0.73	86%	0.81
Internal governance	Shareholding ratio of top ten shareholders	76%	0.66	81%	0.74	88%	0.83
Market performance	Price-to-book ratio	78%	0.68	83%	0.76	89%	0.84
Industry classification	Industry code	79%	0.69	84%	0.77	90%	0.85

In Table 6, the Z-score model performed relatively poorly on all secondary indicators, with AUC values ranging from 0.65 to 0.72, reflecting its weak ability to distinguish complex and multidimensional financial data. The accuracy was generally below 80%, and it performed poorly in the evaluation indicators of total asset turnover and cash ratio. The DE-SVM model showed significant improvement compared to the Z-score model, achieving an accuracy of over 80% in most indicators, and the AUC value also increased, stabilizing within the range of 0.73–0.80. However, in terms of financial leverage and total asset growth rate indicators, the accuracy did not yet reach the optimal effect. The model proposed by the research performed the best on all indicators, with accuracy mostly exceeding 88% and AUC values between 0.81–0.88, demonstrating high sensitivity and accuracy to different financial health states. Especially in terms of profitability and development ability, the performance of this model was significantly better than other models, indicating its robustness and predictive accuracy under multidimensional indicators. From this, the model proposed by the research performed better than the other two models in terms of accuracy and AUC value, demonstrating advantages in multidimensional financial indicator prediction. This indicated that the model could better identify different types of financial health status in enterprise financial warning systems and provide more accurate decision support for management. To ensure fairness in comparison between models and reproducibility of experimental results, all models were independently run on the same testing platform, ensuring exclusive use of CPU and GPU resources and no interference from concurrent computing tasks. Finally, the resource consumption and running time cost of the three models during operation were compared, as shown in Figure 11.

Figure 11.

Resource consumption and time cost analysis. (a) Resource consumption (b) Time cost.

Figure 11(a) and (b) respectively show the comparative analysis of three models in terms of resource consumption and time cost. In Figure 11(a), the DE-SVM model had the highest energy consumption, reaching a usage rate of over 80%, followed by the Z-score model. The model proposed by the research had the lowest energy consumption, around 60%, indicating that the model was more energy-efficient during execution and suitable for deployment in resource constrained environments. The CPU usage rate of the DE-SVM model was the highest, close to 85%, while the Z-score model was about 75%. The CPU usage rate of the suggested model was the lowest, close to 65%. This indicated that the DE-SVM model relied heavily on CPU resources and might generate high loads in environments with multi-threaded processing or concurrent requests, while the proposed model was more CPU resource efficient. Similarly, the DE-SVM model consumed a significant amount of GPU resources, reaching about 80%, while the Z-score model consumed about 70%. The GPU utilization rate of the proposed model was the lowest, at around 60%. In Figure 11(b), the training time of the Z-score model was 8.13 seconds, while the DE-SVM model was slightly lower than the Z-score at 7.38 seconds. The training time of the proposed model was the shortest, only 6.21 seconds. In terms of prediction time, both the Z-score model and the DE-SVM model required 8.23 seconds and 8.04 seconds respectively, while the proposed model had the shortest prediction time of only 5.86 seconds, indicating that the model had a faster response speed in practical applications.

To ensure the objectivity and stability of model performance evaluation as much as possible, the study specifically considered the interference factors of macroeconomic and external shocks in the data processing stage, such as global economic fluctuations, trade policy adjustments, and systemic risks caused by the Covid-19 pandemic. The study selected corporate financial report data from 2018 to 2022, avoiding abnormal years such as financial crises and the early stages of the pandemic Introducing lag mechanism in sample label construction to improve the stability of response to external economic changes. Further comparison was made between financial risk warning models based on Light Gradient Boosting Machine (LightGBM) ²⁹ and Dempster Shafer’s theory and Ensemble Classifier (DS-EC).³⁰ The results are shown in Table 7.

Table 7.

Model performance comparison.

Model	ACC	AUC	F1 value	False alarm rate (%)	Leakage rate (%)
LightGBM	0.908	0.881	0.891	5.60%	5.10%
DS-EC	0.891	0.862	0.873	6.80%	6.30%
Ours	0.932	0.928	0.912	3.80%	3.20%

From Table 7, the proposed model outperformed the comparative models LightGBM and DS-EC in all key indicators, demonstrating stronger financial risk warning capabilities. Specifically, the accuracy of the model proposed by the research was 0.932, the AUC value reached 0.928, and the F1 value was 0.912, which was at the optimal level in multiple dimensions of comprehensive evaluation of the model’s discriminative ability. Meanwhile, the false positive rate and false negative rate of this model were 3.8% and 3.2%, respectively, which were much lower than other models, indicating that it had stronger reliability in reducing false positives and false negatives. In contrast, although LightGBM performed well in efficiency and generalization ability, it fell slightly short in controlling errors. The DS-EC model improved decision stability by integrating evidence reasoning mechanisms, but its ability to handle complex classification boundaries was still inferior to the structure of deep fusion models. Overall, the model proposed by the research balanced prediction accuracy and error control capability while maintaining model robustness, and had significant advantages in enterprise financial risk warning tasks.

Discussion and conclusion

Aiming at the problem of traditional models being unable to cope with complex financial data and outliers in the field of enterprise financial early warning, the research introduced an enterprise financial early warning model based on EL and stacked generalization fusion algorithm. By integrating multiple machine learning algorithms such as RF, GBDT, XGBoost, DT, and SVM, the proposed model demonstrated significant advantages in classification accuracy and robustness. The experiment outcomes showed that the average accuracy of the model proposed by the research reached over 90%, and the AUC value was close to 0.93, consistently outperforming the benchmark model in multiple indicators. In terms of resource consumption, the proposed model exhibited significant advantages in energy consumption and CPU/GPU utilization. Specifically, the average energy consumption of the model proposed by the research was about 60%, the CPU usage rate was about 65%, and the GPU usage rate was about 60%, all of which were lower than the corresponding consumption of the other two models. This indicated that the model proposed by the research was not only more accurate than traditional models, but also more energy-efficient in terms of computational resource consumption, making it suitable for running in environments with high real-time requirements or limited resources. In addition, the model proposed by the research performed well in terms of false positive and false negative rates, maintaining low false positive and false negative rates in various financial risk categories. For example, in the “high debt” category, the false positive rate of this model was only 4.9%, and the false negative rate was 3.7%, significantly better than the other two comparative models. This further proved the reliability and stability of the proposed model in accurately predicting financial risks. In summary, the enterprise financial warning model based on EL performed well in accuracy, resource efficiency, and predictive stability, providing strong support for financial risk management of enterprises. However, there are also some limitations. The model did not fully consider external factors such as macroeconomic environment and industry changes, which may have significant effects on the financial situation of enterprises. Future research can introduce more external data, such as macroeconomic indicators and industry characteristics, to improve the predictive capability and adaptability of the model. Besides, the complexity and training cost of the model are high, and further optimization may be needed in practical applications to reduce the demand for computing resources and deployment difficulty.

Footnotes

ORCID iD

Baicheng Chen

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Alekserova

Fedoryshynа

. Financial management activities of enterprises in the modern economic conditions. Balt J Eco Stud1 2024; 10(3): 58–68.

. Research on financial risk early warning system model based on second-order blockchain differential equation. Intell Decis Technol 2023; 18(1): 327–342.

Wang

Yang

, et al. Stacking ensemble learning-based load identification considering feature fusion by cyber-physical approach. IEEE Sens J 2023; 23(6): 5997–6007.

Wang

Liu

, et al. A novel ensemble model of multi-class credit assessment based on multi-source fusion theory. J Intell Fuzzy Syst 2024; 46(1): 419–431.

Zhang

Xie

Yan

. Analysis of the application of Z3 model in the early warning of corporate financial risk. Rev Adhe Adhesiv 2023; 11(3): 612–630.

Garai

Paul

Kumar

, et al. Intra-annual national statistical accounts based on machine learning algorithm. J Data Sci Intell Sys 2023; 2(2): 12–15.

. Research on enterprise financial economics early warning based on machine learning method. J Comput Methods Sci Eng 2022; 22(2): 529–539.

Jiao

. Financial management early warning model of enterprise circular economy based on chaotic particle swarm optimization algorithm. J Ind Prod Eng 2024; 41(3): 217–228.

. Identification of enterprise financial risk based on logistics model. Int J Bus Intell Data Min 2024; 25(1): 106–127.

10.

Chen

. An effective financial crisis early warning model based on an IFOA-BP neural network. J Internet Technol 2024; 25(3): 435–446. DOI: 10.53106/160792642024052503009.

11.

Zeng

. Influences of mobile edge computing-based service preloading on the early-warning of financial risks. J Supercomput 2022; 78(9): 11621–11639.

12.

Rahman

Zhu

. Predicting accounting fraud using imbalanced ensemble learning classifiers–evidence from China. Acc Finan 2023; 63(3): 3455–3486.

13.

Price

Lizier

. Professional learning of academics enacting work-integrated learning. Prof Dev Educ 2024; 50(3): 474–486.

14.

Gao

Zeng

, et al. An improved hybrid structure learning strategy for Bayesian networks based on ensemble learning. Intell Data Anal 2023; 27(4): 1103–1120.

15.

Anisha

Jijib

Ajith Bosco Raj

. Deep feature fusion and optimized feature selection based ensemble classification of liver lesions. Imag Sci J 2023; 71(6): 518–536.

16.

Zuo

Zhao

Chen

, et al. Multi-focus image fusion algorithm based on random features embedding and ensemble learning. Opt Express 2022; 30(5): 8234–8247.

17.

Fei

Hassan

Xiao

, et al. UAV-based multi-sensor data fusion and machine learning algorithm for yield prediction in wheat. Precis Agric 2023; 24(1): 187–212.

18.

Hou

Liu

Wang

, et al. Load forecasting combining phase space reconstruction and stacking ensemble learning. IEEE Trans Ind Appl 2022; 59(2): 2296–2304.

19.

Allaj

Sanfelici

. Early Warning Systems for identifying financial instability. Int J Forecast 2023; 39(4): 1777–1803.

20.

Rhmann

Ishrat

. Imbalanced data preprocessing model for web service classification. Int J Syst Assur Eng Manag 2024; 5(10): 4825–4837.

21.

Ren

Tan

, et al. Fault diagnosis of HVAC system with imbalanced data using multi-scale convolution composite neural network. Building Simulation 2024; 17(3): 371–386. DOI: 10.1007/s12273-023-1086-1.

22.

Tariq

Sargano

Iftikhar

, et al. Comparing different oversampling methods in predicting multi-class educational datasets using machine learning techniques. Cybern Inf Technol 2023; 23(4): 199–212.

23.

Rhodes

Cutler

Moon

. Geometry-and accuracy-preserving random forest proximities. IEEE Trans Pattern Anal Mach Intell 2023; 45(9): 10947–10959.

24.

Jain

Rastogi

. Parametric non-parallel support vector machines for pattern classification. Mach Learn 2024; 113(4): 1567–1594.

25.

Dong

Guo

Wang

. GBDT-based multivariate structural stress data analysis for predicting the sinking speed of an open caisson foundation. Georisk 2024; 18(2): 333–345.

26.

Kosaka

Wandale

Ichige

. RSSI-based indoor localization using two-step XGBoost. IEICE Commun Express 2023; 12(12): 647–650.

27.

Mekbib

Cai

, et al. Reproducibility and sensitivity of resting‐state fMRI in patients with Parkinson’s disease using cross validation‐based data censoring. J Magn Reson Imag 2024; 59(5): 1630–1642.

28.

Liu

Zhou

Sun

. The influence mechanism of real estate enterprises’ status on debt default risk. J Property Invest Finance 2024; 42(1): 28–49.

29.

Gao

Balyan

. Construction of a financial default risk prediction model based on the LightGBM algorithm. J Intell Syst 2022; 31(1): 767–779.

30.

Liu

. Research of Dempster-Shafer’s theory and ensemble classifier financial risk early warning model based on Benford’s law. Comput Econ 2024; 65(6): 1–29.