Financial distress prediction using SVM ensemble based on earnings manipulation and fuzzy integral

Abstract

Financial distress prediction (FDP) has received considerable attention from both practitioners and researchers. This paper proposes a novel support vector machine (SVM) classifier ensemble framework based on earnings manipulation and fuzzy integral for FDP named SEMFI. We use financial data from the previous three years to predict companies’ current financial situation and divide the companies in each year into different categories according to whether they manipulate the earnings. Then, SVM is trained on different categories. The outputs of SVM are combined by fuzzy integral which adopts a new fuzzy measure determination method. This method considers the fact that recent financial data are more valuable for FDP. Additionally, when using the model trained by historical data for FDP, the external environment may have dramatically changed. Therefore, a fuzzy measure dynamic adjustment method is proposed by considering the confidence of each single classifier’s output, the consistency between each single classifier’s output and the diversity among classifiers. To verify the performance of SEMFI, an empirical study using real financial data is conducted. The results indicate that the introduction of earnings manipulation, the new fuzzy measure determination and dynamic adjustment method to FDP can significantly enhance the prediction performance.

Keywords

Financial distress prediction support vector machine classifiers ensemble earnings manipulation fuzzy integral

1. Introduction

The increasing volatility of markets signifies the importance of financial distress prediction (FDP) for companies. Financial distress can cause serious adverse effects on the development of companies. Financial distress leads to significant losses for investors and negatively affects the entire society. Therefore, FDP receives considerable attention from both companies and researchers.

Many methods have been developed to study FDP. Statistical techniques, including single ratio analysis [45], multiple discriminate analysis [11, 15] and logistic regression [4], were first used for FDP. Later, with the rapid development of artificial intelligence, neural networks [30], decision trees [17], support vector machines (SVM) [29] and case-based reasoning [20] were applied to FDP. SVM is one of the most useful tools to predict financial distress. The main concept of an SVM is to map the input vectors into a high-dimensional space using a kernel function and then find an optimal separating hyperplane to classify the samples [5]. As a relatively new machine learning method, the SVM is widely used in the field of FDP in recent years because of its unique advantages in solving the problems of small samples, nonlinear and high-dimension [16]. Sun and Hui [21] applied the SVM to FDP with Chinese listed companies. Chen and Hsiao [29] proposed a SVM method based on a genetic algorithm for their FDP model. Bae [23] developed a FDP model for the South Korean manufacturing industry based on a radical basis function (RBF) SVM. Li et al. [14] constructed a SVM model for FDP by using statistical indices of ranking-order information. All of these studies show that the SVM has satisfactory prediction performance in FDP.

Early studies mainly used single classifiers for distress prediction in the field of FDP. In recent years, a number of studies have examined the classifier ensemble approaches. Because different classifiers may have different prediction performances, classifier ensembles can utilize various single classifiers and perform better than single classification techniques [39]. A classifier ensemble first designs a number of single classifiers and then combines the outputs of different classifiers to obtain the final result. Sun and Li [22] used a weighted majority voting combination of multiple classifiers for FDP. Kim and Kang [35] proposed two neural network ensemble models by using bagging and boosting for bankruptcy prediction. Both methods are based on the assumption that there is no interaction between single classifiers. However, this assumption is invalid in many real problems [36]. The prediction accuracy of classifier ensembles may decrease if the interaction between classifiers is neglected [8]. To overcome this weakness, classifier ensembles based on fuzzy integrals have received considerable attention. The interaction between classifiers can be measured accurately by a fuzzy measure. Kim and Cho [26] presented a novel data fusion approach for web content mining using the fuzzy integration of structure adaptive SOMs. Verikas and Lipnickas [2] proposed an ensemble method based on a fuzzy integral and neural network for FDP. Abdallah et al. [3] applied a method to the problem of landmine detection by applying a fuzzy integral to different regions of the feature space. All of these studies show that a fuzzy integral outperforms other ensemble methods that do not account for the interactions between single classifiers.

A key issue in the application of a fuzzy integral is the determination of the fuzzy measure. The overall recognition accuracy of each single classifier on the training dataset is typically regarded as the fuzzy measure. Li et al. [44] proposed an ensemble model by using heterogeneous features and a fuzzy integral in which the prediction accuracy of the classifier over the training dataset was used to define the fuzzy measure. Xiong et al. [13] proposed an effective method for fault diagnosis based on the RBF neural network and a fuzzy integral in which prediction accuracy was used to define the fuzzy measure. Another commonly used method is confusion matrices. In this case, the prediction accuracy of each single classifier with respect to each class is used to determine the fuzzy measure. Pizzi and Pedrycz [37] used confusion matrices to compute the fuzzy measure and combined the classifiers with a fuzzy integral. Cao [48] applied confusion matrices to compute a prior fuzzy measure and proposed a Choquet integral-based classifier ensemble method for FDP. However, these two methods share a common problem, namely, that the computation of the fuzzy measure is time consuming. For example, if the model has n classifiers, $2^{n}$ parameters are required to determine the fuzzy measure. Yeung et al. [10] showed that a linear programming problem by minimizing the quadratic error between the desired output and actual output of a classifier was an elegant solution. However, this method was sensitive to the values chosen for the desired outputs [1]. In addition, the financial data of companies have an important feature, namely, that the recent data are more useful for evaluating the current financial situation. The above three methods do not consider the effects of this feature. In this study, a new approach that attempts to minimize the misclassification of an ensemble by considering the features of financial data and removing the need to set desired outputs is proposed to determine the fuzzy measure.

In fuzzy integral ensemble models, the fuzzy measure of each single classifier is typically based on its performance in the training stage. In the prediction stage, these initial fuzzy measures remain unchanged. However, the classifier that performs well in the training stage may perform poorly in the prediction stage. If the initial fuzzy measures are not adjusted according to the prediction condition, the accuracy of the ensemble model may deteriorate [7]. In many previous studies, the confidence of each single classifier’s output and the consistency between each of the two classifiers’ outputs were considered to adjust the initial fuzzy measures in the classifier ensembles. Cao [48] proposed a dynamically adjusted fuzzy integral to combine different classifiers for FDP and improved the performance of the recognition model. The performance of the classifier ensemble system depends on not only the performances of single classifiers but also the diversity among classifiers [8]. In this study, diversity is considered to adjust the proposed fuzzy measure.

Previous studies ignore an important factor for FDP: earnings manipulation. Earnings manipulation is the act of managers that alter their financial statements intentionally to maximize their own interests [38]. There are many motivations for managers to manipulate the earnings, such as the capital market motivation and meeting a regulatory threshold [24]. Previous studies have shown that earnings manipulation is a common behavior for companies [43]. Because earnings manipulation is the act of altering the financial statement, certain financial data in the financial statement may be contaminated [12]. Therefore, the financial statement of a company that manipulates its earnings may differ from that of a company that does not. When using classifier ensembles models, it is inappropriate to directly combine the outputs of single classifiers without considering the different characteristics of the companies’ financial data. However, to the best of our knowledge, earnings manipulation has not been considered in classifier ensemble models for FDP thus far.

The objective of this research is to propose a novel framework of SVM classifier ensembles based on earnings manipulation and a fuzzy integral to improve the prediction performance of FDP, called SEMFI. The contributions of this work are three-fold. First, earnings manipulation is introduced to divide companies into different categories for training by single SVM classifiers. Second, the ensemble strategies of single classifiers are studied based on different situations in which the company manipulates earnings in the previous three years. Third, a new fuzzy measure determination and dynamic adjustment method is proposed to obtain the final ensemble results.

The remainder of this paper is organized as follows. Section 2 is devoted to the methodologies, including earnings manipulation, the support vector machine and the fuzzy integral. Section 3 describes the SEMFI framework. Section 4 discusses the determination and dynamic adjustment of a fuzzy measure. Section 5 illustrates the experimental results and analysis. Section 6 summarizes the major findings and proposes future research.

2. Literature review

2.1 Earnings manipulation

Earnings manipulation has received considerable attention since the 1980s and has evolved into an important subfield of accounting research. According to [38], earnings manipulation is the act of managers that alter their financial statements intentionally to mislead their stakeholders regarding the real financial situation of the companies or to increase their contractual outcomes that depend on reported financial performance.

The motivations for managers to manipulate the earnings are varied [24]. Managers may want to obtain desirable stock valuations in the capital market through such manipulation. Managers can also use this manipulation to avoid a reduction of their earnings-based bonuses. In addition, in China, the managers of listed companies manipulate the earnings to meet the regulatory thresholds to avoid special treatment or being delisted.

Because of the existence of earnings manipulation, data in the financial statements will be influenced, and the financial statements may not reveal the real financial situation of the companies. This issue will cause losses to the investors and make the market unstable [12]. Hence, detecting the earnings manipulation of companies is important and necessary.

Many models have been developed to determine whether a company manipulates the earnings. This study adopts the widely used modified-Jones model, which has suitable performance [41]. This model divides the total accruals into discretionary accruals, which are used to measure the earnings manipulation of the companies, and nondiscretionary accruals. In addition, Kothari performed specific adaptations to the modified-Jones model. According to Kothari’s work, this study included an intercept term, $\alpha_{0}$ , in the modified-Jones model for better performance [40]. The equation of the model is as follow:

$\displaystyle\frac{\textit{NDA}_{j,t}}{\textit{TA}_{j,t-1}}=\alpha_{0}+\alpha_% {1}\left(\frac{1}{\textit{TA}_{j,t-1}}\right)+\alpha_{2}\left(\frac{\Delta% \textit{REV}_{j,t}-\Delta\textit{REC}_{j,t}}{\textit{TA}_{j,t-1}}\right)+% \alpha_{3}\left(\frac{\textit{PPE}_{j,t}}{\textit{TA}_{j,t-1}}\right)$ (1)

where $\textit{NDA}_{j,t}$ is the nondiscretionary accruals of firm $j$ in year $t$ ; $\textit{TA}_{j,t-1}$ is the total assets of firm $j$ in the end of year $t-1$ ; $\Delta\textit{REV}_{j,t}$ is the change in sales of firm $j$ between years $t$ and $t-1$ ; $\textit{PPE}_{j,t}$ is the gross value of property, plant and equipment of firm $j$ in year $t$ ; $\Delta\textit{REC}_{j,t}$ is the change in accounts receivable of firm $j$ between years $t$ and $t-1$ .

$\alpha_{i}(i=0,\ldots,3)$ is the characteristic parameter which can be obtained by regression analysis for the data of each year and industry. The total accruals equal the net profit minus the cash flows from operations. The nondiscretionary accruals are obtained by Eq. 1. Then the discretionary accruals can be gotten by subtracting the nondiscretionary accruals from the total accruals. In addition, all the financial data above are scaled by total assets [41]. If the value of discretionary accruals of a company is zero, it means that the company does not manipulate the earnings, otherwise, the company manipulates the earnings.

2.2 Support vector machine

The SVM is a classification method based on the statistical learning theory. The SVM was first proffered by Vapnik in the 1990s and was subsequently rapidly developed by researchers. The principle of the SVM is based on the structural risk minimization principle [5]. The SVM is widely used in many fields, such as pattern recognition, regression estimation and FDP, because of its unique advantages in solving the problems of small samples, nonlinear, and high-dimensional pattern recognition [21, 42].

The training date should be $T=\{(x_{1},y_{1}),\ldots,(x_{l},y_{l})\}$ , where $x_{i}$ is the $i$ th input vector, and $y_{i}$ is the corresponding label, $x_{i}\in R^{n}$ and $y_{i}\in\{-1,+1\}$ . For the linear binary classification problem, the SVM classifier is trained to find the optimal separating hyperplane: $w^{T}x+b=0$ , which can not only perfectly classify the two classes but also maximize the classification margin of the two classes. When the training set is linearly separable, the constraint optimization model is as follows:

$\displaystyle\operatorname*{min}\limits_{w,b}\frac{1}{2}\|w\|^{2}$ $\displaystyle\operatorname*{s.t.}y_{i}(w^{T}x_{i}+b)\geqslant 1,i=1,\ldots,l$

However, when the training set is linearly inseparable, the optimal separating hyperplane cannot classify all of the samples correctly and subsequently leads to misclassifications; therefore, the constraint optimization model is rewritten as follows:

$\displaystyle\operatorname*{min}\limits_{w,\xi,b}\frac{1}{2}\|w\|^{2}+C\sum% \limits_{i=1}^{l}\xi_{i}$ $\displaystyle\operatorname*{s.t.}y_{i}(w^{T}x_{i}+b)\geqslant 1-\xi_{i},i=1,% \ldots,l$ $\displaystyle\xi_{i}\geqslant 0,i=1,\ldots,l$

where $\xi$ is the slack variable that represents the situation of misclassification and $C(C>0)$ is the penalty parameter, which is added to the model to minimize misclassification.

For a nonlinear binary classification problem, the input vectors are mapped into a high-dimension feature space through a mapping function $\Phi(x)$ such that the samples can be classified as a linear problem in this space. The kernel function is defined as:

$\displaystyle K(x_{j},x_{i})=\langle\Phi(x_{i}),\Phi(x_{j})\rangle$ (4)

where $x\in\mathscr{X}$ , $\Phi(x)\in\mathscr{H}$ ; $\mathscr{X}$ is a subset of $n$ -dimensional real space $R^{n}$ ; $\mathscr{H}$ is the Hilbert space and $\langle\cdot,\cdot\rangle$ denotes the inner product.

Any kernel function that meets the Mercer’s condition corresponds to an inner product and can be used in the SVM. The most widely used kernel functions are linear kernel, Polynomial kernel, and RBF kernel.

Then, the classification hyperplane for the nonlinear binary classification problem is $w^{T}\Phi(x)+b=0$ . According to [29], finding the optimal classification hyperplane for the nonlinear binary classification problem is equivalent to solving the following optimization problem:

$\displaystyle\operatorname*{min}\limits_{w,\xi,b}\frac{1}{2}\|w\|^{2}+C\sum% \limits_{i=1}^{l}\xi_{i}$ $\displaystyle\operatorname*{s.t.}y_{i}[w^{T}\Phi(x_{i})+b]\geqslant 1-\xi_{i},% ∼{}i=1,\ldots,l$ $\displaystyle\xi_{i}\geqslant 0,∼{}i=1,\ldots,l$

Then the dual formulation of the above optimization problem is as follows:

$\displaystyle\operatorname*{min}\limits_{\alpha}\frac{1}{2}\sum\limits_{i=1}^{% l}\sum\limits_{j=1}^{l}\alpha_{i}\alpha_{j}y_{i}y_{j}\langle\Phi(x_{i}),\Phi(x% _{j})\rangle-\sum\limits_{j=1}^{l}\alpha_{j}$ $\displaystyle\operatorname*{s.t.}\sum\limits_{i=1}^{l}y_{i}\alpha_{i}=0$ $\displaystyle 0\leqslant\alpha_{i}\leqslant C,i=1,2,\ldots,l$

By solving the dual formulation, we can get the optimal solution $\alpha^{*}$ and then obtain the optimal hyperplane:

$\displaystyle w^{*T}\Phi(x)+b^{*}=0$ (7)

The final decision function is

$\displaystyle f(x)=sgn(w^{*T}\Phi(x)+b^{*})$ (8)

where $w^{*}=\sum\nolimits_{i=1}^{l}y_{i}\alpha_{i}^{*}\Phi(x_{i})$ and $b^{*}=y_{j}-\sum\nolimits_{i=1}^{l}y_{i}\alpha_{i}^{*}\langle\Phi(x_{i}),\Phi(% x_{j})\rangle$ .

2.3 Fuzzy integral

Classifier ensembles entail training a number of single classifiers and combining the outputs of these classifiers to obtain the final result for an unknown sample. Traditional ensemble methods, such as OWA, define the weight of each single classifier to represent its importance and to obtain the final results by combining the outputs of the classifiers. However, these methods are based on the assumption that there is no interaction between the single classifiers. However, this assumption is not realistic. In 1974, Sugeno introduced the concept of the fuzzy measure, which measured the importance of each single classifier and also clearly expressed the interaction between classifiers [34]. Because of these advantages, a fuzzy integral, which is based on a fuzzy measure, has been used as the ensemble method in many studies [2, 3]. In this section, certain basic definitions of fuzzy measures and fuzzy integrals are briefly reviewed.

Let $X=\{x_{1},x_{2},\ldots,x_{n}\}$ be the set of classifiers, $P(X)$ be the power set of $X$ .

Definition 1. Let $X$ be a finite set. A fuzzy measure on $X$ is a set function $\mu:P(X)\xrightarrow[]{}[0,1]$ , satisfying the following conditions:

(1) (1)
$\mu(\phi)=0,\mu(X)=1$ .
(2)
If $A,B\subseteq P(X),A\subseteq B$ then $\mu(A)\leqslant\mu(B)$ .

For subset $s\subseteq P(X)$ , $\mu(s)$ is considered to be the importance of classifier set $s$ .

In previous studies, diverse fuzzy measures were proposed, such as the Sugeno $\lambda$ -measure and $k$ -order additive discrete fuzzy measures [33, 37, 48]. The Sugeno $\lambda$ -measure is used in this paper because of its computational simplicity and limited number of parameters.

Definition 2. (Sugeno $\lambda$ -measure): Let $A,B\subseteq X$ and $A\cap B=\phi$ , a fuzzy measure $\mu$ is called a Sugeno $\lambda$ -measure if it satisfies:

$\displaystyle\mu(A\cup B)=\mu(A)+\mu(B)+\lambda\mu(A)\mu(B),\quad\lambda\in(-1% ,+\infty)$ (9)

For a subset with a single element $x_{i}$ , $\mu(\{x_{i}\})$ is called a fuzzy density and can be denoted as $\mu_{i}=\mu(\{x_{i}\})$ .

$\lambda$ can be effectively determined by solving the following equation:

$\displaystyle 1+\lambda=\prod_{i=1}^{n}(1+\lambda\mu_{i})$ (10)

As previously noted, a fuzzy integral, which is based on a fuzzy measure, plays an important role in the study of classifier ensembles. Many different fuzzy integral methods have been proposed, such as the Sugeno integral, the Choquet integral and the upper integral [49]. Among these fuzzy integral methods, the Choquet integral has been proven to be a useful tool to combine the results of single classifiers [31], and it is applied in this study.

Definition 3. Let $\mu$ be a fuzzy measure on $X$ . The Choquet integral of function $f:X\xrightarrow[]{}[0,1],f(x_{i})=f_{i},i=1,\ldots,n$ , with respect to $\mu$ is defined by:

$\displaystyle C_{\mu}(f)=\sum_{i=1}^{n}\mu(A_{(i)})(f_{(i)}-f_{(i-1)})$ (11)

where $(i)$ indicates a permutation on $X$ such that $0\leqslant f_{(1)}\leqslant f_{(2)}\leqslant\ldots\leqslant f_{(n)}$ and $A_{(i)}=\{x_{(i)},$ $\ldots,$ $x_{(n)}\}$ .
3. SEMFI framework

Considerable research has shown that the SVM provides promising performance in classification problems [14, 23]. In the field of FDP, the SVM is widely used as a single classifier due to its excellent learning ability and generalization performance. A fuzzy integral is a strong reasoning method under conditions of uncertainty. The ensemble based on a fuzzy integral can address the interaction between classifiers, which is beyond the capabilities of many typical ensemble methods. The Choquet integral and Sugeno integral are two typical fuzzy integral methods. An important difference between the Choquet integral and Sugeno integral is that the Choquet integral uses all values of the single classifiers to compute the integral, whereas the Sugeno integral discards a considerable amount of the information. Many empirical studies have proven that the Choquet integral ensemble can utilize different single classifiers and provides satisfactory performance [2, 3, 26]. In addition, many studies indicate that earnings manipulation can lead to the distortion of financial data and thereby affect the characteristics of the financial statement, which would directly affect the accuracy of FDP. Based on the above analysis, we propose a SVM classifier ensemble framework based on earnings manipulation and the Choquet integral for FDP, called SEMFI, as shown in Fig. 1. For a specific FDP process, the new framework involves three steps.

Figure 1.

SEMFI framework.

In step 1, an initial dataset, which consists of distressed companies and normal companies, is prepared as samples, and appropriate financial indicators are selected for FDP. When predicting the financial situation of a company in year $T$ , financial data of this company in years $T-1$ , $T-2$ and $T-3$ , which represent one previous year, two previous years and three previous years, respectively, are collected. Therefore, the sample set can be divided into three sets, datasets $S_{T-1}$ , $S_{T-2}$ and $S_{T-3}$ , which consist of the financial data of sample companies in years $T-1$ , $T-2$ and $T-3$ , respectively. Financial indicators have been considered by researchers as a major basis for FDP. An indicator system must be developed to summarize the financial situation of companies comprehensively. We will discuss this issue in Section 5.1.

In step 2, an earnings manipulation determination is conducted to divide each dataset into two subsets. Thus, each dataset $S_{k}(k=T-1,T-2,T-3)$ in step 1 can be classified into two subsets by determining whether the companies in the dataset manipulate the earnings: subset $S_{k}^{1}(k=T-1,T-2,T-3)$ , in which the companies manipulate their earnings in year ${k}$ , and subset $S_{k}^{0}(k=T-1,T-2,T-3)$ , in which the companies do not manipulate their earnings in year ${k}$ . Earnings manipulation is the act of managers that alter the financial data in the financial statements. Therefore, the financial data of companies that manipulate the earnings are distorted and may have different characteristics compared with the financial data of companies that do not manipulate the earnings. Thus, it is reasonable to divide the companies in each year into two different categories according to whether they manipulate their earnings. Then, six SVM classifiers are trained on these different categories. $\text{SVM}_{k}^{0}(k=T-1,T-2,T-3)$ denotes the SVM classifier used to train the financial data of companies that do not manipulate the earnings in year ${k}$ , whereas $\text{SVM}_{k}^{1}(k=T-1,T-2,T-3)$ is the SVM classifier used to train the financial data of companies that manipulate the earnings in year ${k}$ .

In step 3, the Choquet integral is applied to combine the outputs of single classifiers obtained in step 2 to obtain the final result. For a specific company, there are two strategies to combine the results of single classifiers in step 2 to obtain the final classification results based on whether the company manipulates earnings in years $T-1$ , $T-2$ and $T-3$ .

Strategy 1: Strategy 1:

If a company manipulates the earnings in all three years (i.e., $T-1$ , $T-2$ and $T-3$ ), then the company’s financial data in the years $T-1$ , $T-2$ and $T-3$ will be trained on SVM ${}_{T-1}^{1}$ , SVM ${}_{T-2}^{1}$ and SVM ${}_{T-3}^{1}$ , respectively, and the classification result of each classifier is obtained. Then, the results of the three classifiers are directly combined by the Choquet integral to obtain the final prediction result, as shown in Fig. 2. If a company does not manipulate the earnings in all three years, then the company’s financial data in the years $T-1$ , $T-2$ and $T-3$ will be trained on $\text{SVM}_{T-1}^{0}$ , $\text{SVM}_{T-2}^{0}$ and $\text{SVM}_{T-3}^{0}$ , respectively. Similar to the procedure shown in Fig. 2, the results of the three classifiers are directly combined by the Choquet integral to obtain the final prediction result.

Figure 2.

Classifier ensemble procedure of strategy 1.

Figure 3.

Classifier ensemble procedure of strategy 2.

Strategy 2:

If a company manipulates the earnings in one of the three years, for example, in year $T-1$ , then the company’s financial data in the years $T-1$ , $T-2$ and $T-3$ will be trained on $\text{SVM}_{T-1}^{1}$ , $\text{SVM}_{T-2}^{0}$ and $\text{SVM}_{T-3}^{0}$ , respectively. $\text{SVM}_{T-2}^{0}$ and $\text{SVM}_{T-3}^{0}$ are used to train the financial data of the company that does not manipulate the earnings in years $T-2$ and $T-3$ ; thus, the results of these two single classifiers are combined first. Then, the combination result of $\text{SVM}_{T-2}^{0}$ and $\text{SVM}_{T-3}^{0}$ is combined with the result of $\text{SVM}_{T-1}^{1}$ to obtain the final prediction result of the company. The procedure of the combination is shown in Fig. 3. Similarly, if a company does not manipulate the earnings in only one of the three years, the results of the two single SVM classifiers that are used to train financial data of the company that manipulates the earnings are combined first. Then, the combination result of the two single SVM classifiers is combined with the result of the classifier used to train the financial data of the company that does not manipulate the earnings to obtain the final prediction result; the procedure of this combination is similar to that in Fig. 3.

With the extension of the look back period, for example, 4 or 5 years, the number of possible combination will increase. For each combination, corresponding ensemble strategy can be designed. Because the older data are less useful for evaluating the current financial situation of the companies, few studies use the data of more than four years.

4. Determination and adjustment of the fuzzy measure

Before applying the Choquet integral to combine the outputs of single classifiers, the fuzzy measure of each classifiers set must be determined. In the classifier ensemble process, a fuzzy measure can be viewed as the degree of importance of the classifier set. To improve the accuracy of classifier ensembles based on a fuzzy integral, it is necessary to determine the fuzzy measure appropriately. In this study, nonlinear programming that attempts to minimize the classification error of an ensemble is applied to determine the fuzzy measure. Furthermore, it considers the value of financial data in different years for evaluating company’s financial situation.

Grabisch and Nicolas [32] proposed a minimum classification error approach that is widely used for the determination of the fuzzy measure. They applied a mean square error (MSE) cost function to compute the cost of ensemble misclassification. Let $C=\{C_{1},\ldots,C_{N}\}$ be the set of classes, $X=\{x_{1},x_{2},\ldots,x_{n}\}$ be a set of $n$ classifiers, and $\Omega$ be the set of samples to be classified. For each $\omega\in\Omega$ , we define a set function $f_{\omega}:X\xrightarrow[]{}[0,1],f_{\omega}(x_{i})=f_{\omega i},i=1,\ldots,n$ . For the Choquet integral ensemble, the MSE cost function is

$\displaystyle E^{2}=\sum_{\omega\in C_{1}}(C_{\mu}(f_{\omega})-\alpha_{1})^{2}% +\ldots+\sum_{\omega\in C_{N}}(C_{\mu}(f_{\omega})-\alpha_{N})^{2}$ (12)

where $\alpha_{i}\in[0,1](i=1,\ldots,N)$ denotes the desired output, which can be set subjectively, and $C_{\mu}(f_{\omega})$ denotes the Choquet integral of $f_{\omega}$ with respect to a fuzzy measure $\mu$ .

This function has a disadvantage, namely, that the prediction performance is sensitive to the desired outputs, and it is difficult to determine the ideal value of the desired outputs [1]. In this study, we apply the cost function based on the difference between the probabilities of each sample belonging to different classes, which removes the need to set desired outputs. These differences are called the dissimilarity measures.

Suppose that the actual class of $\omega$ is class $C_{i}$ , the dissimilarity measure of sample $\omega$ can be calculated as follows:

$\displaystyle d_{i}(f_{\omega})=-C_{\mu_{i}}(f_{\omega})+max_{j,j\neq i}C_{\mu% _{j}}(f_{\omega})$ (13)

where $C_{\mu_{i}}(f_{\omega})$ and $C_{\mu_{j}}(f_{\omega})$ denote the possibilities of sample $\omega$ belonging to class $C_{i}$ , and class $C_{j}$ , respectively. Then a loss function $l(f_{\omega})$ is introduced to compute the loss of misclassification of sample $\omega$ :

$\displaystyle l_{i}(f_{\omega})=\begin{cases}2\left({\displaystyle\frac{1}{1+e% ^{(-\alpha d_{i}(f_{\omega}))}}}-{\displaystyle\frac{1}{2}}\right),&\alpha>0,d% _{i}(f_{\omega})>0\\ 0,&d_{i}(f_{\omega})\leqslant 0\end{cases}$ (14)

In Eq. 14, the parameter $\alpha$ which controls the center slop of the loss function will affect the computing speed of the proposed fuzzy measure determination algorithm. However, there is no generally agreed upon value of $\alpha$ , and we adopt 0.5 in this paper.

In this function, the losses of correctly classified samples are zero, and the losses of incorrectly classified samples are positive. Thus, samples that are solely misclassified are calculated. For a sample, the dissimilarity measure will increase when the possibility of being misclassified is increased; thus, the loss will increase. With this loss function and dissimilar measure, we use the following cost function to compute the cost of the classification error of ensemble for $N$ classes:

$\displaystyle L=\sum_{\omega\in C_{1}}l_{1}(f_{\omega})+\ldots+\sum_{\omega\in C% _{N}}l_{N}(f_{\omega})$ (15)

Financial data have a common feature, namely, that recent data are more useful for evaluating the current financial situation of the companies compared with older data. Thus, we should assign a higher importance to single classifiers trained on the recent data in the fuzzy integral ensemble. In other words, it is reasonable to provide higher fuzzy measures to single classifiers trained on recent data. Suppose that classifier $x^{(p)}$ is trained on financial data in year $p$ , and $x^{(q)}$ is trained on financial data in year $q$ . When $q$ is before $p$ , the fuzzy measures of classifier $x^{(p)}$ and classifier $x^{(q)}$ with respect to class $C_{i}$ should satisfy the following constraints:

$\displaystyle\mu_{i}(x^{(p)})>\mu_{i}(x^{(q)})$ (16)

Then, we follow nonlinear programming:

$\displaystyle\operatorname*{min}\sum_{\omega\in C_{1}}l_{1}(f_{\omega})+\ldots% +\sum_{\omega\in C_{N}}l_{N}(f_{\omega})$ $\displaystyle\operatorname*{s.t.}\mu_{i}(x^{(p)})>\mu_{i}(x^{(q)}),i=1,\ldots,% N;\quad p>q$ $\displaystyle 0\leqslant\mu_{i}\leqslant 1$

The fuzzy measure of each classifier set can be obtained by solving the above nonlinear programming.

For FDP, the external environment may significantly change when using the model trained by historical data to predict the current financial situation of companies. If the initial fuzzy measure obtained in the training stage is fixed in the prediction stage, the prediction performance of the model may decline. Therefore, we should adjust the initial fuzzy measure obtained in the training stage when using the SEMFI shown in Fig. 1 to predict the financial situation of companies.

Two aspects must be considered when adjusting the fuzzy measure: the confidence of each single classifier’s output, which is represented by a confidence correction coefficient, $\alpha$ , and the consistency between each single classifier’s output, which is represented by a consistency correction coefficient, $\beta$ . The importance of the classifier will decrease when the confidence of the classifier output is smaller or the output consistency between one single classifier and the others is smaller [7, 48].

Suppose that $\mu^{k}$ denotes the initial fuzzy measure of single classifier, $x_{k}$ , which is obtained in the training stage. $\Psi$ represents the set of testing samples to be classified. The fuzzy measure adjustment can be summarized as follows.

For each sample $\varpi\in\Psi$ , assume that $[h_{k1},h_{k2},\ldots,h_{kN}]$ is the prediction result of classifier $x_{k}$ with respect to $\varpi$ , where $h_{kj}$ represents the possibility of $\varpi$ belonging to class $C_{j}$ , predicted by classifier $x_{k}$ . The confidence correction coefficient $\alpha^{k}$ and consistency correction coefficient $\beta^{k}$ of classifier $x_{k}$ with respect to the sample $\varpi$ can be obtained as follows:

$\displaystyle\alpha^{k}=a_{kl}-sup_{j,j\neq l}(h_{kj})$ (18) $\displaystyle\beta^{k}=\frac{\sum\nolimits_{j,j\neq k}\gamma^{kj}}{n}$ (19)

where $a_{kl}=\max(h_{ki})(i=1,\ldots,N)$ ; $\gamma^{kj}$ represents the consistency between the output of classifier $x_{k}$ and the output of classifier $x_{j}$ . Suppose that classifier $x_{k}$ classifies the sample $\varpi$ into class $C_{X}$ and classifier $x_{j}$ classifies the sample $\varpi$ into class $C_{Y}$ ; then $\gamma^{kj}$ can be computed as follows:

$\displaystyle\gamma^{kj}=\begin{cases}1-\lvert h_{kX}-h_{jY}\rvert,&X\neq Y\\ 1,&X=Y\end{cases}$ (20)

The final fuzzy measure $\mu_{\varpi}^{k}$ of classifier $x_{k}$ for sample $\varpi$ can be obtained by:

$\displaystyle\mu_{\varpi}^{k}=\mu^{k}\times\alpha^{k}\times\beta^{k}$ (21)

However, the performance of the classifier ensemble system depends on not only the performances of single classifiers but also the diversity among classifiers [8]. When classifier $x_{k}$ is dissimilar to other classifiers and the output of classifier $x_{k}$ is consistent with other classifiers, it is reasonable to assign a higher degree of importance to classifier $x_{k}$ . A dissimilarity coefficient $d_{kj}$ is applied to measure the diversity between the classifier $x_{k}$ and the classifier $x_{j}$ .

Given $S$ samples, the disagreement measure proposed by Skalak [9] is used to measure the diversity between the classifier $x_{k}$ and the classifier $x_{j}$ :

$\displaystyle d_{kj}=\frac{S_{01}+S_{10}}{S}$ (22)

where $S_{01}$ represents the number of samples that are correctly classified by classifier $x_{k}$ and misclassified by classifier $x_{j}$ , $S_{10}$ represents the number of samples that are correctly classified by classifier $x_{j}$ and misclassified by classifier $x_{k}$ .

According to the method applied by Si et al. [28], the consistency correction coefficient $\beta^{k}$ of classifier $x_{k}$ for the sample $\varpi$ can be rewritten as $\tilde{\beta}^{k}$ :

$\displaystyle\tilde{\beta}^{k}=\frac{\sum\nolimits_{j,j\neq k}(\gamma^{kj})^{d% _{kj}}}{n}$ (23)

Thus, the final fuzzy measure $\tilde{\mu}_{\varpi}^{k}$ of classifier $x_{k}$ for sample $\varpi$ can be adjusted as:

$\displaystyle\tilde{\mu}_{\varpi}^{k}=\mu^{k}\times[a_{kl}-sup_{j,j\neq l}(h_{% kj})]\times\frac{\sum\nolimits_{j,j\neq k}(\gamma^{kj})^{d_{kj}}}{n}$ (24)

where $\mu^{k}$ denotes the initial fuzzy measure of classifier $x_{k}$ .

5. Empirical analysis

5.1 The dataset and indicators

Listed manufacturing companies on the Chinese stock exchange are selected to verify the prediction performance of SEMFI in this paper. To protect the investors and the market, the China Securities Regulatory Commission implements special treatment mechanisms for listed companies in poor financial situations. Consistent with most previous research methods for the FDP of Chinese listed companies, in this paper, we regard the special treatment companies as distressed companies and regard the non-special treatment companies as normal companies [21, 22]. According to [46], special treatment companies include the companies that are labeled ST, *ST or S*ST: companies that have negative net profits for two consecutive years are labeled ST; companies that suffer losses for three consecutive years are labeled *ST; and companies that suffer losses for four consecutive years are labeled S*ST. S*ST companies have the worst financial situations, and *ST companies have a worse financial situation than ST companies. The 116 manufacturing companies, which were determined to be ST, *ST or S*ST for the first time during the 2007–2013 period, are selected as the distressed companies. In addition, 116 manufacturing companies, which are not labeled ST, *ST or S*ST during the same period, are selected as the normal companies to match the distressed companies by the ratio of 1:1.

Among the 232 manufacturing companies, 79 distressed companies and 79 normal companies are randomly selected for training data, and the remaining 37 distressed companies and 37 normal companies are regarded as testing data. Financial data in years $T-1$ , $T-2$ and $T-3$ are used to predict financial situation in year $T$ . All of the data are collected from the iFinD database.

The financial indicator system is an important aspect of FDP and has a considerable impact on the performance of the FDP model. Financial indicators are typically divided into many categories, such as profitability, liability and market value, and relevant indicators are selected from these categories for FDP by the researchers [19, 47]. To comprehensively summarize the financial situation of companies and consider the particularity of the Chinese stock market and accounting system, 32 financial indicators from 5 categories are selected to develop the indicator system. The indicator system is shown in Table 1.

Table 1
Financial indicator system

Category	Financial indicators
Debt paying ability ratios	Current ratio, quick ratio, cash ratio, interest coverage ratio, asset-liability ratio, equity multiplier, equity ratio.
Growth ratios	Capital preservation increment rate, growth rate of fixed assets, growth rate of total assets, growth rate of net profit, growth rate of operating profit, growth rate of operating income, growth rate of owners’ equity.
Activity ratios	Accounts receivable turnover, inventory turnover, accounts payable turnover, current assets turnover, fixed assets turnover, total assets turnover, equity turnover.
Market value ratios	Earnings per share, net asset value per share, operating earnings per share, price earnings ratio, price to book ratio, price-to-sales ratio, Tobin’s Q.
Profitability ratios	Operating profit ratio, rate of return on total assets ratio, net profit to fixed assets, return on assets.

5.2 Results analysis

First, we design three SVM classifiers that are trained on financial data in years $T-1$ , $T-2$ and $T-3$ , respectively: SVM ${}_{T-1}$ , SVM ${}_{T-2}$ and SVM ${}_{T-3}$ , to test the prediction performances of single SVM classifiers trained on financial data in different years. Libsvm is used to train these single SVM classifiers in this study [6]. RBF is selected as the kernel function for the SVM classifiers, and the optimal parameters $c$ and $\delta$ for the RBF kernel are determined by the grid search method.

To test the necessity of considering earnings manipulation in this paper, the prediction results are split into two situations. Situation 1 considers earnings manipulation, and Situation 2 does not consider earnings manipulation. The classification procedure of the SVM classifiers in situation 1 and situation 2 are shown in Figs 4 and 5, respectively. The prediction results are presented in Table 2. The overall accuracy, Type I error and Type II error are used to measure the prediction performances of the SVM classifiers. The Type I error denotes the percentage of normal companies misclassified as distressed companies, whereas the Type II error denotes the percentage of distressed companies misclassified as normal companies.

Figure 4.

Classification procedure of the SVM classifier in situation 1.

Figure 5.

Classification procedure of the SVM classifier in situation 2.

Figure 6.

The classification procedure of classifiers ensemble in situation 2.

Table 2 illustrates that the overall accuracy of SVM ${}_{T-1}$ is the highest and that of SVM ${}_{T-3}$ is the lowest in both situations 1 and 2. These results show that more recent data can describe the financial situation of companies more accurately in FDP. This finding means that we should assign a higher degree of importance to single classifiers trained on recent data compared with classifiers trained on older data. Thus, it is rational to provide higher fuzzy measures to the classifiers trained on more recent data in the proposed nonlinear programming method.

Table 2

Prediction results of SVM classifiers with a sample ratio of 1:1

		Situation 1						Situation 2
Model	Actual	Prediction class		Overall	Type I	Type II	Actual	Prediction class		Overall	Type I	Type II
	class	Distressed	Normal	accuracy	error	error	class	Distressed	Normal	accuracy	error	error
SVM ${}_{T-3}$	Distressed	30	7	78.4%	24.3%	18.9%	Distressed	27	10	74.3%	24.3%	27.0%
	Normal	9	28				Normal	9	28
SVM ${}_{T-2}$	Distressed	29	8	82.4%	13.5%	21.6%	Distressed	28	9	79.7%	16.2%	24.3%
	Normal	5	32				Normal	6	31
SVM ${}_{T-1}$	Distressed	28	9	83.8%	8.1%	24.3%	Distressed	27	10	81.1%	10.8%	27.0%
	Normal	3	34				Normal	4	34

Then, we compare the proposed framework with other seven classification methods. PAFI and CMFI denote fuzzy integral ensemble methods in which the fuzzy measure determination methods are based on the overall prediction accuracy and confusion matrices, respectively. OWA and DS denote classifier ensembles methods based on ordered weighted averaging and the Dempster-Shafer evidence theory, respectively. The Dempster-Shafer evidence theory, which was first proposed by Dempster and developed by Shafer, is an effective information combination method [50]. The logit, BP neural network (BPNN) and SVM classifiers are single classifiers that are widely used in FDP. Similar to previous studies, we use financial data in year $T-2$ to train these three types of single classifiers to avoid overestimating the prediction performance [18, 22].

To test the necessity of considering earnings manipulation in this paper, we also provide the prediction results of SEMFI and the above seven classification methods in two situations. In situation 1, the classification procedure of PAFI, CMFI, OWA and DS are similar to SEMFI, as shown in Fig. 1. And the classification procedures of three kinds of single classifiers are similar to Fig. 4. In situation 2, the classification procedures of PAFI, CMFI, OWA, DS and SEMFI are shown in Fig. 6. And the classification procedures of three kinds of single classifiers are similar to Fig. 5. The results are shown in Table 3.

Table 3

Prediction results of SEMFI and seven classification methods on the sample ratio of 1:1

		Situation 1						Situation 2
Model	Actual	Prediction class		Overall	Type I	Type II	Actual	Prediction class		Overall	Type I	Type II
	class	Distressed	Normal	accuracy	error	error	class	Distressed	Normal	accuracy	error	error
Logit	Distressed	28	9	77.0%	21.6%	24.3%	Distressed	27	10	74.3%	24.3%	27.0%
	Normal	8	29				Normal	9	28
BPNN	Distressed	27	10	79.7%	15.6%	27.0%	Distressed	24	13	73.0%	18.9%	35.1%
	Normal	5	32				Normal	7	30
SVM	Distressed	29	8	82.4%	13.5%	21.6%	Distressed	28	9	79.7%	16.2%	24.3%
	Normal	5	32				Normal	6	31
OWA	Distressed	29	8	85.1%	8.1%	21.6%	Distressed	27	10	81.1%	10.8%	27.0%
	Normal	3	34				Normal	4	33
DS	Distressed	30	7	87.8%	5.9%	20.6%	Distressed	29	8	83.8%	10.8%	21.6%
	Normal	2	35				Normal	4	34
CMFI	Distressed	28	9	83.8%	8.1%	24.3%	Distressed	28	9	81.1%	13.5%	24.3%
	Normal	3	34				Normal	5	32
PAFI	Distressed	29	8	85.1%	8.1%	21.6%	Distressed	28	9	81.1%	13.5%	24.3%
	Normal	3	34				Normal	5	32
SEMFI	Distressed	31	6	90.5%	2.7%	16.2%	Distressed	30	7	87.8%	5.4%	18.9%
	Normal	1	36				Normal	2	35

Table 3 shows that SEMFI has satisfactory performance. Using the results in situation 1 for example, SEMFI has a higher overall accuracy (90.5%) and a lower Type I error (2.7%) and Type II error (16.2%) than the two typical fuzzy measure determination methods, i.e., PAFI (85.1%, 8.1% and 21.6%, respectively) and CMFI (83.8%, 8.1% and 24.3%, respectively). The overall accuracy of SEMFI is also higher than those of OWA (85.1%) and DS (87.8%); in addition, the Type I error, Type II error of SEMFI are lower than those of OWA (8.1% and 21.6%, respectively) and DS (5.9% and 20.6%, respectively). SEMFI has significant advantages compared with the logit, BPNN and SVM classifiers in terms of prediction performance. For example, although the SVM performs best among the three single classifiers, the overall accuracy of the SVM is 8.1% lower than that of SEMFI, and the Type I error, Type II error of the SVM are 10.8% and 5.4% higher, respectively, than those of SEMFI. The comparison results in situation 2 are similar to those in situation 1. These results indicate that SEMFI has better performance, regardless of whether earnings manipulation is considered.

Table 3 also reveals that SEMFI and other seven classification methods perform better when considering earnings manipulation, as shown in Figs 7 and 8. Compared with situation 2, in situation 1, the average overall accuracy is 3.7% higher, and the average Type I error and average Type II error are 3.7% and 3.2% lower, respectively.

Figure 7.

Overall accuracy comparison with a sample ratio of 1:1.

Figure 8.

Type II error comparison with a sample ratio of 1:1.

The ratio of distressed to normal companies selected in the above experiment is 1:1. However, normal companies far surpass distressed companies in the real world. Therefore, an experiment with unbalanced samples is conducted in this study to further validate the effectiveness of the proposed framework. Furthermore, to evaluate the real prediction performance of SEMFI, the sample balance techniques, such as sample weighting and data resampling [25, 27], are not used in this experiment. A total of 116 distressed companies and 464 normal companies are randomly selected as the samples with a ratio of 1:4 during the 2007–2013 period. Among the 580 sampled companies, 79 distressed companies and 316 normal companies are randomly selected as training data, whereas the remaining 37 distressed companies and 148 normal companies are regarded as testing data.

The prediction results of the SVM classifiers, which use financial data in years $T-1$ , $T-2$ and $T-3$ are shown in Table 4. The results in Table 4 are similar to those in Table 2, i.e., the SVM classifier using the more recent data has a higher overall accuracy. For SVM classifiers using financial data in the same year, the overall accuracies of methods considering earnings manipulation is higher than corresponding methods not considering earnings manipulation. The prediction results of SEMFI and the other seven classification methods are shown in Table 5, Figs 9 and 10. These results indicate that when the ratio of distressed to normal samples is 1:4, SEMFI continues to have the highest overall accuracy and the lowest Type I error and Type II error. Table 5 also shows that when considering earnings manipulation, the overall accuracies of SEMFI and the other seven classification methods are higher and the Type I error and Type II error are lower. These experimental results show that the proposed framework based on considering earnings manipulation is also effective when the numbers of distressed companies and normal companies are imbalanced.

Table 4

Prediction results of SVM classifiers with a sample ratio of 1:4

		Situation 1						Situation 2
Model	Actual	Prediction class		Overall	Type I	Type II	Actual	Prediction class		Overall	Type I	Type II
	class	Distressed	Normal	accuracy	error	error	class	Distressed	Normal	accuracy	error	error
SVM ${}_{T-3}$	Distressed	4	33	82.2%	0.0%	89.2%	Distressed	0	37	80.0%	0.0%	100.0%
	Normal	0	148				Normal	0	148
SVM ${}_{T-2}$	Distressed	18	19	87.0%	3.4%	51.4%	Distressed	15	22	85.4%	3.4%	59.5%
	Normal	5	143				Normal	5	143
SVM ${}_{T-1}$	Distressed	15	22	87.6%	0.7%	59.5%	Distressed	14	23	85.9%	2.0%	62.2%
	Normal	1	147				Normal	3	145

Table 5

Prediction results of SEMFI and seven classification methods with a sample ratio of 1:4

		Situation 1						Situation 2
Model	Actual	Prediction class		Overall	Type I	Type II	Actual	Prediction class		Overall	Type I	Type II
	class	Distressed	Normal	accuracy	error	error	class	Distressed	Normal	accuracy	error	error
Logit	Distressed	13	24	84.9%	2.7%	64.9%	Distressed	8	29	81.6%	3.4%	78.4%
	Normal	4	144				Normal	5	143
BPNN	Distressed	17	20	88.6%	0.7%	54.1%	Distressed	8	29	83.8%	0.7%	78.4%
	Normal	1	147				Normal	1	147
SVM	Distressed	18	19	87.0%	3.4%	51.4%	Distressed	15	22	85.4%	3.4%	59.5%
	Normal	5	143				Normal	5	143
OWA	Distressed	15	22	87.6%	0.7%	59.5%	Distressed	12	25	85.9%	0.7%	67.6%
	Normal	1	147				Normal	1	147
DS	Distressed	12	25	86.5%	0.0%	67.6%	Distressed	11	26	85.4%	0.7%	70.3%
	Normal	0	148				Normal	1	148
CMFI	Distressed	15	22	87.6%	0.7%	59.5%	Distressed	14	23	87.0%	0.7%	62.2%
	Normal	1	147				Normal	1	147
PAFI	Distressed	16	21	88.1%	0.7%	56.8%	Distressed	14	23	87.0%	0.7%	62.2%
	Normal	1	147				Normal	1	147
SEMFI	Distressed	23	14	92.4%	0.0%	37.8%	Distressed	16	21	88.1%	0.7%	56.8%
	Normal	0	148				Normal	1	147

The statistical significances of the difference between two situations regarding the overall accuracy, Type I error and Type II error of SEMFI and the other seven methods are further assessed by using the Wilcoxon signed ranks test. The Wilcoxon test is a nonparametric test method that is used when overall distribution is unknown. The Wilcoxon test results of overall accuracy, Type I error, and Type II error in the two situations, i.e., considering earnings manipulation and not considering earnings manipulation, are shown in Table 6.

Table 6

Statistical comparison using the Wilcoxon test

Test statistics	Distressed : Normal $=$ 1:1			Distressed : Normal $=$ 1:4
	Overall accuracy	Type I error	Type II error	Overall accuracy	Type I error	Type II error
$Z$	$-$ 2.565	$-$ 2.555	$-$ 2.410	$-$ 2.524	$-$ 1.732	$-$ 2.527
$\alpha$	0.010	0.011	0.016	0.012	0.083	0.012

When the ratio of distressed companies to normal companies is 1:1, the value of the $Z$ statistics of the overall accuracy, Type I error and Type II error are $-$ 2.565, $-$ 2.555, and $-$ 2.410, respectively. The concomitant probabilities, $\alpha$ , of the three different indicators are 0.010, 0.011, and 0.016, which indicate the overall accuracy; the Type I error and Type II error of SEMFI and the other seven classification methods considering earnings manipulation are better than those of the opposite at the significance level of 0.05.

When the ratio of distressed companies to normal companies is 1:4, the value of the $Z$ statistics of Type I error is $-$ 1.732, and the corresponding concomitant probability $\alpha$ is 0.083, which indicates that the difference in the Type I error is not significant at the significance level of 0.05. This finding is obtained because when the numbers of normal companies and distressed companies are imbalanced and balance techniques are not applied, companies tend to be classified as majority class because the model is over-trained by the majority class samples. In fact, as shown in Table 5, the Type I errors of SEMFI and the other seven classification methods are extremely low. Despite these low errors, the value of $Z$ statistics of overall accuracy and Type II error are $-$ 2.524 and $-$ 2.527, respectively. The concomitant probabilities, $\alpha$ , of these two indicators are all 0.012, which is smaller than the significance level of 0.05, which indicates that the overall accuracy and Type II error of SEMFI and the other seven classification methods considering earnings manipulation are better than those of the opposite ones at a significance level of 0.05. The statistical analysis demonstrates that considering earnings manipulation is effective and practical when using the classifier ensembles for FDP.

Figure 9.

Overall accuracy comparison with a sample ratio of 1:4.

Figure 10.

Type II error comparison with a sample ratio of 1:4.

6. Conclusion

This research proposes a novel SVM classifier ensemble framework based on earnings manipulation and a fuzzy integral for FDP called SEMFI. The financial data in the previous three years are used to predict the current financial situation of companies. The companies in each year are divided into two different categories according to whether the companies manipulate the earnings. Then, single SVM classifiers are trained on different categories, and a fuzzy integral is applied as the ensemble method. A new fuzzy measure determination method is proposed to improve the performance of the fuzzy integral. In this method, an important feature (i.e., the recent data are more valuable for evaluating the current financial situation of the companies) of companies’ financial data is considered. Furthermore, considering that the external environment may have significantly changed when using the model trained by historical data to predict the current financial situation of companies, we propose a fuzzy measure dynamic adjustment method that considers the confidence of each single classifier’s output, the consistency between each of the two single classifiers’ outputs and the diversity among classifiers. An experimental study that compares the performance of SEMFI with three single classifiers and the other four ensemble methods is conducted on the real data of Chinese listed companies. Two cases are considered, with the ratios of distressed companies to normal companies are balanced (1:1) and unbalanced (1:4). From the experimental results, SEMFI obtains the highest overall accuracy and lowest Type I error and Type II error compared with the single classifiers and the other ensemble methods. The results also show that for SEMFI and the other seven methods, considering earnings manipulation can significantly improve the overall accuracy and reduce Type I and Type II errors in statistical terms, regardless of whether the dataset is balanced. Future research will focus on examining the performance of the heterogeneous classifier ensembles, such as combining the logit, BPNN and SVM classifiers using a fuzzy integral based on the new fuzzy measure determination method for FDP.

Footnotes

Acknowledgments

This research was supported by NSFC Grant (No. 71201024) and NSFC Grant (No. 71671038).

References

Mendez-Vazquez

Gader

Keller

J.M.

et al., Minimum classification error training for choquet integrals with applications to landmine detection, Fuzzy Systems, IEEE Transactions on 16 (2008), 225–238.

Verikas

and Lipnickas

, Fusing neural networks through space partitioning and fuzzy integration, Neural Processing Letters 16 (2002), 53–65.

Abdallah

A.C.B.

Frigui

and Gader

, Adaptive local fusion with fuzzy integrals, Fuzzy Systems, IEEE Transactions on 20 (2012), 849–864.

Lau

A.H.L.

, A five-state financial distress prediction model, Journal of Accounting Research (1987), 127–138.

Cortes

and Vapnik

, Support-vector networks, Machine Learning (1995), 273–297.

Chang

C.C.

and Lin

C.J.

, LIBSVM: A library for support vector machines, Acm Transactions on Intelligent Systems & Technology 2 (2011), 389–396.

Chibelushi

C.C.

Deravi

and Mason

J.S.D.

, Adaptive classifier integration for robust pattern recognition, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 29 (1999), 902–907.

Štefka

and Holeňa

, Dynamic classifier aggregation using interaction-sensitive fuzzy measures, Fuzzy Sets and Systems 270 (2015), 25–52.

Skalak

D.B.

, The sources of increased accuracy for two proposed boosting algorithms, American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop 1129 (1996), 120–125.

10.

Yeung

D.S.

Wang

X.Z.

and Tsang

E.C.C.

, Handling interaction in fuzzy production rule reasoning, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 34 (2004), 1979–1987.

11.

Altman

E.I.

, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance 23 (1968), 589–609.

12.

Chen

F.H.

Chi

D.J.

and Wang

Y.C.

, Detecting biotechnology industry’s earnings management using Bayesian network, principal component analysis, back propagation neural network, and decision tree, Economic Modelling 46 (2015), 1–10.

13.

Xiong

Shi

Chen

et al., Divisional fault diagnosis of large-scale power systems based on radial basis function neural network and fuzzy integral, Electric Power Systems Research 105 (2013), 9–19.

14.

C.J.

X.J.

et al., Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine, Applied Soft Computing 19 (2014), 57–67.

15.

Sun

J.C.

and Yan

X.Y.

, Forecasting business failure using two-stage ensemble of multivariate discriminant analysis and logistic regression, Expert Systems 30 (2013), 385–397.

16.

Tserng

H.P.

Lin

G.F.

Tsai

L.K.

and Chen

P.C.

, An enforced support vector machine model for construction contractor default prediction, Automation in Construction 20 (2011), 1242–1249.

17.

Sun

and Li

, Data mining method for listed companies’ financial distress prediction, Knowledge-Based Systems 21 (2008), 1–5.

18.

Sun

and Li

, Financial distress prediction based on serial combination of multiple classifiers, Expert Systems with Applications 36 (2009), 8659–8666.

19.

Sun

and Li

, Financial distress prediction using support vector machines: Ensemble individual

vs.

, Applied Soft Computing 12 (2012), 2254–2265.

20.

Sun

and Li

, Majority voting combination of multiple case-based reasoning for financial distress prediction, Expert Systems with Applications 36 (2009), 4363–4373.

21.

Sun

and Hui

X.F.

, An application of support vector machine to companies’ financial distress prediction, Lecture Notes in Artificial Intelligence 5 (2006), 274–282

22.

Sun

and Li

, Listed companies’ financial distress prediction based on weighted majority voting combination of multiple classifiers, Expert Systems with Applications 35 (2008), 818–827.

23.

Bae

J.K.

, Predicting financial distress of the South Korean manufacturing industries, Expert Systems with Applications 39 (2012), 9159–9165.

24.

Kuo

J.M.

Ning

and Song

, The real and accrual-based earnings management behaviors: Evidence from the split share structure reform in China, The International Journal of Accounting 49 (2014), 101–136.

25.

Polat

, Data weighting method on the basis of binary encoded output to solve multi-class pattern classification problems, Expert Systems with Applications 40 (2013), 4637–4647.

26.

Kim

K.J.

and Cho

S.B.

, Fuzzy integration of structure adaptive SOMs for web content mining, Fuzzy Sets and Systems 148 (2004), 43–60.

27.

Peng

Zhang

Yang

et al., A new approach for imbalanced data classification based on data gravitation, Information Sciences 288 (2014), 347–373.

28.

Wang

Tan

et al., A novel approach for coal seam terrain prediction through information fusion of improved D-S evidence theory and neural network, Measurement 54 (2014), 140–151.

29.

Chen

L.H.

and Hsiao

H.D.

, Feature selection to diagnose a business crisis by using a real GA-based support vector machine: An empirical study, Expert Systems with Applications 35 (2008), 1145–1155.

30.

Salchenberger

L.M.

Cinar

and Lash

N.A.

, Neural networks: A new tool for predicting thrift failures, Decision Sciences 23 (1992), 899–916.

31.

Grabisch

and Labreuche

, A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid, 4OR 6 (2008), 1–44.

32.

Grabisch

and Nicolas

J.M.

, Classification by fuzzy integral: Performance and tests, Fuzzy Sets and Systems 65 (1994), 255–271.

33.

Grabisch

, K-order additive discrete fuzzy measures and their representation, Fuzzy Sets and Systems 92 (1997), 167–189.

34.

Sugeno

, Theory of Fuzzy Integral and Its Application, Ph. D. Dissertation, Tokyo Institute of Technology, 1974.

35.

Kim

M.J.

and Kang

D.K.

, Ensemble with neural networks for bankruptcy prediction, Expert Systems with Applications 37 (2010), 3373–3379.

36.

Kochi

and Wang

, An algebraic method and a genetic algorithm to the identification of fuzzy measures based on Choquet integrals, Journal of Intelligent and Fuzzy Systems 26 (2014), 1393–1400.

37.

Pizzi

N.J.

and Pedrycz

, Effective classification using feature selection and fuzzy integration, Fuzzy Sets and Systems 159 (2008), 2859–2872.

38.

Healy

P.M.

and Wahlen

J.M.

, A review of the earnings management literature and its implications for standard setting, Accounting Horizons 13 (1999), 365–383.

39.

Kumar

P.R.

and Ravi

, Bankruptcy prediction in banks and firms via statistical and intelligent techniques-A review, European Journal of Operational Research 180 (2007), 1–28.

40.

R.S.

, Predicting earnings management: A nonlinear approach, International Review of Economics & Finance 30 (2014), 1–25.

41.

Kothari

S.P.

Leone

A.J.

and Wasley

C.E.

, Performance matched discretionary accrual measures, Journal of Accounting and Economics 39 (2005), 163–197.

42.

Bellotti

and Crook

, Support vector machines for credit scoring and discovery of significant features, Expert Systems with Applications 36 (2009), 3302–3308.

43.

and Kim

J.B.

, Real earnings management and the cost of new corporate bonds, Journal of Business Research 67 (2014), 641–647.

44.

Wang

and Chai

, Flame image-based burning state recognition for sintering process of rotary kiln using heterogeneous features and fuzzy integral, Industrial Informatics, IEEE Transactions on 8 (2012), 780–790.

45.

Beaver

W.H.

, Alternative accounting measures as predictors of failure, Accounting Review (1968), 113–122.

46.

Chen

Wang

and Wu

D.D.

, Credit risk measurement and early warning of SMEs: An empirical study of listed SMEs in China, Decision Support Systems 49 (2010), 301–310.

47.

Song

Ding

Huang

et al., Feature selection for support vector machine in financial crisis prediction: A case study in China, Expert Systems 27 (2010), 299–310.

48.

Cao

, Aggregating multiple classification results using Choquet integral for financial distress early warning, Expert Systems with Applications 39 (2012), 1830–1836.

49.

Wang

Lee

K.H.

et al., Lower integrals and upper integrals with respect to nonadditive set functions, Fuzzy Sets and Systems 159 (2008), 646–660.

50.

Xiao

Yang

Niu

et al., A new evaluation method based on d-s generalized fuzzy soft sets and its application in medical diagnosis problem, Applied Mathematical Modelling 36 (2012), 4592–4604.

Financial distress prediction using SVM ensemble based on earnings manipulation and fuzzy integral

Abstract

Keywords

1. Introduction

2. Literature review

2.1 Earnings manipulation

5.1 The dataset and indicators

Table 1 Financial indicator system

Footnotes

Acknowledgments

References

Table 1
Financial indicator system