Identifying financial ratios associated with companies’ performance using fuzzy logic tools

Abstract

This study introduces computerized model for evaluation of corporate performance for companies traded in the main world stock markets. The main contribution of this study is to utilize a “Soft Regression” modeling tool, which is a soft computing tool based on fuzzy logic in financial statement analysis. Specifically, the tool is used to identify the most important financial ratios explaining the performance (as reflected by Operating Income Margin) of publicly traded companies, belonging to the manufacturing industries 2000–3999. We used data extracted from the XBRL database for years 2012 to 2016.

The main results and conclusions of the study are:

The study identified relevant financial ratios for the manufacturing industry. It also revealed the relative importance of the various categories of financial ratios.

Detailed comparison of the results for 2012 and for 2016 indicated high degree of consistency and stability over time.

Not all financial ratios are equally relevant for all industries.

Proxy variables belonging to the same category of financial ratios are interchangeable in our model. It does not matter, which of the ratios belonging to the same category are used, the results are very similar for both, 2012 and for 2016.

All the resulting indicators imply that the model is highly reliable and robust.

The main contribution of this study is to present a soft computing modeling tool based on fuzzy logic which is intuitive, stable and not based on restrictive assumptions.

Keywords

Modeling corporate earnings financial ratios XBRL soft regression corporate evaluation

1 Introduction

The main objective of this study is to use “Soft Regression", which is a Soft Computing tool based on Fuzzy Logic to identify indicators associated with the successful performance of corporations in terms of corporate earnings. One of the main endeavors of financial analysts is to evaluate corporate earnings, and financial statements are an important input in this process. Financial statement analysis is used to provide stock recommendations to investors and is also used to generate benchmarks [1]. While financial ratios play an important role in this process the question is which financial ratios, among the hundreds that can be computed, should be analyzed? [2].

The aim of this study is to build a model using various financial ratios as explanatory variables, and to identify the ratios that are the most associated with the companies’ earnings. There are many financial ratios, all of them well known and widely accepted by professionals involved in analyzing financial reports. Our study attempts to find out which ratios are significant in explaining the behavior of the dependent variable (companies’ profitability), and even more important: what is the relative importance of these ratios among themselves. Unquestionably, identifying the most important financial ratios associated with earnings will constitute an important contribution in designing investment strategies.

An important factor which makes it very difficult to build a model of financial ratios while utilizing conventional modeling tools is that there is a substantial mathematical correlation among various financial ratios. Multicollinearity does not allow to incorporate all the relevant financial ratios into the same equation, thus undermining the reliability of the results due to model misspecification. In addition, the exclusion of some explanatory variables due to multicollinearity makes the computation of relative importance of the whole set of variables incorrect, because the excluded variables are implicitly assigned weight of zero even in the cases of important variables that given different model specification could become significant.

Therefore, in this study we utilize “Soft Regression” (SR), which is a Soft Computing modeling tool based on Fuzzy Information Processing. SR does not require independence of explanatory variables and thus multicollinearity does not affect the reliability of modeling results. In other words, SR allows to incorporate explanatory variables into the same equation even if they are mathematically correlated. In addition, SR generates reliable computation of relative importance of explanatory variables among themselves [3]. More details regarding SR are presented below.

Financial reports’ ratios are computed utilizing the eXtensive Business Reporting Language (XBRL). The Securities Exchange Commission (SEC) has mandated, since 2011, XBRL format for reporting of financial data for all publicly traded companies. XBRL facilitates information gathering and processing, since it is easily downloaded from the internet and translated into EXCEL format, which should be beneficial to users of the financial reporting information. We are using annual data between 2012 and 2016, in order to demonstrate stability, and consistency of the results, thus pointing to the reliability and robustness of the model.

2 Literature survey

Evaluating firm performance using financial ratios is the traditional tool for decision makers, including investors and researchers. Financial ratios express the relationship between total amounts observed in the financial statements allowing comparisons to be made across companies of different industries and different sizes, and within a company across time. The main issue raised over time is which of these ratios, among the hundreds that can be computed, should be analyzed to obtain the necessary information for the required decision.

Due to the fact that not all ratios are informative and can provide high discrimination power, it is necessary to filter out unrepresentative variables from a given data set through feature selection techniques [4]. There are many well-known feature selection/extraction techniques that have been used as a first step for bankruptcy prediction, the more traditional methods are correlation matrix [5], t-test, factor analysis, and stepwise logistic regression [6]. Logistic regression has also been a key method in feature selection for research focused on the usefulness of accounting ratios in predicting earnings movement, which can consequently be used as the basis for a profitable investment strategy [7 –12].

Feature selection/extraction has also been found to enhance the performance of AI methods. Principal Component Analysis (PCA) has been found to increase the performance of models using financial ratios in bankruptcy predictions [13] and the performance of bankruptcy prediction models [14].

A comparison of five well known feature selection methods, in bankruptcy prediction, was done by [13]. The paper compared t-test, correlation matrix, stepwise regression, PCA and factor analysis, multi-layer perceptron neural networks were used as the prediction model. The results found that the t-test feature selection method outperformed the other methods.

In feature selection we chose a subset of feature from a set of features. In extraction, we create a subset out of the set (in PCA, we might take 7 PC out of 60 features. These 7 PC are built from all the features). That is, in feature selection, we chose features, in feature extraction, we create new features based on the original features.

Financial time series data are were found to be characterized by noise, chaos and a high degree of uncertainty, and contain strong nonlinearity and outliers [15, 16].

Soft Regression (SR) is an Artificial Intelligence (Soft Computing) modeling tool based on Fuzzy and Heuristic Information Processing. It has been evolving since 1990 s (for more details see [17]. Comparison of SR to Multivariate Regression method appears in [18]. Computing relative importance of explanatory variables (RELIMP) by utilizing SR versus traditional regression methods is presented in [19]. The detailed explanation of RELIMP (based on SR) and evaluation of its reliability are presented in [3].

Extensive literature addressing the precision and reliability of XBRL data is presented in detail below.

2.1 Extensible business reporting language

XBRL (eXtensible Business Reporting Language) is a freely available and global standard designed for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. One use of XBRL is to define and exchange financial information, such as financial statements.

The U.S Securities and Exchange Commission (SEC) has created the XBRL U.S. GAAP Financial Reporting Taxonomy. This taxonomy is a collection of accounting data concepts and rules that enables companies to present their financial reports electronically. The SEC’s deployment was launched in 2008 in phases, and all public U.S. GAAP companies were required to file their financial reports using the XBRL reporting technology starting from June 15, 2011.

XBRL has several advantages over COMPUSTAT, which has been a popular source of financial information for both academics and practitioners. Among XBRL data advantages is the fact that it is freely available while COMPUSTAT is costly. XBRL filings also have a time advantage, it takes an average of 14 weekdays from the time a company files with the SEC for that data to appear in COMPUSTAT [20, 21], while XBRL data is published concurrently with the related PDF versions, and is immediately available. In addition, the reliability of COMPUSTAT has been questioned, prior studies have shown that COMPUSTAT data may differ from the original corporate financial data [22 –24] and data found in other accounting databases [25, 26].

3 The model

Dependent Variable:

The dependent variable is A20-Operating Income Margin (operating income divided by total revenues)

Explanatory Variables:

Financial ratios have played an important part in evaluating the financial condition of companies [2] different ratios and a variety of different financial ratio classification systems have been suggested [27]. In this paper we follow one of the most common classifications as presented in numerous textbooks [28].

The ratios are commonly classified as follows:

Liquidity and Efficiency:

Liquidity refers to the ability to pay for short term liabilities, current as well as liabilities which mature within the next year. The payment is expected to be in terms of present liquid assets as well as assets which are expected to become liquid within the next year.

Efficiency, measured as Cash conversion cycle, refers to the ability to sell inventory, collect payment from customers and pay suppliers. Efficiency classification has very similar features to the liquidity classification, since for most companies the ability to pay their liabilities within the next year will depend on their ability to collect cash from customers. It is therefore common in many ratio classification schemes, to lump these two classifications together.

Solvency: the ability to pay long term debt, liabilities which will mature after more than one year.

Profitability: the fundamental goal of the business is to earn a profit and therefore there is great importance to profitability measures. Profitability represents the company’s ability to create future positive cashflows in excess of liabilities.

Market ratios: analyzing shares as an investment. Investors buy shares to earn a return on their investment, this return can be achieved in one of two ways: gains (or losses) from selling the shares and/or dividends.

The investment decision is usually based on two important factors, risk and return. When examining the classifications presented above, the first two classifications represent the company’s risk level, its ability to pay its debts and operations and survive, in the short and the long run. The last two classifications represent return to the investor, profitability represents the potential for return, while the market ratios represent the actual return.

It should be noted that the Price/Earnings (P/E) ratio, which is classified as a market ratio, and represents the price the investor is willing to pay for one unit of earnings, is a special case in terms of its relationship with future earnings. Traditional capital markets theory assumes that the market is efficient in the sense that useful information, such as earnings information, influences the adjustment of share price [29, 30]. In other words, earnings changes can be used as an explanatory variable to the market price. However, the P/E ratio has also been shown to move in the opposite direction, current price changes may be used as an explanatory variable of future earnings [31, 32].

The results of the analysis presented in our study demonstrate that the variables found significant represent all four categories of financial ratios discussed above:

Liquidity and Efficiency:

The company’s liquidity and efficiency are represented by the sales to total cash ratio. This is an inclusive ratio which represents the company’s ability to generate cash (and not just accounts receivables) from its current sales and be liquid. The ability to generate cash is pertinent for the ability of the company to pay off its current debts (efficiency).

Solvency:

The company’s solvency is represented by the Interest Coverage Ratio and the Cash Flow from Operations to total debt.

The first ratio measures the proportionate amount of operating income that is used to cover interest payments, since these interest payments are usually made on a long-term basis, they are often treated as an ongoing expense. This ratio is also used to indicate the company’s capitalization efficiency, the impact of the company’s choices in raising capital.

The second ratio representing the company’s solvency is: Cash Flow from operations to total debt. It indicates how long it will take the company to pay off all of its debt if it devotes all of its cash flow from operations to debt repayment, this ratio provides a snapshot of the overall financial health of the company.

Profitability:

It is reasonable that the classification which will be most prominent and have the most significant variables, are profitability. Profitability ratios represent the relative measures of the earnings (profits) the company created, and therefore have the closest association with the earnings themselves.

Market Ratios:

The market ratios represent the relationship between the company’s actual profits and the investor returns (gains from an increase in the price of the shares or from the distribution of dividends). There is a representation of both the gains from shares (P/E ratio) and the gains from dividends (Payment of dividends as a % of operating cash flow).

Proxy variables

The four types of financial ratios presented above are represented by measurable quantitative proxy variables as presented below. Appendix 1 shows all the accounting descriptors examined in the first phase of analysis. From these descriptors, the proxy variables for each “financial ratios” category were selected as follows:

Liquidity and Efficiency:

A52-Sales to total cash

A54-Sales to total working capital

Solvency:

A47-Times Interest Earned:

A73-Cash From Operations (CFO) to Total Debt

Profitability:

A35-ROA (Return on Assets)

A50-Pre-taxes income over Sales

A51-Net Profit Margin:

A57-Research & Development Expense to Sales:

A59-Operating Income to Total assets

A70-EBITDA Margin Ratio: EDITDA (Earnings Before Interest, Taxes, Depreciation and Amortization) to Total Revenue

Market ratios:

A21-P/E Ratio

A75-Payment of Dividends as % of OCF (Operating Cash Flow)

4 Data

Using the NASDAQ company list (http://www.nasdaq.com/screening/company-list.aspx) all 6,670 companies (tickers) listed on all of the three major US stock exchanges (AMEX, NASDAQ, and NYSE) were found.

The annual financial data was obtained using XBRL Analyst (created by FinDynamics); an Excel plugin that allows users to access the company’s XBRL tagged data from its XBRL SEC filing via the XBRL US database. Using this software not only allows easy access and analysis of the data but also allows the calculation of any missing balances. For example, the balance reported in each XBRL filing for total liabilities is not available on the original XBRL filing but is extracted and calculated using the XBRL Analyst. The obtained data was annual filings from 2012 to 2016 (5 years).

The process of selecting a subset of relevant features to be used in the model construction, was also used to create the financial ratios. 6,670 tickers were originally identified using the NASDAQ company list and 2,561 tickers were removed. The reasons for removal: there wasn’t any data reported in XBRL format, tickers for non-common stocks, tickers for companies with IPO’s between 2012 and 2016, and tickers for companies with more than one ticker (the same CIK).

The final sample included 4,109 companies (61.6% of all tickers listed) that were publicly traded on Q3/2017. For the purpose of this study it was decided to examine only one industry, the manufacturing industry (SIC code 2000–3999), which represents the largest industry, 1,597 tickers, 38.9% of the total sample out of which 1,585 reported operating income.

60 variables (based on [7]) were extracted from the XBRL filing data base (Appendix 1). It should be noted that some of the variables had to be calculated from the original filing, whereas some other variables were already calculated as part of the XBRL Analyst tool. We ended up with 622 companies having positive operating income for consecutive 5 years (2012–2016), 246 companies with negative operating income for consecutive 5 years, and 398 companies that had positive and negative operating income over the 5 years.

5 Method

The above description of the explanatory variables points to a possibility that there is a mathematical correlation among some of the variables described above. This means that it becomes impossible to include all of them together in the model when utilizing traditional modeling tools such as MVR (see [3]). Due to multicollinearity, some of the explanatory variables become insignificant not because they are not related enough to the dependent variable, but because of technical limitations of the MVR. We avoid this problem by utilizing SR modeling tool, where explanatory variables are not required to be independent of each other.

Soft Regression

SR is a modeling tool based on soft computing concepts such as Fuzzy Logic [33]. The technical details of the SR method are described in [3 , 19].

We will briefly describe several of the important characteristics of the SR that are different from those of traditional MVR, and thus justify using it in this study. These characteristics are:

Soft regression does not require precise model specification. This regression tool is based on Fuzzy Logic, which is designed in the first place to handle information under severe conditions of uncertainty and imprecision [33]. The idea here is to give up on the possibility of building a precise model and satisfying ourselves with the opportunity to work with whatever data are available. We generate a partial/less-precise model that could still be very reliable in a general direction of its conclusions because it avoids the problem of misspecification bias. It could be summarized as follows: It is preferable to have imprecise, but broadly correct results (SR), rather than have precise results (containing a small statistical error) which are incorrect (due to misspecification bias –MVR). Of course, in the modeling projects where some potentially important variables are excluded due to being insignificant because of multicollinearity (MVR method), such models are misspecified by definition.

Explanatory variables are not required to be independent of each other. In the fields such as Economics, Finance, etc. the variables are usually intangible concepts, that are often highly correlated among themselves mathematically even while logically they could each represent separate and independent (at least to some extent) concepts. When using MVR, correlation among explanatory variables causes some of important explanatory variables to appear insignificant, and therefore being removed from the model - thus leading to model misspecification. Hence, this feature of SR (not requiring independence of explanatory variables and thus not removing variables due to multicollinearity) constitutes a major advantage in comparison to MVR.

The relative importance of the explanatory variables among themselves is not affected by adding or removing variables. When a partial model is constructed, the significance of the explanatory variables and the relative importance of those variables among themselves are not affected by adding additional variables to the model, or removing some variables from it. This is in contrast to the behavior of MVR, where addition or removal of an explanatory variable can change drastically the significance and even coefficient sign of other explanatory variables of the model. This characteristic of the SR adds an important feature of stability into the research/decision making.

The method requires to use normalized data. We introduce heuristically determined maximum and minimum thresholds (for maximum and minimum values during the normalizing process of the data –see explanation below). This helps to handle the distortions due to outlying values in a user-based logical approach (in contrast to strictly mathematical method utilized in sophisticated traditional techniques such as Robust Regression).

In SR there is a dependent variable and m numerical vectors (m columns) of explanatory variables. Let Y = (y₁, y₂, . . . , y_n) be the n-dimensional vector of dependent variable to be explained, and let ${X_{j}}_{j = 1}^{m}$ be the corresponding n-dimensional vectors of explanatory variables when X_j = (x_j,1, x_j,2, . . . , x_j,n).

$\begin{matrix} We denote V = (v_{0}, v_{1}, . . ., v_{m}) when v_{0} = Y and \\ v_{j} = X_{j} for all j = 1, 2, . . ., m \end{matrix}$ (1)

Normalizing data: the conversion of numerical vectors into fuzzy sets requires their projection into equivalent vectors of the corresponding grades of membership (between zero and one, where 1 represents full membership, and 0 represents no membership at all in the set), based on predefined membership function which is expected logically to reflect the membership of each element in the fuzzy set. It is the critical requirement of this method that the membership function must be in line with human logic and common sense. This is the reason why the normalizing process is described in great detail below.

Based on [17], we define the membership function as follows: Let’s define $\max_{j}$ as the value in a given vector such that all elements equal to or greater than $\max_{j}$ have full membership in the fuzzy set. We assign all elements that are above or equal $\max_{j}$ value of one. Let’s define $\min_{j}$ as the value in that vector such that all elements equal to or smaller than $\min_{j}$ have zero membership in the fuzzy set (do not belong to the fuzzy set at all). We assign all elements that are below or equal $\min_{j}$ value of zero. $\max_{j}$ and $\min_{j}$ must be determined based on logic and common sense for each domain, so as to maintain the integrity of the data. Thus $\max_{j}$ and $\min_{j}$ are Maximum cut-off point and Minimum cut-off point correspondingly.

For all other elements (between $\max_{j}$ and $\min_{j}$ ) we project all other i vector elements of v_j (i) into the interval [0,1] proportionally for all vectors, by

$v_{j}^{Norm} (i) = {\begin{matrix} \begin{matrix} 0 & , v_{j} (i) ⩽ \min_{j} \end{matrix} \\ \begin{matrix} \frac{v_{j} (i) - \min_{j}}{\max_{j} - \min_{j}} & , \min_{j} < v_{j} (i) < \max_{j} \end{matrix} \\ \begin{matrix} 1 & , \max_{j} ⩽ v_{j} (i) \end{matrix} \end{matrix} For all j = 0, . . ., m$ (2) $The result is : V^{Norm} = (v_{0}^{Norm}, v_{1}^{Norm}, \dots, v_{m}^{Norm})$ (3)

Normalizing the data - the implementation:

All the companies in our data base were divided into three groups:

The group of “Winners”: contains companies which were continuously profitable, reported a positive net income, on annual basis for every year between 2012 to 2016 (including 2012 and 2016).

The group of “Losers”: contains companies that reported a negative net income on annual basis for every year between 2012 to 2016 (including 2012 and 2016).

All the remaining companies, the “Middle Group”.

$\max_{j}$ for every year was determined as follows (for every variable): the values of the companies belonging to the group of “Winners” were arranged from the lowest to the highest, and then divided into four quarters. The highest value of the lowest quarter (i.e., the 25th percentile or the first quartile) was selected as $\max_{j}$ .

$\min_{j}$ for every year was determined as follows (for every variable): the values of the companies belonging to the group of “Losers” were arranged from lowest to highest, and then divided into four quarters. The lowest value of the highest quarter (i.e., the 75th percentile or the third quartile) was selected as $\min_{j}$ .

Justification: As was stated above, the process must be in line with human logic and common sense and modelers should be capable of defending their decisions. For example, for $\max_{j}$ , instead of selection made above, we could have selected the minimum measure of the all companies in the category of “Winners”. Such selection would include all the companies in the group “Winners” as a full members in the Fuzzy Set of “Winners”. However, such a selection would include unknown amount of borderline cases, whose corresponding values of explanatory variables (which reflect their performance) often intermix with the more successful performers from the “Middle Group”. On the other hand, by defining only the higher 75% of the “Winners” as the full members of the fuzzy set representing the Winner Group, we prevent the vast majority of the borderline cases from being considered as full members of the group, thus making the identification of the group more clear-cut. Moreover, the 25% of the “Winners” which are not assigned the value of 1, which represents the full membership in the fuzzy set, will be assigned grade of membership close to 1, still reflecting accurately the relative strength of their performance, and hence the integrity of the data is maintained. All this in contrast to the Boolean method, where all those who are not assigned the value of 1, get value of 0, thus becoming an important source of distortions in numerous statistical methods.

Similar, but inverse reasoning applies to $\min_{j}$ .

Every variable where $\max_{j}$ > $\min_{j}$ , is a valid variable ready for being normalized, utilizing equation (2). If $\max_{j}$ is not greater than $\min_{j}$ , this means that this explanatory variable, if related to the dependent variable, will be inversely related. In other words, this is a variable characterized by large values in the group of “Losers” and small values in the group of “Winners”. In this case we define $\max_{j}$ and $\min_{j} as$ follows:

$\max_{j}$ for every year was determined as follows (for every variable): the values of the companies belonging to the group of “Losers” were arranged from lowest to highest, and then divided into four quarters. The highest value of the lowest quarter was selected as $\max_{j}$ .

$\min_{j}$ for every year was determined as follows (for every variable): the values of the companies belonging to the group of “Winners” were arranged from lowest to highest, and then divided into four quarters. The lowest value of the highest quarter was selected as $\min_{j}$ .

Computing Similarity ( S _{Y,X_j} ):

We compute the similarity between the dependent variable and every explanatory variable v_j (j = 1, . . . , m) in the following way: we define distance for direct relation between variables:

$\begin{matrix} d_{Y, X_{j}}^{direct} (i) & = | v_{0}^{Norm} (i) - v_{j}^{Norm} (i) | \\ = | y_{i}^{Norm} - x_{j, i}^{Norm} | for all j = 1, . . ., m \end{matrix}$ (4) and distance for inverse relation between variables:

$\begin{matrix} d_{Y, X_{j}}^{inverse} (i) & = | v_{0}^{Norm} (i) - (1 - v_{j}^{Norm} (i)) | \\ = | y_{i}^{Norm} - (1 - x_{j, i}^{Norm}) | For all j \end{matrix}$ (5)

If $\sum_{i = 1}^{n} d_{Y, X_{j}}^{direct} (i) < \sum_{i = 1}^{n} d_{Y, X_{j}}^{inverse} (i)$ then $d_{Y, X_{j}} (i) = d_{Y, X_{j}}^{direct} (i)$ for all i = 1, . . . , n, else $d_{Y, X_{j}} (i) = d_{Y, X_{j}}^{inverse} (i)$ for all i = 1, . . . , n.

The similarity or closeness (denoted by S_{Y,X_j}) of each explanatory variable X_j to Y is then computed as: $S_{Y, X_{j}} = 1 - \frac{1}{n} \sum_{i = 1}^{n} d_{Y, X_{j}} (i) for all j = 1, . . ., m .$ (6)

The measure of similarity indicates the degree to which explanatory variable behaves in a similar pattern (direct or inverse) in comparison to dependent variable. Therefore, the measure of similarity S_{Y,X_j} is a parallel to the traditional statistical measures of significance (t-tests or sig.). However, in addition to a significant relation (similarity of S_{Y,X_j} ⩾ 0.8), there is an option of partial significance 0.7 < S_{Y,X_j} < 0.8, so that as S_{Y,X_j} is approaching closer to 0.7, it is closer to insignificance. The gradual transition from being fully significant to being fully insignificant adds additional element of stability to the modeling process when utilizing soft regression.

Computing combined similarity ( $S_{Y, X_{1}, . . ., X_{n}}^{Comb}$ ):

Once similarity measures are computed for all the explanatory variables, the next step is to calculate collective contribution of all the explanatory variables combined in explaining the behavior of dependent variable. For every observation, we select the element from one (or more) of the explanatory variables, that is the most similar (has the shortest distance) to the dependent variable, thus creating the vector of minimum distances: $d_{Y, X_{1}, . . ., X_{m}}^{Min} (i) = \min_{1 ⩽ j ⩽ m} d_{Y, X_{j}} (i) for all i = 1, . . ., n$ (7)

A combined similarity of all the explanatory variables to the dependent variable is $S_{Y, X_{1}, . . ., X_{m}}^{Comb} = 1 - \frac{1}{n} \sum_{i = 1}^{n} d_{Y, X_{1}, . . ., X_{m}}^{Min} (i)$ (8)

$S_{Y, X_{1}, . . ., X_{n}}^{Comb}$ explains, to what degree all the explanatory variables combined –explain the behavior of the dependent variable, and in this respect, it is parallel to R². One important difference between the two measurements is that in $S_{Y, X_{1}, . . ., X_{n}}^{Comb}$ we allow for overlap of explanatory variables in their relations with the dependent variable (which is, of course, more reasonable and more in line with the “real world” behavior), and therefore explanatory variables are not required to be independent of each other.

Computing relative importance ( $\underset{j}{Relimp}$ ):

The way to compute relative importance of the explanatory variables is to find out how much each of them contributes to the vector of minimum distances (7) (that was used to compute $S_{Y, X_{1}, . . ., X_{n}}^{Comb}$ ). This is done by finding the difference between the vector of minimum distances $d_{Y, X_{1}, . . ., X_{m}}^{Min} (i)$ (overall closeness of all the explanatory variables combined to the dependent variable) and the distance of each explanatory variable from the dependent variable (d_{Y,X_j}) (see Yosef et. al., 2015).

Therefore, relative importance in the SR (in contrast to traditional regression methods) is not affected by correlation with other explanatory variables, and is determined solely by the contribution of a given explanatory variable to explaining the behavior of the dependent variable.

We can calculate relative weight or relative importance (denoted by Relimp) of each explanatory variable in explaining the behavior of the dependent variable based on the following principles (for more details see Yosef et. al.,2015):

$\begin{matrix} {Relimp}_{j} = & \frac{{Contrib}_{j} - 0.7}{\sum_{r = 1}^{m} ({Contrib}_{r} - 0.7)} for all \\ j = 1, 2, 3, \dots, m \end{matrix}$ (9) where the contribution of each explanatory variable (Contrib_j) is:

$\begin{matrix} {Contrib}_{j} = & 1 - \frac{1}{n} \sum_{i = 1}^{n} | d_{Y, X_{1}, . . ., X_{m}}^{Min} (i) - d_{Y, X_{j}} (i) | \\ for all j = 1, 2, 3, \dots, m . \end{matrix}$ (10)

6 Results

This study involved a very large amount of regression runs, covering all the possible combinations of variables for the years 2012 and 2016, because each one of the four financial ratios category was represented by more than one proxy variable, while each one of the proxy variables covers important aspect within its category and cannot be ignored. In addition, it should be emphasized that such large amount of regression runs would be required for each year under study. A major challenge in summarizing the results was to present them in a concise form, while on the other hand exposing all the main and the most interesting outcomes.

Table 1 presents the measures of Similarity (S_{Y,X_j}) of all the proxy variables used in this study for all the years under study. The most important conclusion of this Table is that all the included variables are important to some degree on a consistent basis. For all the years covered in this study there was not a single case of insignificant S_{Y,X_j}, since all the values came out greater than 0.7. In addition, it is easy to observe (by comparing the values located on the same row), that the similarity measures of each proxy variable are not much different from each other over the years. The consistency and stability are indicative of a solid and stable model: the variables characterized by a relatively high measure of S_{Y,X_j}, are consistently high over the years, and the variables characterized by a partial significance are partially significant for all the years.

Table 1
Similarity over the years

S _{Y,X_j} 2012 2013 2014 2015 2016

Liquidity and A52 0.745 0.761 0.763 0.764 0.748

Efficiency A54 0.728 0.726 0.712 0.719 0.717

Solvency A47 0.901 0.901 0.893 0.902 0.894

A73 0.896 0.882 0.896 0.892 0.892

Profitability A35 0.932 0.934 0.931 0.931 0.932

A50 0.972 0.971 0.972 0.974 0.974

A51 0.959 0.957 0.957 0.960 0.957

A59 0.954 0.953 0.950 0.951 0.958

A70 0.976 0.976 0.976 0.977 0.976

Market ratios A21 0.838 0.823 0.817 0.824 0.809

A75 0.805 0.803 0.793 0.787 0.781

S _{Y,X_j}		2012	2013	2014	2015	2016
Liquidity and	A52	0.745	0.761	0.763	0.764	0.748
Efficiency	A54	0.728	0.726	0.712	0.719	0.717
Solvency	A47	0.901	0.901	0.893	0.902	0.894
	A73	0.896	0.882	0.896	0.892	0.892
Profitability	A35	0.932	0.934	0.931	0.931	0.932
	A50	0.972	0.971	0.972	0.974	0.974
	A51	0.959	0.957	0.957	0.960	0.957
	A59	0.954	0.953	0.950	0.951	0.958
	A70	0.976	0.976	0.976	0.977	0.976
Market ratios	A21	0.838	0.823	0.817	0.824	0.809
	A75	0.805	0.803	0.793	0.787	0.781

Note: all the explanatory variables are directly related to the dependent variable.

Due to utilization of several proxy variables from every one of the four Financial Ratios categories, it required very large number of regression runs to try all the possible combinations of the proxy variables from all four categories. As was explained in the theoretical section, the measure of similarity for a given explanatory variable remains the same no matter what are the other variables in any given regression run. Therefore, the task of presenting S_{Y,X_j} measures in Table 1 was simple. However, this is not the case for the measures of relative importance: Relimp. As explained above, for any given explanatory variable, its Relimp will be different in every regression run based on a different set of explanatory variables. Thus, summarizing Relimps for the explanatory variables is more difficult and challenging. In Table 2 we present, just as an example, arbitrarily selected results of seven different regression runs (each one of the seven columns of the Table representing a separate regression run). The Table presents results for the year 2012 only. The first four rows display the variables included in the various regression runs. The measures of S_{Y,X_j} (rows 5–8) are, as expected, the same measures as in Table 1 for the year 2012. The most interesting part of the Table are the measures of Relimp and of $S_{Y, X_{1}, . . ., X_{n}}^{Comb} .$ In the rows 8–12 of Table 2, there are measures of Relimp for different variables appearing in rows 1–4 in the corresponding order. Note, that the variables in the same row are the variables from the same group of Financial Ratios, whereas each column represents (as stated above) different regression runs, each consisting of unique combination of the proxy variables. Thus, by observing the values along each row, we can notice the consistency and the same order of magnitude for the values of Relimp located on the same row (belonging to the same category of financial ratios). Hence, the important conclusion is that no matter which proxy variable we decide to select from any of the four groups of financial ratios, the results will still be very similar, which points to a high degree of robustness of the model. In addition, it is easily noticeable, that the relative importance measures of various proxy variables reflect the relative importance of the whole categories of financial ratios to which they (proxy variables) belong. For example, we can observe, that the category “profitability” is the most important category (having the highest values of Relimp), followed by “ Solvency”. Next, and much weaker (but still significant) category is “Market Ratios”. The last, and much weaker category is “Liquidity and Efficiency”, which is characterized by relative importance that is only a fraction in comparison to the leading groups.

Table 2

Example of selected regression runs for 2012

Liquidity and Efficiency	A52	A52	A52	A52	A52	A54	A54
Solvency	A47	A47	A47	A47	A73	A47	A73
Profitability	A35	A50	A51	A70	A50	A50	A50
Market rations	A21	A21	A21	A21	A75	A21	A75
S _{Y,X_j}	0.745	0.745	0.745	0.745	0.745	0.728	0.728
	0.901	0.901	0.901	0.901	0.896	0.901	0.896
	0.932	0.972	0.959	0.976	0.972	0.972	0.972
	0.838	0.838	0.838	0.838	0.805	0.838	0.805
Relimp	0.107	0.083	0.091	0.082	0.089	0.082	0.087
	0.311	0.302	0.304	0.303	0.313	0.302	0.313
	0.355	0.402	0.386	0.402	0.426	0.402	0.426
	0.227	0.213	0.219	0.213	0.172	0.213	0.174
$S_{Y, X_{1}, . . ., X_{m}}^{Comb}$	0.963	0.984	0.978	0.985	0.983	0.985	0.983

The last row in Table 2 displays $S_{Y, X_{1}, . . ., X_{m}}^{Comb}$ for every regression run. Since the scale is between 0 and 1, the conclusion is that no matter what combination of proxy variables we select from the relevant groups, all the regression runs are highly successful in explaining the behavior of the dependent variable, the lowest being 0.963, which is still very high.

Table 3 compares results for year 2012 to results of 2016. It differs from the Table 2 in the following aspects:

Table 3

Comparison of 2012 to 2016

	2012			2016
	S _{Y,X_j}	Relimp	$S_{Y, X_{1}, \dots, X_{m}}^{Comb}$	S _{Y,X_j}	Relimp	$S_{Y, X_{1}, \dots, X_{m}}^{Comb}$
Liquidity and Efficiency	[0.728,0.745]	[0.082,0.107]	[0.963,0.986]	[0.717,0.748]	[0.091,0.128]	[0.958,0.984]
Solvency	[0.896,0.901]	[0.302,0.313]		[0.894,0.892]	[0.296,0.383]
Profitability	[0.932,0.976]	[0.355,0.426]		[0.932,0.976]	[0.282,0.413]
Market ratios	[0.805,0.838]	[0.172,0.227]		[0.781,0.809]	[0.177,0.206]

Table 2 presents results of selected sample of regression runs for 2012 only. Its purpose is to demonstrate stability and consistency of the results when different proxy variables are selected from the four groups of variables as explained above.

Table 3 compares between the regression results of 2012 and 2016. The comparison allows to demonstrate consistency of the model results over time. Table 3 is based on a very large number of regression runs (including all the possible combinations of explanatory variables). In order to present the results of so many regression runs in the most comprehensible and concise form, we utilize ranges of values which contain all the results of the various regression runs. This way the comparison between the results for 2012 and for 2016 become much simpler and convenient.

In addition to Table 3, we present Graphs 1 and 2, which visually present the same results. Graph 1 displays the comparison between 2012 and 2016 in terms of S_{Y,X_j} measurements, while Graph 2 displays the comparison in terms of Relimp measurements. Both graphs are based on the mid-points of the ranges appearing in Table 3.

Fig. 1

Similarity.

Fig. 2

Relimp.

The consistency of the measures of S_{Y,X_j} and of Relimp is clearly visible in Table 3 when comparing the ranges of these values for 2012 versus 2016. It is even easier to visually observe this consistency when looking at the Graphs 1 and 2. The consistency and stability of the model over time are important indicators of its reliability.

7 Summary and conclusions

In this study we presented a computerized modeling tool “Soft Regression”, which is a Soft Computing tool based on Fuzzy logic, of earnings (Operating Income Margin) of companies characterized as manufacturing industries (SIC code 2000–3999). We used several categories of financial ratios as explanatory variables and included several financial ratios from each category as possible proxy variables to represent their relevant category.

The main conclusions are:

All the categories of the financial ratios included in this study have been validated.

All the proxy variables selected from the four main categories of financial ratios came out either fully significant or partially significant. No variables came out insignificant.

The financial ratios category “Profitability” came out as the most important category (having the highest values of Relimp), followed by “Solvency”, and then by much weaker (but still significant) category “Market Ratios”. The last, and much weaker category came out “Liquidity and Efficiency”, characterized by only partially significant S_{Y,X_j} measures.

Comparing results of 2012 to 2016 leads to a conclusion that the model is stable and consistent over the years. Similar conclusion can be reached by comparing the S_{Y,X_j} results for all the five years 2012 –2016.

Incorporating different explanatory variables from the various categories of financial ratios led to similar and consistent results thus implying high degree of robustness of the model.

Very high scores of $S_{Y, X_{1}, . . ., X_{m}}^{Comb}$ measurements for all the regression runs are indicative of very high explanatory power of the model. It is indicative of a successful selection of explanatory variables combined with the appropriate modeling technique.

A combination of stability, consistency, robustness and a strong explanatory power are all important indicators of a model reliability.

The main contribution of this study is to demonstrate effectiveness of soft computing modeling tool based on fuzzy logic. The resulting model is robust, consistent, stable, and thus very reliable. It validates relevant financial ratios and determines their relative importance, which is very critical information for the success of financial investments.

The logical follow-up for future research is to incorporate the method presented here into decision support system for financial investments. This will require to integrate several additional soft computing/fuzzy logic technologies.

Footnotes

Appendix 1 –All accounting descriptors examined in the first phase of analysis

	Accounting Descriptors
1	Account Receivable Turnover
2	Current Ratio
3	Quick Ratio
4	Inventory Turnover
5	Total Debt To Equity
6	ROA
7	ROE
8	Gross Profit Margin
9	Days sales in Accounting Recv.
10	Inventory to total assets
11	Depreciation over Plant
12	Long-Term Debt/Equity
13	Equity/Fixed assets
14	Times Interest Earned
15	Sales/Total Assets
16	Pre-taxes income/Sales
17	Net Profit Margin
18	Sales to total cash
19	Sales to total Inventory
20	Sales to total working capital
21	Sales to Fixed assets
22	Working capital to total assets
23	Operating Income to Total assets
24	EBITDA Margin Ratio
25	Cash From Operations (CFO) to Total Debt
26	Payment Of Dividends as % of OCF
27	Net Income over OCF
28	ΔDepreciation (&Amortization), IS
29	Δinventory
30	ΔResearch &Development Expense
31	ΔTotal Assets
32	ΔTotal Long-Term Debt
33	ΔTotal Revenue
34	ΔCurrent Ratio
35	ΔQuick Ratio
36	ΔInventory Turnover
37	ΔDividends per share
38	ΔTotal Debt To Equity
39	ΔROE
40	ΔGross Profit Margin
41	ΔWorking capital
42	ΔDays sales in Accounting Recv.
43	ΔInventory to total assets
44	ΔDepreciation over Plant
45	ΔCapital Expenditures/total assets
46	ΔLong-Term Debt/Equity
47	ΔEquity/Fixed assets
48	ΔTimes Interest Earned
49	ΔSales/Total Assets
50	ΔPre-taxes income/Sales
51	ΔNet Profit Margin
52	ΔSales to total Inventory
53	ΔSales to total working capital
54	ΔResearch &Development Expense to Sales
55	ΔWorking capital to total assets
56	ΔOperating Income to Total assets
57	ΔEBITDA Margin Ratio
58	ΔCapital Expenditures/total assets
59	ΔTotal Depreciation
60	ΔTotal Debt

References

Lev

and Gu

, The End of Accounting and the Path Forward for Investors and Managers, John Wiley & Sons (2016).

Chen

, Kung and Shimerda Thomas

, An Empirical Analysis of Useful Financial Ratios, Financ Manag 10 (1981), 51–61. doi: 10.2307/3665113.

Shnaider

and Yosef

, Relative Importance of Explanatory Variable: Traditional Method vs Soft Regression, Int J Intell Syst 33 (2018), 1180–1196.

Tsai

C.F.

and Hsiao

Y.C.

, Combining Multiple Feature Selection Methods for Stock Prediction: Union, Intersection, and Multi-Intersection Approaches, Decis Support Syst 50 (2010), 258–269. doi: 10.1016/j.dss.2010.08.028

Atiya

A.F.

, Bankruptcy Prediction for Credit Risk Using Neural Networks: A Survey and New Results, IEEE Trans Neural Networks 12 (2001), 929–935. doi: 10.1109/72.935101

Shin

K.S.

, Lee

T.S.

and Kim

H.J.

, An Application of Support Vector Machines in Bankruptcy Prediction Model, Expert Syst Appl 28 (2005), 127–135. doi: 10.1016/j.eswa.2004.08.009

J.A.

and Penman

S.H.

, Financial Statement Analysis and the Prediction of Stock Returns, J Account Econ 11 (1989), 295–329. doi: 10.1016/0165-4101(89)90017-7

Holthausen

R.W.

and Larcker

D.F.

, The Prediction of Stock Returns Using Financial Statement Information, J Account Econ 15 (1992), 373–411. doi: 10.1016/0165-4101(92)90025-W

Bernard

, Thomas

and Wahlen

, Accounting-Based Stock Price Anomalies: Separating Market Inefficiencies from Risk, Contemp Account Res 14 (1997), 89–136. doi: 10.1111/j.1911-3846.1997.tb00529.x

10.

Stober

T.L.

, Summary Financial Statement Measures and Analysts’ Forecasts of Earnings, J Account Econ 15 (1992), 347–372. doi: 10.1016/0165-4101(92)90024-V

11.

Setiono

and Strong

, Predicting Stock Returns Using Financial Statement Information, J Bus Financ Account 25 (1998), 631–657. doi: 10.1111/1468-5957.t01-1-00205

12.

Bird

, Gerlach

and Hall

A.D.

, The Prediction of Earnings Movements Using Accounting Data: An Update and Extension of Ou and Penman, J Asset Manag 2 (2001), 180–195. doi: 10.1057/palgrave.jam.2240044

13.

Tsai

C.F.

, Feature Selection in Bankruptcy Prediction, Knowledge-Based Syst 22 (2009), 120–127. doi: 10.1016/j.knosys.2008.08.002

14.

Min

J.H.

and Lee

Y.C.

, Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters, Expert Syst Appl 28 (2005), 603–614. doi: 10.1016/j.eswa.2004.12.008

15.

Chandwani

and Saluja

M.S.

, Stock Direction Forecasting Techniques: An Empirical Study Combining Machine Learning System with Market Indicators in the Indian Context, Int J Comput Appl 92 (2014), 8–17. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.685.7842&rep=rep1&type=pdf.

16.

Niu

, Chen

and Xu

, Twin support vector regression with Huber loss, J Intell Fuzzy Syst 32 (2017), 4247–4258. doi: 10.3233/JIFS-16629

17.

Kandel

, Last

and Bunke

, Data Mining and Computational Intelligence, Physica-Verlag Publishing (2001).

18.

Yosef

, Haruvy

and Shnaider

, Soft Regression vs Linear Regression, Pioneer J Theor Appl Stat 10 (2015), 31–46.

19.

Yosef

and Shnaider

, On Measuring the Relative Importance of Explanatory Variables in a Soft Regression Method, Adv Appl Stat 50 (2017), 201–228.

20.

D’Souza

J.M.

, Ramesh

and Shen

, The Interdependence Between Institutional Ownership and Information Dissemination by Data Aggregators, Account Rev 85 (2010), 159–193. doi: 10.2308/accr.2010.85.1.159

21.

Baranes

and Palas

, The Prediction of Earnings Movements Using Accounting Data: Using XBRL, Int J Account Res 04 (2017), 1–7. doi: 10.4172/2472-114X.1000143

22.

Miguel

J.G.S.

, The Reliability of R&D Data in COMPUSAT and 10-K Reports, Account Rev 52 (1977), 638–641. http://links.jstor.org/sici?sici=0001-4826(197707)52:3%3C638:TRORDI%3E2.0.CO;2-#.

23.

Kinney

M.R.

and Swanson

E.P.

, The Accuracy and Adequacy of Tax Data in COMPUSTAT., J Am Tax Assoc 15 (1993), 121. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=6147470&site=ehost-live.

24.

Tallapally

, Luehlfing

M.S.

and Motha

, The Partnership Of EDGAR Online And XBRL - Should Compustat Care? Rev Bus Inf Syst 15 (2011), 39–46. http://search.proquest.com/docview/900720360?accountid=11262.

25.

Rosenberg

and Houglet

, Error Rates IN CRSP and COMPUSTAT Data Bases and Their Implications, J Finance 29 (1974), 1303–1310. doi: 10.1111/j.1540-6261.1974.tb03107.x

26.

Yang

D.C.

, Vasarhelyi

M.a.

and Liu

, A Note on the Using of Accounting Databases, Ind Manag Data Syst 103 (2003), 204–210. doi: 10.1108/02635570310465689

27.

Pinches

G.E.

, Eubank

A.A.

, Mingo

K.A.

and Caruthers

J.K.

, The Hierarchical Classification of Financiul Ratios, J Bus Res 3 (1975), 295–310.

28.

Harrison

W.T.

, Horngern

C.T.

, Thomas

W.C.

and Suwardy

, Financial Accounting - International Financial Reporting Standards, Eighth Edi, Pearson Education South Asia, Singapore, (2011).

29.

Ball

and Brown

, An Empirical Evaluation of Accounting Income Numbers, J Account Res 6 (1968), 159–178. doi: 10.2307/2490232

30.

Ball

, The Earnings-Price Anomaly, J Account Econ 15 (1992), 319–345. doi: 10.1016/0165-4101(92)90023-U

31.

Beaver

, Lambert

and Morse

, The Information Content of Security Prices, J Account Econ 2 (1980), 3–28. doi: 10.1016/0165-4101(80)90013-0

32.

J.a

and Penman

S.H.

, Accounting Measurement, Price-Earnings Ratio, and the Information Content of Security Prices, J Account Res 27 (1989), 111–144. doi: 10.2307/2491068

33.

Zadeh

, Fuzzy Sets, Inf Control 8 (1965), 338–353.