Predicting Financial Health of Banks for Investor Guidance Using Machine Learning Algorithms

Abstract

While earlier studies have focused excessively on bankruptcy prediction of banks, this study classifies banks based on their financial strength from the perspective of retail depositors who currently do not have an authentic guiding framework that helps them identify banks with higher risk profiles. Using machine learning techniques, we classify 44 Indian banks into distinct categories of financial health based on 12-year data from 2005 to 2017. We first use unsupervised learning to identify a pattern leading to logical groups in terms of financial health and then move to supervised learning for prediction. Using linear discriminant analysis (LDA), Classification and Regression Tree (CART) and Random Forest methods, we predict the cluster membership with the associated explanatory power alongside. We also compare our classification with the credit ratings awarded by rating agencies and highlight certain discrepancies that exist between what is predicted by our models and the credit rating awards.

JEL Codes: C53; M10

Keywords

Emerging markets financial inclusion government policy and regulation market efficiency

1. Introduction

Collapse of banks caused by mismanagement impacts deposit holders, investors, employees and has a negative bearing on the economy as a whole (Huang, Chang, & Liu, 2012). Bank failure could trigger an adverse financial repercussion and generate negative impacts such as a massive bail-out cost for the failing bank and loss of confidence from the investors and depositors (Tung, Quek, & Cheng, 2004). Studies have shown that the growth foregone by the economy due to an overhang of non-performing loans can be in excess of two percentage points annually (Balgova et al., 2016). Weak institutions lacking competence during periods of credit expansion rapidly increases non-performing loans and can deplete bank capital, unless resolved on a timely manner, underpinning the importance of financial supervision (Steigum, 2011). Establishing an early warning system and timely intervention by the central bank can safeguard deposit holders, investors and the economy.

The Reserve Bank of India (RBI) has highlighted the deteriorating stability of Indian banks primarily due to poor capital adequacy, non-performing advances, profitability, liquidity and efficiency (RBI Financial Stability Report, 2018). Non-performing assets are loans which are in default or arrears where the bank has not received interest or principal due for more than 90 days. The stress in the Indian banking sector continues with problems of deteriorating asset quality and declining profitability, exerting pressure on the bank’s regulatory capital ratios (Kadanda & Raj, 2018). Indian banks fall short of the capital adequacy norms stipulated under the global capital risk norms (Basel III). The deteriorating non-performing assets (NPA) position warrants a higher level of capital infusion that the banks are unable to raise from the market; it has been challenging for the government to infuse such a requisition of additional capital (Economic Times, 2018a). This in effect means most of the Indian banks are undercapitalised, exposing the depositors and regulators to a higher level of risk.

India has been lacking a robust risk-based supervisory framework for the monitoring of the scheduled commercial banks. Although oversight has evolved recently, large magnitude frauds like the Punjab National Bank scam of ₹114 billion continue to be reported in the Indian banking system, which deepens the trust deficit between the banks and deposit holders (Firstpost, 2017). Depositor insurance in India is governed by the Deposit Insurance and Credit Guarantee Corporation (DICGC) Act, 1961, which insures deposit holders in the event of a bank failure. In comparison to the developed and developing economies, insured limit of deposits needs to increase at least by 15-fold to cover at least 90 per cent of the fixed deposits in an Indian context (Economic Times, 2018b). Especially given that ranking scheduled commercial banks in terms of risk and stability have been lacking, the interest of deposit holders is not fully secured and bank failures are a major concern for deposit holders. The recent restrictions imposed by the Reserve Bank of India (RBI) in terms of withdrawal caps on deposit holders of Punjab and Maharashtra Cooperative Bank (PMB), one of the top fibre cooperative banks in India, following the ₹65 billion scam, has resulted in a public outcry, further denting the trust deficit (Firstpost, 2019).

Credit rating agencies provide guidance to the deposit holders, to a certain extent. However, virtually every government inquiry into the 2008–2009 global financial crisis has assigned some blame to credit rating agencies for assigning inaccurate ratings and errors (Kashyap & Kovrijnykh, 2015). Recently, India’s credit rating agencies have come under severe scrutiny over their sudden downgrading of one of the country’s largest non-banking financial institution, Infrastructure Finance and Leasing Corporation (IL&FS). In India, the ‘issuer-pays’ model is popular where the rating agencies charge the banks that issue the debt instrument a fee for providing credit rating assessments, resulting in conflict of interest. Deposit holders are thus vulnerable and are exposed to bank instability and potential defaults, with no agency to provide an effective warning system. Predicting bankruptcy of commercial banks has been extensively studied in literature using a multitude of methodologies (Affes & Kaffel, 2019; Jones, 2017; Kolari & Sanz, 2017; Migliore & Chinta, 2017). However, there is a lack of studies that propose robust rating mechanisms that could identify unstable banks, thereby presenting an authentic guiding framework from the retail depositor’s perspective. This is a clear research gap this study has identified.

Our study contributes primarily in three areas. First, while earlier studies focused excessively on bankruptcy prediction, our study’s key objective is to classify banks based on their financial strength from the perspective of retail depositors who currently do not have an authentic guiding framework that helps them to identify banks with higher level of risk profile. Our study classifies banks into three risk categories—high, medium and low—facilitating the retail investor to make informed decisions when it comes to choosing a commercial bank for placing their savings as deposits. Our study includes all 44 Indian public and private sector banks, excluding foreign banks in operation, during the 12-year period from 2005 to 2017. Second, our study uses a more refined methodology, employing machine learning technique for classifying the Indian scheduled commercial banks into three distinct risk categories. This study first uses clustering, an unsupervised learning technique, and then moves to supervised learning involving linear discriminant analysis (LDA), classification and regression trees (CART) and Random Forest. Third, this study compares our classification of banks into high, medium and low financial risk categories with the credit ratings awarded by the top Indian rating agencies to the deposits issued by the banks and demonstrates disparities in the credit rating awards. Rating agencies could immensely benefit from such a framework.

This article is organised as follows: the second section sets out the literature review. The third section describes the data, and the fourth section elucidates the methodology. The fifth section analyses the results, and the sixth section sets out the discussions, conclusions and managerial implications. The seventh section lists out the limitations of the research and scope for further research.

2. Literature Review

The gross non-performing assets of the Indian scheduled commercial banks increased 11.6 per cent and the net non-performing assets to 6.1 per cent of the advances. Stressed advances that include non-performing and restructured advances reached 12.5 per cent of the total advances. The position of the public sector banks (PSBs) is more severe with net non-performing advances at 8.6 per cent and the quantum of stressed advances at an alarming 16.7 per cent. The capital adequacy ratio of Indian scheduled commercial banks plummeted to 13.8 per cent with Tier-I capital adequacy ratio falling to 7.0 per cent. Net interest at 2.7 per cent yielding a return on assets at a negative 0.2 per cent demonstrates deteriorating asset quality (RBI Financial Stability Report, 2018). The recent enactment of the Insolvency and Bankruptcy Code (IBC), 2016, intended for resolving the problem of stress has been ineffective, given the lengthy process and frequent court interventions, defeating the very purpose of time-bound stressed asset resolution (RBI Annual Report, 2018); Out of the 977 cases admitted, only 34 cases have seen successful resolution even after 2 years after the inception (IBC Delays, 2018).

Global banks fall short of regulatory norms, and stress tests used are not robust enough to differentiate between weak and sound banks (Goldstein, 2017). While the overall and Tier-I capital adequacy ratio for the Indian banks are lower than the Basel III prescribed norms and have been deteriorating, the problem for the banks identified under the Prompt Corrective Action (PCA) is severe, which have recorded much below 9 per cent of its risk weighted assets (RBI Financial Stability Report, 2018).

RBI has classified three large banks (SBI, ICICI, HDFC), having assets more than 2 per cent of the country’s Gross Domestic Product (GDP), as ‘too big to fail’ and termed them as ‘domestic systemically important banks (DSIB)’. These institutions are large and highly interconnected, and hence important for the orderly functioning of the entire national financial system and the real economy (Financial Express, 2017). The government guarantees to support these banks in the event of financial stress. However, there are four other large banks not designated as DSIBs (Axis Bank, Bank of Baroda, Bank of India and Punjab National Bank), exposing them and their deposit holders to higher financial risks (Firstpost, 2017) and further signalling that these banks would get a lower priority in government support in times of financial stress; this is again a concern for deposit holders.

Historically, none of the Indian PSBs have failed while more than 35 private banks have been either put under moratorium in public interest due to mismanagement or have gone out of existence in the last 40 years; merger of such weak private banks with PSBs has frequently been resorted to (Newsclick, 2018). The major challenge for the government and RBI has been the effective monitoring of the financial position of the banks, and the need for corrective actions and infusing funds on a timely manner. More importantly, weak bank financials not only steeply increase the retail depositor’s risk exposure but also draw upon the precious taxpayers’ money that eventually weakens the government’s financials. In the recent past, the government and RBI have been strengthening the risk-based supervision of scheduled commercial banks.

India has been lacking a risk-based supervisory framework for the monitoring of the scheduled commercial banks. The RBI introduced a Risk Based Supervision (RBS) framework in 2012 and which was renamed to Supervisory Program for Assessment of Risk and Capital program (SPARC). Such risk-based oversight calls for an ongoing interaction from banks and supervisors, much beyond periodic inspections. Supervision of the banks, especially in ranking them in terms of risk and stability, has been an important role of the regulators.

In spite of the strengthening regulatory oversight framework, large magnitude frauds continue to be reported in the Indian banking system, which deepens the trust deficit of the deposit holders; the ₹114 billion scam at Punjab National Bank, the second largest public sector bank in the country, has shown the Indian banking system in poor light in the eyes of the deposit holders (Livemint, 2018b). Many lenders including Axis bank, Bank of India, ICICI Bank and Yes Bank have under-reported non-performing advances, and this has been continuing for many years (Times of India, 2018). Recently, it has come to light that PMB has under-reported its non-performing assets to the tune of 73 per cent (Firstpost, 2019).

Explicit, limited, deposit insurance schemes, providing protection from loss up to some limited amount per depositor, have become the norm around the globe (Davis, 2020). In the USA, the Federal Deposit Insurance Corporation (FDIC) uses CAMELS data to develop a rating system to rank banks on their stability criteria (Affes & Kaffel, 2019). FDIC insures the deposit holders to a limited extent in the event of bank failures. FDIC, through its rating system, provides an early warning to investors, in order to draw their attention to those banks that have a great default probability ratio. Depositor insurance in India is governed by the DICGC Act, 1961, which insures deposit holders in the event of a bank failure. The total amount payable by DICGC in such an event shall not exceed ₹100,000 per depositor. In comparison to the developed and developing economies, insured limit of deposits needs to increase at least by 15-fold to cover at least 90 per cent of the fixed deposits in an Indian context (Economic Times, 2018b). Several regulations like the Financial Resolution and Deposit Insurance (FRDI) Bill, 2017, have been attempted by the government to address the depositor interest, but such regulations have failed to take off. The interest of the deposit holders is not fully secured, and bank failures are a major concern for deposit holders when they choose the banks for placing their deposits. Credit rating agencies provide guidance to the deposit holders to a certain extent.

However, virtually, every government inquiry into the 2008–2009 financial crisis has assigned some blame to credit rating agencies. Kashyap and Kovrijnykh’s (2013) study shows that inaccurate AAA credit ratings introduced risk into the US financial system and further show that rating errors are larger when the firm orders it than when investors do. Recently, India’s credit rating agencies have come under severe scrutiny over their sudden downgrading of one of the country’s largest non-banking financial institution—IL&FS. The IL&FS group, comprising over 100 companies, accounted for more than 2 per cent of all Commercial Paper issued in the country, 1 per cent of all Corporate Debentures and nearly 1 per cent of all banking system loans outstanding in the country. IL&FS’ aggregate group borrowings reached US$13 billion (Livemint, 2018a). It is pertinent to note that in the financial year 2017–2018, these borrowings raised steeply, by over 44 per cent over the last year. Until August 2018, two of India’s top credit rating agencies ICRA limited and Credit Analysis & Research (CARE) continued to award highest possible credit ratings of AAA or AA to both the long-term and the short-term borrowings, even as one of the IL&FS subsidiaries did default on its debt obligations. However, in September 2018, it came to light that IL&FS defaulted on multiple credit obligations after which the agencies drastically downgraded IL&FS rating to junk grades of D rating (The Wire, 2018). Top credit rating agencies did not comprehend the disaster as it was happening around them, which precisely is their responsibility; safeguarding the interests of the investors by setting out the financial health of the deposit issuer. Within 7 months, 50 per cent of the IL&FS group assets have been put on the block; so serious has been the inability of the rating agencies to forewarn the deteriorating financial situation of the IL&FS group (The Hindu, 2019).

Researchers attribute the failure of rating agencies to an incentive problem. Since the credit rating agencies are paid by the issuers, their interests are more likely to be aligned with the issuers than with the investors (Pagano & Volpin, 2010). In India, the ‘issuer-pays’ model is popular where the rating agencies charge the banks that issue the debt instrument, creating a conflict of interest (Livemint, 2018). Given such conflict of interests, researchers recommend that rating agencies be paid by investors rather than issuers (Pagano & Volpin, 2010). Since credit rating business is reputation based, investors demand more reliable disclosures and qualitative information from debt issuers when rating agencies lose reputation, and the ratings become less credible. Sharing more information can thus reduce information asymmetry and adverse selection in lending markets (Sethuraman, 2018). However, such additional disclosure comes at higher costs that can further worsen the financials of the banks issuing the deposits. Recent studies have shown that using machine learning techniques for establishing credit ratings can provide promising results and could be well designed to predict credit rating changes (Hsu, Chen, & Chen, 2018).

2.1 Research Gaps

2.1.1 Research Gap 1

Predicting bankruptcy has been extensively studied in literature (Abdou, Abd Allah, Mulkeen, Ntim, & Wang, 2017; Affes & Kaffel, 2019; Altman, 1968; Ekinci & Erdal, 2017; Halteh, Kumar, & Gepp, 2018; Kolari & Sanz, 2017; Tung et al., 2004; Jones, 2017). While earlier studies focused excessively on bankruptcy prediction, a lack of studies to classify banks based on their financial strength from the perspective of retail depositors, especially to facilitate the retail investor to make informed decisions in choosing a commercial bank for placing their savings as deposits, this is the first research gap this study identified and proposes to address. This article proposes a risk classification that will facilitate the retail investor to make wise decisions when it comes to choosing a commercial bank, among the private and public banks in India, for placing his investments as deposits.

2.1.2 Research Gap 2

A plethora of methodologies have been used to predict bankruptcy; multivariate discriminant analysis and Logistic regression (Jones, 2017), decision trees (DTs) and Random Forest (Halteh et al., 2018), CART and multivariate adaptive regression splines (MARS) (Affes & Kaffel, 2019). Techniques like Generic Self-organising Fuzzy Neural Network have been used (Tung et al., 2004). Non-parametric techniques of trait recognition combined with neural network methodology of self-organising maps have also been used (Kolari & Sanz, 2017). Recently, machine learning (Migliore & Chinta, 2017), ensemble learning models (random subspaces) and hybrid ensemble learning models (bagging and multi-boosting) have been used (Ekinci & Erdal, 2017) along with chi-square Automatic Interaction Detector CHAID, CART, multilayer-perceptron neural networks, discriminant analysis and logistic regression (Abdou et al., 2017).

However, many of the above models use an ‘a-priori’ classification theme based on experience and judgement in a heuristic manner. Such an approach cannot guarantee homogeneity within a cluster in a robust manner and heterogeneity between clusters apart from being highly subjective in terms of grouping. Very often, this will lead to a high variance in a cluster. This is the second research gap this study has identified. The hidden pattern in the data can be unravelled only by an unsupervised learning algorithm. By profiling the pattern, target outcome can be identified. In our case, we have used K-Means Clustering to identify distinct groups based on profiling the pattern. Having unravelled the pattern through unsupervised learning, the prediction of individual banks as to which category it belongs to involves supervised learning algorithms. Thus, we address this research gap by combining unsupervised and supervised learning to ensure homogeneity within a cluster with minimum variance.

2.1.3 Research Gap 3

With poor credit rating awards that do not help deposit holders, this study compares our classification of banks into distinct groups with the credit ratings awarded by the top Indian rating agencies to the deposits issued by the banks and demonstrates disparities in the credit rating awards. This is the third research gap that the study captures.

3. Data

Indian banking sector comprises scheduled commercial banks and cooperative banks. Scheduled commercial banks form the backbone of the Indian banking system comprising around 27 PSBs, 30 private banks, 40 foreign banks and a number of regional rural banks. Data for this article include all 44 public and private sector banks, excluding foreign banks, in operation during the 12-year period from 2005 to 2017. The 44 banks comprising 26 PSBs and 18 private banks include some of the well-known PSBs like State Bank of India and its group banks, Indian Bank, Punjab national bank, Indian Overseas Bank, Bank of India and private banks like ICICI, HDFC, Yes Bank and Kotak Mahindra Bank. The data for this study include financials drawn from the balance sheet and income statements of the banks, as well as key annual financial metrics published by RBI for the period from 2005 to 2017. The study also uses RBI reports and guidelines, press releases and reports from the ministry of finance.

Previous studies have extensively resorted to financial ratios according to the CAMELS system, comprising capital base, asset quality, management soundness, earning capacity, liquidity and sensitivity ratios. Each of these characteristics influences bank failure (Huang et al., 2012). A modified CAMELS is used by RBI and other regulators in an Indian context. Very often, bank failures are due to financial distress (Tung et al., 2004). CAMELS is a five-part banking rating system, applied to large number of banks and credit unions in the USA, used as a supervisory system by various banking regulators, including the Federal Reserve and FDIC (Affes & Kaffel, 2019). Ekinci and Erdal (2016) used 35 financial ratios according to CAMELS. Huang et al. (2012), showed that equity to assets and net interest margin had negative relationship with financial distress and were the best predicted trends in developing economies, part of the South East Asian region and European Union banks. Tung et al. (2004) also model the financial ratios based on CAMELS.

Based on the recommendations of the Padmanabhan Committee, the commercial banks incorporated in India are rated on the ‘CAMELS’ model(Capital adequacy, Asset quality, Management, Earnings, Liquidity, and Systems & control), while foreign banks’ branches operating in India are rated under the ‘CALCS’ model (Capital adequacy, Asset quality, Liquidity, Compliance, and Systems and control).The banking stability indicator (BSI) defined by RBI includes five dimensions to evaluate the risk exposure of scheduled commercial banks—capital adequacy, asset quality, profitability, liquidity and efficiency (RBI Financial Stability Report, 2018).

Based on the comparative study of various models nine key accounting ratios for various public and private sector banks for the past 12 years were selected for the study. These nine ratios in a representative manner adequately capture the essence of key dimensions from a deposit holder perspective that includes credit risk, capital adequacy, profitability, efficiency and liquidity. This data set will form the basis of the statistical analysis and model-building exercise for deriving insights. The ratios along with their definition are detailed in Table 1.

Given that our study is conducted from the point of view of deposit holders in banks, other secondary data resources have been referred to, which includes analyst research reports, business dailies and other eminent subject matter experts in the banking field. The time effect on data is not considered for this study as the ratios are used for the modelling purpose.

Table 1.
Financial Ratios and Definitions

Measure Short Name Definition of the Metric

Credit–deposit ratio CDR This ratio indicates the funds lent out of the total amount raised through deposits

Ratio of net interest income to total assets (net interest margin) NITA Interest income−interest expenses/avg. earning assets

Return on assets ROA (Profit after tax/average total assets) ₹100

Capital adequacy ratio CAR (Total capital/total risk weighted assets) ₹100

Capital adequacy ratio—Tier I CART I (Tier 1 capital/total risk weighted assets) ₹100

Capital adequacy ratio—Tier II CART II (Tier 2 capital/total risk weighted assets) ₹100

Ratio of net NPA to net advances NNPATA (NET NPA/total advances) ₹100

Liquidity asset to total asset ratio LATA Liquidity assets/total asset

Gross NPAs to gross advances ratio (%) GNPATA (Gross NPA/total advances) ₹100

Measure	Short Name	Definition of the Metric
Credit–deposit ratio	CDR	This ratio indicates the funds lent out of the total amount raised through deposits
Ratio of net interest income to total assets (net interest margin)	NITA	Interest income−interest expenses/avg. earning assets
Return on assets	ROA	(Profit after tax/average total assets) ₹100
Capital adequacy ratio	CAR	(Total capital/total risk weighted assets) ₹100
Capital adequacy ratio—Tier I	CART I	(Tier 1 capital/total risk weighted assets) ₹100
Capital adequacy ratio—Tier II	CART II	(Tier 2 capital/total risk weighted assets) ₹100
Ratio of net NPA to net advances	NNPATA	(NET NPA/total advances) ₹100
Liquidity asset to total asset ratio	LATA	Liquidity assets/total asset
Gross NPAs to gross advances ratio (%)	GNPATA	(Gross NPA/total advances) ₹100

Source: The authors.

4. Methodology

The objective of this study is to classify the 44 banks into specific risk categories in terms of financial health from the perspective of protecting the interest of retail deposit holders. In order to achieve this, the study first uses unsupervised machine learning to identify patterns and then move towards supervised learning. As the data involve nine financial ratios of several diverse banks indicating high heterogeneity, clustering is done so that homogeneity within a cluster and heterogeneity between clusters could be achieved. It is pertinent to point out that clustering into groups could have been made using experience and judgement in a heuristic manner. Such an approach cannot assure homogeneity n a robust manner within a cluster and heterogeneity between clusters. Very often, this will lead to a high variance in a cluster; our methodology overcomes this shortcoming.

First, our study uses unsupervised machine learning algorithm, namely K-Means clustering, to ensure homogeneity within a cluster with minimum variance, leading to identification of three distinct clusters. Profiling of the clusters into high-, medium- and low-risk categories provide insights, which is a key objective in unsupervised learning. We then move from unsupervised learning to supervised learning. We then confirm the three clusters in terms of predictor variables by actually classifying each record into one of the three clusters. This would pave the way for classifying and predicting a new record, given the values of the predictor variables. Correlation study is then carried out with a view to minimising the effect of multicollinearity and reducing the number of variables. The relative importance of the variables in terms of ranks is inferred from the correlation ratios.

Second, the study uses supervised machine learning method, namely LDA where we first predict the clusters in terms of explanatory power. Using Jackknife validation procedure, which drops one record at a time, builds the model with the remaining and then predicts the outcome of the omitted records, a confusion matrix is drawn upon to cross-validate the predictive accuracy. This is further cross-validated using Fischer’s discriminant function and Mahalanobis classification procedure.

Third, by splitting the data set into training data (70%) and testing data (30%), we use CART and Random Forest to build our model on the training data. As part of classification of data to different clusters, alternate approaches are explored for testing the classification performance; we use RPA which is associated with the term DT for classification and prediction. RPA creates a DT that aims to correctly classify members of the population based on several dichotomous dependent variables. We then use Random Forest technique, which is an ensemble learning method for classification, and regression. It operates by constructing a multitude of DTs using training data. Using CART and Random Forest, we then predict the clusters in terms of explanatory power as well as predictive accuracy that is critical for capital protection of deposit holders. We then establish the accuracy of our predictive model.

Fourth, we compare our classification of banks that are high, medium and low in terms of financial health with the credit ratings awarded by top Indian rating agencies to the deposits of respective banks, highlighting the disparities in the credit rating awards.

5. Analysis and Results

5.1 Unsupervised Learning

Basic exploratory analysis is carried out after necessary data-cleaning procedures. Descriptive statistics of the sample is in Table 2. Three clusters have been identified from the K-Means clustering algorithm that is unsupervised in nature. Banks with low ratio of gross NPA to gross advances (GNPATA), high return on assets (ROA), high ratio of net interest income to total assets (Net Interest Margin) (NIITA), high current asset ratio (CAR) and high cash deposit ratio (CDR) are banks with relatively high financial health; cluster 3 can, hence, be profiled as banks with ‘High’ financial health, followed by cluster 1 classified as banks with ‘Medium’ financial health and cluster 2 as banks with ‘Low’ financial health.

Table 2.
Descriptive Statistics

Descriptive Statistics CDR NITA ROA CAR CART1 CART2 NNPA TA LATA GNPA TA

Minimum 42.39 0.23 0.00 7.51 4.88 0.17 0.00 3.57 0.00

1st Quartile 67.29 2.29 0.52 11.57 7.93 2.17 0.64 6.26 1.71

Median 71.88 2.68 0.88 12.63 9.03 3.17 1.22 7.78 2.82

Mean 73.05 2.72 0.88 13.22 9.95 3.27 1.97 8.32 3.84

3rd Quartile 77.39 3.12 1.26 13.91 11.20 4.42 2.35 9.55 4.73

Maximum 300.70 5.62 2.13 56.41 55.93 8.80 16.89 32.77 25.68

Descriptive Statistics	CDR	NITA	ROA	CAR	CART1	CART2	NNPA TA	LATA	GNPA TA
Minimum	42.39	0.23	0.00	7.51	4.88	0.17	0.00	3.57	0.00
1st Quartile	67.29	2.29	0.52	11.57	7.93	2.17	0.64	6.26	1.71
Median	71.88	2.68	0.88	12.63	9.03	3.17	1.22	7.78	2.82
Mean	73.05	2.72	0.88	13.22	9.95	3.27	1.97	8.32	3.84
3rd Quartile	77.39	3.12	1.26	13.91	11.20	4.42	2.35	9.55	4.73
Maximum	300.70	5.62	2.13	56.41	55.93	8.80	16.89	32.77	25.68

Source: The authors.

Further to the clustering analysis, the characteristics of each cluster are studied with an objective to profile such clusters. Figure 1 sets out the bar plots of the input variables. These are compared to arrive at profiling the clusters.

Figure 1.

Source: The authors.

Table 3 sets out the mean values of the input variables for the three clusters. From the comparison of the means of the three clusters for all the nine input variables, it is evident that cluster 3 has the lowest GNPATA and ratio of net NPA to net advances (NNPATA) followed by clusters 1 and 2. Cluster 3 has high return on assets (ROA) followed by clusters 1 and 2. Cluster 3 has high NITA, followed by clusters 1 and 2. Cluster 3 has high CAR and CAR, followed by clusters 1 and 2. Cluster 3 has high credit-deposit ratio (CDR) followed by clusters 1 and 2.

Before proceeding with the model-building exercise, correlation study is carried out to minimise the effect of multicollinearity. Results of exploratory data analysis are in Table 4.

From the correlation analysis, it is evident that CAR is highly correlated to CAR I (>90%), and GNPATA is highly correlated to NNPATA (>90%). As these are derived financial ratios, we consider all the ratios except CAR and NNPATA. Hence, seven financial ratios are considered for the linear discriminant analysis.

5.2 Linear Discriminant Analysis

We next move to model building for predicting group memberships to clusters. LDA, which is a proven approach, especially when the group variables are continuous, is used. LDA is a statistical technique used to classify an observation into two or more a priori groups. It is also known by the expression Multiple Discriminant Analysis (MDA). For example, in the case of bankruptcy prediction, there are two predefined groups—bankrupt and non-bankrupt firms. Classification is accomplished by a discriminant function, which is a linear combination of independent variables (Fisher, 1936). A cut-off score is derived to determine group classification for each observation. Confusion matrix that shows the misclassification error, a measure of predictive accuracy.

Table 3.
Mean Values of the Input Variables for the Three Clusters

Cluster CDR NITA ROA CAR CART I CART II NNPATA LATA GNPATA

1 72.76 2.55 0.79 12.46 8.53 3.92 1.63 8.13 3.24

2 65.07 2.12 0.06 11.11 8.53 2.59 7.61 10.24 12.49

3 76.69 3.37 1.40 15.91 14.08 1.84 0.79 8.11 2.25

Cluster	CDR	NITA	ROA	CAR	CART I	CART II	NNPATA	LATA	GNPATA
1	72.76	2.55	0.79	12.46	8.53	3.92	1.63	8.13	3.24
2	65.07	2.12	0.06	11.11	8.53	2.59	7.61	10.24	12.49
3	76.69	3.37	1.40	15.91	14.08	1.84	0.79	8.11	2.25

Source: The authors.

Table 4.

Correlation Matrix

Variables	CDR	NITA	ROA	CAR	CART1	CART2	NNPATA	LATA	GNPA TA
CDR	1.0000	–0.0916	0.1823	0.1638	0.1553	–0.0288	–0.1749	–0.2896	–0.2429
NITA	–0.0916	1.0000	0.5736	0.3439	0.3770	–0.1914	–0.3219	–0.0037	–0.2349
ROA	0.1823	0.5736	1.0000	0.3848	0.3441	–0.0140	–0.6480	–0.0447	–0.5892
CAR	0.1638	0.3439	0.3848	1.0000	0.9224	–0.1092	–0.2547	0.2357	–0.2005
CART1	0.1553	0.3770	0.3441	0.9224	1.0000	–0.4847	–0.1786	0.1985	–0.1322
CART2	–0.0288	–0.1914	–0.0140	–0.1092	–0.4847	1.0000	–0.1171	0.0231	–0.1138
NNPATA	–0.1749	–0.3219	–0.6480	–0.2547	–0.1786	–0.1171	1.0000	0.1121	0.9343
LATA	–0.2896	–0.0037	–0.0447	0.2357	0.1985	0.0231	0.1121	1.0000	0.1933
GNPATA	–0.2429	–0.2349	–0.5892	–0.2005	–0.1322	–0.1138	0.9343	0.1933	1.0000

Source: The authors.

After profiling of three clusters, one of the key requirements of the LDA is differentiation and for three clusters two discriminant functions are identified. Using the Fisher discriminant function, Table 5 sets out the output of the analysis.

We infer that the first discriminant function accounts for 82.5 per cent of information content, and the second discriminant function accounts for the balance 17.5 per cent. As seen in Table 6 (input variables and discriminant function), CART II, ROA followed by CART I are highly correlated with the two discriminant functions, indicating the relative importance of these three in terms of separating the three clusters with regard to financial health.

We then move to multivariate analysis of variance (MANOVA) and test of Hypothesis. The initial hypotheses can be stated as follows: H0: The mean of all the predictor variables are zero in other words µ₁ = µ₂ = µ₃ = µ₄ = µ₅ = µ₆ = µ₇ across all the three clusters with 3 × 7 matrix. H1: Alternate hypothesis is at least one pair µ_i–µ_j is not equal to zero across the three clusters. Table 6 also sets out the results of MANOVA and Wilks Lamda test. All the seven independent variables are statistically significant as ( p-value is less than 0.005) in sharply differentiating the clusters namely High, Medium and Low financial health. The relative importance in terms of ranks could be inferred from correlation ratios displayed. The correlation ratio is a measure of strength of the independent variable in assessing the financial health of the clusters. The ranking of the input variables is also listed as a part of Table 6.

Table 5.

Discriminant Functions

df	Value	Proportion	Accumulated
df1	14.23	82.5	82.5
df2	3.02	17.5	100

Source: The authors.

For each record in the data set, scores are computed for all the three groups, using the above three classification functions. The largest of the scores determine the group membership. Table 7 also sets out a classification confusion matrix. The classification based on confusion matrix is showing accuracy of 95.5 per cent, indicating good fit of the model based on the training data. As this may do well for training data to avoid the over-fitting for the test data, Jackknife cross-validation was carried out to check the veracity of the model. The Jackknife validation procedure drops one record at a time, builds the model with the remaining and then predicts the outcome of the omitted records. The Jackknife cross-validation results interestingly compare with the confusion matrix of training data, providing very high accuracy rate (95.36%).

Table 6.

Discriminant Function, Correlations, Manova, Wilks Lamda and Mahalanobis Classification

Variables	Input variables and Discriminant function		Correlation between discriminant function and input variables		Results of Manova and Wilks Lamda test					Mahalanobis Classification function
	df1	df2	df1	df2	Cor_ratio	wilks_Lamda	F_Statistic	P_values	Rank	High	Med	Low
Constant	0.0742	–2.4359	0.2320	–0.0052						–55.910	–43.905	–56.725
CDR	–0.0013	0.0049	0.4728	–0.3893	0.0414	0.9586	12.5150	0.0000	6	0.525	0.513	0.521
NITA	0.4531	0.1853	0.7370	–0.2242	0.3428	0.6572	151.2450	0.0000	4	12.345	10.507	8.724
ROA	0.4135	1.4884	0.2940	–0.5913	0.5159	0.4841	309.0130	0.0000	2	4.115	–2.004	–2.850
CART I	0.0413	0.0743	0.0064	0.7521	0.3534	0.6466	158.4920	0.0000	3	0.726	0.365	0.237
CART II	–0.0061	–0.5063	–0.2221	–0.1339	0.3382	0.6618	148.2250	0.0000	5	0.658	2.379	2.105
LATA	–0.0431	0.0006	–0.9577	–0.4325	0.0350	0.965	10.5060	0.0000	7	0.985	1.098	1.279
GNPATA	–0.4172	–0.3069	0.2320	–0.0052	0.6238	0.3762	480.9280	0.0000	1	0.869	0.953	2.877

Source: The authors.

Table 7.

Confusion Matrix

Confusion Matrix				Confusion Matrix Jackknife
Predicted				Predicted
Original	High	Low	Medium	Original	High	Low	Medium
High	136	0	13	High	136	0	13
Low	0	44	10	Low	0	44	10
Medium	3	0	377	Medium	4	0	376

Source: The authors.

Note: Error rate: 0.0445.

Results of the test data classifying banks into ‘high’, ‘medium’ and ‘low’, using Mahalanobis classification for 2017 are presented in Tables 8–10.

As we may find, all the banks which are classified as ‘High’, financial health is primarily from private sector, and it resonates with popular perception and published data. Popular banks like State Bank of India (SBI) and Indian bank fall under banks with medium financial health, and the list has a mix of public sector undertaking (PSU) and private sector banks. All the state bank group banks, Indian Overseas Bank, Punjab National Bank, Bank of Baroda and Corporation Bank fall under ‘Low’ financial health.

The report published by the RBI in August 2017 showed that six banks were identified for their overall performance with key focus on bad loans and has initiated PCA, which bars them from expanding their banking business such as opening branches, hiring or paying higher rates on deposits. The banks are Dena Bank, Central Bank of India, IDBI Bank, Indian Overseas Bank, Bank of Maharashtra and United Commercial Bank (UCO) Bank. Apart from Canara and Union banks, the other PSBs that might come under PCA are Andhra Bank, Punjab National Bank and Punjab & Sind Bank. These banks appear in the list of banks that are classified as ‘Low’ financial health banks, in our study (Table 10).

Table 8.

List of Banks with ‘High’ Financial Health

								Mahalanobis Classification Function
Bank	CDR	NITA	ROA	CART I	CART II	LATA	GNPATA	High	Medium	Low	Maximum score	Health
City Union Bank Limited	79.14	3.57	1.5	15.35	0.48	8.16	2.83	57.8935	49.6452	34.6389	57.89349	High
DCB Bank Limited	82	3.69	0.93	11.87	1.89	4.96	1.59	52.6911	50.895	33.2843	52.69108	High
Federal Bank Ltd	75.09	2.91	0.84	11.81	0.58	6.48	2.33	40.3004	38.5725	24.4353	40.30036	High
HDFC Bank Ltd.	86.16	4.13	1.88	12.79	1.76	5.67	1.04	64.9829	55.9982	35.8255	64.98288	High
ICICI Bank Limited	94.73	2.91	1.35	14.36	3.03	5.25	8.74	60.5278	59.1398	55.8372	60.52782	High
IndusInd Bank Ltd	89.34	3.77	1.86	14.72	0.59	10.43	0.93	67.3545	56.9331	38.1578	67.35447	High
Karur Vysya Bank Ltd	76.18	3.43	1	11.85	0.69	6.2	3.58	48.8577	45.4294	32.5565	48.85767	High
Kotak Mahindra Bank Ltd.	86.44	3.99	1.73	15.9	0.87	10.7	2.59	70.8116	61.0352	44.9733	70.81156	High
Tamilnad Mercantile Bank Ltd	68.26	3.19	0.86	13.27	0.75	9.92	2.91	45.2169	43.1508	29.9658	45.21686	High
Yes Bank Ltd.	92.57	3.05	1.81	13.3	3.7	8.7	1.52	59.7503	56.6424	39.3699	59.75032	High

Source: The authors.

Table 9.

List of Banks with ‘Medium’ Financial Health

Bank	CDR	NITA	ROA	CART I	CART II	LATA	GNPATA	High	Medium	Low	Maximum score	Health
State Bank of India	76.83	2.44	0.41	10.35	2.76	6.36	6.9	37.868	44.263	39.706	44.263	Medium
Indian Bank	69.97	2.44	0.67	12.2	1.44	4.6	7.47	34.517	36.321	32.385	36.321	Medium
Syndicate Bank	76.63	2.07	0.12	9.26	2.77	8.44	8.5	34.587	44.229	44.179	44.229	Medium
Vijaya Bank	71.08	2.34	0.49	9.96	2.77	3.83	6.59	30.808	36.827	31.339	36.827	Medium
Axis Bank Limited	90.03	3.17	0.65	11.87	3.08	8.36	5.21	56.575	60.092	50.963	60.092	Medium
Catholic Syrian Bank Ltd	54.45	1.97	0.01	11.54	0.61	7.47	7.25	19.445	25.455	23.219	25.455	Medium
Karnataka Bank Ltd	65.22	2.47	0.74	12.21	1.09	6.81	4.21	31.839	32.582	22.719	32.582	Medium
Lakshmi Vilas Bank Ltd	77.66	2.45	0.83	8.75	1.63	13.46	2.67	41.517	44.402	33.145	44.402	Medium
RBL Bank Limited	85.14	2.78	1.08	11.39	2.33	9.55	1.2	47.823	48.165	32.105	48.165	Medium
South Indian Bank Ltd	70.16	2.43	0.57	10.88	1.49	11.1	2.45	35.234	38.536	26.385	38.536	Medium

Source: The authors.

Table 10.

List of Banks with ‘Low’ Financial Health

Bank	CDR	NITA	ROA	CART I	CART II	LATA	GNPATA	High	Medium	Low	Maximum score	Health
State Bank of Bikaner and Jaipur								32.04	49.07	63.34	63.34	Low
State Bank of Hyderabad	55.94	2.32	–1.55	9.22	2.50	17.64	20.77	39.46	60.73	86.83	86.83	Low
State Bank of Mysore	43.93	2.18	–2.29	8.09	4.32	26.78	25.68	42.04	73.22	110.84	110.84	Low
State Bank of Patiala	69.47	1.78	–2.80	7.78	3.40	4.33	23.15	23.25	53.76	84.09	84.09	Low
State Bank of Travancore	42.39	2.03	–1.75	9.94	2.25	23.99	16.79	31.09	53.98	74.12	74.12	Low
Allahabad Bank	74.68	2.22	–0.13	8.49	2.96	9.30	13.09	38.78	50.79	59.69	59.69	Low
Andhra Bank	70.02	2.62	0.08	9.17	3.21	7.98	12.25	40.82	50.82	56.78	56.78	Low
Bank of Baroda	63.70	1.98	0.20	9.93	2.31	21.65	10.46	41.92	52.02	58.15	58.15	Low
Bank of India	67.86	1.91	–0.24	8.90	3.24	15.31	13.22	37.51	51.85	62.54	62.54	Low
Bank of Maharashtra	68.69	1.98	–0.86	9.01	2.17	10.45	16.93	34.06	49.94	67.57	67.57	Low
Canara Bank	69.05	1.74	0.20	9.77	3.09	10.08	9.63	30.03	40.53	43.25	43.25	Low
Central Bank of India	46.99	2.06	–0.80	8.62	2.32	23.63	17.81	37.41	55.01	76.37	76.37	Low
Corporation Bank	63.64	1.84	0.23	8.90	2.42	10.11	11.70	29.38	38.90	45.64	45.64	Low
Dena Bank	63.69	1.83	–0.67	9.05	2.34	4.83	16.27	24.39	39.04	54.42	54.42	Low
IDBI Bank Limited	71.06	1.56	–1.37	7.81	2.89	9.03	21.25	29.96	51.59	78.44	78.44	Low
Indian Overseas Bank	66.46	1.99	–1.21	8.21	2.28	9.40	22.39	34.74	53.59	81.88	81.88	Low
Oriental Bank of Commerce	71.90	1.99	–0.46	8.88	2.76	6.89	13.73	31.52	45.29	55.66	55.66	Low
Punjab and Sind Bank	68.20	2.17	0.20	9.14	1.91	4.75	10.45	29.13	36.51	39.47	39.47	Low
Punjab National Bank	67.47	2.16	0.19	8.91	2.75	12.26	12.53	38.22	48.24	56.38	56.38	Low
UCO Bank	59.48	1.60	–0.75	8.27	2.66	7.82	17.12	22.36	39.21	57.21	57.21	Low
Union Bank of India	75.71	2.08	0.13	9.02	2.77	7.25	11.16	35.22	44.97	49.81	49.81	Low
United Bank of India	52.10	1.43	0.16	8.94	2.20	9.23	15.53	20.23	30.91	45.64	45.64	Low
Jammu & Kashmir Bank Ltd	68.75	3.10	–2.04	8.70	2.10	8.05	11.20	35.37	55.66	60.91	60.91	Low

Source: The authors.

5.3 Random Forest and Classification of Regression Tree

As part of classification of data to different clusters, alternate approach of Random Forest and CART are used for testing the classification performance. Recursive Partitioning Algorithm (RPA) is associated with the term DT for classification and prediction. Recursive partitioning creates a DT that aims to correctly classify members of the population based on several dichotomous dependent variables. The popular algorithm used in recursive portioning is CART. When the outcome variable is categorical, Classification Tree is used, while the outcome is continuous, Regression Tree is used. In our study, the outcome variable is categorical, having three distinct clusters. Random Forest is an ensemble learning method for classification and regression. It operates by constructing a multitude of DTs using training data. The output is based on the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random Forest is well known for giving high prediction accuracy though it can over-fit sometimes.

The data set was split into training (70%) and testing (30%). The next step is to implement RPA, taking all available independent variables into consideration. First, train the model with 10-fold cross-validation. The value of minsplit is considered as 30 (5% of the total no. of observations) and minbucket as 10. Minbucket is the minimum number of observations in any leaf node. The DT plot is set out in Figure 2. No pruning was done here.

Figure 2.

Source: The authors.

The tree clearly suggests that records in cluster ‘3’ mostly have CART I greater than 0.29. As seen in Table 11, the summary of the model also provides the order of importance of the independent variables.

The model is tested on the test data set, and the results are obtained. Overall Accuracy is 87.79 per cent. Pruning of the model is carried out to improve the accuracy further. Figure 3 sets out the revised output where the tree is visualised:

Table 11.

Summary of Model

Variable Importance
CART I	28
ROA	22
GNPATA	18
CART II	15
NITA	8
CDR	7
LATA	2

Source: The authors.

CART I is considered as the best predictor variable and forms the root of the tree. The other variable considered is GNPATA. Clusters ‘1’ and ‘2’ have CART I less than 0.29. The model is tested on the test data set, and the results are obtained. After pruning we see that the accuracy has increased to 89.53 per cent. Next, we build our model using Random Forest on the training set. All available independent variables are considered. The Random Forest model is set out in Table 12.

As seen in Figure 4, the Random Forest model error is plotted.

The black line shows the overall error rate, which falls below 10 per cent. The red, green and blue lines show the error rates for ‘cluster 1’(high), ‘cluster 2 (medium)’ and ‘cluster 3 (low)’, respectively. We also plotted the relative variable importance, which is set out in Figure 5.

Figure 3.

Source: The authors.

Table 12.

Random Forest Model

Confusion Matrix
	1	2	3	Class Error
1	39	3	0	0.0714
2	4	258	5	0.0337
3	0	7	95	0.0686
OOB Estimate of Error				0.0462

Source: The authors.

Figure 4.

Source: The authors.

Figure 5.

Source: The authors.

The final step was to make the prediction, using the test data set. Results are set out in Table 13. The summary of the model performance in terms of accuracy is set out in Table 13. LDA and Random Forest produces better accuracy compared to CART model. Interestingly, all seven predictor variables are important in all the three models with slight change in the order.

Table 13.

Summary of Model Performance

Model	Accuracy in %
CART	89.5
Random forest	95.93
LDA	95.36

Source: The authors.

We now compare our classification of banks into ‘High’, ‘Medium’ and ‘Low’ categories with the credit ratings awarded by the top Indian rating agencies (see Table 14). As it is seen here, there are a number of disparities in the credit rating awards that need reconciliation. For example, HDFC Bank (‘high’ financial health as per our classification), State Bank of Bikaner and Jaipur (‘low’ financial health as per our classification) and Indian Bank (‘medium’ financial health as per our classification) are all awarded AAA by the rating agency CRISIL.

Similarly, IndusInd Bank, State Bank of Travancore and Axis Bank, all from different clusters as per our classification, have been awarded AAA by the rating agency ICRA.

Table 14.

Comparison of our Classification of Banks with Ratings from Credit Rating Agencies

Sr. No.	Year	Bank	Health	Certificate of Deposits, Shortterm FD, Tier 1 Bonds, Tier 2 Bonds, Infrastructure Bonds	ICRA (long-term rating scale)	CARE (long-term/medium-term instruments)
1	2017	City Union Bank Limited	High	CRISIL A1+	ICRA AA–	Not available
2	2017	DCB Bank Limited	High	CRISIL A1+	ICRA A+ (hybrid)	Not available
3	2017	Federal Bank Ltd	High	CRISIL A1+	Not available	CARE AA
4	2017	HDFC Bank Ltd.	High	CRISIL AAA	ICRA AAA (SO)	CARE AAA
5	2017	ICICI Bank Limited	High	CRISIL AAA	ICRA AAA/AA+	CARE AAA
6	2017	IndusInd Bank Ltd	High	CRISIL A1+	ICRA AA	Not available
7	2017	Karur Vysya Bank Ltd	High	CRISIL A1+	ICRA A+ (hybrid)	Not available
8	2017	Kotak mahindra bank ltd.	High	CRISIL A1+/CRISIL AAA	ICRA AAA	Not available
9	2017	Tamilnad Mercantile Bank Ltd	High	CRISL A1+	Not available
10	2017	Yes bank ltd.	High	Not available	ICRA AA (hybrid)	CARE AA/AA+
11	2017	State Bank of Bikaner and Jaipur	Low	CRISIL AAA	ICRA AAA	CARE AAA
12	2017	State Bank of Hyderabad	Low	Not available	ICRA AAA	CARE AAA
13	2017	State Bank of Mysore	Low	Not available	ICRA AAA (hybrid)	CARE AAA
14	2017	State Bank of Patiala	Low	Not available	ICRA AAA (hybrid)	CARE AAA
15	2017	State Bank of Travancore	Low	Not available	ICRA AAA	CARE AAA
16	2017	Allahabad Bank	Low	CRISL A1+/CRISL AA−	ICRA A1+ (short-term funds)	CARE A/A+
17	2017	Andhra Bank	Low	CRISIL AA/AA+/AA−	Not available	CARE A1+
18	2017	Bank of Baroda	Low	CRISIL AAA	Not available	CARE AA+
19	2017	Bank of India	Low	CRISIL BBB+/AA−/A1+	Not available	CARE AA-
20	2017	Bank of Maharashtra	Low	CRISIL BBB+/AA−/A−/A1+	ICRA A/A+	CARE BBB+/A+/A
21	2017	Canara Bank	Low	CRISL AAA−/A1+	ICRA AAA	Not available
22	2017	Central Bank of India	Low	CRISIL A−/A+	ICRA A+/A−	CARE A−/A
23	2017	Corporation Bank	Low	CRISIL AA−/A−/A+	ICRA A1+ (short-term funds)	CARE AA−/A+
24	2017	Dena Bank	Low	CRISL A1+/AA−/A+/A−	Not available	CARE BBB+/A+/A
25	2017	IDBI Bank Limited	Low	CRISIL BBB+/AA−/A/A1+	ICRA BBB+&	CARE A
26	2017	Indian Overseas Bank	Low	CRISIL A1+/A+/A−	ICRA A1+ (short-term funds)	CARE BBB
27	2017	Oriental Bank of Commerce	Low	CRISIL A1+/FAAA+	ICRA A+/A−	CARE A
28	2017	Punjab and Sind Bank	Low	CRISIL AA	ICRA A+ (hybrid)	CARE AA/A+
29	2017	Punjab National Bank	Low	CRISIL AAA	ICRA A+	CARE AA−/AA/AA+
30	2017	UCO Bank	Low	CRISL A1+/AA+/A+	ICRA A+/A−&	CARE A+
31	2017	Union Bank of India	Low	CRISIL AA+/−	ICRA AA/AA+	CARE AA/AA+/AA−
32	2017	United Bank of India	Low	CRISIL BBB+/AA−/A/A1+	ICRA A+	CARE A+/A−
33	2017	Jammu & Kashmir Bank Ltd	Low	CRISIL A+/FAA−	Not available	CARE AA−
34	2017	State Bank of India	Medium	CRISIL A1+/FAAA/AAA/AA+	ICRA AA+ (hybrid)/ICRA AAA	CARE AAA
35	2017	Indian Bank	Medium	CRISL AAA/AA+	ICRA AA+	CARE AAA
36	2017	Syndicate Bank	Medium	CRISIL AA	ICRA AA (hybrid)	CARE AA−
37	2017	Vijaya Bank	Medium	Not available	Not available	CARE AA−
38	2017	Axis Bank Limited	Medium	CRISIL A1+/CRISIL AAA/AA+	ICRA AAA/AA+ (hybrid)	CARE AAA
39	2017	Catholic Syrian Bank Ltd	Medium	Not available	NOT available	NOT available
40	2017	Karnataka Bank Ltd	Medium	Not available	ICRA A/A (hybrid)	CARE A
41	2017	Lakshmi Vilas Bank Ltd	Medium	Not available	Not available	Not available
42	2017	RBL Bank Limited	Medium	Not available	ICRA A1+	CARE AA−
43	2017	South Indian Bank Ltd	Medium	CRISIL A1+	Not available	Not available
44	2017	The Dhanalakshmi Bank Ltd	Medium	Not available	Not available	Not available

Source: The authors and CRISIL ICRA, CARE agencies ‘credit rating’ award.

6. Discussions, Conclusions and Managerial Implications

The first research gap we addressed is to classify the banks into three specific categories based on financial health, being high medium and low. The negative impact of growing non-performing assets to the economy is well documented (Balgova et al., 2016). Our study has precisely captured the banks that are under severe stress from the high level of non-performing assets; it is interesting to note that all the banks classified as having ‘Low’ financial health have a double-digit GNPATA (Table 10), whereas banks classified as having ‘High’ and ‘Medium’ financial health are in single digits (Tables 8 and 9). Falling capital adequacy is a problem of high magnitude (RBI Financial Stability Report, 2018); our classification rightly shows that the average CART I for banks classified as having ‘High’ financial health is substantially higher than for banks classified in the medium or low range (Tables 8–10). Our classification of banks from the perspective of ROA, NITA and GNPATA reflect a similar position. These go to demonstrate the appropriateness of the classification and the accuracy of the prediction model. More importantly, while analysing the composition of the three clusters, it is interesting to note that some of the large, perceptually most reputed and safest, banks in the country like State Bank of India are not clustered under the ‘high’ category, but they fall under the ‘medium’ category. Our classification is also counter-intuitive to the stock market valuations, for example, smaller banks like Karur Vysya Bank is valued by the stock market at a price to book ratio (PB) of 0.72 but finds a place in high cluster, while State Bank of India with a PB of 1.15 is positioned in the ‘medium’ cluster. Comparing individual membership of banks into our three clusters with the market valuation data (PB) shows many such anomalies. It is understandable that stock market valuations go much beyond the perspective of deposit holders and considers various other dimensions in arriving at the valuation of the bank. While past studies focused on bankruptcy predictions, our study and classification provide an authentic guidance to deposit holders in terms of the financial strength of the banks. This is our first research contribution.

The accuracy of our predictive model stems from the choice of a refined methodology, employing machine learning techniques for the classification of banks; this is our second research contribution. In the process, we also compared the predictive accuracy of techniques like discriminant analysis and Random Forest, thereby incrementally contributing to the quantitative theoretical literature. Random Forest though can quantify the relative importance of the input variables; it cannot emphatically measure the change on the outcome when an incremental change in an input variable takes place. In other words, the explanatory power of the input variables impacting the outcome cannot be precisely assessed in Random Forest. Right from the time of Altman (1968) discriminant analysis has been found to be the appropriate technique for classification problems involving financial indicators and hence for separating solvent companies from bankrupt companies. LDA of Fisher can quantify precisely the relative importance of the input variables in sharply separating the clusters and is certainly statistically more parsimonious than random forest. In our study, the difference in classification accuracy between Discriminant Analysis (95.36%) and Random Forest (95.93%) is marginal. Hence, Discriminant Analysis may be preferred to Random Forest as it has substantive explanatory power.

Our third research contribution brings out the anomaly when comparing our classification with the credit ratings awarded by reputed rating agencies to the deposits issued by the respective banks. The AAA rating awarded by CRISIL for long-term instruments are considered to have the highest degree of safety regarding timely servicing of financial obligations and such instruments are intended to carry lowest credit risk. It seems appropriate that CRISIL has assigned the AAA rating to the long-term instruments issued by HDFC Bank as our study classifies HDFC Bank as ‘high’ in terms of financial health (see Table 8). However, assigning the same rating to the long-term instruments issued by State Bank of Bikaner and Jaipur, which is classified as ‘Low’ and to Indian Bank which is classified as ‘Medium’ in our study, clearly seems to be an anomaly (Table 14). More importantly, when we compare some of the input variables with high relative importance, for three banks, the rating anomaly is fairly evident. The ROA for HDFC, Indian Bank and State Bank of Bikaner and Jaipur are 1.88 per cent, 0.67 per cent and a negative 1.22 per cent, respectively (Tables 8–10). The GNPATA for the three banks at 1.04 per cent, 7.47 per cent and 15.52 per cent, respectively, underscores our argument. The CART I for the three banks amount to 12.79 per cent, 12.20 per cent and 7.13per cent, respectively. On all the key variables of relative importance, we see that there is a wide disparity among the three banks, in spite of which we see that the long-term instruments are awarded a AAA rating, which are intended to have the highest degree of safety and timely servicing of financial obligations. Other two top rating agencies in the country— ICRA and CARE—also award very similar ratings to these three banks, which have very different financial strengths and risk profiles, as modelled in this study. Our results corroborate with RBI, which has acknowledged the weakness in the current credit rating structure, commenting that the rating agencies are intended to be forward-looking, but have always been a laggard (Economic Times, 2019). With an objective to prevent rating-shopping or any conflict of interest, it is exploring the feasibility of rating assignments being determined by the RBI itself and paid for from a fund to be created out of contributions from the banks and the RBI (Indian Express, 2017). More importantly, credit rating agencies need to deploy machine learning techniques for establishing credit ratings, which has in recent times been providing promising results; the model developed in this study could go a long way in providing a sophisticated guidance framework to the deposit holders. This is one of the most important incremental contributions of this study.

The IL&FS episode has shaken up the regulators; India’s corporate affairs secretary has highlighted that 100 of the IL&FS group companies would need a risk-based classification and would need to be categorised as high-risk or medium-risk companies (The Hindu, 2019). Our study has developed a model that precisely addresses such a situation and can be applied to any financial institution, including commercial banks. Machine learning techniques to classify banks based on their financial health can help identify on a timely manner the change in the risk profile of the banks over a period of time. This can immensely benefit the deposit holders who could be alerted for a timely reallocation of their savings to more secure banks, from banks whose financials are weakening. From a regulator’s perspective, the RBI, for instance, monitors banks on a continuous basis and periodically moves certain banks into the list of banks requiring PCA, and the vice versa (Business Line, 2019). Machine learning-based predictive frameworks could come in handy for a PCA. Going beyond deposit holders, an oversight mechanism based on machine learning framework can support the regulators to intervene on a timely manner. Especially weak financial institutions with high non-performing assets can deplete bank capital unless resolved on a timely manner (Steigum, 2011); a machine learning–driven dynamic algorithm that can support an RBS framework is the need of the hour. The regulators are precisely moving in this direction (The Hindu, 2019). Our study and the model we have proposed is an incremental contribution in this direction.

7. Limitations and Scope for Further Research

With more training data being available, the predictive accuracy could steeply increase over a period of time. This article does not integrate bank-level corporate governance differentials, especially the lack of separation between management and control, which is a key challenge faced by the all the Indian banks, both private sector and the PSBs. Integrating the governance differentials with the financial health of banks in the context of providing guidance to deposit holders could be an interesting area for future research. Neural Network and Support Vector Machine (SVM) as classifiers will be interesting to explore for possible improvement in predictive accuracy.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors received no financial support for the research, authorship and/or publication of this article.

References

Abdou

, Abd Allah

, Mulkeen

, Ntim

C. G.

, & Wang

(2017). Prediction of financial strength ratings using machine learning and conventional techniques. Investment Management and Financial Innovations, 14(4), 194–211.

Affes

, & Hentati-Kaffel

(2019). Forecast bankruptcy using a blend of clustering and MARS model: Case of US banks. Annals of Operations Research, 281(1–2), 27–64.

Altman

E. I.

(1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.

Balgova

, Nies

, & Plekhanov

(2016). The economic impact of reducing non-performing loans (Working Paper No. 193). European Bank for Reconstruction and Development.

Business Line (2019). RBI takes three banks out of PCA framework. Retrieved April 20, 2020, from https://www.thehindubusinessline.com/money-andbanking/corp-bank-allahabad-bank-dhanlaxmi-bank-out-of-pca/article26378237.ece

Davis

(2020). Regulatory changes to bank liability structures: Implications for deposit insurance design. Journal of Banking Regulation, 21(1), 95–106.

Economic Times (2018a). Government likely to make additional capital infusion in PSU Banks: Subhash Chandra Garg. Retrieved from https://economictimes.indiatimes.com/news/economy/policy/government-likely-to-make-additional-capital-infusion-in-psu-banks-subhash-chandra-garg/articleshow/67165267.cms

Economic Times (2018b). FRDI: Deposit insurance may need to rise up to ₹15 Lakh to cover at least 90% of FDs. Retrieved from https://economictimes.indiatimes.com/news/economy/policy/frdi-deposit-insurance-may-need-to-rise-up-to-rs–15-lakh-to-cover-at-least–90-of-fds/articleshow/62343336.cms

Economic Times (2019). Credit rating firms face criticism from RBI. Retrieved from https://economictimes.indiatimes.com/news/economy/finance/credit-rating-firms-came-under-criticism-from-rbi/articleshow/68312205.cms?from=mdr

10.

Ekinci

, & Erdal

İ. (2017). Forecasting bank failure: Base learners, ensembles and hybrid ensembles. Computational Economics, 49(4), 677–686.

11.

Financial Express (2017). Too-Big-To-Fail Indian Banks: Here’s what RBI is doing about it. Retrieved from https://www.financialexpress.com/industry/banking-finance/are-indian-banks-too-big-to-fail-heres-what-rbi-is-doing-about-it-rbi-monetary-policy-august–2018-rate-hike–2nd-time–6–5/1266140/

12.

Firstpost (2017). SBI, ICICI, HDFC Bank In RBI's Too-Big-To-Fail list: All you need to know about the categorisation. Retrieved April 20, 2020, from https://www.firstpost.com/business/sbi-icici-hdfc-bank-in-rbis-too-big-to-fail-list-all-you-need-to-know-about-the-categorisation-4010333.html

13.

Firstpost (2019). PMC Bank crisis: Economic offences wing files case against bank management, promoters of HDIL; SIT To probe case. Retrieved April 20, 2020, from https://www.firstpost.com/business/pmc-bank-crisis-economic-offences-wing-files-case-against-bank-management-promoters-of-hdil-sit-to-probe-case-7429291.html

14.

Fisher

R. A.

(1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

15.

Goldstein

(2017). Banking’s final exam: Stress testing and bank-capital reform. Washington, DC: Columbia University Press.

16.

Halteh

, Kumar

, & Gepp

(2018). Financial distress prediction of Islamic banks using tree-based stochastic techniques. Managerial Finance, 44, 759–773.

17.

Hsu

F. J.

, Chen

M. Y.

, & Chen

Y. C.

(2018). The human-like intelligence with bio-inspired computing approach for credit ratings prediction. Neurocomputing, 279, 11–18.

18.

Huang

D. T.

, Chang

, & Liu

Z. C.

(2012). Bank failure prediction models: For the developing and developed countries. Quality & Quantity, 46(2), 553–558.

19.

IBC Delays (2018). IBC delays. Retrieved from https://www.thehindubusinessline.com/opinion/letters/letters-to-the-editor/article25252065.ece

20.

Indian Express. (2017). RBI to revamp oversight panel; bigger role for rating agencies. Retrieved April 20, 2020, from https://indianexpress.com/article/business/banking-and-finance/rbi-to-revamp-oversight-panel-bigger-role-for-rating-agencies-4668940/

21.

Jones

(2017). Corporate bankruptcy prediction: A high dimensional analysis. Review of Accounting Studies, 22(3), 1366–1422.

22.

Kadanda

, & Raj

. (2018). Non-performing assets (NPAs) and its determinants: A study of Indian public sector banks. Journal of Social and Economic Development, 20, 193–212.

23.

Kashyap

A. K.

, & Kovrijnykh

(2015). Who should pay for credit ratings and how? The Review of Financial Studies, 29(2), 420–456.

24.

Kolari

J. W.

, & Sanz

I. P.

(2017). Systemic risk measurement in banking using self-organizing maps. Journal of Banking Regulation, 18(4), 338–358.

25.

Livemint (2018a). How credit rating agencies missed the IL&FS crisis. Retrieved April 20, 2020, from https://www.livemint.com/Companies/kDBrz7DB4Ti4Pz2TdxG85N/How-credit-rating-agencies-missed-the-ILFS-crisis.html

26.

Livemint (2018b). PNB fraud explained: How India’s 2nd largest PSU bank lost ₹11,400 crore. Retrieved April 20, 2020, from https://www.livemint.com/Industry/YegzgaJhyB66N2byVCGv7L/PNB-fraud-explained-How-Indias-2nd-largest-PSU-bank-lost-R.html

27.

Migliore

L. A.

, & Chinta

(2017). Demystifying the big data phenomenon for strategic leadership. SAM Advanced Management Journal (07497075), 82(1), 48–58.

28.

Newsclick (2018). ‘Efficient’ private banks? Here is the list of failed private banks in India. Retrieved April 20, 2020, from https://www.newsclick.in/efficient-private-banks-here-list-failed-private-banks-india

29.

Pagano

, & Volpin

(2010). Credit ratings failures and policy options. Economic Policy, 25(62), 401–431.

30.

RBI Annual Report (2018). Retrieved December 21, 2018, from https://rbidocs.rbi.org.in/rdocs/AnnualReport/PDFs/0ANREPORT201718077745EC9A874DB38C991F580ED14242.PDF

31.

RBI Financial Stability Report (2018). Retrieved December 21, 2018, from https://rbidocs.rbi.org.in/rdocs/PublicationReport/Pdfs/0FSR_JUNE2018A3526EF7DC8640539C1420D256A470FC.PDF

32.

Sethuraman

(2018). The effect of reputation shocks to rating agencies on corporate disclosures. The Accounting Review.

33.

Steigum

(2011). The Norwegian banking crisis in the 1990s: Effects and lessons.

34.

The Hindu (2019). First phase of resolution for IL&FS in next few months. Retrieved from https://www.thehindu.com/business/first-phase-of-resolution-for-ilfs-in-next-few-months/article26379163.ece

35.

The Wire (2018). IL&FS and the La-La land that is Indian credit rating. Retrieved from https://thewire.in/business/ilfs-moodys-fitch-care-icra-rating-companies

36.

Times of India (2018). RBI puts 200 stressed account under scanner. Retrieved from https://timesofindia.indiatimes.com/business/india-business/rbi-puts–200-stressed-account-under-scanner/articleshow/65413491.cms

37.

Tung

W. L.

, Quek

, & Cheng

(2004). GenSo-EWS: A novel neural-fuzzy based early warning system for predicting bank failures. Neural Networks, 17(4), 567–587.

Predicting Financial Health of Banks for Investor Guidance Using Machine Learning Algorithms

Abstract

Keywords

1. Introduction

2. Literature Review

2.1 Research Gaps

2.1.1 Research Gap 1

2.1.2 Research Gap 2

2.1.3 Research Gap 3

3. Data

5. Analysis and Results

5.1 Unsupervised Learning

Table 3. Mean Values of the Input Variables for the Three Clusters Cluster CDR NITA ROA CAR CART I CART II NNPATA LATA GNPATA 1 72.76 2.55 0.79 12.46 8.53 3.92 1.63 8.13 3.24 2 65.07 2.12 0.06 11.11 8.53 2.59 7.61 10.24 12.49 3 76.69 3.37 1.40 15.91 14.08 1.84 0.79 8.11 2.25

7. Limitations and Scope for Further Research

Footnotes

Declaration of Conflicting Interests

Funding

References

Table 3.
Mean Values of the Input Variables for the Three Clusters

Cluster CDR NITA ROA CAR CART I CART II NNPATA LATA GNPATA

1 72.76 2.55 0.79 12.46 8.53 3.92 1.63 8.13 3.24

2 65.07 2.12 0.06 11.11 8.53 2.59 7.61 10.24 12.49

3 76.69 3.37 1.40 15.91 14.08 1.84 0.79 8.11 2.25