Abstract
Credit score models have been successfully applied in a traditional credit card industry and by mortgage firms to determine defaulting customer from the non-defaulting customer. In the light of growing competition in the microfinance industry, over-indebtedness and other factors, the industry has come under increased regulatory supervision. Our study provides evidence from a large microfinance institutions (MFI) in India, and we have applied both the credit scoring method and neural network (NN) method and compared the results. In this article, we demonstrate the capability of credit scoring models for an Indian-based microfinance firm in terms of predicting default probability as well the relative importance of each of its associated drivers. A logistic regression model and NN have been used as the predictive analytic tools for sifting the key drivers of default.
Introduction
As a sequel to the microfinance crisis which took place in Andhra Pradesh in India in October 2010, the microfinance institutions (MFIs) in India have come under stringent regulations. Among other things, the regulators have sought to put a cap on the interest margins that MFIs can receive from their operations. Given this fact, assessment of credit risk in any MFI assumes great relevance. The Indian microfinance industry is also in that critical growth phase where they are attempting to make the transition from non-profit making entities to profit making enterprises, capable of economic viability over the years to come. Increasing awareness of investors and lenders to commercial viability, growing competition and shrinking returns in the microfinance market are the key factors forcing MFIs to improve the efficiency of their lending operations. It is in this context predicting credit default looms large. Predictive analytic models are being used to estimate probability of default as well as differentiating defaulters from non-defaulters of loan in terms of important characteristics/variables. To our knowledge, no study relating to risk measurement in microfinance sector, pertaining to India, has used credit scoring methodology.
Credit scoring models envisage quantitative analysis of parameters of past data on loans to predict the future in terms of default probability assuming in future also the same parameters hold true. The models in this regard do two things. First, they predict the probability of default, and second, they provide a classification table giving defaulters and non-defaulters predicted by the model alongside the actual defaulters and non-defaulters thus enabling us to evaluate the efficacy of the predictive power of the model used. Scoring models primarily rely on the use of enormous computing power available today to predict and classify probability of defaulters using advanced statistical techniques.
Review of Literature
According to Lewis (1992), Credit scoring models are built using statistical techniques that assign points to the variables that are part of a credit system for deciding to give loan or not. These models require identifying characteristics that may facilitate the differentiation of the potential credit defaulters from the non-defaulters.
One of the earliest studies that applied credit scoring models for microfinance is Vigano (1993). Empirical evidence in this area is rather limited, particularly with respect to developing countries. Saunders (1999) focussed on the use of credit score for the purpose of classifying credit categories into prompt payers, insolvent, good or bad and desirable or not. This classification can help the credit analyst whether to grant the requested credit or not. Schreiner (2000) demonstrates that scoring model does have a role to play in microfinance. Although models will not be a substitute for the judgement and sound personal knowledge of loan officers or loan groups with regard to characteristics that differentiate a defaulter from a non-defaulter, they can certainly improve estimates of risk thus predict probability of a loan default. Thomas (2000) explained the differences that exist between the models of credit approval and behavioural scoring. While the primary aim of the credit approval models is to estimate the probability of a new credit applicant becoming insolvent with the institution in a particular time horizon, the behavioural scoring models aim to estimate the probability of a client’s insolvency that already has availed credit facility from the institution.
Andrade (2004) pointed out that while some institutions still practice judgmental credit scoring, models such as discriminant analysis, logistic regression and neural network (NN) are being used extensively. Schreiner (2004) discusses the advantages and limitations of credit scoring applied to microfinance. Analytic models do have predictive capability to significantly improve the evaluation of the risk associated with loans. This article also points out the basic steps in a scoring project. Carmona and Araujo (2011) used credit score models for a microcredit institution for credit approval and behavioural scoring. The multivariate model used for scoring was logistic regression. The results of their study revealed a predictive accuracy of about 80 per cent. It was also pointed out that the two critical problems, namely, insolvency and high operational costs, that adversely impact the financial sustainability could be mitigated substantially, resulting in reduction in insolvency incidence and decrease in operational costs.
Maves (1991) observes that because of its adaptability to change, NNs can be ‘retrained’ much more quickly than discriminant analysis-based techniques when markets, products and economy change. Neural network has served as versatile predictive analytic tool in a variety of complex environments. In finance, it has been successfully applied to predicting bankruptcy and loan default as well as credit evaluation (DeLurgio & Hays, 2001; Jain & Nag, 1995). Ghatge and Halkarnikar (2013) point out that the feed-forward back propagation NN when used to predict credit default based on selected parameters show abilities of the network to learn the patterns as well as robust in classification.
In this article, we attempt to fill an important gap in the empirical literature pertaining to India. We use primary data collected from one of the largest MFIs in India and try to evaluate the accuracy of forecast using the credit scoring methodology applying logistic regression and NNs techniques. In the current context, this article assumes a great deal of importance since the potential for microfinance to ease the path towards total financial inclusion and then on to inclusive growth is undeniable (Gangopadhyay & Shanthi, 2012).
We further find answers to some important questions. (i) What are the key variables that can significantly predict a credit default in the microfinance context? (ii) How effectively can we use the logistic regression method for this purpose? (iii) Can we corroborate its predictive accuracy using another modelling technique, which is also used in the literature, namely, the NN? and (iv) Apart from identifying the key variables, can we also identify the relative importance of the key drivers?
Data
We have used primary data collected from a leading MFI ABC in Tamil Nadu. The company has been in the business of extending microcredit to people who are unable to get finance from the mainstream banking avenues. In this context, the alternative source of securing finance is from private money lenders whose rates of interest are in the vicinity of 30–100 per cent. The mission of ABC is to make available finance at reasonable cost to such customers in a transparent manner and, in the process, tries to achieve acceptable returns on investment to ensure economic viability.
ABC is one of the largest microfinance companies in India. As of July 2014, the company has a client/beneficiary base of close to two million, employee strength of about 3400 and total outstanding credit of about ₹1.73 billion. The company currently has about 335 branches spread over India.
ABC’s Lending Model
Customers are formed into groups comprising five members. Three to six groups are to be amalgamated into a centre. Each group and centre will have one leader. The groups have joint liability in the sense each member of the group stands guarantee to the loan repayment of the other members of the group. Areas are identified for microfinance customers by their teams and the sales officer communicates the salient features of the company’s schemes to prospective borrowers. The applicants are then screened for their credit risk. Criteria used include length of stay in the same place of residence, nature of business, income, expenditure, age, caste, among others.
ABC’s Operational Risk Management
ABC has an operational risk team. The operational risk team has three types operational risk audit
Member audit—this audit is done by field risk officer. A random sample is selected and member audit form is filled. Finally, member audit risk score is calculated. Centre-meeting audit—a risk team picks up a random sample, visits the field and audits the centre meetings. A centre-meeting audit form is filled and finally the score is calculated. Branch audit score—there are total of 280 branches in ABC company. Every month all 280 branches are audited, and a branch audit score is calculated.
These scores reflect the operational efficiency of the sales officers and relationship officers working on the ground.
Methodology
Logistic regression is a variation of ordinary regression in which the dependent variable is binary and it takes values 0 or 1. The dependent variable is categorical and usually represents the occurrence or non-occurrence of an event and the independent variables can be continuous, categorical or both. Logistic regression has been widely used in the financial service industry for credit scoring models. On theoretical grounds, logistic regression is a more appropriate statistical tool than linear regression, given the fact the dependent variable is categorical that has two discrete classes in credit risk, namely, a customer is a defaulter or a non-defaulter. Ordinary least squares (OLS) regression will be fraught with problems in predicting probability of default which has to be between 0 and 1. It cannot guarantee estimated probability will always fall in the range 0–1. On the contrary, logistic regression will ensure the estimated probability to fall in the range 0–1 because it is based on a sigmoid function. In logistic regression, the individual parameters can be tested for statistical significance. The model has clarity when it comes to writing the equation connecting the dependent variable with a host of independent variables. This facilitates predicting the default probability for a new customer asking for loan.
Neural networks can be used effectively in corporate credit decisions and in fraud detection. The initial work on NNs was motivated by the study on human brain and the idea of neurons as its building blocks. Artificial intelligence researchers introduced a computing neuron model to simulate the way neurons work in human brain. This model provided the basis for many later NNs developments. Neural networks are universal approximation and extremely powerful as a predictive analytic tool. If the main objective is hypothesis testing, then one should go to traditional and proven statistical modelling. If the main objective is predictive power, then NN is a strong contender and often can provide more accurate results than statistical regression modelling. Neural network cannot directly assess the change in the dependent variable caused by the change in the independent variable. In other words, it cannot provide a satisfactory answer to the question ‘if the independent variable increases by one unit, what is the change in the dependent variable?’. It is hard to write the final equation in the NN-based modelling that is required for predicting the dependent variable. For example, in predicting probability of default for a new customer, we need the equation connecting the dependent and independent variables. That is pretty hard to find in the NN which may have many layers. These short comings could be perhaps overcome by performing sensitivity analysis on the independent variables.
In this article, we would like to take advantage of both these techniques to confirm the significant independent variables that impact the behaviour of credit default. For prediction, the logistic regression will be used in which the independent variables are more precisely defined. We then use the NN method to find support to the results obtained using logistic regressions.
The Credit Risk Models
In this article, we focus on the analysis of credit risk, which is a part of the financial risk to the service provider, the other part of financial risk being the market risk. The main focus of this article is to find out the relative merits of using NNs and logistic regression methods in modelling credit risk. We have also explored the issue of these two methodologies reinforcing one another and giving us a better model fit.
For the purposes of this study, the following 10 variables have been identified in the context of predicting credit risk, based on a detailed discussion with the organisation. These are the variables that the microfinance organisations use in trying to understand the credit risks involved. The modelling techniques are different and have better scientific underpinnings and expected to perform better. ABC Ltd. does not use either the NNs or the logistic regression methodologies in their modelling process. Therefore, the purpose of this work is also to provide them with a modelling technique that performs better in predicting the credit risk.
Age
Total family members
Length of stay—duration of stay in the house
Loan amount requested—loan principal amount
Total income of family
Monthly expenses
Toilet—attached or public toilet
Type of house—tiled or RCC—concrete or sheet or thatched
Religion
Caste
Sample size of 640 customers comprising 504 good accounts (non-defaulters) and 136 bad accounts (defaulters) were selected from the company’s data base to model the behaviour of credit default. For classification in terms of prediction versus actual and also for sifting the relative importance of independent variables that impact default behaviour, logistic regression and neural network based on multilayer perceptron (MLP) have been used for modelling the credit default. SPSS software was used to obtain the results both for logistic regression and NN (MLP). As the scope of this research article is confined to answering the four specific research questions enunciated earlier under ‘Research Questions’ that have arisen after the review of literature, the following points are being succinctly addressed in terms of the two techniques.
Discussion of Results
Logistic Regression Model
Our results show that the logistic regression model has an overall predictive accuracy of 88.9 per cent (Table 1) in terms correct of classification. The model performs well in terms of its overall predictive power. Out of the actual cases of 504 which belong to ‘no overdue’, the model has incorrectly predicted 25 as ‘yes overdue’, which is only 5 per cent of the total sample size. This is a measure of type I error. Out of 136 cases observed in the actual data which are in overdue category, the model has incorrectly predicted them as ‘no overdue’ which amounts to 33.8 per cent. This is a measure of type II error. The type II error is large, and thus, we observe that the model has not been able to strike a proper balance between type I and type II errors though the overall predictive accuracy is satisfactory (88.9 per cent).
Results–Logistic Regression
Results–Logistic Regression
From Table 2, the following insights could be drawn: length of stay, total income, loan amount required and expenses are overwhelmingly significant, predictors of loan default based on 5 per cent level (see Table 2 where p values under column Sig are given).
Logistic Regression—Relative Importance of Variables
Logistic Regression-Variables in the Equation
Type of house and Age are highly significant at 5 per cent level pointing they are good predictors of loan default.
Caste as a factor is overwhelmingly significant predictor of default (p-value is very small) at 5 per cent level.
Total family (number of members) is moderately significant (significant at 6.4 per cent) as a predictor of default.
EXP (B) column in the output in Table 2 has an interesting interpretation. These are odds, and whenever the number is more than 1, the probability of default is more than 50 per cent, and it will increase for every one additional unit of the concerned independent variable. By this criterion, we find that length of stay, total income, total family, expenses, type of house and caste are critical in assessing default behaviour.
Loan amount required has odds (0.999) almost close to 1 and hence can be taken to be critical predictor of risk.
In Table 3, we present the results obtained from using NN methodology. In terms of predictive power, NN outperforms logistic regression. The predictive accuracy is in the vicinity of 93 per cent for the training sample and 94 per cent for the testing sample.
It is significant to note that the type I error both for the training (3.3 per cent) and testing data (4.2 per cent) is smaller than logistic regression. The type II error is substantially lower than logistic regression both for training data (21.9 per cent) and for testing data (15 per cent). The balancing power of NN is also better than in the case of logistic regression with regard to controlling type I and type II errors.
From Figure 1, giving relative importance of the independent variables, we deduce that total income, expenses, length of stay, age, type of house, loan amount, caste, total family, toilet type and religion are the order in which the independent variables are ranked in terms of their importance in predicting default risk.
As discussed earlier, because of the inability to fully redress the shortcomings of NN with regard to explanatory variables and precise form of equation, we confine to the logistic regression model to predict the loan default.
It may be seen that NN confirms the statistical validity of the significant predictors of logistic regression as a corroborative tool.
Neural Network Model
Neural Network Model
Unifying both logistic regression and NN results, we confirm that the significant predictors of credit risk for the case under consideration are length of stay, total income, loan amount required, expenses, age, type of house, total family, caste and toilet type.

In this research article, we have successfully demonstrated the capability of credit scoring models for an Indian-based microfinance firm in terms of predicting default probability. Further, we have been able to sift the relative importance of each of its associated drivers. The strengths and limitations of logistic regression and NN have been discussed in the context of predictive power of credit risk modelling. In terms of predictive power, NN outperforms logistic regression. The predictive accuracy of NN is in the vicinity of 93 per cent for the training sample and 94 per cent for the testing sample and is higher than logistic regression (88.9 per cent). However, because of the inability to fully redress the shortcomings of NN with regard to explanatory variables and precise form of equation, we confine to the logistic regression model to predict the loan default. We have synergised the advantages of both these techniques to confirm the significant independent variables that impact the behaviour of credit default. We conclude that for prediction, logistic regression is preferred over NN because it has statistical rigor of interpreting significance of independent variables. The results of logistic regression have been corroborated by NN. The key drivers of the credit default by unifying both logistic regression and NN are length of stay, total income, loan amount required, expenses, age, type of house, total family, caste and toilet type.
Even though any analytic model is not necessarily a complete substitute for the existing judgement-based practices of credit scoring prevalent in the microfinance sector, it can be used as an important decision support mechanism alongside so that the credit risk can be at the best minimum.
