Exploring impact on auto loan assessment and predictive modeling of credit scores with machine learning Algorithms

Abstract

The main goal of Credit Information Bureau India Limited (CIBIL) in the auto loan procedure is to give lenders vital information regarding borrowers’ creditworthiness. CIBIL scores help assess loan risk. This study uses advanced ML techniques, Gaussian Process Classification (GPC) and Gradient Boosting Classification (GBC), to predict scores, improving accuracy and reliability in evaluating creditworthiness for vehicle loan approvals. To enhance the accuracy of these predictive models, the Electric Eel Foraging Optimization (EEFO) and the Political Optimizer Algorithm (POA) are incorporated as optimization methods. By integrating these ML models with sophisticated optimization algorithms, a highly accurate prediction of CIBIL scores is aimed to be achieved. This can improve the efficiency and reliability of the auto loan approval process. GBEE excelled with top accuracy in both training (0.965) and testing (0.903) phases. GBPO closely followed, showing robust predictive power. GBC was reliable, particularly in high and mid-probability conditions, despite trailing GBEE and GBPO.

Keywords

Credit information bureau India automobile loan machine learning models and optimizations

1. Introduction

These days, Credit scoring is becoming a more and more popular topic of discussion. Credit scores are used by lenders, including banks and credit card companies, to establish credit limits, interest rates, and who is eligible for a loan.¹ A credit score is a numerical representation of an individual's creditworthiness, or more specifically, a diagnosis of their financial fitness and discipline, based on a level study of their credit files.² Credit report data, usually obtained from credit agencies, is the main source of a credit score. A person's credit score, or CIBIL credit score, is a 3-digit figure that summarizes their credit history and credit rating.³ The highest score is 900, and the range is 300 to 900.⁴ A credit score of −1 would be given to people with no prior credit history.⁵ Credit scores are considered zero if the credit history is less than 6 months old.⁶ Generally, it takes from 18 to 36 months to increase one's CIBIL credit score.

Therefore, one cannot underestimate the importance of the CIBIL score in processing an application for a vehicle loan.⁷ As stated by the lenders, CIBIL scores have been considered one of the most important factors for the approval or disapproval of an applicant for a car loan.⁸ In the process of car loans, CIBIL scores are quite important. It is a numerical representation of an individual's creditworthiness based on his/her financial activities and past credit performance.⁹ This score is one of the most important factors for lenders when considering loan applications. A high CIBIL score, usually above 750, indicates that one is in a very good credit health position, thereby increasing one's chances to get his/her loan proposal approved with the best terms and conditions, such as the best interest rates and highest loan amounts.¹⁰ Conversely, a poor CIBIL score may lead to the approval or rejection of any loan under strict terms and conditions, such as higher interest rates and more stringent policies for repayment.¹¹

Auto loans provide financing for consumers to purchase automobiles and pay them back over a pre-determined time, usually through monthly installments.¹² These loans may be taken out as either secured or unsecured, the former being more common due to the lower interest rates associated with them and because there is far less risk associated for the lender in a secured loan, which puts up the car used as collateral.¹³

One does consider how very reluctant one would be to lend a large sum of money to a buddy, even when the friend's past history of repayment has been stellar. The selection process becomes difficult in the absence of a licensed and approved organization capable of offering a trustworthy credit assessment. This is where banks and other financial organizations benefit from CIBIL. CIBIL scores provide a quick overview of a person's credit history and health, aiding lenders in determining the borrower's propensity and capacity for timely debt repayment.¹⁴

A few crucial steps need to be taken to keep the CIBIL score high and increase the chances of securing an auto loan. Firstly, bills should be constantly paid on time. Late payments are viewed unfavorably by lenders, but prompt bill payment demonstrates financial responsibility and discipline, raising the credit score. Balances should be kept low to control credit consumption carefully. Excessive credit balances can lower the credit score and indicate financial pressure. Maintaining credit card balances far below their limits and using no more than thirty percent of available credit is ideal.¹⁵

While those with lower credit scores might be offered loans with stricter terms or higher interest rates, those with higher credit scores might be eligible for cheaper interest rates and better loan terms.¹⁶ Joint, guaranteed, and co-signed accounts need to be regularly checked. The creditworthiness in this sort of account would mean that both parties are equally responsible in case of late payments, and carelessness from one may affect the creditworthiness of another. So, periodic review of these accounts ensures no delay is made in paying, and ultimately, it saves the credit score. If the above guidelines are followed, then a high CIBIL can be sustained, and in turn, it will make one a decent applicant for any auto loan or other credit opportunities.¹⁷

1.1. Study aims

In this given study, the investigators focused on the automobile loan prediction capability of the CIBIL score. Advanced ML techniques, such as GPC and GBC models, were employed for this. Further, more advanced optimization algorithms like EEFO and POA were applied to this investigation in order to improve the accuracy of the results. Further, these algorithms were combined to form hybrid models whereby synergistically, those inherent challenges and limitations were addressed effectively. The hybrid models showed better predictive power by leveraging the strengths of the predictive models and optimization algorithms in minimized errors with maximized reliability of predictions for loan approval. This is an umbrella approach whereby the relationship between CIBIL score and automobile loan approval is improved, along with the potentiality of advanced computational techniques in the financial decision-making process.

1.2. Literature review

The first empirical research on the existence of credit reporting operations in about 40 countries worldwide was given by.¹⁸ In addition, they have conducted the effects on the whole economy considering factors like credit availability, credit volume, credit pricing, and credit portfolio quality. In “A Portfolio View of Consumer Credit,” Musto and Souleles¹⁹ analyzed the “covariance risk” of individual consumers, i.e., the correlation of their default risk with overall consumer default rates, using a unique panel dataset of credit bureau records. They found significant heterogeneity in covariance risk among consumers, with high covariance risk linked to low credit scores and a significant positive correlation between credit scores and credit obtained, with a smaller negative effect from covariance risk. In their 2003 study, Galindo A. and Miller M.J.²⁰ meticulously examined global private and public information-sharing arrangements, complemented by empirical analysis using a freshly collected dataset. Their research revealed significant insights. They observed a positive correlation between the breadth of credit markets and the intensity of information sharing. Additionally, both public and private information sharing were found to alleviate defaults, with stronger evidence favoring the former. Interestingly, the presence of private credit registers decreased the likelihood of establishing public credit registers. They noted that government intervention in information sharing tends to be more prevalent in nations lacking private arrangements and having weak creditor rights protection. This study represents an important initial stride in comprehending the impact of information sharing on credit markets, underscoring the necessity for further investigation, particularly concerning its ramifications in developing countries and its influence on the conduct of banks and borrowers. According to Galindo A. and Miller M.J.²⁰ microeconomic data, nations with highly developed credit registers face fewer financial constraints than those with less developed credit bureaus. It has been discovered that effective credit registries greatly lessen financial limitations by lowering the susceptibility of businesses’ investment decisions to cash flow availability.

According to Sareef Jameel Malbery and Altab Althar Taha, the development of e-commerce and communication technology has led to a rise in the usage of credit cards as a payment method. Transaction fraud, however, has also increased. Light gradient boosting machine optimization and hyper-parameter optimization based on Bayesian theory were used to alter the parameters of the enhanced light gradient boosting machine (LightGBM). This approach used two publicly accessible real-world dataset sets, one of which included fraudulent transactions and the other of which did not. The accuracy of their proposed system was higher than that of other approaches. The proposed method yields a 56.95% F1-score, 92.88% area under the output response characteristics curve (AUC), 97.34% precision, and 98.40% accuracy.²¹ According to a study by Baker, Mohammed Rashad, Zuhair Norii Mahmood, and Ehab Hashim Shaker, credit card fraud causes sizeable financial losses. Many researchers have been trying to come up with innovative ways to stop this loss, but most of them are time-consuming, costly, and labor-intensive. The authors have concluded that the unequal classification of the dataset is the main reason of the incorrect findings based on a number of experimental studies. These lopsided classifications are the result of an unbalanced dataset, which caused economic harm and inaccurate model predictions. They found that LR, SVM, ANN, and the C5.0 decision tree method are the best algorithms in terms of sensitivity, AUCPR, and accuracy. The balanced dataset was used to train these models²². Ni, Lina, et al. suggested a novel multi-stage process. First, the cardholder's transactions are collected, then they are combined based on behavioral patterns, the dataset is classified, the model is further trained, and finally the model is tested. A feedback mechanism is used to alert the system of abnormal behavior as it happens.²³ Due to the reasonably appropriate ratio of fraudulent to genuine transactions, Peter, A., K. Manoj, and P. Kumar presented an ensemble learning approach for credit card fraud detection. They discovered that Random Forest and neural networks work better together to detect fraud instances more accurately. Large credit card transactions from the real world were also used in their studies. Ensemble learning combines Random Forest with neural networks.²⁴

2. Dataset details

The dataset includes some variables that are important for forecasting CIBIL scores for those who are seeking auto loans. Important factors consist of:

Requested Amount: The loan amount requested by the applicant.

Age: The age of the applicant.

Loan Term: The duration of the loan repayment period.

Gender Dese: The gender of the applicant, categorized descriptively.

Manufacturer: The make of the vehicle for which the loan is being applied.

Marital Status: The marital status of the applicant.

Residence Owned by Dese: Indicates the ownership status of the applicant's residence (e.g., owned, rented).

Employment Type Dese: The nature of the applicant's employment (e.g., salaried, self-employed).

Ex-showroom Price: The price of the vehicle before taxes and additional fees.

Current Valuation: The present market value of the applicant's assets.

Number of Years at Residence: Duration the applicant has lived at their current address.

Number of Years in City: Duration the applicant has resided in the current city.

Segment Dese: The category of the vehicle (e.g., sedan, SUV).

Cost of Vehicle: The total cost of the vehicle.

CIBIL Score: The credit score of the applicant, which assesses their creditworthiness.

The relationships between a dataset's input and output variables are shown visually via a correlation plot in Figure 1. The correlation coefficient, which displays the strength and direction of the association, is displayed in each plot cell. For instance, the requested amount and the ex-showroom price have a strong positive association, indicating that the amount asked for an auto loan tends to increase proportionately to the ex-showroom price of a car. Furthermore, a strong positive link between the desired amount and the present valuation is shown by the correlation plot. This implies that the requested loan amount tends to increase in tandem with an asset's existing valuation. Table 1 presents statistic properties of the input variable the utilized dataset.

Figure 1.

The correlation plot of the inputs and output variables.

Table 1.

The statistic properties of the input variable of Cibil.

		Indicators
Variables	Category	Min	Max	Avg	St Dev.
Requested Amount	Input	0.000	1.000	0.000	0.008
Age	Input	0.000	1.000	0.319	0.154
Loan Term	Input	0.000	1.000	0.034	0.014
Manufacturer Desc	Input	0.000	1.000	0.503	0.262
Gender Desc	Input	0.000	1.000	0.896	0.306
Marital Status Desc	Input	0.000	1.000	0.524	0.108
Resid Owned By Desc	Input	0.000	1.000	0.614	0.172
Employment Type Desc	Input	0.000	1.000	0.742	0.133
Ex Showroom Price	Input	0.000	1.000	0.009	0.013
Current Valuation	Input	0.000	1.000	0.000	0.008
No Of Years At Residence	Input	0.000	1.000	0.201	0.166
No Of Years In City	Input	0.000	1.000	0.127	0.104
Segment Desc	Input	0.000	1.000	0.339	0.322
Cost Of Vehicle	Input	0.000	1.000	0.010	0.017
cibil_score	Output	0.000	2.000	1.826	0.387

All available features in the dataset were used directly for model development without applying feature selection or dimensionality reduction techniques such as Principal Component Analysis (PCA), recursive feature elimination, or mutual information-based selection. The decision to retain the complete feature set was based on three considerations:

Preservation of Information: By using all features, the predictive models had access to the full range of potential predictors, ensuring that no variable with possible influence on credit score prediction was excluded.

Model and Optimizer Robustness: The selected machine learning models—Gaussian Process Classification (GPC) and Gradient Boosting Classification (GBC)—are inherently capable of handling high-dimensional input spaces. Furthermore, the optimization algorithms (Electric Eel Foraging Optimization and Political Optimizer Algorithm) were employed to fine-tune model hyperparameters, enabling the models to effectively balance complexity and generalization even with the complete feature set.

Computational Feasibility: Given the dataset size and computational resources, the inclusion of all features did not introduce prohibitive computational costs. This made it feasible to avoid dimensionality reduction while still achieving high predictive accuracy.

3. Mathematical framework

3.1. Gaussian process classification (GPC)

Models

Ancestors of normal distribution procedures offer well-defined non-parametric function patterns. To carry out classification using this former, the process is compressed using a sigmoidal reverse-connection technique, and the data is subjected to a Bernoulli probability based on the function magnitudes that have been transformed.²⁵ As $y = {y_{1}, y_{2}, \dots, y_{N}}$ , the dual-class observations are denoted, and the input data is organized in a plan matrix. Consider $X = {x_{1}, x_{2}, \dots, x_{N}}$ In accordance with standard protocol, the covariance function is computed for each possible combination of input vectors to produce the covariation matrix $K_{n n}$ .

$p (f) = N (f | 0, K_{n n})$ is the prior probability distribution that results in the values of the Gaussian Process function at the input locations as $\emptyset (x) = \int_{- \infty}^{x} N (f | 0, K_{n n})$ is the representation of the $p r o b i t$ Inverse linking function, and $B (y_{n} | \emptyset (f_{n})) = \emptyset (f_{n})^{y_{n}} * (1 - \emptyset (f_{n}))^{{1 - y_{n}}}$ is the Bernoulli probability distribution. The collective distribution of data and hidden parameters result from this.²⁶

\begin{aligned} p (y, f) = \prod_{n = 1}^{N} B (y_{n} | \emptyset (f_{n})) N (f | 0, K_{n n}) \end{aligned}

(1)

The primary goal is to estimate the posterior probability distribution, or $p (f | y),$ of a function's values. To aid in the optimization or removal of covariance function parameters, an estimate of the marginal probability, $p (y),$ is also required. There are other estimating methods that have been suggested, but they all involve $O (N^{3})$ level computing.

3.2. Gradient boost classification algorithm (GBC)

On-demand, dynamic specifications for the base-learner and loss function models are possible. With a bespoke base-learner $h (x, θ)$ and/or a customized loss function $Ψ (y, f),$ parameter estimations may be difficult to get. To solve this, a new function $h (x$ , $θ_{t}$ ) was proposed, which is almost parallel to the real observations’ path along the negative gradient {{ $g_{t} (x_{i})$ )} $N_{i} = 1$ .

\begin{aligned} g_{t} (x) = E_{y} {[\frac{\partial ψ (y, f (x))}{\partial f (x)} | x]}_{f (x) = {\hat{f}}^{t - 1} (x)} \end{aligned}

(2)

Choose the new function that is closest to $- g_{t}$ among the functions in the function space rather than looking for a generic solution for the boost augmentation. This makes it possible to replace a potentially challenging optimization problem with the standard least-squares minimization problem:

\begin{aligned} (p_{t}, θ_{t}) = \arg m i n \sum_{i = 1}^{N} [- g_{t} (x_{i}) + p h (x_{i}, θ)]^{2} \end{aligned}

(3)

To summarize, the Friedman (2001) entire gradient boosting technique can be defined. The exact structure of the resulting algorithm and its supporting formulas will be greatly influenced by the design decisions made for $Ψ (y, f)$ and $h (x, θ)$ . Friedman provides some often-used examples of these techniques.²⁷

3.3. Electric Eel foraging optimization (EEFO)

Optimizers

In contrast to GOA, EEFO is an algorithm that draws inspiration from nature.^28,29 Specifically, EEFO is recommended based on the electric eel's foraging behavior during 2 stages: interaction, resting, hunting, and migration. The update process for fresh EEFO solutions is derived from the mathematical formulation of the final 2 phases, and it is provided as follows:

3.3.1. The stage of interaction

The eel's progress in the search space at this point is determined by its direction, the neighborhood eel, and the comparison between its current position and the random eel in the search space. The eel's movement in this stage can be expressed as follows:

\begin{aligned} E_{i}^{I S} = {\begin{matrix} {\begin{matrix} E_{j} + r a n d \times D V_{1} \times (R E - E_{i}), i f P B_{1} > 0.5 \\ E_{j} + r a n d \times D V_{2} \times (R E - E_{i}), i f P B_{1} \leq 0.5 \end{matrix} \\ {\begin{matrix} E_{j} + r a n d \times D V_{1} \times (R E - E_{i}), i f P B_{2} > 0.5 \\ E_{j} + r a n d \times D V_{2} \times (R E - E_{i}), i f P B_{2} \leq 0.5 \end{matrix} \end{matrix} \begin{matrix} i f F_{E j} < F_{E i} \\ i f F_{E j} < F_{E i} \end{matrix} \end{aligned}

(4)

where

i^{t h}

eel's position in the interaction stage, with

i = 1, 2, \dots, n

is represented by

E_{i}^{I S}

. The population size is represented by

N_{p z}

and

N_{p z}

; the neighborhood eel's position is represented by Ej; the direct vectors during the eel's movement are

D V_{1}

and

D V_{2}

; the random eel chosen within the population is represented by

R E

; the possibility of choosing the moving method is represented by

P B_{1}

and

P B_{2}

; and the fitness values of the neighborhood eel and the current eel are represented by

F E_{j}

and

F E_{i}

3.3.2. The resting stage

At this point, each eel in the population is represented as moving in the following way:

\begin{aligned} F_{i}^{R S} = E_{R} + R n \times (E_{R} - E_{i}) \end{aligned}

(5)

In Eq. (4), $E_{i}^{R S}$ represents the $i^{t h}$ eel's resting position, and $E_{R}$ denotes the location where the eel will arrive and reset itself in the search area.

3.3.3. The stage of migration

During this phase, the eel's movement will be determined by the difference in position between it and the prey, which may be found in the equation below:

\begin{aligned} E_{i}^{M S} = - R n d \times E_{i} + R n d \times R - L F \times (R - E_{i}) \end{aligned}

(6)

The new location of the eel during its migration stage is denoted by $E_{i}^{M S}$ in the equation above, while the prey's position is indicated by R, and the Levy flight function's value is indicated by $L F .$

3.3.4. The stage of hunting

The eel in the hunting stage can be expressed as follows:

\begin{aligned} E_{i}^{H S} = R + A F \times R \times E_{i} \end{aligned}

(7)

where

A F

is the amplifying factor, and its value is randomly between

0

and

1

;

E_{i}^{H S}

is the new position of the

i^{t h}

eel in the hunting stage.

3.4. Political optimizer algorithm (POA)

The political election process in human civilization serves as the inspiration for the POA, a unique intelligent optimization system.³⁰ Every party member in POA can be thought of as a potential solution, and their voting patterns can be considered as an assessment tool. Furthermore, the Party members’ votes are correlated with the candidate's solution's fitness value. In contrast to conventional algorithms that rely just on political elections, POA takes into account every stage of the electoral process, encompassing the 5 stages of party formation and constituency distribution, election campaigning, party switching, inter-party elections, and parliamentary matters. POA uses a multi-stage iterative method to find the best answer; Figure 2 illustrates the primary algorithm flow of POA. The 5 primary phases will be introduced in the following of POA.³¹

Figure 2.

The convergence curve of the 3 presented hybrid models.

Party formation and constituency allocation. The entire population of $n^{2}$ people are split into n parties at the start of POA, with n members (candidate solutions) in each party. Every party member also assumes the role of an election candidate; that is, a constituency is formed by choosing a member from each party. The division of political parties is shown by there had dotted line in Figure 2, and the constituency division is indicated by the blue dotted line. The entire population is divided into n political parties, as indicated by Eq. (8), and each party is made up of n party members, as represented by Eq. (9). This population division is mapped to the mathematical model.

\begin{aligned} P = {P_{1}, P_{2}, P_{3}, \dots, P_{n}} \end{aligned}

(8)

\begin{aligned} P_{i} = {P_{i}^{1}, P_{i}^{2}, P_{i}^{3}, \dots, P_{i}^{n}} \end{aligned}

(9)

Since every party member is a candidate for office, all of the people can be thought of as n constituencies, which can be represented by Eq. (10). It must be emphasized that although the logical division is different, the constituents are party members as well. Each constituency's membership is distributed as indicated by Eq. (11).

\begin{aligned} C = {C_{1}, C_{2}, C_{3}, \dots, C_{n})} \end{aligned}

(10)

\begin{aligned} C_{j} = {P_{1}^{j}, P_{2}^{j}, P_{3}^{j}, \dots, P_{n}^{j}} \end{aligned}

(11)

Moreover, as demonstrated by Eq. (12), the set of all party leaders is represented by $P *$ , and the leader of the $i t h$ party following the computation of each member's fitness is denoted as $P_{i} *$ . Similarly, as demonstrated in Eq. (13), where $C_{j} *$ indicates the winner of the $j t h$ constituency, $C *$ regroups, the winners from each of the constituencies are named the parliamentarians following the election.

\begin{aligned} P * = {P_{1} *, P_{2} *, P_{3} *, \dots, P_{n} *} \end{aligned}

(12)

\begin{aligned} C * = {C_{1} *, C_{2} *, C_{3} *, \dots, C_{n} *} \end{aligned}

(13)

Run for office: The location update of the search agent is handled by this stage, which is the central portion of the algorithm. Party members adjust their positions in the algorithm based on the winner $C *$ of their constituency and the leader $P *$ of the party they are a member of, furthermore, by employing a novel location update mechanism known as the recent past-based position updating strategy (RPPUS), as developed in Eqs. (14) and (15), they will also benefit from the experience of the previous election. The primary goal of RPPUS is to identify areas that show promise based on the numerical relationship between the subgroup optimal solution (party leader or constituency winner) and the search agent's current and historical fitness.

\begin{aligned} P_{i, k}^{j} (t + 1) = {\begin{array}{l} m * + r (m * - P_{i, k}^{j} (t) i f p_{i, k}^{j} (t - 1) \\ \leq p_{i, k}^{j} (t) \leq m * o r p_{i, k}^{j} (t - 1) \geq p_{i, k}^{i} (t) \geq m * \\ m * + (2 r - 1) | m * - p_{i, k}^{j} (t) | i f p_{i, k}^{j} (t - 1) \\ \leq m * \leq p_{i, k}^{j} (t) o r p_{i, k}^{j} (t - 1) \geq m * \geq p_{i, k}^{j} (t) \\ m * + (2 r - 1) | m * - p_{i, k}^{j} (t - 1) | i f m * \\ \leq p_{i, k}^{j} (t - 1) \leq p_{i, k}^{j} (t) o r m * \geq p_{i, k}^{j} (t - 1) \geq p_{i, k}^{j} (t) \end{array} \end{aligned}

(14)

\begin{aligned} P_{i, k}^{j} (t + 1) {\begin{array}{l} m * + (2 r - 1) | m * - p_{i, k}^{i} (t) |, i f p_{i, k}^{j} (t - 1) \\ \leq p_{i, k}^{j} (t) \leq m * o r p_{i, k}^{j} (t - 1) \geq p_{i, k}^{i} (t) \geq m * \\ p_{i, k}^{j} (t - 1) + r ((p_{i, k}^{j} (t) - p_{i, k}^{j} (t - 1)), i f p_{i, k}^{j} (t - 1) \\ \leq m * \leq p_{i, k}^{j} (t) o r p_{i, k}^{j} (t - 1) \geq m * \geq p_{i, k}^{i} (t) \\ m * + (2 r - 1) | m * - p_{i, k}^{j} (t - 1) | \\ i f m * \leq p_{i, k}^{j} (t - 1) \leq p_{i, k}^{j} (t) o r m * \geq p_{i, k}^{j} (t - 1) \geq p_{i, k}^{j} (t) \end{array} \end{aligned}

(15)

where t denotes the current iteration number, r is a random number between

0 a n d 1

, and m∗ denotes the leader of a party or the winner of a constituency—switching parties. To balance exploration and exploitation, a variable known as the party switching rate is introduced during the party switching phase. Every party member has the option to be chosen at random and switched to a different party. As demonstrated in Eq. (9), the probability of switching is given by ∼, which starts at 1 and decreases linearly to 0.

\begin{aligned} λ = (1 - \frac{t}{T}) \times λ_{m a x} \end{aligned}

(16)

Election. At this stage, the viability of each candidate solution is assessed, and Eqs. (17) and (18) are used to update the party leaders and constituency victors.

\begin{aligned} q = a r g min_{1 \leq j \leq n} f (p_{i}^{j}) p_{i} * = p_{i}^{q} \end{aligned}

(17)

\begin{aligned} q = a r g min_{1 \leq i \leq n} f (p_{i}^{j}) c_{j} * = p_{q}^{j} \end{aligned}

(18)

Affairs of Parliament. The goal of the party-switching phase is to shift the party's viewpoint, and the goal of the parliamentary affairs phase is to shift the viewpoint of the constituency. The constituency winners engage in mutual interactions to enhance their physical well-being. The following formula is used by each constituency winner to update its standing with any other randomly chosen constituency. Please take note that the movement will only be implemented if $c_{j} *$ fitness improves.

3.5. Conceptual summary and intuitive methods explanation

The purpose of this section is to provide readers who might not be familiar with complex mathematical formalism with clear explanations and high-level summaries of the main techniques used in this investigation. This enhances the thorough mathematical explanations given in the sections that follow.

Gaussian Process Classification (GPC)

By creating smooth curves through the data, GPC can be viewed as a method of predicting the likelihood of a particular outcome. It operates under the supposition that each point in the input space corresponds to a random variable with a joint Gaussian distribution. GPC indicates the likelihood that a sample belongs to a particular class rather than providing explicit labels (such as “class A” or “class B”). The raw predictions are transformed into probabilities using a “squashing” function, such as the sigmoid or probit function. Because of this, it is particularly helpful when working with noisy or uncertain data.

Classification by Gradient Boost (GBC)

GBC is an ensemble technique that combines numerous weak classifiers (usually decision trees) to create a strong one. Imagine it as a relay team in which each model attempts to correct the errors of the one before it. The algorithm incorporates a new model that concentrates on the residual errors at each stage. The combined output improves in accuracy over time. It accomplishes this by updating in the direction that minimizes this error the quickest (i.e., the gradient) and minimizing a loss function, which is essentially a gauge of how inaccurate the model's predictions are.

Optimization of Electric Eel Foraging (EEFO)

EEFO mimics how electric eels forage in water, drawing inspiration from their foraging habits. When resources are limited, they alternate between hunting prey, resting to reorient, interacting with their surroundings, and migrating. To steer a collection of solutions toward the best outcome, these behaviors are mathematically modeled. When it comes to avoiding local optima and preserving solution diversity, EEFO excels.

Algorithm for Political Optimization (POA)

To direct optimization, the POA consults election tactics and political systems. Every “candidate solution” acts like a party member taking part in different political stages like parliamentary debate, party switching, and campaigning. The population of candidates eventually converges on the best political strategy, or the best answer, after a number of iterations. The algorithm is able to strike a unique, structured balance between exploration and exploitation thanks to this metaphor.

4. Performance evaluators

Predictive accuracy as a whole is measured by accuracy (Ac), which is calculated as the ratio of correctly anticipated events to total instances. Precision (Pr) measures how well positive projections match up with all positive occurrences. The percentage of true positive instances that were correctly identified is called recall (Re). When working with imbalanced datasets, the F1-score (F1) is particularly helpful as it provides a balanced assessment of a model's performance by combining accuracy and recall into a single statistic. Eqs. (19) to (22) show the formulation of these metrics:

\begin{aligned} A c = \frac{T P + T N}{T P + T N + F P + F N} \end{aligned}

(19)

\begin{aligned} P r = \frac{T P}{T P + F P} \end{aligned}

(20)

\begin{aligned} R e = T P R = \frac{T P}{P} = \frac{T P}{T P + F N} \end{aligned}

(21)

\begin{aligned} F 1 = \frac{2 \times R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n} \end{aligned}

(22)

The number of occurrences that fall into the positive class and are accurately predicted to do so by the model is denoted by $T P$ . The total number of instances in the positive class is indicated by $P .$ False Negative is referred to as $F N .$ It represents occurrences that are correctly classified as positive by the model but are incorrectly classified as negative. True Positive Rate, or $T P R$ for short, is sometimes referred to as recall or sensitivity. True Negative is shortened to TN. It represents instances that the model correctly labels as negative.

5. Observations and analysis

This section covers the presentation and analysis of the findings of the study.

Convergence

The convergence curve basically shows the process of optimization that a model undergoes through successive iterations. It reflects how the performance metric evolves in a model as iterations increase, thereby specifying how close it has come to convergence toward an optimal solution. In Figure 2 below, there is a trend where better performance is depicted by models with added iterations. Specifically, the remarkable improvement is reflected by GBEE, increasing its Ac value from about 0.7 at 40 iterations to almost 0.9 at 200 iterations, reflecting its ability for self-improvement in time. This trend is also reflected by GBBO, with similar behavior in optimization. Particularly, GBBO and GBEE have highly similar convergence profiles. Moreover, GPPO and GPEE show parallel behaviors, which further drives home the point of consistency in optimization dynamics across these models. In summary, the convergence curves explain how iterative refinements contributed to an improvement in the performance of the models, bringing out the iteration nature of the underlying optimization processes in achieving optimal solutions.

The training configurations and hyperparameters for the machine learning models used in this study are summarized in Table 2, where hyperparameters were optimized using the Electric Eel Foraging Optimization (EEFO) and Political Optimizer Algorithm (POA) to achieve high predictive performance while maintaining model generalization. In the table, “–” indicates that the parameter is not applicable for the respective model. GBEE refers to Gradient Boosting with EEFO optimization, GBPO refers to Gradient Boosting with POA optimization, GBC refers to standard Gradient Boosting Classification, and GPEE, GPPO, and GPC represent Gaussian Process Classification variants with EEFO or POA optimizations. For hyperparameter selection, EEFO and POA were applied to each model to efficiently explore the hyperparameter space, candidate hyperparameter sets were evaluated based on cross-validated accuracy and F1-score on the training set, and the configuration yielding the highest balance of accuracy and generalization was selected for the final model.

Models Performance

Table 2.
Hyperparameters results of the models utilized in the study.

Criticalparameter

Models n_estimators learning_rate max_depth min_samples_split n_restarts length_scale

GBEE 182 0.0287 233 2 ---- ----

GBPO 114 0.1148 142 2 ---- ----

GBC 100 0.1 3 2 ---- ----

GPEE ---- ---- ---- ---- 87 392

GPPO ---- ---- ---- ---- 56 266

GPC ---- ---- ---- ---- 1 ----

	Criticalparameter
GBEE	182	0.0287	233	2	----	----
GBPO	114	0.1148	142	2	----	----
GBC	100	0.1	3	2	----	----
GPEE	----	----	----	----	87	392
GPPO	----	----	----	----	56	266
GPC	----	----	----	----	1	----

Table 3 presents the performance of models by training and testing using 4 metrics. Throughout the entire training process, GBEE offered the best performance, at the highest level of Ac of 0.965, while GPBO was at 0.952. GBC provided a mid-level performance with an Ac of 0.940, while GPEE and GPPO recorded 0.925 and 0.908, respectively. GPC was relatively lower at an Ac of 0.893. The rest were also outperformed by GBEE during the training phase. During testing, the same was leading with an Ac value of 0.903 against GPBO, which had stood at 0.894. On the GBC and GPC, GBC performed with an Ac value of 0.884, while the GPC trailed behind with an Ac value of 0.825. Meanwhile, similarly, GPEE outperformed GPPO with Ac values of 0.852 and 0.849, respectively. Again, GBEE topped in the test phase.

Table 3.

GBC and GPC base models achieved results through the performance evaluators.

		Metrics
phases	Model	Accuracy	Precision	Recall	F1 _Score
Training	GBEE	0.965	0.966	0.965	0.964
	GBPO	0.952	0.955	0.952	0.949
	GBC	0.940	0.941	0.940	0.935
	GPEE	0.925	0.922	0.925	0.922
	GPPO	0.908	0.903	0.908	0.903
	GPC	0.893	0.885	0.893	0.885
Testing	GBEE	0.903	0.910	0.903	0.888
	GBPO	0.894	0.901	0.894	0.875
	GBC	0.884	0.883	0.884	0.864
	GPEE	0.852	0.836	0.852	0.840
	GPPO	0.849	0.834	0.849	0.838
	GPC	0.835	0.814	0.835	0.820

During the training phase, GBEE demonstrated superior performance compared to GBPO, with respective Pr, Re, and F1 values of 0.966, 0.965, and 0.964 for GBEE, and 0.955, 0.952, and 0.949 for GBPO. In the comparison between GBC and GPC, GBC exhibited better performance, boasting precision, recall, and F1 scores of 0.941, 0.940, and 0.935, respectively, while GPC yielded scores of 0.885, 0.893, and 0.885. Similarly, when contrasting GPEE and GPPO, GPEE outperformed GPPO with precision and recall values of 0.922 and 0.925, respectively, alongside F1 values of 0.922, while GPPO yielded scores of 0.903, 0.908, and 0.902 for precision, recall, and F1 values, respectively. In the overall assessment during the training phase, GBEE emerges as the top-performing model across all 4 metrics, while GPC consistently exhibits the lowest performance among the models evaluated.

Table 4 presents the performance of the models under 3 conditions: high probability, low probability, and mid probability. Under the high probability condition, GBEE achieved a Pr value of 0.941, while GBPO had a Pr value of 0.929. Between GBC and GPC, GBC had better performance with a Pr value of 0.920 compared to GPC's 0.899. Furthermore, GPEE surpassed GPPO, exhibiting precision values of 0.922 and 0.913, respectively. GBEE emerged as the top-performing model across all conditions. Under the low probability condition, GPEE achieved a precision value of 0.976, closely followed by GBC with a precision value of 0.971. Notably, GBPO, GPEE, GPC, and GPPO all attained perfect precision values of 1.000, signifying their exceptional performance under this condition. Transitioning to the mid-probability condition, GBEE continued to excel with a precision value of 0.992, while GBPO trailed closely behind at 0.987. Conversely, GPC demonstrated the weakest performance in this condition, registering a precision value of 0.693.

Table 4.

Models’ performance in the 4 different conditions.

		Metric
Model	Condition	precision	recall	F1-Score
GBEE	High probability	0.941	0.999	0.969
	Low probability	0.976	0.851	0.909
	Mid probability	0.992	0.690	0.814
GBPO	High probability	0.929	0.998	0.962
	Low probability	1.000	0.723	0.840
	Mid probability	0.987	0.624	0.765
GBC	High probability	0.920	0.994	0.955
	Low probability	0.971	0.723	0.829
	Mid probability	0.949	0.576	0.717
GPEE	High probability	0.922	0.965	0.943
	Low probability	1.000	0.851	0.920
	Mid probability	0.774	0.596	0.673
GPPO	High probability	0.913	0.959	0.936
	Low probability	1.000	0.851	0.920
	Mid probability	0.730	0.551	0.628
GPC	High probability	0.899	0.959	0.928
	Low probability	1.000	0.851	0.920
	Mid probability	0.693	0.465	0.557

Under high probability conditions, GBEE exhibited superior performance with a Re value of 0.999, compared to GBBO's recall value of 0.998. When comparing GBC and GPC under the same conditions, GBC achieved a Re value of 0.994, while GPC had a Re value of 0.959. Between GPEE and GPPO, GPEE demonstrated better performance with a recall value of 0.965, compared to GPPO's 0.959. Overall, GBEE was the best-performing model under high-probability conditions. In the mid-probability condition, GBEE, GBPO, GBC, GPEE, GPPO, and GPC had recall values of 0.690, 0.624, 0.576, 0.596, 0.551, and 0.465, respectively. This indicates that GBEE was the best-performing model, while GPC had the weakest performance. Under low probability conditions, both GBPO and GBC achieved recall values of 0.723. Meanwhile, GBEE, GPEE, GPPO, and GPC all attained recall values of 0.851. Overall, GBEE is the best-performing model across all 3 conditions.

As depicted in Figure 3, under the high probability condition, GBEE correctly predicted 12501 out of 12517 measured participles. GBPO predicted 12496, and GBC correctly predicted 12438, indicating that GBEE has the best performance. Under the mid-probability condition, GBEE correctly predicted 1742 out of 2524 measured participles. GBPO predicted 1575, and GBC predicted 1453. Under the low probability condition, GBEE correctly predicted 40 out of 47 measured participles, while both GBPO and GBC each predicted 34 correctly. This demonstrates GBEE's superior performance across all 3 conditions. Regarding GPEE, under the high probability condition, it correctly predicted 12004 out of 12517 participles. GPPO also predicted 12004, and GPC predicted 11999. Under the mid-probability condition, GPEE correctly predicted 1503 out of 2524 participles, GPPO predicted 1390, and GPC predicted 1174. Overall, GBEE consistently emerges as the best-performing model across all conditions.

Figure 3.

Column plot for the difference percentage of the models.

As shown in Figure 4, GBPO correctly predicted 12496 participles under the high probability condition while misclassifying 21 participles under the mid probability condition. In comparison, GBEE correctly predicted 12501 participles under the high probability condition, with only 1 misclassification under the low probability condition and 15 misclassifications under the mid probability condition. Furthermore, under the low probability condition, GBPO correctly predicted 34 participles but misclassified 13 under the high probability condition. Conversely, GBEE correctly predicted 40 participles under the low probability condition and misclassified 7 under the high probability condition. Between GPPO and GPEE, GPPO correctly predicted 12004 participles under the high probability condition, with 513 misclassifications under the mid probability condition. On the other hand, GPEE correctly predicted 12079 participles under the high probability condition, with 438 misclassifications under the mid probability condition. Overall, these results highlight the superior performance of GBEE, especially under high probability conditions, as it consistently demonstrates higher accuracy and fewer misclassifications compared to the other models.

Figure 4.

Confusion matrix for the accuracy of each model.

Plotting the true positive rate (TPR) against the false positive rate (FPR) at various thresholds, the ROC curves illustrate the effectiveness of the best hybrid models. A curve that approaches the top-left corner indicates superior model performance and accuracy, as it signifies a high TPR and a low FPR. Figure 5 shows that under high probability conditions, the ROC curve is closest to the top-left corner, indicating that the model reaches a TPR of 1 early, thus demonstrating the best performance. Models under the mid-probability condition demonstrate a promising start with an initial TPR of approximately 0.3 at an FPR of 0.0, outperforming other models at the early stage. However, at an FPR of 0.4, models under the high probability condition surpass others, reaching a TPR of 1.0 earlier, indicating superior performance. Subsequently, the mid-probability models also achieve a TPR of 1.0. In contrast, models under low probability conditions exhibit weaker performance, as they remain below the macro average line and reach a TPR of 1.0 at a later stage.

Figure 5.

The ROC curves for the performance of the most efficient hybrid models.

The SHAP (Shapley Additive exPlanations) sensitivity in Figure 6 analysis offers a thorough explanation of how each feature affects the top-performing model's predictions. Based on the given SHAP analysis, the “cost of the vehicle”, “the borrower's age”, and the “ex-showroom price” have a significant impact on CIBIL scores and automobile loan assessments. Additionally, they have more effect on high and mid-probability conditions compared to low-probability conditions. Conversely, “marital status” and “current valuation” have less impact on CIBIL scores and automobile loan assessments. The “number of years in the city” and “the number of years at the current residence” have similar effects. The “Segment Desc” variable has a greater effect, especially under mid and high-probability conditions.

Practical Implications and Implementation in Real-World Financial Settings

Figure 6.

The SHAP sensitivity analysis of the best-performed model.

Beyond algorithmic performance, a number of other factors are necessary for the successful application of machine learning models in financial environments. Suggested models, which have been refined using sophisticated metaheuristics, show excellent predictive accuracy in determining the likelihood of financial distress and bankruptcy. However, putting these models into practice necessitates taking into account a number of important factors:

Integration with Current Systems: When evaluating credit risk, financial institutions frequently turn to legacy systems. To improve existing workflows without completely redesigning infrastructure, our models can be incorporated into decision support systems as modules or through API-based architectures.

Data Availability and Quality: Real-world application requires consistent access to high-quality, up-to-date financial data. Financial institutions must ensure robust data pipelines and implement preprocessing routines (e.g., handling missing values, standardizing formats) similar to those used in our study.

Models Interpretability: The requirement for explainable models is emphasized by regulatory frameworks like Basel III and GDPR. Explainability tools like SHAP (SHapley Additive exPlanations) can help shed light on model predictions, despite the complexity of some of the suggested models (such as XGBoost and ensemble methods).

Regulatory and Ethical Compliance: The deployment of the model must guarantee ethical data use and conform to financial regulations. Features used in predictions should not lead to discriminatory outcomes or privacy breaches.

Scalability and Real-Time Prediction: This study's optimized models are computationally effective and scalable for real-time financial decision-making, such as dynamic credit scoring, automated loan approvals, and early warning systems for financial distress.

Cost-Benefit Analysis: Prior to implementation, organizations can conduct a cost-benefit analysis that contrasts our suggested system with conventional models in terms of predictive accuracy, false positive/negative rates, and possible savings or losses in terms of money.

Theoretical Basis for Hybridization

The hybridization of machine learning classifiers with nature-inspired optimizers in this study is grounded in complementary strengths of the methods. Gaussian Process Classification (GPC) inherently provides probabilistic predictions and models uncertainty in the input space. When combined with Electric Eel Foraging Optimization (EEFO) or the Political Optimizer Algorithm (POA), the optimizer efficiently tunes hyperparameters such as length scale and number of restarts, enhancing GPC's ability to generalize across unseen data. Gradient Boosting Classification (GBC), on the other hand, is a strong ensemble method that builds additive models in a forward stage-wise fashion. Its performance is sensitive to hyperparameters like the number of estimators, learning rate, and maximum tree depth. By integrating GBC with EEFO and POA, the search for optimal hyperparameters is guided to regions that maximize predictive accuracy while preventing overfitting, leading to the superior performance observed in GBEE and GBPO.

The observed differences in predictive performance between the hybrid models can thus be theoretically attributed to the interaction between the model's inherent capabilities and the optimizer's search strategy. For instance, GPC benefits more from precise tuning of length scale and restart parameters, which directly influence the model's probabilistic predictions, while GBC's improvements arise primarily from optimized ensemble parameters that control model complexity and learning rate. This rationale explains why GBEE achieved the highest training and testing accuracy, closely followed by GBPO, while standard GBC, although reliable, trailed slightly due to the absence of optimizer-guided hyperparameter refinement. By presenting this theoretical framework, the study highlights the practical and methodological reasoning behind the choice of each classifier-optimizer pair.

6. Conclusion

The primary objective of CIBIL in the auto loan process was to furnish lenders with crucial insights into borrowers’ creditworthiness. By assessing applicants’ credit histories and financial behaviors, lenders gauged the risk associated with lending to them. CIBIL significantly contributed to this assessment by providing credit reports and ratings. A high CIBIL score indicates good credit health and timely loan repayments. In this study, advanced machine learning techniques like GPC and GBC, enhanced with optimization algorithms such as EEFO and POA, were used to predict CIBIL scores. The integration of these optimization algorithms with machine learning techniques significantly improved prediction accuracy, simplifying and reinforcing the reliability of the auto loan approval process. Based on this analysis, the GBEE hybrid model performed the best, achieving strong performance metrics during both the training and testing phases. It had an Ac of 0.965 and Re of 0.966, while its Pr and F1 were 0.910 and 0.888 during training. In contrast, during the testing process, it had an Ac of 0.903 and Re of 0.910 with a Pr of 0.888 and F1 of 0.888. Under mid-probability conditions, GBEE correctly predicted 1742 participles out of 2,524, while in the low-probability condition, GBEE correctly predicted 40 of the 47 participles. In addition, under the high-probability scenario, the number of measured participles in which GBEE was able to make correct predictions was 12,501 out of 12,517 participles.

By contrast, the weakest model was GPC, as stated by the lower performance metrics. In the training of GPC, Ac was 0.893, and Re also equated to 0.893, Pr - 0.885, and F1 score - 0.885. For testing, Ac was equated to 0.835, and Re also equated to 0.835; Pr was 0.814, F1 - 0.820. Further, the predictive capability of GPC highlighted its performance under different probability conditions where its output has correctly predicted 40 out of 47 measured participles in low probability conditions, 1174 out of 2524 in mid-probability conditions, and 11,999 out of 12,517 measured participles in high probability conditions.

Equally important is to realize the limitations that exist with ML, as it has been a pretty helpful tool. In view of the complexity of human behavior and financial factors, limits need to be placed on the predictions of ML regarding CIBIL ratings and auto loans. Due to their heavy reliance on previous data, these models may also fail to capture the subtlety of a person's credibility and financial situation. Moreover, estimates of repayment capability forecasted by such models can always be subject to unlooked-for events or even to a change in the economy. Fairness in loan decisions and data privacy are 2 more ethical factors that are taken into account. Therefore, although ML provides insightful information, the intricacy of financial behavior and socioeconomic variables ultimately limits the prediction power of ML.

Footnotes

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Mester

. What’s the point of credit scoring. Business Review 1997; 3: 3–16.

Abdou

Pointon

. Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Intelligent Systems in Accounting, Finance and Management 2011; 18: 59–88.

Laxmanan

. Awareness of Credit Score Mechanism in India: A Study with reference to Credit Information Bureau India Limited (CIBIL).

Itoo

Selvarasu

Filipe

. Loan products and credit scoring by commercial banks (India), 2015.

Jackson

. How credit scores can make a difference for your revenue cycle. Healthc Financ Manage 2008; 62: 34–38.

Avery

Bostic

Calem

, et al. Credit risk, credit scoring, and the performance of home mortgages. Fed. Res. Bull. 1996; 82: 621.

Benmelech

Dlugosz

. The credit rating crisis. NBER Macroecon Annu 2010; 24: 161–208.

Hurley

Adebayo

. Credit scoring in the era of big data. Yale JL & Tech. 2016; 18: 148.

Agarwal

Alok

Ghosh

, et al. Financial inclusion and alternate credit scoring: Role of big data and machine learning in Fintech. Indian School of Business 2019: 1–65.

10.

Ahmed

MSI

Rajaleximi

. An empirical study on credit scoring and credit scorecard for financialinstitutions. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) 2019; 8: 1323–2278.

11.

Kashyap

Mahapatro

Tantri

. How Does The Rescue of Weak Banks Through Mergers Impact Loan Performance? Evidence From India. Evidence From India (July 12, 2022) 2022: 1–57.

12.

Walks

. Driving the poor into debt? Automobile loans, transport disadvantage, and automobile dependence. Transp Policy (Oxf) 2018; 65: 137–149.

13.

Ghulam

Dhruva

Naseem

, et al. The interaction of borrower and loan characteristics in predicting risks of subprime automobile loans. Risks 2018; 6: 101.

14.

Deepika

Srinivas

. Credit familiarity impact on NPPN credit card holder-A contemporary study. Educational Administration: Theory and Practice 2024; 30: 3950–3955.

15.

Punarselvam

Gopi

Kowsalya

. Free from CIBIL score and strong your financial. In: International Journal of Engineering Research Technology (IJERT) NCICCT-2020 Conference Proceedings. Gandhinagar, INDIA: IJERT, 2020, pp.1–4.

16.

Moscato

Picariello

Sperlí

. A benchmark of machine learning approaches for credit score prediction. Expert Syst Appl 2021; 165: 113986.

17.

Singh

Sharma

, et al. Criteria selection of housing loan based on dominance-based rough set theory: an Indian case. Journal of Risk and Financial Management 2023; 16: 309.

18.

Ghosh

Vallee

Zeng

. FinTech lending and cashless payments. In: Proceedings of Paris December 2021 Finance Meeting EUROFIDAI-ESSEC. Amsterdam, Netherlands: Elsevier BV, 2022, pp.1–74.

19.

Musto

Souleles

. A portfolio view of consumer credit. J Monet Econ 2006; 53: 59–84.

20.

Galindo

Miller

. Can credit registries reduce credit constraints? Empirical evidence on the role of credit registries in firm investment decisionsAnnual Meetings of the Inter-American Development Bank, Santiago Chile. Washington, D.C: IDB, 2001, pp.1–25.

21.

Moroke

Makatjane

. Predictive modelling for financial fraud detection using data analytics: a gradient-boosting decision tree. In: Applications of Machine Learning and Deep Learning for Privacy and Cybersecurity. Hershey, Pennsylvania, USA: IGI Global, 2022, pp.25–45.

22.

Baker

Mahmood

Shaker

. Ensemble learning with supervised machine learning models to predict credit card fraud transactions. Revue d’Intelligence Artificielle 2022; 36: 509–518.

23.

, et al. Fraud feature boosting mechanism and spiral oversampling balancing technique for credit card fraud detection. IEEE Transactions on Computational Social Systems 2023; 11: 1615–1630.

24.

Peter

Manoj

Kumar

. Credit card fraud detection using artificial neural networks. NeuroQuantology 2022; 20: 772.

25.

Nickisch

Rasmussen

. Approximations for binary Gaussian process classification. J Mach Learn Res 2008; 9: 2035–2078.

26.

Hensman

Matthews

Ghahramani

. Scalable variational Gaussian process classification. In: Artificial Intelligence and Statistics. San Diego, California, USA: PMLR, 2015, pp.351–360.

27.

Friedman

Hastie

Tibshirani

. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics 2000; 28: 337–407.

28.

Abdelwahab

SAM

El-Rifaie

Hegazy

, et al. Optimal control and optimization of grid-connected PV and wind turbine hybrid systems using electric eel foraging optimization algorithms. Sensors 2024; 24: 2354.

29.

Linh

. Optimize power generation of thermal generating sources in solving the green energies-based economic load dispatch using electric eel foraging optimization. World Journal of Advanced Engineering Technology and Sciences 2024; 11: 368–378.

30.

, et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017; 30: 1–9.

31.

Dharavat

Sudabattula

Velamuri

. Optimal allocation of multiple distributed generators and shunt capacitors in a distribution system using political optimization algorithm. International Journal of Renewable Energy Research (IJRER) 2021; 11: 1478–1488.

	Criticalparameter
Models	n_estimators	learning_rate	max_depth	min_samples_split	n_restarts	length_scale
GBEE	182	0.0287	233	2	----	----
GBPO	114	0.1148	142	2	----	----
GBC	100	0.1	3	2	----	----
GPEE	----	----	----	----	87	392
GPPO	----	----	----	----	56	266
GPC	----	----	----	----	1	----