Clinical decision system for chronic kidney disease staging using machine learning

Abstract

Background:

Chronic Kidney Disease (CKD) is a prevalent health condition that requires personalized treatment planning at each of its five stages. Machine Learning (ML) and Generative AI have shown promise in predicting CKD progression based on patient data. However, existing prediction models have limitations on generalizability, interpretability, and resource requirements.

Objective:

This study aims to develop a clinical support system using ML models to classify CKD stages accurately. The research focuses on feature selection strategies and model performance evaluation to enhance prediction accuracy and guide personalized treatment planning for CKD patients.

Methods:

The study utilizes ML algorithms, including Gradient Boosting, XGBoost, CatBoost, and GAN AML, to categorize CKD stages. Various feature selection techniques such as Recursive Feature Elimination, chi-square test, and SHAP are employed to identify relevant features for improved prediction accuracy. The models are evaluated based on precision, recall, F1-score, accuracy, and AUC-ROC metrics.

Conclusions:

The findings demonstrate the effectiveness of CatBoost and GAN AML in accurately classifying CKD stages, highlighting the importance of expert knowledge in selecting feature selection strategies to enhance ML model performance. Future research directions include validating diverse datasets, integrating with clinical practice, and improving interpretability and explainability in CKD prediction models.

Keywords

machine learning XGBoost CatBoost SHAP features selection

1. Introduction

Chronic Kidney Disease (CKD) is a disease that causes kidneys to lose function slowly over time. It impacts millions of people globally. The leading causes are high blood pressure, diabetes, and glomerulonephritis, with less common but equally severe being autoimmune illnesses or long-term exposure to particular drugs or toxins. Any living habits and trends contribute to the increase in CKD. The prevalence of the disease is rising worldwide as the population ages and as predisposing conditions increase in occurrence. CKD has several long-lasting implications for the people who suffer from it. Complications such as cardiovascular disease, anaemia, bone diseases, and metabolic problems significantly impact the patient's health and quality of life.^1,2 The costs of CKD should also be mentioned in this regard – the high cost of medication, hospitalization, and productivity due to ongoing therapy can present a considerable financial burden for healthcare systems. Public health campaigns that raise awareness promote screening programmes, and drive research are fundamental to fighting CKD.

There are different stages of chronic renal disease, each with varying levels of kidney function and symptoms. Stages 3a and 3b show reduced renal function with symptoms like fluid retention, tiredness, and itching.^3,4 Stage 4 is characterized by severe nephron capacity loss, leading to anaemia and bone damage. In Stage 5, kidney function is drastically reduced, often requiring dialysis or a kidney transplant. Bones are crucial in overall health, protecting vital organs and mobility. Fractures can have severe consequences, highlighting the need for accurate classification systems. The proposed Systematized Attention Gate UNet (SAG-UNet)⁵ achieved a high accuracy rate in classifying bone fractures compared to existing models. To address data maintenance challenges in healthcare, Deep Federated Collaborative Learning (DFCL)⁶ has been suggested for improved diagnoses and analysis rates. In skin disease identification, an IoT-based monitoring system has shown significant improvements in detecting moles, skin tags, and warts, contributing to early diagnosis and treatment.⁷

The paper's main contribution is discussed as follows: This study aims to develop a clinical support system for accurately classifying CKD stages using machine learning (ML) models. It emphasizes feature selection strategies and model performance evaluation to improve prediction accuracy and guide personalized treatment planning for CKD patients. The research utilizes several ML algorithms to categorize CKD stages, including Gradient Boosting, XGBoost, CatBoost, and GAN AML. Various feature selection techniques, such as Recursive Feature Elimination (RFE), chi-square test, and SHAP, are employed to identify relevant features for enhanced prediction accuracy. The models are evaluated based on precision, recall, F1-score, accuracy, and AUC-ROC metrics. The findings demonstrate that CatBoost and GAN AML are particularly effective in accurately classifying CKD stages, underscoring the importance of expert knowledge in selecting feature selection strategies to boost ML model performance.

The paper is organized as follows: The Introduction (Section I) provides an overview of CKD; the Literature Review (Section II) provides a summary of relevant research; the Proposed Work (Section III) presents the research findings, the Experimental Results (Section IV) present the findings, and the Conclusion (Section V) summarizes the main points and makes suggestions for future work.

2. Related works

Deep learning and machine learning models predict end-stage renal disease in CKD patients.⁸ Eight models and four correction methods have been used. The deep learning model, with an AUC-ROC of 0.8991, identified positive associations with CKD progression. The research supports using attribution algorithms in deep learning to identify CKD progression's comprehensible characteristics, enhancing clinical management and treatment. An AHDCNN is proposed for early CKD diagnosis using CT images and hidden layers to identify unusual patterns.⁹ AI algorithms are investigated for CKD diagnosis, employing extreme gradient boosting (XGBoost) as a baseline model.¹⁰ AI is used to analyze patients’ electronic health information and forecast the probability that they will have severe CKD.¹¹ The above-average Matthews correlation coefficient (MCC) values of +0.499 in both cases and +0.469 in the third case were reached. Age, estimated glomerular filtration rate (eGFR), creatinine, hypertension, smoking, and diabetes were among the most significant clinical characteristics subsequently identified by a feature ranking analysis.

A new hybrid deep-learning network model (HDLNet) is created to find and identify CKD early on.¹² The Sooty Tern Optimization Algorithm (STOA) can identify CKD and non-CKD renal illnesses. The suggested method surpasses the present state-of-the-art method on every metric—specificality, accuracy, sensitivity, MCC, PPV, FPR, and FNR. A modified Deep Belief Network model is proposed to develop an intelligent renal illness prediction and classification model.¹³ Compared to other models, the prediction of the occurrence of CKD yielded a sensitivity level of 87.5% and an accuracy level of 98.5%.

The proposed optimization framework balances test accuracy and explainability to build a CKD prediction model.¹⁴ First, the model employs an extreme gradient boosting classifier based on hypertension, specific gravity, and haemoglobin to diagnose CKD as quickly as possible. Second, the model reached 99.2% and 97.5% accuracy, using 5-fold cross-validation and newly-seen data. Third, the prediction was affected mainly by haemoglobin, secondly by specific gravity, and lastly by hypertension. The study¹⁵ aimed to evaluate machine learning algorithms that require the fewest features to predict CKD. The Gradient Boosting classifier had an accuracy of 99.1% for the researchers who used logistic regression, support vector machine, random forest, and boosting. Hemoglobin was the most significant contributor to detecting CKD, while albumin was the smallest.

A new model was introduced that anticipates chronic renal disease, cardiac heart disease, hypertension, and many other diseases.¹⁶ A PSWN is a patient symptom-weighted network where the health condition correlation appears in similar patients. This method enhances healthcare development and disease identification by smart surveillance and forecasting accurately. It assists healthcare providers in classifying risk beforehand and advancing chronic disease healthcare with increased performance scores.

Machine-learning designs that establish clinical test characteristics to facilitate early and precise CKD diagnosis using specific cores’ pathological groups were discussed in.¹⁷ The random forest classifier was the best among the optimized datasets and had prominent attributes while diagnosing CKD. Thus, this method could facilitate improved CKD screening and treatment. Feature selection methods have been used alongside five unsupervised algorithms: K-Means Clustering, DB-Scan, I-Forest, and Autoencoder.¹⁸ According to the results of a computer-aided design system, clinical data on CKD and non-CKD was classified at a high accuracy of 99%. A method for diagnosing CKD¹⁹ was presented using a dataset from the University of California, Irvine. Models were built using six machine learning methods, and random forest showed the best diagnosis accuracy of 99.75%. After ten simulations, an integrated model combined logistic regression and gradient boosting with a support vector machine, random forest, or Perceptron to create an average accuracy of 99.83%.

An ensemble learning-based technique for detecting CKD on UCI machine-learning datasets was presented.²⁰ The proposed technique had the highest average of 99.75% accuracy, 99.40% precision, 99.41% recall, 99.61% F-measure, and 99.57% AUC-ROC. The method also increased the average detection rate by 5.64%, 1%, 2.04%, 8.63%, 1.99%, 2.84%, 2.42%, and 4.76% compared to the other approaches. An efficient e-healthcare risk prediction system powered by a heterogeneous network centred around licensed medical practitioners.²¹ It enhances monogenic score, density accuracy, execution time, and prediction accuracy by 73.98%.

The COVID-19 pandemic has resulted in millions of deaths globally. A new Deep Neural Network (IDA-DNN) has been developed to detect various diseases in COVID-19 patients.^22,23 A hybrid feature selection approach with different Machine Learning algorithms was developed to build a diagnosis system for chronic renal disease.^24,25 The evaluation of 14 classifiers revealed that CKD can be detected by the Extra Tress classifier in 98% of cases under a 2% actual negative rate, applying no data leakage. The Extra Tress was tested over the CKD dataset under the hybrid wrapper feature selection approach (Chi2-MI), where the accuracy score increased.

The linear discriminant analysis method from the feature selection method shows the best results with 99.5% accuracy. Machine learning and feature optimization are applied to achieve a high-accuracy automatic CKD detection model using patient clinical variables. LDA is the best feature optimizer, with an overall error rate of 0.5%. Deep learning models are complex and difficult for non-experts to interpret. In healthcare, imbalanced datasets and biases can skew performance, especially in healthcare applications. Some models require excessive computational resources, making them unsuitable for resource-constrained healthcare. Limited collaboration and data sharing can lead to weak models. In severe cases, researchers may work on private datasets, requiring more practicality. Without careful consideration, models may overpredict non-severe CKD threats, missing early intervention and personalized treatment.

3. Proposed methodology

Machine learning algorithms create a predictive model to estimate CKD stages based on data analysis of vital signs, health behaviour, lab results, and demographic information. The model includes data preprocessing, critical feature engineering, hyperparameter-optimized algorithms training, and performance analysis. Key features include creatinine levels, eGFR, urine protein, and blood pressure. The model is then integrated into a clinical support system for healthcare providers. Over time, the model's accuracy in predicting CKD stages can be improved by monitoring its operation in real-world conditions, receiving feedback, and periodically retraining using the latest data and advances in machine learning and CKD studies. This approach helps identify individuals at risk or in the early stages of CKD.

3.1 Data collection

Data, such as demographics, vital signs, laboratory test results, and health behaviour, is required to develop a clinical support system for identifying CKD stages. It is possible to interpret uneven prevalence and outcomes of CKD using a sample including demographic data, and the latter includes self-reporting and medical record definition of gender. Data was collected from the University of California, Irvine, to develop a clinical support system for diagnosing the CKD stage. A clinical support system for identifying the CKD stage must collect different data types. Thus, laboratory result data are collected and include gender data; the data on patient behaviour, including a family history of heart attack and CKD, diabetes, and smoking, are collected. In total, 23 characteristics of 491 patients were collected.

The efficiency and implementation of the clinical support system rely on all data being collected across the spectrum of the population, relying heavily on CKD patients. For this reason, the provided approach is both beneficial and justifiable because it allows the creation of a robust predictive model capable of understanding the difference between the CKD stages across multiple populations with various degrees of CKD severity. Healthcare professionals train the system of CKD support on a large dataset that includes demographics, vital signs, lab, and patient behaviour data. It allows them to make fast and comprehensive decisions about CKD diagnosis and effective treatments. Table 1 shows the data collection process,

Table 1.
Data collection.

Category Variable Name Type Description

Demographic Information Gender Nominal (binary: 0/1) Indicates the gender of the individuals in the dataset.

Vital Signs Age Baseline Ordinal (continuous) Age of the individuals at baseline – the beginning of the study or observation period.

sBP Baseline Ordinal (continuous) Systolic blood pressure at baseline, measured in millimetres of mercury (mmHg).

dBP Baseline Ordinal (continuous) Diastolic blood pressure at baseline, measured in millimetres of mercury (mmHg).

BMI Baseline Ordinal (continuous) Body Mass Index (BMI) at baseline measures body fat based on height and weight.

Health Behaviour History Smoking Nominal (binary: 0/1) Indicates whether the individual has a smoking history.

Lab Tests History Diabetes Nominal (binary: 0/1) Specifies if the individual has a history of diabetes.

History CHD Nominal (binary: 0/1) Specifies if the individual has a history of coronary heart disease.

History Vascular Nominal (binary: 0/1) Provides information about the individual's vascular history.

History HTN Nominal (binary: 0/1) Indicates whether a hypertension history exists for the individual.

History DLD Nominal (binary: 0/1) Provides information about the individual's dyslipidemia history.

History Obesity Nominal (binary: 0/1) Specifies if the individual has a history of obesity.

DLDmeds Nominal (binary: 0/1) Determines whether the individual has taken drugs for dyslipidemia.

DMmeds Nominal (binary: 0/1) Determines whether the individual has taken drugs for diabetes.

HTNmeds Nominal (binary: 0/1) Determines whether the individual has taken drugs for hypertension (HTN).

ACEIARB Nominal (binary: 0/1) Determines whether the individual has taken drugs for ACE Inhibitors and ARBs.

CKD Nominal (binary: 0/1) Represents the occurrence of chronic kidney disease (CKD).

Time To Event Months Ordinal (continuous) Represents the measurement of months relative to the occurrence of events.

TIME_YEAR Ordinal (continuous) Represents the measurement of years as a temporal reference in the study.

eGFR Baseline Ordinal (continuous) Represents the level of eGFR at baseline, indicating kidney function.

Cholesterol Baseline Ordinal (continuous) Represents the initial cholesterol level.

Creatinine Baseline Ordinal (continuous) Represents the starting level of creatinine for kidney evaluation.

Category	Variable Name	Type	Description
Demographic Information	Gender	Nominal (binary: 0/1)	Indicates the gender of the individuals in the dataset.
Vital Signs	Age Baseline	Ordinal (continuous)	Age of the individuals at baseline – the beginning of the study or observation period.
sBP Baseline	Ordinal (continuous)	Systolic blood pressure at baseline, measured in millimetres of mercury (mmHg).
dBP Baseline	Ordinal (continuous)	Diastolic blood pressure at baseline, measured in millimetres of mercury (mmHg).
BMI Baseline	Ordinal (continuous)	Body Mass Index (BMI) at baseline measures body fat based on height and weight.
Health Behaviour	History Smoking	Nominal (binary: 0/1)	Indicates whether the individual has a smoking history.
Lab Tests	History Diabetes	Nominal (binary: 0/1)	Specifies if the individual has a history of diabetes.
History CHD	Nominal (binary: 0/1)	Specifies if the individual has a history of coronary heart disease.
History Vascular	Nominal (binary: 0/1)	Provides information about the individual's vascular history.
History HTN	Nominal (binary: 0/1)	Indicates whether a hypertension history exists for the individual.
History DLD	Nominal (binary: 0/1)	Provides information about the individual's dyslipidemia history.
History Obesity	Nominal (binary: 0/1)	Specifies if the individual has a history of obesity.
DLDmeds	Nominal (binary: 0/1)	Determines whether the individual has taken drugs for dyslipidemia.
DMmeds	Nominal (binary: 0/1)	Determines whether the individual has taken drugs for diabetes.
HTNmeds	Nominal (binary: 0/1)	Determines whether the individual has taken drugs for hypertension (HTN).
ACEIARB	Nominal (binary: 0/1)	Determines whether the individual has taken drugs for ACE Inhibitors and ARBs.
CKD	Nominal (binary: 0/1)	Represents the occurrence of chronic kidney disease (CKD).
Time To Event Months	Ordinal (continuous)	Represents the measurement of months relative to the occurrence of events.
TIME_YEAR	Ordinal (continuous)	Represents the measurement of years as a temporal reference in the study.
eGFR Baseline	Ordinal (continuous)	Represents the level of eGFR at baseline, indicating kidney function.
Cholesterol Baseline	Ordinal (continuous)	Represents the initial cholesterol level.
Creatinine Baseline	Ordinal (continuous)	Represents the starting level of creatinine for kidney evaluation.

3.2 Data preprocessing

The data preprocessing procedure was conducted to create a clinical decision support system for CKD. This was important to ensure that the data were high quality and appropriate for analysis. A better understanding of the preprocessing procedure was facilitated by describing each step. Providing some equations about the procedure was also appropriate because they were applicable. The steps are as follows:

3.2.1. Missing values handling

The absence of the values was frequent in the Electronic Health Record datasets. It was caused by several reasons, such as the mistakes that happened while collecting the data, the malfunction of the equipment, and cases when a particular measurement was not taken. As for the new missing values in EHR data, the K – nearest Neighbours or KNN imputation tended to be applied instead of mean imputation. The KNN imputation was executed by determining the k nearest neighbours to the missing data point within the EHR dataset. It was conducted by applying a distance metric, such as Euclidean distance, to determine the distance between the two points. Finally, the missing value was imputed by calculating the average or weighted average of the corresponding feature values from these nearest neighbours within the EHR framework. It was conducted to replace the missing value.

Algorithm: K-Nearest Neighbours (KNN) Imputation

Input:

Dataset D comprises missing values

The number of closest neighbours (k).

Initialization:

Determine the attributes connected to every missing value.

Imputation:

For every absent value x_i in dataset D:

Determine the distance between x_i and every other data point in the dataset employing the selected distance metric. In most cases, the distance metric is computed as:

D i s t a n c e (x_{i}, x_{j)} = \sqrt{\sum_{f = 1}^{F} {(x_{i f} - x_{j f})}^{2}}

(1)

Where, x_if and x_jf represent feature f values in data points x_i and x_j, respectively.

The sum of all features is denoted by F.

Using the calculated distances, choose the k closest neighbours.

Take the feature values’ average (or weighted average) from the chosen closest neighbours to determine the imputed value for xi. Typically, this is expressed as:

{\hat{x}}_{i} = \frac{1}{k} \sum_{j = 1}^{k} x_{i j}

(2)

Where:

{\hat{x}}_{i}

denotes the imputed value for the missing data point.

x_ij is the feature value of the j-th neighbour to the missing data point x_i, where i is the number of features.

Output:

Revised dataset D with KNN Imputation filled in for missing values.

3.2.2. Data normalization

In data preprocessing, Z-score normalization, or standardization, is an extensively utilized method in which numerical characteristics are standardized to have a mean of 1 and a standard deviation of 0. This normalization method works wonders when the feature distribution is close to normal or Gaussian. To normalize the data, the mean of the feature is subtracted from each data point, and the resulting value is subsequently divided by the standard deviation. The main principle of Z-score normalization consists of standardizing data, which implies that all features have the same magnitude and can be connected on a standard scale. The exact form for Z-score normalization is as follows:

Z = \frac{(X - μ)}{σ}

(3)

Here, Z is a standardized value for the data point X, which is the actual value of the feature, µ indicates the feature mean, and σ indicates its standard deviation.

A Z-score normalization is a robust approach that is not sensitive to outliers. It ensures that the data is centred around and adequately scaled to zero by removing the mean and dividing by the std. As such, it may be appropriately applied in situations with outliers. Using medical datasets to detect a patient's CKD stage with high variability is also relevant. The method is accompanied by a simple arrangement in implementation and interpretation, involving only three simple arithmetic operations. As a result, it is widely used in various kinds of machine learning and analytical algorithms. Specifically, in this task's environment, its application would impose a standard scale on the numerical features for modelling and analysis, ensuring their success. Hence, Z-score normalization is the most appropriate option for normalizing numerical features in the CKD stage detection project because it does not require outliers, is a simple method, and is widely used.

3.2.3. Encoding categorical values

The one-hot encoding method is critical in preprocessing data for the clinical support system I want to create to find the CKD stage. This method is essential in this process as it makes it possible for categorical variables to be converted and machine learning algorithms to understand categorical factors. The one-hot encoding uses binary features to represent each category in the data such that the categorical variables are turned into binary vectors. The approach does not imply any ordinality or put any category above the other, and it automatically ensures that categorical data is correctly represented. The one-hot encoding function takes a category as the input and returns 0 if the observation is part of the category and one if it is not part. The formula that describes one-hot encoding is:

C a t e g o r y = {\begin{array}{ccccc} 1, if the finding fits the description \\ 0, Otherwise \end{array}

(4)

Several categorical features have been selected for the CKD stage detection research, such as those presented in Table 2. Each of these properties must be transformed into binary features for a machine to understand categorical information. One-hot encoding is used to ensure that machine learning algorithms can use these categorical variables for CKD stage identification purposes, and it operates by preserving each category as a binary feature. In particular, distinct binary features correspond to “Smoking Status,” being smoking, never smoking, and unknown/missing on this occasion. The same logic creates binary features defining “Race” as black, white, and unknown. Overall, one-hot encoding is vital as a preprocessing step that allows the CKD stage detection algorithm to use categorical information effectively.

Table 2.

Feature selection using different feature selection methods.

Feature Selection		Given Input	Selected Output	Number of
Techniques	Model	Features	Features	Features
No Feature Selection (NFS)	Gradient Boost	All 23 Features	All Features
	XGBoost		All Features
	CatBoost		All Features
	GAN-AML		All Features
Recursive Feature Elimination (RFE)	Gradient Boost	All 23 Features	Age, HistoryDiabetes, HistoryVascular, CholesterolBaseline	10
			CreatinineBaseline, eGFRBaseline
			sBPBaseline, dBPBaseline
			BMIBaseline, TimeToEventMonths
	XGBoost		Age, HistoryDiabetes, HistoryCHD	10
			HistoryVascular, DMmeds, ACEIARB
			CholesterolBaseline, eGFRBaseline
			dBPBaseline, TimeToEventMonths
	CatBoost		Age, HistoryDiabetes, ACEIARB	10
			CholesterolBaseline, CreatinineBaseline
			eGFRBaseline, dBPBaseline
			BMIBaseline, TimeToEventMonths
			TIME_YEAR
	GAN-AML		Age, DMmeds, ACEIARB, CholesterolBaseline, CreatinineBaseline,	10
			eGFRBaseline, dBPBaseline
			BMIBaseline, TimeToEventMonths
			TIME_YEAR
Chi-square Test	Gradient Boost	All 23 Features	HistoryDiabetes, HistoryCHD, HistorySmoking, HistoryHTN, DLDmeds	10
	Gradient Boost		DMmeds, HTNmeds, ACEIARB, TimeToEventMonths, TIME_YEAR	10
	XGBoost		HistoryDiabetes, HistoryCHD, HistorySmoking, HistoryHTN, DLDmeds	10
	XGBoost		DMmeds, HTNmeds, ACEIARB, TimeToEventMonths, TIME_YEAR	10
	CatBoost		HistoryDiabetes, HistoryCHD	10
			HistorySmoking, HistoryHTN, DLDmeds
			DMmeds, HTNmeds, ACEIARB
			TimeToEventMonths, TIME_YEAR
	GAN-AML		HistoryDiabetes,HistoryCHD, HistoryHTN, HistoryDLD, DLDmeds	10
	GAN-AML		DMmeds, HTNmeds, ACEIARB, TimeToEventMonths, TIME_YEAR	10
Information Gain	Gradient Boost	All 23 Features	Age, CholesterolBaseline	5
	Gradient Boost		CreatinineBaseline, eGFRBaseline, TimeToEventMonths	5
	XGBoost		HistoryDiabetes, HistoryCHD, HistoryVascular, DMmeds, ACEIARB	8
			CholesterolBaseline, eGFRBaseline
			TimeToEventMonths
	CatBoost		EGFRBaseline, TimeToEventMonths	10
			TIME_YEAR, HistoryDiabetes
			CreatinineBaseline, Age, ACEIARB
			HTNmeds, HistoryCHD, HistoryDLD
	GAN-AML		TimeToEventMonths, eGFRBaseline, TIME_YEAR, AgeBaseline, CreatinineBaseline	10
			DMmeds, HistoryVascular, HistoryDiabetes
			ACEIARB, sBPBaseline
SHAP	Gradient Boost	All 23 Features	TimeToEventMonths, eGFRBaseline	10
			Age, HistoryDiabetes, dBPBaseline
			CholesterolBaseline, BMIBaseline
			CreatinineBaseline, sBPBaseline
			DMmeds
	XGBoost		TimeToEventMonths, eGFRBaseline	10
			BMIBaseline, dBPBaseline
			CholesterolBaseline, Age, HistoryDiabetes
			ACEIARB, CreatinineBaseline
			sBPBaseline
	CatBoost		EGFRBaseline, TimeToEventMonths	10
			TIME_YEAR, Age, dBPBaseline
			HistoryDiabetes, CreatinineBaseline
			ACEIARB, DMmeds, CholesterolBaseline
	GAN-AML		TimeToEventMonths, eGFRBaseline	10
			AgeBaseline, TIME_YEAR, dBPBaseline
			DMmeds, CreatinineBaseline, ACEIARB
			BMIBaseline, HistoryDiabetes

3.2.4. Outlier detection and removal

In constructing a clinical support system for detecting CKD stage, the preprocessing of data, including detecting and removing outliers via a Box-Cox transformation, can be significantly enhanced. Statistical methods are first used to determine the presence of outliers after performing the preprocessing measures, including missing value management, numerical feature normalization, and categorical variable encoding. Although it may indicate anomalies, these methods allow data points to be differentiated from the rest of the set. The Box-Cox transformation is also applied to the numerical features containing outliers when detected. The following formula determines the formula with which the data distribution may become more similar to a normal distribution:

y^{(λ)} = {\begin{array}{cccc} \frac{(y^{λ} - 1)}{λ}, if λ \neq 0 \\ \log (y), if λ = 0 \end{array}

(5)

Executing this transformation makes the data distribution more similar to a normal distribution. In the transformed state, the data distribution remains normalized and is not subject to the effects of the outliers. Following these transformations, the results of the CKD stage detection work's models and analyses would be more reliable since the data would be more consistent, and its distribution would assume a more shaped form due to the datasets.

3.3. Feature selection and engineering

Feature selection is essential in developing a clinical support system to detect CKD. This phase aims to narrow down the current dataset and select the most relevant predictors. Proper feature selection can ensure that the model focuses on the most critical variables, which can enhance performance and interpretability. Many feature selection methods are relevant to the current project, such as RFE, chi-square test, information gain, and Shapley Additive explanations (SHAP). These are included in the table below.

3.3.1. No feature selection (NFS)

The technique implies using all the features that are in the dataset without implementing the feature selection process. It uses no algorithm or method to determine the dataset's most significant or relevant information. No characteristic is given priority during training or prediction. The considerable advantage of this method is that all information in the dataset is included. Furthermore, no further preprocessing must be done before feeding the data into the algorithm, and all potentially relevant information is retained. However, this approach may increase computational complexity, reduce interpretability, and decrease prediction accuracy if characteristics that are not necessary or repetitive.

3.3.2. Recursive feature elimination (RFE)

The goal of RFE, in this case, is to extract the best features from a long list of potential predictors, such as history of diabetes, demographic information, and lab results, among others. The identified features are critical for developing CKD stage-specific classification models that would be both effective and interpretable. Implementing this technique involves several crucial steps; each adapted to the specificities of the dataset and the selected classification model. The first step is initialization, when a gradient boosting-based estimator, Gradient Boosting Classifier, XGB Classifier, or CatBoost Classifier, for instance, is selected, along with either the complete feature set or a maximum number. The second step is joining the model fitting with the feature ranking phase, which involves training the gradient boosting model with the existing set of features and assessing the importance of individual features. Lastly, the final classification model is developed, with the subset of features selected through RFE at the second phase of the process, during which metrics, such as feature importance scores, are applied.

Algorithm: Recursive Feature Elimination:

Input:

X: A matrix that represents samples as rows and features as columns.

y: The target vector that contains the labels for each sample in X, indicating the stage of CKD.

k: The Hyperparameter that determines the number of features to be selected.

Estimator: An estimator is a classification model used for feature selection. Examples include gradient boosting models like XGBoost, CatBoost, or classic gradient boosting.

Output:

X_selected: Subset of features chosen via Recursive Feature Elimination (RFE).

Procedure:

Initialization

X_selected= X

Selected_features = all_features

Score = 0

Feature Ranking

Apply the estimator to the specified dataset (X_selected) and calculate the feature importance scores.

Denote the function importance(X_selected,y) that computes the significance of X_selected features to predict y.

Assign a ranking to the features according to their respective significance scores.

Feature Elimination

Eliminate the least significant attribute(s) from the chosen X_selected and modify selected_features appropriately.

Eliminate the least significant feature(s) from the chosen X_selected and update selected_features accordingly.

Stopping Criteria

If the number of chosen features (k) equals the target number of features to be selected, terminate the process.

Else repeat steps 2 and 3

Final Model Training

Build a final classification model using the chosen subset of characteristics (X_selected) and the target vector (y).

3.3.3. Chi-Square test

The Chi-square test is one of the statistical techniques used in different research, including medical studies. It is an effective tool used to evaluate the relationship between categorical factors. Analyzing the possibility of using the Chi-square test to predict the stage of CKD, it is possible to speak about the importance of considering categorical factors related to demographics, medical history, and laboratory findings as predictors. At least six variables can be mentioned, including gender, age group, the presence or absence of diabetes and hypertension, smoking status, and ingestion of prescribed drugs. The target variable is the stages of CKD, which are identifiable with the help of eGFR and several other findings that are often defined as complaints and symptoms, physical examination information, and additional notes in medical histories. The Chi-square test is used to find the differences observed in the stages of CKD related to each categorical predictor.

The computation of the Chi-square statistic is the process of measuring the discrepancy or the relationship between the observed and expected frequencies of CKD stages available in the sources across all levels of the predictors. A high Chi-square value can be observed when strong relationships are detected between one predictor and the stages of CKD. In such cases, the predictor can help detect the disease, and this value is valuable for the present and future studies. After the Chi-square test is performed, the p-value must be interpreted. This is the probability that the observed relationship is accidental and dependent on the high value of the test. The relationship is unlikely to be accidental if the p-value is low, usually less than 0.05. If the p-value exceeds 0.05, then there is not enough evidence to state that the null hypothesis is rejected and that the two variables, predictors and stages of CKD, are independent.

Algorithm: Chi-Square Test for CKD Stage Detection

Input:

X: Matrix of Predictor variables.

y: Labels vector for CKD stage.

α: Level of significance.

Output:

The chi-square statistic(χ2)

p-value (p)

Procedure:

Determine the Expected and Observed Frequencies:

Create a contingency table that compares each categorical predictor variable with the CKD stages.

Determine the predicted (E) and observed frequencies (O) for every cell in the contingency table.

Chi-square Statistic computation:

Apply the following formula to get the Chi-square statistic (χ2):

χ 2 = \sum \frac{{(O - E)}^{2}}{E}

Degrees of Freedom computation:

The degrees of freedom (df) can be calculated as (r−1) × (c−1), where r and c represent the number of columns and rows, respectively, in the contingency table.

Calculate the p-value:

The p-value (p) can be determined by employing the Chi-square distribution with degrees of freedom (df).

p - v a l u e = P (χ 2 > χ_{o b e s r v e d}^{2})

Decision:

If p-value < α, reject the null hypothesis and conclude a substantial connection between categorical predictor and CKD stages.

For p-values ≥ α, the null hypothesis is not rejected, indicating no significant correlation between categorical predictor variables and CKD stages.

3.3.4. Information gain

The machine learning approach called Information Gain, particularly choosing an appropriate subset of features that can help classify different stages of CKD, is targeted at identifying the most relevant subset of features that can accurately classify different patients to their specific stages. In the case of IG, features describe a patient's attributes, and the result of the Information Gain selection is a relevant subset of features used to classify many CKD stages.

Usually, the process of definition of Information Gain features for selecting different CKD stages includes the following stages:

Entropy Calculation: Entropy is a measure that conveys the amount of uncertainty or unpredictability associated with a given amount of information. It is calculated using the following formula:

E n t r o p y (S) = - \sum_{i = 1}^{n} p_{i l o g_{2} (p_{i})}

Where p_i is the probability associated with each class i.

Calculating Conditional Entropy: Considering the target variable, the conditional entropy of every feature is inferred. It calculates the target variable's average entropy by splitting the dataset according to every feature.

Computation of Information Gain: The Information Gain of any feature is calculated by subtracting the feature's conditional entropy from the target variable's entropy.

I G (X) = E n t r o p y (S) - C o n d i t i o n a l E n t r o p y (X | S)

Where X stands for every feature.

Feature Ranking: Each feature is sorted according to its Information Gain. In the feature subset, features with higher IG values are considered more informative and have a higher priority in being included.

Feature Selection: The final feature subset to identify the CKD stage should be selected by choosing the top k features with the highest IGs.

The subset of features selected by high Information Gain features is expected to have the most relevant data for properly predicting CKD stages. By prioritizing the most distinct features, the dimensionality of the dataset will decrease, and the classification models’ efficiency will increase. Thus, selecting information gain features is necessary to improve the accuracy and interpretability of models that detect the CKD stage.

3.3.5. SHAP

SHAP are machine learning methods for explaining predictions made by models. SHAP can explain how each attribute contributed to the model's prediction regarding a patient's CKD stage.

SHAP values are based on cooperative game theory and aim to attribute a contribution score to the features to make predictions. SHAP values show the feature's impact on the model's output. When SHAP is positive, the feature positively impacts the likelihood that a specific stage of CKD will be predicted. At the same time, when it is negative, this is not the case.

SHAP values are determined by utilizing Shapley values derived from cooperative game theory. These values are computed according to the subsequent equation:

\emptyset_{i} (v) = \sum_{S \subseteq N {i}} \frac{| S |! (| N | - | S | - 1)!}{| N |!} [v (S \cup {i}) - v (S)]

Where, ∅︀i (v) - denotes the SHAP value of a feature i

N – is the entire set of features.

v(S) - denotes a model's prediction when a particular set of features is considered.

S∪{i} - means including the feature i in this subset.

|S| denotes the cardinality representing the subset S.

|N| is the total number of all features.

3.4. Model selection

Several aspects are included at the stage of model selection of the CKD stage detection project. They are the selection of machine learning models and the exploration of machine learning with generative AI. Furthermore, ML models, such as LightGBM, CatBoost, and XGBoost, should be evaluated in terms of performance, interpretability, and scalability. Additionally, the investigations are conducted to fuse machine learning and generative artificial intelligence to develop interpretable models that can efficiently represent complex relationships. The models are described below:

3.4.1. Gradient boosting

For the CKD stage detection project, a gradient boosting framework called LightGBM is beneficial because of its efficiency, scalability, and performance with large datasets. Specifically, by using gradient boosting, LightGBM iteratively minimizes a loss function by fixing errors by sequentially fitting new models to the residuals of the prior ones. Additional advantages include the fact that LightGBM divides data into small pieces, making it ‘lighter’ and using less memory. Thus, the training can be done faster, and memory can be economized, which is critical for CKD data, which has numerous features. Finally, the leaf-wise tree growth strategy allows the generation of more balanced and deeper tree structures, which has a good potential for improving accuracy performance. Notably, the final equation of the model's prediction includes a weighted total from each leaf region to consider the multiplicative effect of each tree's predictions. This expression can be written as

\hat{y_{i}} = \sum_{k = 1}^{K} w_{k} I (x_{i} \in R_{k})

(6)

Where,

\hat{y_{i}}

- The predicted output for the sample i

K - tree's leaf count

w_{k −} The weight connected with the leaf k

R_{k −} The attached region or leaf k

I—Indicator function. I evaluate to 1 if x_i belongs to the area R_k; otherwise, it is 0.

3.4.2. XGBoost

The extreme gradient Boosting approach, abbreviated as XGBoost, is an efficient method of finding the stage of CKD. This approach utilizes gradient boosting algorithms to construct a predictive model that accurately categorizes different CKD stages of patients by using their demographic information, vital signs, lab test results, and health behaviour. XGBoost repeatedly includes fragile learners, often decision trees, in an ensemble to improve the model's effectiveness. The method is particularly suitable for medical data, such as CKD data, because of its scalability, efficiency, and capability to model big data with many attributes. Concerning the features, this model allows for using different types of regularization approaches, scalability, handling of missing data, and interpretation of the relevance of features. The equation for XGBoost is expressed as given below:

\hat{y_{i}} = \sum_{k = 1}^{K} f_{k} (x_{i})

(7)

Where,

\hat{y_{i}}

- Predicted output for sample i

K - Number of trees in the model,

f_k - Prediction of the k-th tree

x_i - Feature vector for sample i.

3.4.3. Catboost

CatBoost, an algorithm based on gradient boosting, attempts to apply techniques learned about managing categorical features, such as using a particular drug or medical history. The formula used by the tree-building algorithm is:

P r e d i c t e d V a l u e = \sum (\propto_{i} * h (x_{i} - y_{i} * \sum (k < i) \propto_{k} * h (x_{i} - y_{k})))

(8)

In this regard, α_i defines the splitting point within each tree, which can be taken as x_i is the input character, h is a base learner, commonly known as decision trees, and α_i determines the learning rate of each tree. As a result, the interaction of each tree creates a sophisticated set. By repeatedly updating and polishing the predictions, they are more refined. Before applying CatBoost to the CKD dataset, preparing the data and carefully selecting the appropriate characteristics is essential. The regularization intensity and the learning rate were among the key factors that had to be adjusted.

3.4.4. GAN augmented machine learning (GAN-AML)

A prospective approach for CKD identification is to implement Generative Adversarial Networks (GANs) with machine learning models to enhance their performance. GAN-Augmented ML is a novel, robust data augmentation approach that utilizes GANs to boost traditional ML models by providing them with a wider and more precise variety of data samples. GANs generate synthetic data that accurately mimic actual patient data's empirical distribution. The synthetic data creation process is unrestricted by a two-player game. Consequently, the discriminator network must distinguish between natural and artificial data, whereas the generator network generates synthetic samples. As a result of applying the GANs to the original dataset, the generator can intelligently replicate the complicated features and behaviours of actual patient data.

Consequently, it generates synthetic data sets that must be suitably combined into the training dataset and accurate patient records. Working with the GANs’ synthetic data, the ML model is trained on a more substantial and diverse set. The more comprehensive dataset improves the ML model's ability to generalize and reliably detect CKD stages across diverse patient populations. Using Generative AI and ML models, the GAN-AML hybrid model strikes a satisfactory balance and outperforms the models in healthcare environments.

Algorithm:

Initialize GAN Model

Set the initial values of θ_G and θ_D for the generator G and discriminator D networks of the GAN model.

Define the hyperparameters for the training process, which consist of the learning rate (η), batch size (B), and number of epochs (N_epochs).

Train GAN Models

Repeat over a predetermined number of epochs (N_epochs):

Extract X samples of actual patient record X_real from the original dataset.

Use the generator network to generate a batch of synthetic data samples X_synthetic= G(Z), where Z is a noise vector taken from a latent space.

Train the Discriminator network to discriminate:

Use the binary cross-entropy loss to calculate the discriminator loss L_D for both natural and synthetic data samples:

L_{D} = - \frac{1}{B} \sum_{i = 1}^{B} [l o g (D (x_{i})) + l o g (1 - D (G (Z_{i})))]

Using gradient descent, modify the discriminator weights to minimize the discriminator loss:

θ_{D} \leftarrow θ_{D} -^{η} Δ_{θ_{D}} L_{D}

Train the Generator Network:

Use the discriminator's feedback to calculate the generator loss LG:

L_{G} = - \frac{1}{B} \sum_{i = 1}^{B} l o g (D (G (Z_{i})))

Reset the generator weights with gradient ascending to optimize the discriminator loss:

θ_{G} \leftarrow θ_{G} -^{η} Δ_{θ_{G}} L_{G}

Generate Synthetic Data

Utilize the trained generator network to create artificial data samples after training.

Augment Training Data Set

Construct an augmented training dataset combining authentic patient data with created synthetic (X_augmented= X_real ∪ X_synthetic).

Initialize ML Model

Choose the Gradient Boosting Machine Learning model to detect CKD stages.

Set up the ML model's hyperparameters.

Train ML Model

Use the enhanced training dataset to train ML:

Split the expanded dataset into training and validation sets.

Train the machine learning model using the training data.

Optimize the hyperparameters through cross-validation.

Assess the model's performance on the validation set.

3.4.5. Model evaluation

During the model evaluation process, the models trained on the training set are evaluated on the testing set. Hence, appropriate evaluation measures such as F1-score, recall, accuracy, precision, and the area under the ROC curve are calculated.²⁶ Moreover, the model is tested for its ability to detect CKD cases with high specificity while maintaining a low rate of false positives. All these evaluations and analyses ensure that the model can be applied in reality for CKD stage identification.

3.4.6. Deployment and integration

Deployment and integration, which entails linking the trained model into a clinical support system interface, is another workflow component. The healthcare providers are given an interface with which they can interact easily and input the patient data. The system then offers predictions based on the input data on the probability of CKD and its stage. The method ensures that it provides reasons or justifications for predictions to help healthcare professionals understand and trust the model's recommendations. This method also ensures that the system is user-friendly and assists healthcare providers in understanding and relying on the model's predictions to manage CKD.

4. Experimental results

The experimental findings reveal the performance of 4 ML models, namely Gradient Boosting, XGBoost, CatBoost, and GAN AML, in categorizing CKD stages. To understand the efficiencies of the models in predicting stages, various feature selection strategies were adopted to explore relevant features for enhanced prediction accuracy. Tables 3, 4, 5, 6, and 7 depict the performance evaluation of all Models for NFS, RFE, Chi-square test, Information Gain, and SHAP. Regarding the model's performance in selecting features, it was notable that CatBoost and GAN AML performed excellently in determining the completeness of all the criteria. In this case, CatBoost achieved 92% precision, GAN AML 98% recall, 95% F1-score, 91% accuracy, and 90% AUC-ROC. According to the research findings, it is essential to note that these models have natural processes that allow them to separate the stages of CKD without utilizing feature selection strategies. This notion is proven by improvements in all models’ performance measures after applying the RFE. In this instance, a method called Gradient Boosting had 93% precision rate, 95% recall rate, 94% F1-score, 90% accuracy rate, and 86% AUC-ROC. As for XGBoost, the performance rates were 94% precision, 93% recall, 94% F1-score, 89% accuracy, and 87% AUC-ROC. CatBoost consistently had good results concerning precision, 92%; recall, 97%; F1-score, 94%; accuracy, 90%; and AUC-ROC, 90%. In the same line, the performance of GAN AML was also remarkable, as evidenced by a precision rate of 96%, recall rate of 97%, F1-score 97%, accuracy rate of 94%, and AUC-ROC 96%. The findings confirm equivalent separation among CKD stages, thus implying that the identified features are more relevant when separating into groups.

Table 3.
Performance evaluation of all Models for NFS.

Model PRECISION RECALL F1-SCORE Accuracy AUC-ROC

Gradient Boosting 91 95 93 88 85

XGBoost 92 94 93 88 83

CatBoost 92 98 95 91 90

GAN AML 92 98 95 91 90

Model	PRECISION	RECALL	F1-SCORE	Accuracy	AUC-ROC
Gradient Boosting	91	95	93	88	85
XGBoost	92	94	93	88	83
CatBoost	92	98	95	91	90
GAN AML	92	98	95	91	90

Table 4.

Performance evaluation of all Models for RFE.

Model	PRECISION	RECALL	F1-SCORE	Accuracy	AUC-ROC
Gradient Boosting	93	95	94	90	86
XGBoost	94	93	94	89	87
CatBoost	92	97	94	90	90
GAN AML	96	97	97	94	96

Table 5.

Performance evaluation of all Models for the Chi-square Test.

Model	PRECISION	RECALL	F1-SCORE	Accuracy	AUC-ROC
Gradient Boosting	93	97	95	91	85
XGBoost	92	97	94	90	82
CatBoost	93	98	96	92	88
GAN AML	95	97	96	93	96

Table 6.

Performance evaluation of all Models for Information Gain.

Model	PRECISION	RECALL	F1-SCORE	Accuracy	AUC-ROC
Gradient Boosting	91	93	92	86	87
XGBoost	91	98	94	90	83
CatBoost	92	93	93	87	92
GAN AML	95	98	96	93	95

Table 7.

Performance evaluation of all Models for SHAP.

Model	PRECISION	RECALL	F1-SCORE	Accuracy	AUC-ROC
Gradient Boosting	91	95	93	88	82
XGBoost	92	94	93	88	83
CatBoost	92	98	95	91	91
GAN AML	90	95	93	98	100

Using the Chi-square test for feature selection, the performance of the Gradient Boosting and CatBoost models improved significantly. With the deployment of Gradient Boosting, the precision rate was 93%, the recall rate was 97%, the F1-score was 95%, the accuracy rate was 91%, and the AUC-ROC value was 85%. The performance metrics of CatBoost were as follows: precision was 93%, recall was 98%, F1-score was 96%, the accuracy rate was 92%, and the AUC-ROC was 88%. The improvements demonstrated the effectiveness of the chosen features in acquiring relevant data for categorization tasks. However, the results of the Information Gain selection of features differed across the models. The following were the precision, recall, F1-score, accuracy, and AUC-ROC of Gradient Boosting: 91%, 93%, 92%, 86%, and 87%. On the other hand, XGBoost had a precision rate of 91%, a recall rate of 98%, an F1-score of 94%, an accuracy of 90%, and an AUC-ROC of 83%. Regarding CatBoost, the precision rate was 92%, the recall rate was 93%, the F1-score was 93%, the accuracy rate was 87%, and the AUC-ROC rate was 92%. Moreover, GAN AML had a precision rate of 95%, a recall rate of 98%, an F1-score of 96%, an accuracy rate of 93%, and an AUC-ROC rate of 95%. In conclusion, the results from this research suggest that for specific models, the chosen features may not represent the dataset's ability to discriminate across various factors.

Figure 1.

Comparison of evaluation metrics with NFS.

Figure 2.

Comparison of evaluation metrics with RFE.

Figure 3.

Comparison of evaluation metrics with Chi-square test.

Figure 4.

Comparison of evaluation metrics with Information Gain.

Figure 5.

Comparison of evaluation metrics with SHAP.

REF:

Figure 6.

Features selected using GB with RFE.

Figure 7.

Features selected using XGBoost with RFE.

Figure 8.

Features selected using CatBoost with RFE.

Figure 9.

Features selected using GAN AML with RFE.

Chi-square Test:

Figure 10.

Features selected using XGB with Chi-square test.

Figure 11.

Features selected using GB with Chi-square test.

Figure 12.

Features selected using CatBoost with Chi-square test.

Figure 13.

Features selected using GAN-AML with Chi-square test.

Information Gain:

Figure 14.

Features selected using XGBoost with IG.

Figure 15.

Features selected using GB with IG.

Figure 16.

Features selected using CatBoost with IG.

Figure 17.

Features selected using GAN-AML with IG.

SHAP (SHapley Additive exPlanations):

Figure 18.

Features selected using XGBoost with SHAP.

Figure 19.

Features selected using GB with SHAP.

Figure 20.

Features selected using CatBoost with SHAP.

Figure 21.

Features selected using GAN-AML with SHAP.

Figure 22.

ROC curve for XGBoost with SHAP.

Figure 23.

ROC curve for GB with SHAP.

Figure 24.

ROC curve for CatBoost with SHAP.

Figure 25.

ROC curve for GAN-AML with SHAP.

Finally, it can be observed that SHAP feature selection had varying impacts on the overall model performance. While Gradient Boosting and XGBoost saw marginal decreases in accuracy and AUC-ROC, CatBoost and GAN AML saw consistently superior performance across all metrics. Specifically, CatBoost saw a precision of 92%, recall of 98%, F1-score of 95%, accuracy of 91%, AUC-ROC of 91%, and GAN AML saw a precision of 90%, recall of 95%, F1-score of 93%, accuracy of 98%, and AUC-ROC of 100%. This is indicative of the fact that both of these models are capable of handling feature importance in a far more resilient manner, allowing them not to require the use of further selection methods. In conclusion, the main findings of the experiment highlight the need for proper feature selection methods, which should be chosen based on the individual characteristics of an ML algorithm and the classification of CKD stages. While some models benefit from such measures, CatBoost and GAN AML showed consistent superiority without explicit feature selection. Therefore, properly considering such methods is necessary for achieving optimal performance in CKD stage classification tasks. Figures 1–25 depicts the Comparison of Evaluation Metrics with NFS, RFE, Chi-square test, Information Gain, and SHAP.

4.1. Discussion

The study presents a comprehensive approach to developing a Clinical Decision System for accurately classifying CKD stages using machine learning (ML) models, highlighting several vital aspects. It identifies significant drawbacks in current CKD prediction models, such as their inability to generalize across diverse populations and datasets, reliance on limited data leading to unreliable predictions, and the complexity of deep learning models that make them difficult for non-experts to interpret. Additionally, imbalanced datasets can skew performance metrics, particularly in healthcare applications where equal representation is crucial. The research emphasizes the importance of feature selection and model performance evaluation, utilizing various ML algorithms, including Gradient Boosting, XGBoost, CatBoost, and GAN AML, to enhance prediction accuracy, with key features such as creatinine levels, eGFR, urine protein and blood pressure being integral to the model's development. It underscores the necessity of collecting diverse data, including demographics, vital signs, and laboratory test results, to create a robust predictive model, with the dataset comprising 491 patients allowing for a comprehensive understanding of CKD across different populations and severity levels. The study outlines a systematic approach to model evaluation, employing metrics such as F1-score, recall, accuracy, precision, and the area under the ROC curve to ensure effective detection of CKD cases with high specificity and a low rate of false positives, which is critical for real-world application. Furthermore, it suggests that continuous monitoring and feedback can enhance the model's accuracy over time, with periodic retraining using the latest data and advancements in CKD studies enabling the system to adapt to evolving healthcare needs and improve early detection and intervention strategies for CKD patients. Overall, this study contributes significantly to the field of CKD management by proposing a machine learning-based clinical support system that addresses the limitations of existing models and emphasizes the importance of data diversity and model evaluation, with findings that have the potential to improve patient outcomes through timely and accurate CKD stage identification.

5. Conclusion

This research aims to develop a clinical support system that uses data from the University of California, Irvine, for CKD. This paper presents the study of different machine learning models comprising Gradient Boosting, XGBoost, CatBoost, and GAN AML to classify stages of CKD. It involves the application of a feature selection strategy to enhance the prediction accuracy. The existing dataset, which has 491 patients’ demographic information, vital signs, laboratory test results, and health behaviour data, went through preprocessing. It is followed by missing value imputation using K-nearest Neighbors, Z-score normalization to numerical features, and one-hot encoding to categorical features.

Further activities constitute the resolution of outliers, where they are identified and removed using statistical methods and the Box-Cox transformation. The performance evaluation of four separate models, accompanied by various feature selection strategies, including no feature selection, RFE, chi-square test, information gain, and SHAP, is carried out. Empirical results illustrate the consistently higher performance of CatBoost and GAN AML in terms of the feature selection strategies applied and those not explicitly implemented. It is demonstrated that most models performed far better using RFE, signifying that the feature selection application is influential in classification activities. Most relevantly, information gain benefits are mixed, indicating the need for feature selection methods best suited for different situations. Ultimately, these results demonstrate the importance of expert knowledge in selecting feature selection strategies to enhance the performance of machine learning models for the classification of CKD stages. In the following investigation, longitudinal patient data is considered across many years to provide invaluable insight into the progression of CKD phases. This includes the identification of predictive biomarkers or risk factors related to the progression of the disease. The research recognizes the shortcomings of current models for predicting CKD, including their inability to be broadly applied, the intricacy of deep learning models, the unbalanced nature of datasets, and the need for computational resources. These restrictions significantly affect the models’ applicability and dependability in healthcare settings. The study investigates the possibility of employing a variety of datasets to forecast CKD using machine learning, and it makes recommendations for more investigation, better feature selection, and the incorporation of Gen AI. Enhancing trust among healthcare providers is the focus of its interpretability and explainability approach.

Footnotes

ORCID iDs

E. Chandralekha

T. R. Saravanan

N. Vijayaraj

Authors’ contributions

E. Chandralekha and T. R. Saravanan designed the framework, analyzed performance, validated the results, and wrote the article. N. Vijayaraj collected the information required for the framework, provided software, conducted critical reviews, and administered the process.

Code availability

Not applicable.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

No datasets were generated or analyzed during the current study.

References

Sun

Liu

, et al. Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous modified artificial neural network. Future Gener Comput Syst 2020; 111: 17–26.

Tekale

Shingavi

Wandhekar

, et al. Prediction of chronic kidney disease using a machine learning algorithm. Int J Adv Res Comp Commun Eng 2018; 7: 92–96.

Nishat

Faisal

Dip

, et al. A comprehensive analysis of detecting chronic kidney disease by employing machine learning algorithms. EAI Endorsed Trans Pervas Health Technol 2022; 7: e1.

Swain

Mehta

Bhatt

, et al. A robust chronic kidney disease classifier using machine learning. Electronics 2023; 12: 212.

Devi

Aruna

Almufti

, et al. Bone feature quantization and systematized attention gate UNet-based deep learning framework for bone fracture classification. Intell Data Anal 2024: 1–29.

Pradhan

. PIRAP: Medical cancer rehabilitation healthcare centre data maintenance based on IoT-based deep federated collaborative learning. Int J Coop Inf Syst 2023; 33: 2350005.

Vivekananda

Almufti

Suresh

, et al. Retracing-efficient IoT model for identifying the skin-related tags using automatic lumen detection. Intell Data Anal 2023; 27: 161–180.

Liang

Yang

Wang

, et al. Deep learning identifies intelligible predictors of poor prognosis in chronic kidney disease. IEEE J Biomed Health Inform 2023; 27: 3677–3685.

Chen

Ding

, et al. Prediction of chronic kidney disease using adaptive hybridized deep convolutional neural network on the Internet of Medical Things platform. IEEE Access 2020; 8: 100497–100508.

10.

Ogunleye

Wang

. XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans Comput Biol Bioinf 2020; 17: 2131–2140.

11.

Chicco

Lovejoy

Oneto

. A machine learning analysis of health records of patients with chronic kidney disease at risk of cardiovascular disease. IEEE Access 2021; 9: 165132–44.

12.

Venkatrao

Kareemulla

. HDLNET: a hybrid deep learning network model with intelligent IoT for detection and classification of chronic kidney disease. IEEE Access 2023; 11: 99638–99652.

13.

Elkholy

Rezk

Saleh

. Early prediction of chronic kidney disease using deep belief network. IEEE Access 2021; 9: 135542–9.

14.

Moreno-Sánchez

. Data-driven early diagnosis of chronic kidney disease: development and evaluation of an explainable AI model. IEEE Access 2023; 11: 38359–38369.

15.

Almasoud

Ward

. Detection of chronic kidney disease using machine learning algorithms with the least number of predictors. Int J Soft Comput Appl 2019; 10: 100–110.

16.

Rehman

Xing

Hussain

, et al. HCDP-DELM: heterogeneous chronic disease prediction with temporal perspective enabled deep extreme learning machine. Knowl Based Syst 2024; 284: 111316.

17.

Rashed-Al-Mahfuz

Haque

Azad

, et al. Clinically applicable machine learning approaches to identify attributes of chronic kidney disease (CKD) for use in low-cost diagnostic screening. IEEE J Transl Eng Health Med 2021; 9: 1–1.

18.

Antony

Azam

Ignatious

, et al. A comprehensive unsupervised framework for chronic kidney disease prediction. IEEE Access 2021; 9: 126481–126501.

19.

Qin

Chen

Liu

, et al. A machine learning methodology for diagnosing chronic kidney disease. IEEE Access 2020; 8: 20991–21002.

20.

Yadav

Sharma

Mahadeva

, et al. Exploring hyper-parameters and feature selection for predicting non-communicable chronic disease using stacking classifier. IEEE Access 2023; 11: 80030–80055.

21.

Sathyaprakash

Alagarsundaram

Devarajan

, et al. Medical practitioner-centric heterogeneous network powered efficient E-Healthcare risk prediction on Health Big Data. Int J Coop Inf Syst 2024:2450012. doi:https://doi.org/10.1142/S0218843024500126

22.

Sivaparthipan

Muthu

Fathima

, et al. Blockchain assisted disease identification of COVID-19 patients with the help of IDA-DNN classifier. Wirel Pers Commun 2022; 126: 2597–2620.

23.

Meenakshi

Anbarasan

Murugesan

, et al. Development of an efficient CNN model with hyperparameter tuning for early prediction of lung diseases. In 2023 International conference on data science, agents & artificial intelligence (ICDSAAI) 2023 Dec 21, pp. 1–4. IEEE.

24.

Rahman

Al-Amin

Hossain

. Machine learning models for chronic kidney disease diagnosis and prediction. Biomed Signal Process Control 2024; 87: 105368.

25.

Dey

Uddin

Babu

, et al. Chi2-MI: a hybrid feature selection-based machine learning approach in the diagnosis of chronic kidney disease. Intell Syst Appl 2022; 16: 200144.

26.

Sharma

. Performance-based evaluation of various machine learning classification techniques for chronic kidney disease diagnosis. arXiv preprint arXiv:1606.09581, 2016.

Clinical decision system for chronic kidney disease staging using machine learning

Abstract

Background:

Objective:

Methods:

Conclusions:

Keywords

1. Introduction

2. Related works

3. Proposed methodology

3.1 Data collection

3.2.1. Missing values handling

3.3.1. No feature selection (NFS)

3.3.2. Recursive feature elimination (RFE)

3.3.3. Chi-Square test

3.3.4. Information gain

3.3.5. SHAP

3.4. Model selection

3.4.1. Gradient boosting

3.4.5. Model evaluation

3.4.6. Deployment and integration

4. Experimental results

Table 3. Performance evaluation of all Models for NFS. Model PRECISION RECALL F1-SCORE Accuracy AUC-ROC Gradient Boosting 91 95 93 88 85 XGBoost 92 94 93 88 83 CatBoost 92 98 95 91 90 GAN AML 92 98 95 91 90

5. Conclusion

Footnotes

ORCID iDs

Authors’ contributions

Code availability

Funding

Declaration of conflicting interests

Data availability

References

Table 3.
Performance evaluation of all Models for NFS.

Model PRECISION RECALL F1-SCORE Accuracy AUC-ROC

Gradient Boosting 91 95 93 88 85

XGBoost 92 94 93 88 83

CatBoost 92 98 95 91 90

GAN AML 92 98 95 91 90