Application of Artificial Intelligence-Based Classifiers to Predict the Outcome Measures and Stone-Free Status Following Percutaneous Nephrolithotomy for Staghorn Calculi: Cross-Validation of Data and Estimation of Accuracy

Abstract

Objective:

To develop a decision support system (DSS) for the prediction of the postoperative outcome of a kidney stone treatment procedure, particularly percutaneous nephrolithotomy (PCNL) to serve as a promising tool to provide counseling before an operation.

Materials and Methods:

The overall procedure includes data collection and prediction model development. Pre-/postoperative variables of 100 patients with staghorn calculus, who underwent PCNL, were collected. For feature vector, variables and categories including patient history variables, kidney stone parameters, and laboratory data were considered. The prediction model was developed using machine learning techniques, which include dimensionality reduction and supervised classification. Multiple classifier scheme was used for prediction. The derived DSS was evaluated by running the leave-one-patient-out cross-validation approach on the data set.

Results:

The system provided favorable accuracy (81%) in predicting the outcome of a treatment procedure. Performance in predicting the stone-free rate with the Minimum Redundancy Maximum Relevance feature (MRMR) treatment extracting top 3 features using Random Forest (RF) was 67%, with MRMR treatment extracting top 5 features using RF was 63%, and with MRMR treatment extracting top 10 features using Decision Tree was 62%. The statistical significance using standard error between the best area under the curves (AUCs) obtained from the Linear Discriminant Analysis (LDA) and MRMR. The results obtained from the LDA approach (0.81 AUC) was statistically significant (p = 0.027, z = 2.21) from the MRMR (0.64 AUC) (p = 0.05).

Conclusion:

The promising results of the developed DSS could be used in assisting urologists to provide counseling, predict a surgical outcome, and ultimately choose an appropriate surgical treatment for removing kidney stones.

Introduction

Minimal invasive procedures have made huge progress in the past two decades especially for the treatment of renal stones. As per the European Association of Urology (EAU) guidelines, stone size can be classified into those measuring up to 5, 5 to 10, 10 to 20, and >20 mm in the largest diameter, and percutaneous nephrolithotomy (PCNL) is considered the first line of management for large renal calculi (>20 mm).^1

–6 Various nomogram-based studies have been performed in the past to predict the results of PCNL.^7

–11

The outcomes of these studies were not unexpected, but they showed the fact that prediction is not based on only the preoperative and intraoperative variables as a whole, but the impact of each factor is important and is critical in determining the outcome of the procedure. One of the major drawbacks of these prediction methods is that these studies are structured based on an expert's findings or from findings of previously constructed studies with a limited number of variables taken into consideration.¹² Few did not take into consideration the patient factors, and the system cannot be further enhanced or made more accurate by using a new data set. Keeping these limitations in mind, studies have been performed to predict the outcomes of PCNL in the management of renal stones using artificial intelligence (AI) models with satisfactory results.^12

–15

In our study, we used various AI (machine learning [ML]) models to develop a decision support system (DSS) for prediction of postoperative outcomes following a PCNL procedure and to compare the results of each of these models. To the best of our knowledge, this is the first report of the use of an AI-based system to evaluate PCNL outcomes in partial and complete staghorn calculi.

Materials and Methods

Data collection

The study protocol was registered with the Kasturba Medical College and Kasturba Hospital Institutional Ethics Committee and approved (IEC-313/2020). A retrospective data of 100 patients with staghorn calculus (partial and complete), who underwent PCNL, were collected between March 2017 and February 2020 at our university teaching hospital. Partial staghorn calculi were defined as stones occupying the renal pelvis and extending into two calyces. Complete staghorn calculi were defined as stones occupying the renal pelvis and extending into all the major calyces, with presence in at least 80% of the collecting system.

Patients who underwent bilateral PCNL were excluded from the study. As part of standard protocol, computed tomography (CT) scan was performed for all patients, followed by routine blood investigations and coagulation profile. A single dose of intravenous cefoperazone and sulbactam was administered to all patients before the procedure. Standard PCNL in the prone position was performed via a tract size of 24–30F in all patients using a 20.8F nephroscope (Richard Wolf GmbH, Knittlingen). Tract dilatation was carried out using Alken metal dilators. All cases were performed by experienced endourologists at the center in conjugation with a single resident or trainee. Stones were fragmented using pneumatic lithotripter/holmium laser, and the decision to place postprocedure nephrostomy (14F/16F) was taken by the operating surgeon. All patients underwent conventional ureteric stenting after the procedure. Fluoroscopy, ultrasound scan (USS), or kidney, ureter, and bladder radiograph (KUB) was performed to check for the stone-free status. Complications were graded as per the modified Clavien–Dindo, and the need for ancillary procedures was also recorded. The detailed information regarding patient demographics, preoperative, intraoperative, and postoperative variables is provided in Table 1.

Table 1.

Details Regarding Preoperative, Intraoperative, and Postoperative Variables

Age (mean ± SD) (years)	48.64 ± 13.02
M:F	65:35
BMI (mean ± SD) (kg/m²)	25.48 ± 4.79
Comorbidities (total = 28)
Hypertension	10
Ischemic heart disease	6
Diabetes	9
Asthma	1
Cirrhosis	1
Hypothyroidism	1
History of stone surgery in target kidney	8
Right:left	52:48
Complete:partial	17:83
Stone size (mean ± SD) (mm)	27.587 ± 9.09
Stone density (mean ± SD) (Hounsfield units)	1008 ± 124.6
Percutaneous access
Superior calix	51
Inferior calix	62
Middle calix	12
No. of tracts
1	76
2	23
3	1
After PCNL nephrostomy	17
Fluoroscopy time (mean ± SD) (min)	4.29 ± 1.14
Operative time (mean ± SD) (min)	74.43 ± 19.18
Hemoglobin drop (mean ± SD) (g/dL)	0.422 ± 0.31
Stone free, n (%)	84 (84%)
Ancillary procedures (n = 16)
URSL	6
PCNL	10
Hospital duration (mean) (days)	3.8 (range: 3–8)
Complications (as per the modified Clavien–Dindo)
Grade I (n = 44)
Fever	12
Transient elevation of SCr (>0.5 mg/dl)	32
Grade II (n = 6)
Blood transfusion	2
Infections requiring additional antibiotics	4
Grade ≥ III (n = 2)
Renal angioembolization	1
ICD insertion	1

BMI = body mass index; ICD = intercostal drainage; PCNL = percutaneous nephrolithotomy; SCr = serum creatinine; URSL = ureteroscopic lithotripsy.

Dimensionality reduction

Redundant features can skew the results, which can affect our understanding of the data. Removing irrelevant and redundant features from a data set, therefore, becomes a high priority task. We use the Linear Discriminant Analysis (LDA) for this purpose. Balakrishnama and Ganapathiraju. have given a detailed description of the LDA, but in short, this algorithm aims to project the data on a new axis in such a way that it minimizes the variance and maximizes the distance between the means of the two classes (Fig. 1).¹⁶ We have also tried the same classification algorithms after reducing dimensionality using the Minimum Redundancy Maximum Relevance feature (MRMR) selection method.¹⁷ The MRMR tries to keep those features that have a high correlation with the independent variable but have a low correlation with other dependent variables.

FIG. 1.

DSS 1 predicts the success of PCNL. DSS = decision support system; LDA = Linear Discriminant Analysis; PCNL = percutaneous nephrolithotomy; SVM = Support Vector Machines.

ML classifiers

Five different ML classifiers were explored along with one clustering algorithm: (1) Logistic Regression (LR), (2) Support Vector Classification (SVC), (3) Decision Tree (DT), (4) Random Forest (RF), and (5) K-Means Clustering.

Logistic Regression

LR is used when the output (dependent variable) is categorical. We use a binary LR because our output can have only two possible outcomes. It gives us the estimated probability that the predicted output is the actual output. Data are fit into a linear regression model, which is then followed by a logistic function estimating the desired categorical dependent variable.¹⁸

Support Vector Classification

SVC is based on the Library for Support Vector Machines (LibSVM). Support Vector Machines (SVM) are used for supervised classification and regression tasks. In simple terms, this algorithm tries to create a hyperplane in the N-dimensional space to separate different classes. The optimal hyperplane maximizes the distance between data points of both classes.¹⁹

Decision Tree

DT algorithm can be used for classification tasks. It contains nodes, leaves, and branches that help us classify data points into appropriate classes. The tree is split across nodes based on any of the various algorithms, which can be provided as a parameter. They can handle categorical and numerical data at the same time. DTs have been used previously with medical data and have shown promising results.²⁰

Random Forest

RF is an ensemble-based learning algorithm. It collects the votes from all the DTs and aggregates them to determine the final class of the data point. We used 100 trees for our purpose before aggregating the results from those to give our final output. This algorithm has been previously used in cancer diagnosis studies and provided good results.²¹

K-Means Clustering

Clustering algorithms divide the entire data set into a group of classes, in our case two. They make sure that the distance between the data points present in one cluster is kept to a minimum while increasing the distance between clusters. These algorithms are mostly used for unsupervised learning tasks, but their application can be leveraged in classification tasks as well.²²

Evaluation

The performance of algorithms is measured using three performance indices: accuracy, precision, and recall. To get an unbiased measurement, we use Stratified K-fold cross-validation.²³ Stratified cross-validation shuffles our data and then splits to reduce class imbalance and it divides the data into n groups. Each group is taken as a test and others as training data and the model is tested. Finally, we can get the mean of our indices on each group to get an overall picture.

Results

The classification performance of the established algorithms/DSSs is described in Table 2. For each classifier, the parameter(s) have been experimentally calculated, and the findings obtained are displayed in the second column of each row. In the study, a Stratified K-fold approach was used to approximate these parameters. Table 2 presents the accuracy of the developed DSSs in predicting the stone-free status of a kidney after the PCNL treatment. The incorporation of the LDA in finding the subset of features and eventually the dimensionality reduction of the data is efficient. Compared to not utilizing a dimensionality reduction, this incorporation has resulted in an increase in classification accuracy of ∼10%, which is a substantial change, especially in medical applications. As there are only two classes, LDA results in the creation of the n-1 feature.

Table 2.

Performance of the Decision Support Systems in Predicting the Stone-Free Rate of a Kidney After Percutaneous Nephrolithotomy Treatment with Linear Discriminant Analysis Treatment

Performance in predicting the SFR with LDA treatment
Methods	Parameters	Recall (%)	Precision (%)	Accuracy (%)	AUC
SVM	C = 1.5	79	81	79	0.787
LR		77	83	79	0.79
DT		67	80	73	0.733
RF	n_estimators = 100	83	82	81	0.81
K Means	n = 2	52	57	53	0.53

Performance in predicting the SFR with MRMR treatment extracting top 3 features (BMI, number of tracts, nephrostomy)
SVM	C = 1.5	96	60	63	0.62
LR		93	61	64	0.63
DT		54	64	59	0.59
RF	n_estimators = 100	87	65	67	0.64
K Means	n = 2	2	5	45	0.47

Performance in predicting the SFR with MRMR treatment extracting top 5 features (top 3 features plus history of open renal surgery, complete staghorn)
SVM	C = 1.5	59	93	62	0.61
LR		78	56	56	0.55
DT		53	66	61	0.62
RF	n_estimators = 100	84	60	63	0.63
K Means	n = 2	18	10	47	0.48

Performance in predicting the SFR with MRMR treatment extracting top 10 features (top 5 features plus secondary access, sex of the patient, staged PCNL, comorbidities, laterality)
SVM	C = 1.5	64	60	60	0.6
LR		66	64	61	0.61
DT		62	69	62	0.63
RF	n_estimators = 100	68	59	60	0.60
K Means	n = 2	38	54	52	0.54

DSS = decision support system; DT = Decision Tree; LR = Logistic Regression; RF = Random Forest; SFR = stone-free rate; SVM = Support Vector Machines.

RF provides us with the best results for all parameters. These percentages are based on the mean of 10 splits performed on the data (Stratified K-fold). The accuracy and understanding will increase when we have a larger data set to perform the analysis. The MRMR method was used for dimensionality reduction and further applied the same ML algorithms on the data set created for comparison between the two-dimensionality reduction methods. The parameters predicting the stone-free status were divided into top 10 features in descending order: body mass index (BMI), number of tracts, stone location, history of open renal surgery, complete staghorn, secondary access, sex of the patient, staged PCNL, comorbidities, and laterality. The top 1/3/5/10 features were captured, and the results obtained by 3/5/10 features are mentioned in Table 2. The top feature selected by the MRMR was BMI, while the top 3 selected features were BMI, number of tracts, and nephrostomy, which showed the importance of these features over the data.

Similarly, in addition to the features mentioned above, the top 5 and top 10 features selected by the MRMR to have an impact on the stone-free rate (SFR) were history of open renal surgery, complete staghorn, secondary access, sex of the patient, staged PCNL, comorbidities, and laterality (in the mentioned order). Figure 2a and b, respectively, shows the receiver operating characteristic curve obtained after using the RF algorithm with the MRMR and LDA dimensionality reduction methods. From the area under the curve (AUC), we observe that the LDA gives better results than the MRMR. The statistical significance was compared using standard error between the best AUCs obtained from the LDA and MRMR. The results obtained from the LDA approach (0.81 AUC) was statistically significant (p = 0.027, z = 2.21) from the MRMR (0.64 AUC) (p = 0.05).

FIG. 2.

(a) ROC curve for the RF algorithm using MRMR top 3 features. (b) ROC curve for the RF algorithm using the LDA. AUC = area under the curve; MRMR = Minimum Redundancy Maximum Relevance Feature; RF = Random Forest; ROC = receiver operating characteristic curve. Color images are available online.

Discussion

Although there have been advances in minimally invasive techniques for the management of stone disease, still patient counseling and apt clinical decision-making remain a challenge especially for complex stones such as staghorn stones. The morphometric parameters and stone burden were considered the main prognostic factors for the SFR. Guy's scoring system was developed by Thomas and colleagues based on the stone burden and radiologic findings. This scoring system was easy to apply but had the limitations of being based on expert opinions and results of previously performed studies.¹¹

Seoul national university's renal stone complexity score was later developed in 2013 by Jeong and colleagues, which showed an accuracy of 0.86 in the prediction of the SFR after PCNL. CROES nomogram developed by Smith and colleagues also included preoperative clinical variables, which were missing from other scoring systems. This model achieved an accuracy of 0.76.⁹ Based on the review of various studies, in comparison to standard nomograms and scoring systems, AI models have additional benefits of higher accuracy, AUC, and precision in the prediction of outcomes of the procedure.²⁴ In this study, the main objective was to develop an AI-based system for predicting outcomes of PCNL procedure for staghorn calculi using various classifiers and comparing them.

The results obtained are comparable to other studies, and systems were developed based on five different classifiers, namely SVM, LR, DT, RF, and K Means.^12

–15 The best accuracy in predicting the stone-free status of a kidney after the first treatment was 81%, provided by the RF classifier. Previous studies have already proven that using multiple classifier systems perform better than a single classifier.^24

–27 Although a preliminary attempt, the results of the present study can be relied upon in terms of using ML for the prediction of the SFR for staghorn calculus (Fig. 3). The proposed method helps the clinician to counsel the patient before PCNL and to accurately estimate the probable SFR. Figure 3 illustrates how the proposed method can translate the preoperatively available data into the SFR.

FIG. 3.

Patient pathway followed in our study.

Advantages over existing systems

The proposed model overcomes the lacking feature of the existing systems, by providing an increased prediction accuracy with the input of the new data set. There are several advantages to the models presented in this article over existing models: (1) the model can be modified over time by exposing it to a new data set; (2) both stone parameters and the patient attributes are considered as input to the model; whereas existing models include only stone parameters (3), the model provides not only for the stone-free status but also for postoperative complications, and the need for a ancillary surgical procedure, and (4) provides greater accuracy of prediction when compared with existing models.¹⁵

Limitations

Although all the PCNL cases were performed or supervised by competent and experienced endourologists at our institution, the surgeon's experience was not considered while predicting the results. We did not measure stone burden or the impact of different energy sources, with assessment of the SFR done on KUB and USS rather than on CT scan. Also being a preliminary study, the comparison of the accuracy of the proposed DSS with the existing standard scoring systems predicting the SFRs was not included. The results of the study are promising, but in the future, a larger data set to confirm the predictability of the developed systems and its reliability is essential.

Area of future research

The use of AI for routine use will require further training with more high-quality data from various centers to increase the prediction accuracy and its reproducibility across all demographics. We expect this method to be of practical use for urologists in the future, something that can be applied for both prone and supine PCNL techniques, use of Hounsfield unit to measure stone density, and patients' anatomical abnormalities or malformation.

Conclusion

The proposed DSS can play an important role in patient counseling and decision-making especially while performing PCNL for staghorn stones. As per the developed model, the LDA for dimensionality reduction and RF classifier appears to achieve reasonable, predictable value, and relatively high accuracy. In the future, similar models can also be compared as well as combined with the existing nomograms and scoring systems to assess the increase in predictive precision and accuracy.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

No funding was received for this article.

Abbreviations Used

References

Hesse

, Brändle

, Wilbert

, Köhrmann

, Alken

. Study on the prevalence and incidence of urolithiasis in Germany comparing the years 1979 vs. 2000. Eur Urol, 2003; 44:709–713.

Leusmann

, Blaschke

, Schmandt

. Results of 5,035 stone analyses: A contribution to epidemiology of urinary stone disease. Scand J Urol Nephrol, 1990; 24:205–210.

Leusmann

. Whewellite, weddellite and company: Where do all the strange names originate?. BJU Int, 2000; 86:411–413.

Kim

, Burns

, Lingeman

, Paterson

, McAteer

, Williams

Jr . Cystine calculi: Correlation of CT-visible structure, CT number, and stone morphology with fragmentation by shock wave lithotripsy. Urol Res, 2007; 35:319–324.

Ruhayel

, Tepeler

, Dabestani

, et al. Tract sizes in miniaturized percutaneous nephrolithotomy: A systematic review from the european association of urology urolithiasis guidelines panel. Eur Urol, 2017; 72:220–235.

Pearle

, Lingeman

, Leveillee

, et al. Prospective randomized trial comparing shock wave lithotripsy and ureteroscopy for lower pole caliceal calculi 1 cm or less. J Urol, 2008; 179(5 Suppl):S69–S73.

Elsevier. Campbell-Walsh Urology 11th Edition Review - 2nd Edition. https://www-elsevier-com-s.web.bisu.edu.cn/books/campbell-walsh-urology-11th-edition-review/mcdougal/978-0-323-32830-2. Published November 26, 2015. Accessed March 9, 2021 .

Jeong

, Jung

, Cha

, et al. Seoul National University renal stone complexity score for predicting stone-free rate after percutaneous nephrolithotomy. PLoS One, 2013; 8:e65888.

Smith

, Averch

, Shahrour

, et al. A nephrolithometric nomogram to predict treatment success of percutaneous nephrolithotomy. J Urol, 2013; 190:149–156.

10.

Imamura

, Kawamura

, Sazuka

, et al. Development of a nomogram for predicting the stone-free rate after transurethral ureterolithotripsy using semi-rigid ureteroscope. Int J Urol, 2013; 20:616–621.

11.

Thomas

, Smith

, Hegarty

, Glass

. The Guy's stone score—Grading the complexity of percutaneous nephrolithotomy procedures. Urology, 2011; 78:277–281.

12.

Shabaniyan

, Parsaei

, Aminsharifi

, et al. An artificial intelligence-based clinical decision support system for large kidney stone treatment. Australas Phys Eng Sci Med, 2019; 42:771–779.

13.

Aminsharifi

, Irani

, Tayebi

, Jafari Kafash

, Shabanian

, Parsaei

. Predicting the postoperative outcome of percutaneous nephrolithotomy with machine learning system: Software validation and comparative analysis with guy's stone score and the CROES nomogram. J Endourol, 2020; 34:692–699.

14.

Aminsharifi

, Irani

, Pooyesh

, et al. Artificial neural network system to predict the postoperative outcome of percutaneous nephrolithotomy [published correction appears in J Endourol 2017 Jun;31(6):621]. J Endourol, 2017; 31:461–467.

15.

Kadlec

, Ohlander

, Hotaling

, Hannick

, Niederberger

, Turk

. Nonlinear logistic regression model for outcomes after endourologic procedures: A novel predictor. Urolithiasis, 2014; 42:323–327.

16.

Balakrishnama

, Ganapathiraju

. Linear discriminant analysis—A brief tutorial. Inst Signal Inf Process, 1998; 18:1–8.

17.

Mandal

, Mukhopadhyay

. An improved minimum redundancy maximum relevance approach for feature selection in gene expression data. Procedia Technol, 2013; 10:20–27.

18.

Hosmer

, Lemeshow

, Sturdivant

. Applied Logistic Regression. Hoboken, NJ: John Wiley & Sons, 2013.

19.

Chang

CC.

LIBSVM: A Library for Support Vector Machines. https://www.csie.ntu.edu.tw/∼cjlin/papers/libsvm.pdf. Accessed March 9, 2021 .

20.

Saraee

, Theodoulidis

, Keane

, Tjortjis

. Using T3, an improved decision tree classifier, for mining stroke-related medical data. Methods Inf Med, 2007; 46:523–529.

21.

Nguyen

, Wang

, Nguyen

. Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J Biomed Sci Eng, 2013; 06:551–560.

22.

Biswas

, Cranny

, Gupta

, et al. Recognizing upper limb movements with wrist worn inertial sensors using k-means clustering classification. Hum Mov Sci, 2015; 40:59–76.

23.

Zeng

, Martinez

. Distribution-balanced stratified cross-validation for accuracy estimation. J Exp Theor Artif Intell, 2000; 12:1–12.

24.

Shah

, Naik

, Somani

, Hameed

BMZ

. Artificial intelligence (AI) in urology-Current use and future directions: An iTRUE study. Turk J Urol, 2020; 46(Supp. 1):S27–S39.

25.

Amirmoezzi

, Salehi

, Parsaei

, Kazemi

, Torabi Jahromi

. A knowledge-based system for brain tumor segmentation using only 3D FLAIR images. Australas Phys Eng Sci Med, 2019; 42:529–540.

26.

Amiri

, Movahedi

, Kazemi

, Parsaei

. 3D cerebral MR image segmentation using multiple-classifier system. Med Biol Eng Comput, 2017; 55:353–364.

27.

Parsaei

, Stashuk

. SVM-based validation of motor unit potential trains extracted by EMG signal decomposition. IEEE Trans Biomed Eng, 2012; 59:183–191.