Abstract
Purpose:
To validate the output of a machine learning-based software as an intelligible interface for predicting multiple outcomes after percutaneous nephrolithotomy (PCNL). We compared the performance of this system with Guy's stone score (GSS) and the Clinical Research Office of Endourological Society (CROES) nomogram.
Patients and Methods:
Data from 146 adult patients (87 males, 59%) who underwent PCNL at our institute were used. To validate the system, accuracy of the software for predicting each postoperative outcome was compared with the actual outcome. Similarly, preoperative data were analyzed with GSS and CROES nomograms to determine stone-free status as predicted by these nomograms. A receiver operating characteristic (ROC) curve was generated for each scoring system, and the area under the ROC curve (AUC) was calculated and used to assess the predictive performance of all three models.
Results:
Overall stone-free rate was 72.6% (106/146). Forty of 146 patients (27.4%) were scheduled for 42 ancillary procedures (extracorporeal shockwave lithotripsy [SWL] [n = 31] or repeat PCNL [n = 11]) to manage residual renal stones. Overall, the machine learning system predicted the PCNL outcomes with an accuracy ranging between 80% and 95.1%. For predicting the stone-free status, the AUC for the software (0.915) was significantly larger than the AUC for GSS (0.615) or CROES nomograms (0.621) (p < 0.001).
Conclusion:
At the internal institutional level, the machine learning-based software was a promising tool for recording, processing, and predicting outcomes after PCNL. Validation of this system against an external dataset is highly recommended before its widespread application.
Introduction
In the era of minimally invasive surgery, percutaneous nephrolithotomy (PCNL) remains the standard of care for managing large renal calculi. 1,2 Along with the widespread use of PCNL, several groups have recently proposed and tested scoring systems to predict stone-free status after PCNL. 3 –5 Predictive models are potentially useful to facilitate clinical decision making and patient counseling. To predict the stone-free rate (SFR) after PCNL, Thomas and colleagues were the first to introduce Guy's stone score (GSS), a simple qualitative grading scale (grade I–IV) based on stone shape and configuration, as well as the presence or absence of renal or skeletal anomalies. 3 This approach further evolved when Smith and colleagues designed a quantitative scoring nomogram: the Clinical Research Office of the Endourological Society (CROES) nomogram. 4 To predict SFR after PCNL, several stone characteristics (burden, number, location, multiple, staghorn) together with institute-level case volume are included in this nomogram. A notable feature of the CROES method is that it considers the weighting of each parameter.
Recently, our group designed a machine learning-based system to predict post-PCNL outcomes such as SFR and the need for ancillary procedures. 6 More than 20 preoperative and intraoperative variables and their relative weights are involved in this intelligent network. Preliminary work showed that this system showed promising accuracy (81.0%–98.2%) in predicting stone-free status and the need for blood transfusion or ancillary procedures. 6 Based on machine learning algorithms, we then designed software and a user-friendly interface (Fig. 1).

The digital interface of the machine learning-based software consists of different panels for importing preoperative and intraoperative data
In this study, we sought to validate the output of this machine learning-based toolkit as an intelligible interface able to predict post-PCNL outcomes. We then compared the performance and accuracy of this system with GSS grading and CROES nomograms as two widely used prognostic tools for post-PCNL stone-free status. To our knowledge, this is the first comparative analysis of an artificial intelligence system designed to contrast its performance in predicting PCNL outcomes with currently available approaches.
Patients and Methods
Ethics
Approval for this study was obtained from our institutional review board. All patients were informed about the aims of this study and provided their informed consent to take part. All information imported into the software was de-identified and coded. Two-factor authentication was used for data encryption. Only authorized users had access to the information recorded in our system.
Patients
Data for all adult patients who underwent PCNL between September 2016 and November 2017 at our institute were included. All procedures were done by two fully qualified, competent endourologists (A.A., D.I.). Patients with bilateral PCNL or chronic kidney disease were excluded. All patients had CT scan as a part of preoperative imaging studies. Before all procedures, normal preoperative coagulation profile and negative urine culture were verified, and a single dose of intravenous ceftriaxone was administered at the time of anesthesia. All procedures consisted of standard fluoroscopy-guided PCNL with the patient in the prone position. 7 A pneumatic lithotripter was used for stone fragmentation, and a 14F nephrostomy was placed in the collecting system at the end of surgery.
Postoperative stone-free status was assessed by CT scan, and it was defined as the absence of residual stones larger than 4 mm. 8 Major surgical complications and the need for ancillary procedures to manage residual stones were also recorded. The preoperative, intraoperative, and postoperative variables used in this study are summarized in Supplementary Table S1.
Design and validation of machine learning-based software
The machine learning technique used for data analysis, classification, and regression as well as for identifying the connections between input and output variables is based on the support vector machine (SVM) model. 6,9,10 The SVM networks have an efficient training phase and are accurate, especially for clean datasets with well-defined input and output variables. 9,10 Figure 2A–C illustrate this model when two or more variables are to be classified and analyzed. The software is rendered in MATLAB (Mathworks, Natick, MA) and its user interface is shown in Figure 1. The user can easily import the preoperative values into the appropriate panels, and after computation, the predicted outcomes can be extracted from the output panel. The software can also be used as a registry to record and retrieve data in different formats. Technical details of the development of machine learning tools used in the software were described earlier. 11

As a supervised machine learning algorithm, SVM is based on finding hyperplanes to classify datasets
Clinical application and validation of the software
Preoperative data for 146 adult patients were consecutively imported into the software, and its output was extracted. To validate the system, the accuracy of the software for predicting each postoperative outcome was compared with the actual outcome.
Similarly, preoperative data were analyzed with GSS grades and CROES nomogram, and stone-free status was predicted by these nomograms. 3,4 The predictive performance of the nomograms was then calculated. A receiver operating characteristic (ROC) curve was generated for each scoring system, and the area under the ROC curve (AUC) was calculated and used to assess the predictive accuracy of nomograms versus the machine learning software. As expected, the GSS and CROES nomograms can directly predict postoperative stone-free status, but they are unable to directly capture other post-PCNL outcomes.
Participating surgeons were blinded to data collection and input into the nomograms and software. Each predictive model was used in its own Excel table, and separate teams processed each model.
Results
During the study period, 146 patients (87 males, 59%) were enrolled. Mean age was 49.3 ± 12.6 years, and mean stone burden was 451.2 ± 427.8 mm2. The demographic, preoperative, and intraoperative data for this cohort are summarized in Table 1. Mean hospital stay was 2.87 ± 0.69 days, during which the presence of postoperative residual stones was determined with noncontrast CT scans. All patients were discharged after removal of their nephrostomy tube. Table 2 shows the actual postoperative data for these patients. Overall SFR was 72.6% (106/146). Prolonged urine leakage requiring ureteroscopy (URS) and Double-J stent placement occurred in 12 patients (8.2%). Postoperative blood transfusion due to significant blood loss was required in 11 (7.5%). Forty of 146 patients (27.4%) were scheduled for 42 ancillary procedures (extracorporeal shockwave lithotripsy [SWL] (n = 31) or repeat PCNL [n = 11]) to manage residual renal stones (Table 2).
Demographic, Preoperative, and Intraoperative Characteristics of Patients
Stone burden = length × width × 0.78.
Largest diameter.
Defined as duration of X-ray exposure from insertion of the access needle to the start of nephroscopy.
From insertion of needle access to insertion of final nephrostomy.
CROES = Clinical Research Office of Endourological Society.
Postoperative Outcomes in the Cohort of Patients with Percutaneous Nephrolithotomy
Stone burden = length × width × 0.78.
Two patients received both procedures (SWL+subsequent PCNL) for management of their residual stones.
PCNL = percutaneous nephrolithotomy; SWL = extracorporeal shockwave lithotripsy; URS = ureteroscopy.
After preoperative data were analyzed and PCNL outcomes were predicted by the software, we then compared the software results with the actual outcomes to calculate the performance of the software for predicting each variable (Table 3). In general, the software predicted the PCNL outcomes with an accuracy ranging between 80% and 95.1%. When post-PCNL stone-free status was evaluated according to GSS grades and CROES nomogram score, higher Guy's stone grades and lower CROES nomogram scores were significantly associated with a lower SFR (p = 0.01 and p = 0.03, respectively). Figure 3 shows the stone-free status according to each classification system. When ROC curves were plotted for each predictive model for stone-free status (Fig. 4), the AUC for the machine learning software (0.915) was significantly larger than the AUC for GSS (0.615) or CROES (0.621) nomograms (p < 0.001). The machine learning system recognized stone burden, the presence of staghorn or multiple renal stones as the most highly weighted preoperative factors affecting the post-PCNL SFR.

The stone-free rate in each subgroup of Guy's stone score grades and the CROES nomogram. CROES = Clinical Research Office of Endourological Society.

ROC curve for stone-free status. The AUC for the machine learning software (0.915) was significantly larger than the AUC for Guy's gradings (0.615) and CROES (0.621) nomograms. AUC = area under the ROC curve; ROC = receiver operating characteristic.
Performance of the Machine Learning-Based Software in Predicting Stone-Free Status, Need for Blood Transfusion, and Different Ancillary Procedures in the Cohort
To manage urine leakage.
Discussion
In the era of minimally invasive stone surgery, PCNL has been considered the standard of care for managing large renal stones. According to a global study conducted by CROES, the overall stone-free and complication rates after PCNL were 75.5% and 20.5%, respectively, with stone burden and morphometry as the main predictors of SFR. 12 In recent years, several scoring systems have been developed for the prognostic evaluation of PCNL. These predictive models are important for patient counseling, patient selection (e.g., risk adjustment and referral to tertiary centers), and the evaluation of the quality of care and treatment efficacy. The GSS grades and CROES nomogram were originally reported in 2011 and 2013, respectively. 3,4
As a simple subjective and reproducible scoring system, GSS can be used to categorize patients into four grades based on their stone burden, configuration, and the presence of kidney or skeletal anomalies. 3,13 Several case series validated the performance of this system to predict post-PCNL stone-free status (AUC = 0.69–0.79). 14 –16 Yet despite its efficacy, GSS was originally developed by qualifying image analyses based on expert opinion, not on data-driven analyses. Obviously, there are a number of preoperative and intraoperative variables that cannot be processed by this system.
On the other hand, the CROES nomogram is a data-driven model based on a global cohort of patients from 96 centers worldwide. 4,12 This system calculates multiple regression analyses, several preoperative variables, and the relative weight of each input, and it uses them to predict post-PCNL SFR. 4,12,13 Since this nomogram is based on global data, understandably, several groups subsequently validated the CROES nomogram in light of its acceptable performance in predicting SFR (AUC = 0.641–0.76). 4,17 –20 However, as the developers acknowledged, 4,13 several important preoperative variables are not considered in this model. Moreover, the CROES nomogram is admittedly complex and its application may not be practical at high-volume or nonacademic centers. 13,20
Several studies have compared the predictive performance of GSS as a quick qualitative measure versus the CROES nomogram as a complex quantitative model. 20 In general, these studies showed a comparable performance of GSS grading (AUC = 0.629–0.821) versus the CROES nomogram (AUC = 0.627–0.820) in predicting post-PCNL SFR. 21 –27 In a systematic review, Withington and colleagues showed that although the validity of GSS is supported by a marginally higher quality of evidence compared with other nomograms, in general, the performance of all systems is similar for stone-free status. However, their questionable efficacy for predicting post-PCNL adverse events called attention to the need for further improvement. 28
The use of machine learning approaches is advancing in urology, particularly in the fields of uro-oncology and urolithiasis. Appropriately trained machine learning systems can be exposed to new inputs, which can endow them with the capacity for continuous learning and improve their ability to recognize patterns and associations between variables. 29 In 2017, we reported the accuracy of an artificial neural network (ANN) algorithm (ranging between 81.0% and 98.2%, AUC = 0.861) in predicting SFR, the need for post-PCNL ancillary procedures, and the need for blood transfusion. 6 Supervised learning algorithms have previously been used to adjust weight vectors and classifiers. 6 In this study, we set out to validate the accuracy of our adequately trained SVM-based software in processing prospective data from a new cohort. After about 2 years of the application of these intelligent systems in our practice, we consistently observed a high performance of them in predicting stone-free status in the current cohort (AUC = 0.915) as well as in our initial report (AUC = 0.861). 6 Not only was the predictive performance of the machine learning system better than the GSS grading and CROES systems; clearly, its intelligible digital interface facilitates application in almost any facility. Further, the machine learning software is also able to simultaneously process and report multiple endpoints. The system can also function as registry software to document a variety of current and predicted outputs. Previously, artificial intelligence systems were found to show similar or better efficacy than statistical data mining models in the evaluation of stone-free status after SWL. 30,31 Nevertheless, additional comparative studies are needed to evaluate the prognostic accuracy of these modern predictive approaches compared with statistical models in the field of PCNL.
The ability to simultaneously predict the need for ancillary procedures and/or blood transfusion with the accuracy of ≥80% with intelligence systems is also acknowledgeable. Bleeding is the most significant complication of PCNL. All patients in this study had standard 26F access to the collecting system. It has been shown that by reducing the tract size the risk of major bleeding may diminish with equal efficacy of stone clearance. 32
Moreover, patients with miniperc- or microperc-PCNL may also be potential candidates for tubeless and ambulatory PCNL. 32 With widespread implementation of the machine learning systems into these novel concepts of PCNL, patients' counseling regarding their postoperative morbidities and care pathways might be easier and more objective. This is also true if a digital intelligent system can predict the need for postoperative ancillary procedures to have an estimate about cost-effectiveness of the procedure beforehand. 32,33 In the era of value-based health care, this information may be helpful not only for patients and caregivers but also for the administrators to balance the cost/surgical outcomes. The benefit of machine learning systems to improve efficiency of the health care system and optimizing the value of care remains a hot topic for further cost analysis studies.
This article presents, to our knowledge, the first comparative study of a machine learning digital application versus well-known nomograms currently used to predict PCNL outcomes. However, the limitations of our study need to be noted. When the machine learning systems process data algorithms, they may find and report associations that may not necessarily be meaningful or important for clinical practice. Therefore, frequent provider supervision and system evaluation are essential. We are also aware that this software is designed based on a “classical” PCNL procedure in adults. The conventional cutoff of 4 mm for significant post-procedural residual fragments was defined, which may not be optimal for calculation of “true” SFR 33 and the outcome of patients with residual fragments <4 mm was not captured.
For the next step, we are going to evaluate how this system can be used for “decision making” at our institute. Because of differences between institutes in procedural details, case volumes, and surgical experience, this preliminary step needs to be externally investigated and validated. Wider validation would be essential since we know that reporting from a single center may introduce bias.
Noteworthy advantages of machine learning-based software are that it can be easily updated, and more input and output variables and customized features can be added. Therefore, other clinical applications (e.g., patients amenable for ambulatory PCNL) may be addressed. Whether machine learning-based software is potentially applicable as a universal integrated decision-making tool that can be adopted by other centers and for other surgical procedures for urolithiasis remains a hot topic for future research.
Footnotes
Acknowledgment
The authors thank K. Shashok (AuthorAID in the Eastern Mediterranean) for improving the use of English in the article.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This article is based on the theses by S.T. and T.J.K. for the specialty degree in urology awarded by Shiraz University of Medical Sciences (grants no. 95-01-01-13978, 95-01-01-13254).
Supplementary Material
Supplementary Table S1
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
