Abstract
Objective:
To evaluate predictive capability and clinical applicability of the current nephrolithometric scoring systems of S.T.O.N.E. score, Guy's scoring system (GSS), CROES (Clinical Research Office of the Endourological Society) nomogram, and S-ReSC (Seoul National University Renal Stone Complexity) score for percutaneous nephrolithotomy (PCNL) outcomes in the same cohort in a prospective study.
Methods:
Consecutive patients undergoing PCNL between 2015 and 2018 were included calculating the four scores in the same cohort. Stone-free status (SFS), complications, operative time (OT), estimated blood loss (EBL), fluoroscopy time, and length of hospital stay were investigated. Receiver operator characteristic (ROC) curves for predictive accuracy and regression analysis for predictors of SFS were performed.
Results:
In all, 162 PCNLs were accomplished and analyzed. Overall, SFS was 75.9% and complication rate was 30.9%. The mean acquisition time of scores was 52.9 ± 0.5 seconds for GSS, 05.1 ± 0.3 seconds for S.T.O.N.E. score, 224 ± 3.1 seconds for CROES, and 102.6 ± 3.5 seconds for S-ReSC score. SFS had the best association with CROES grade. Clavien grade was associated with S.T.O.N.E. score. Moreover, EBL and OT had best association with S-ReSC score. All scores had comparable predictive accuracy on ROC curves regarding SFS. Stone essence and tract length are not different in cases with residual stones. Number of involved calyces, single vs multiple stones and renal pelvic obstruction were significant predictors of SFS in regression analysis.
Conclusion:
The four scoring systems had comparable predictive accuracy for SFS. However, S.T.O.N.E. and S-ReSC scores were easily applicable and provided better association with EBL and OT compared with the GSS score. Number of involved calyces, stone multiplicity, and renal pelvic obstruction were significant predictors of SFS; hence, further studies are needed to invent a universally agreeable scoring system covering reported shortcomings in the currently used scores.
Introduction
The merit of percutaneous nephrolithotomy (PCNL), as a minimally invasive technique to manage large renal stones, lies in its successful outcome and acceptable complication rate. 1 However, efficient prediction of success and postoperative complications is still a challenge. 2 Many scoring systems have been devised to predict PCNL outcomes. Guy's scoring system (GSS), 3 the Clinical Research Office of the Endourological Society (CROES) nomogram, 4 S.T.O.N.E. nephrolithometry, 5 and Seoul National University Renal Stone Complexity (S-ReSC) score 6 are the most popular scoring systems. GSS divides PCNL complexity into four grades (grades I, II, III, and IV) depending on the patient's imaging characteristics. 3 CROES nephrolithometric nomogram uses surgeon and patient factors embracing the case volume per year and preoperative radiologic data, 4 where all parameters equalize a numeric value. It was further classified into four grades (grade I: 0–100, grade II: 101–150, grade III: 151–200, and grade IV: 201–350). 7 S.T.O.N.E. score is calculated depending on five variables including stone size, tract length (skin-to-stone distance), degree of obstruction, number of involved calices and stone essence (density). 5 S-ReSC score was generated based on expert opinions about what affects the outcome of PCNL regarding stone-free status (SFS) and complications, the authors proposed that the complex allocation of stones in the kidney is the most remarkable factor that affects PCNL success. They devised a 9-point system, with 1 to 9 assigned to specific pelvic and caliceal locations. The score is set by adding the cumulative locations involved, and the authors also created scoring tiers (low = 1–2 points, medium = 3–4 points, and high risk = 5–9 points), which were also predictive of SFS. 6 These nomograms predict surgical success and bearable postoperative complications which is of importance during outcome reporting and patient counseling as well. 3 –6 Hitherto, there is lack of prospective evaluation comparing these four standardized nomograms. 8 In this study, we compared the acquisition times and predictive accuracy of the four nomograms and investigated the possible predictors of SFS in the same cohort.
Methods
In the period from May 2015 to January 2018, this prospective study was carried out after obtaining appropriate written informed consents and according to our institutional Research Ethics Committee, consecutive cases undergoing PCNL for renal stones were included. Exclusion criteria included patients <18 years of age, concomitant ipsilateral ureteral calculi or previous preoperative nephrostomy tube or stent placement, and cases admitted for second look PCNL.
Measurements
Patients' demographics and perioperative data were collected for all cases prospectively including age, gender, body mass index, surgical and medical history, and all data necessary for calculation of scores. Preoperative noncontrast CT (NCCT) was performed and all scores were calculated as previously described 3 –6 including acquisition time of each score. GSS and CROES nomograms were used in four groups each, S.T.O.N.E. score in three categories (5–6, 7–8, and 9–13) 9 and S-ReSC was used in three scoring tiers (low = 1–2 points, medium = 3–4 points, and high risk = 5–9 points). 6 Difficulties during calculation were resolved in agreement between authors. All procedures were performed under general anesthesia and in the prone position. Plain radio-graphs of Kidney, Ureter and Bladder, and abdomino-pelvic Ultrasound at postoperative day 1 was carried out and a second look PCNL was performed when required through the same or a new tract after 72 hours. Postoperative complications were reported according to the modified Clavien system. 10 Other data collected included stone burden in mm2 (calculated using the ellipsoid formula: length × width × π × 0.25, where π = 3.14), 11 tract length, operative time (OT), estimated blood loss (EBL), fluoroscopy time (FT), and length of hospital stay (LOS). Postoperatively, NCCT for follow-up was carried out within 3 months for all patients, and SFS was defined as absence of residual fragments ≥4 mm.
Statistical analysis
A priori sample size was computed using G* power software for chi-square test to detect the difference between two groups regarding SFS, giving 0.3 medium effect size, at α error probability of 0.05, and a power of 95%, a total sample size of 145 patients was required for the study. The Statistical Package of Social Sciences for Windows (IBM Corp., Armonk, NY) version 20 was used for statistical analysis. Normality of data was tested using the Kolmogorov–Smirnov and Shapiro–Wilk tests. Continuous variables were presented as median and interquartile range when data are not normally distributed. Categorical variables were presented as numbers and percentages. Comparing data was performed using Mann–Whitney U-test, chi square test, Fisher's exact test as appropriate. One-way analysis of variance test with Bonferroni post hoc for pairwise comparison between acquisition time of scores. Correlations between EBL, OT, FT, and LOS and scores were assessed using Pearson's/Spearman correlation coefficients. Receiver operator characteristic (ROC) curves were generated to assess the predictive accuracy of each scoring system for SFS. Binary logistic regression analysis using forward stepwise (likelihood ratio) was run entering factors that showed significance in univariate analysis one at a time. A value of p < 0.05 was considered statistically significant and all statistical tests were two sided.
Results
Of 166 cases performed in the study period, a total of 162 PCNLs were accomplished and were available for analysis. Baseline demographics and perioperative characteristics are described in Table 1. The mean ± standard error of acquisition time of GSS was 52.9 ± 0.5 seconds, S.T.O.N.E. score was 105.1 ± 0.3 seconds, CROES score was 224.4 ± 3.1 seconds, and S-ReSC score was 102.6 ± 3.5 seconds. Significant difference between GSS vs S.T.O.N.E. score, CROES system, and S-ReSC score was found (p < 0.001 for each). Furthermore, there was significant difference between S.T.O.N.E. and S-ReSC scores vs CROES (p < 0.001 for each), whereas S-ReSC vs S.T.O.N.E. score was insignificant.
Baseline Patients' Demographics and Perioperative Characteristics Regarding Stone-Free Status
Data are given as median and IQR or n (%).
Clavien grade I, fever with no antibiotics—urine leakage and watchful waiting. Grade II, Blood transfusion—fever with antibiotics. Grade IIIa, renal pelvic perforation and prolonged nephrostomy. Grade IIIb, bleeding necessitating angioembolization.
Mann–Whitney U-test.
Fisher's exact test.
Chi-square test.
BMI = body mass index; CROES = Clinical Research Office of the Endourological Society; IQR = interquartile range; S-ReSC = Seoul National University Renal Stone Complexity.
Overall SFS was 75.9% and complication rate was 30.9%. Univariate analysis shows the significant factors associated with SFS (Table 1). All scoring systems were significantly associated with SFS (p < 0.001 for each) and postoperative complications (p < 0.001) (Table 2).
Stone-Free Status and Complications According to S.T.O.N.E., Guy's, CROES, and S-ReSC Nephrolithometry Scoring Systems
SFS had significant correlation with S.T.O.N.E. score, GSS, R-ReSC score, and CROES grade (rho = 0.511, 0.441, 0.543, and −0.554, respectively, with p < 0.001 for each). Clavien grade had modest correlation with S.T.O.N.E. categories, GSS, S-ReSC score, and CROES grade (rho = 0.469, 0.449, 0.448 and −0.352, respectively with p < 0.001 for each). The S.T.O.N.E. categories had significant correlation with EBL, OT, and FT (rho = 0.506, 0.603, and 0.576 with p < 0.001 for each, respectively). The GSS had significant correlation with EBL, OT, and FT (rho = 0.567, 0.678, and 0.615 with p < 0.001 for each, respectively). The S-ReSC score had significant correlation with EBL, OT, and FT (rho = 0.693, 0.764, and 0.534 with p < 0.001 for each, respectively). Nevertheless, the CROES grade had significant negative correlation with EBL, OT, and FT (rho = −0.598, −0.672, and −0.477 with p < 0.001 for each, respectively). The best correlation for SFS was recorded with CROES grade, and the best association with complications, using Clavien grade, was found with S.T.O.N.E. score. Furthermore, the best association with EBL and OT was observed with S-ReSC score, and the best correlation with FT discovered with GSS. However, the LOS had insignificant correlation with all nephrolithometric scoring systems.
The ROC curves were generated for the four scores determining the predictive accuracy in detecting SFS (Fig. 1a, b). Area under the curve (AUC) for GSS was 0.785, S.T.O.N.E. score was 0.838, S-ReSC score was 0.860, and CROES nomogram was 0.858, with asymptotic significance of <0.001 for each. The four scores had comparable predictive accuracy for SFS.

Factors that were found significantly associated with SFS in univariate analysis (Table 1) were entered in the logistic regression analysis using forward stepwise (likelihood ratio) method entering one predictor at a time.
These factors included continuous variables (stone burden, and number of involved calices) and categorical variables (staghorn stone where 1 = other vs staghorn, renal pelvic obstruction where 1 = no/mild vs moderate/severe, and number of stones where 1 = single stone vs multiple).
Analysis excluded stone burden and staghorn stone from the regression equation, whereas the other factors were significant predictors, the created model correctly classified SFS by 85.6%, where number of involved calices had significant negative impact on SFS, whereas no/mild renal pelvic obstruction and presence of a single stone had significant positive impact on SFS (Table 3), cases with caliceal stones are less likely to be stone free by the odds of 0.136. Single stones are 9.6-folds more likely to be stone free than multiple stones and cases with no or mild renal pelvic obstruction are 4-folds more likely to be stone free than those with moderate to severe renal pelvic obstruction.
Logistic Regression Variables for Stone-Free Status
CI = confidence interval.
Discussion
Tools were developed to provide a model for preoperative patient counseling and standardized method for prediction of SFS and postoperative complications after PCNL. 3 –6 A perfect nephrolithometric scoring system should have a high capability to predict SFS and complications, easily applicable in everyday practice, and bring about reproducible outcome with negligible or very little subjectivity. Yet, there is no universally accepted scoring system to cover the deficiencies in current scoring systems. The aim of this prospective study was to compare accuracy and clinical applicability of these scoring systems in a single cohort of patients as a footstep to refine these tools that might help a forthcoming creation of a universal and widely accepted scoring system. In this study, the four scoring systems discriminated well the SFS in univariate analysis and were of comparable accuracy in ROC curves. CROES nomogram had the best association with SFS, and S.T.O.N.E. score had the best association with Clavien grade, and GSS had best association with FT. The S-ReSC score had the best association with EBL and OT. None of the four systems correlate with LOS. Data acquisition time was significantly shorter for Guy's score in comparison with the other three scores (p < 0.001) (Table 1). Although S.T.O.N.E. and S-ReSC scores had little prolonged time for calculation, they had better association with other outcomes as Clavien grade and EBL and OT compared with GSS.
In external validation and comparative studies of these scoring systems, no score has proven superiority over the others. Many studies compared either two or three scores and their capability to predict SFS and complications after PCNL. In this regard, Labadie and colleagues compared GSS, S.T.O.N.E. score, and CROES nomogram in a retrospective study including 246 patients and concluded that these scoring systems and the stone burden equally predicted SFS. The GSS and S.T.O.N.E. scores were associated with EBL and LOS, whereas CROES nomogram did not. 11 In another recent study, Vicentini and colleagues calculated GSS, S.T.O.N.E. score, and CROES nomogram in 48 cases and found that the three nomograms showed similar ability to predict success of PCNL; however, the GSS was the quickest to be applied, 12 which is congruent with the current results. In a retrospective study where GSS and S.T.O.N.E score were compared in 185 PCNLs, the authors found that both have comparable accuracies in predicting SFS and association with EBL, OT, and LOS but not with complications. 13 In a prospective study comparing GSS, S.T.O.N.E. score, and CROES nomogram in 48 kidney units, the three scores equally predicted SFS but modest association of complications and standardization of scores is needed. 14 In a broader retrospective study in 160 patients with staghorn stones, SFS was 59%; the authors found that the four scores had potential association with SFS, OT, FT, and LOS, whereas only GSS and S-ReSC score were associated with complications. 15 These results are different from this study and this might be owing to inclusion of staghorn stones only and the retrospective manner. In a recent retrospective cross-sectional study in 298 cases with 300 procedures, comparing GSS, S.T.O.N.E., and CROES nomogram, the authors found that CROES system had the highest predictive value, specificity, and sensitivity, which in part is consistent with the results of this study. 16 In another study in 248 obese patients, comparing the three scores, the authors found that GSS and CROES system showed good predictive accuracy, whereas S.T.O.N.E. did not discriminate SFS and none was associated with complications. 17 The results of this study and the contradicting findings in the literature have further confirmed that no sole scoring system is perfect. Superiority of the CROES nephrolithometric nomogram that has been confirmed by the results of many former studies 11 –17 could be because the nomogram uses patient factors and imaging findings and hence a more comprehensive assessment. Yet, timewise, calculations would hinder repetitive daily practice.
From univariate analysis, stone burden, number of involved calices, presence of staghorn stone, renal pelvic obstruction, and multiplicity of stones were significantly associated with SFS. In addition, stone essence and tract length were insignificant factors in the present series. Regression analysis excluded stone burden and staghorn stone from the regression equation as insignificant predictors. whereas the other factors were significant predictors of SFS namely, number of involved calices, single vs multiple stones, and absence of renal pelvic obstruction that had significant impact on SFS in the present series. Refinement of these results is mandatory and to be studied on a large sample size to construct a valid scoring system. From the obtained results we believe that S.T.O.N.E. score and S-ReSC score were more easily applicable. Preoperative CT study is routinely performed; hence, S.T.O.N.E. score can be easily applied; it was created based on review of literature about factors that impact PCNL outcome, whereas the S-ReSC score was based on expert opinion. Stone essence and tract length were insignificant in our cohort. Singla et al. 14 omitted both factors (T and E) from the score and calculated SON score with AUC consistent with S.T.O.N.E. score. In this study, stone burden and presence of staghorn stone were excluded from our model and the number of involved calices, renal pelvic obstruction, and multiplicity of stones were the significant predictors. Those three factors can be easily calculated from preoperative CT (stone multiplicity, obstruction, and number of involved calices). Despite the fact this is the first prospective study to compare the four nephrolithometry scoring systems and standardized definition of SFS using preoperative NCCT, some limitations could be addressed as the calculation of scores was by only one urologist. However, these are validated scoring systems and there was no need to perform inter-rater agreement. Another limitation is that the cases were performed by four urologists. Future multicenter prospective studies are needed to prove our results.
Conclusion
The four scoring systems had comparable predictive accuracy for SFS. S.T.O.N.E. and S-ReSC scores were easily applicable and provided better correlations with EBL and OT compared with the GSS score. Number of involved calices, stone multiplicity, and renal pelvic obstruction were significant predictors of SFS in the present series. Further studies are needed to invent a universally agreeable scoring system covering reported shortcomings in the currently used scores.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
