Abstract
Introduction:
Preoperative assessment of renal stones is essential to selecting treatment options and achieving high success rates; thus, some nephrolithometric scoring systems have been developed by using preoperative clinical data and stone characteristics. Initially, nomograms predicting stone-free rates (SFRs) were designed for percutaneous nephrolithotomy. After this, some were modified, and new scoring systems were developed for retrograde intrarenal surgery (RIRS). In this study, we aimed at validating and comparing the accuracy of four scoring systems predicting the SFR of RIRS.
Materials and Methods:
We conducted a prospective study. The data of 110 consecutive patients who required RIRS for renal stones between May 2018 and February 2020 were evaluated. The patients were divided into four groups regarding total score: 0, 1, 2, ≥3 according to the Resorlu-Unsal Stone Score (RUSS). The scores were calculated between 5 and 15 for the size of the stone, topography or location, degree of obstruction of the urinary system, number of stones, and evaluation of Hounsfield units (S.T.O.N.E.) scoring system. Modified Seoul National University Renal Stone Complexity (S-ReSC) scores of the patients were between 1 and 12. Finally, the patients were classified between 4 and 10 points with the R.I.R.S. scoring system.
Results:
The mean RUSS, S.T.O.N.E., R.I.R.S., and modified S-ReSC scores were 1.14 (±0.818), 10.78 (±1.499), 6.50 (±1.305), and 2.29 (±1.710), respectively. The area under curve values of RUSS, S.T.O.N.E., R.I.R.S., and S-ReSC were 0.735 (95% confidence interval [CI] 0.623–0.826), 0.725 (95% CI 0.626–0.823), 0.752 (95% CI 0.646–0.857), and 0.755 (95% CI 0.660–0.849), respectively. Logistic regression analysis revealed that the RUSS was an independent predictive factor for SFR (p = 0.028).
Conclusion:
The results showed that all four scoring systems predict the SFRs for RIRS accurately. However, surgeons should prefer RUSS when all four nomograms are available, except when assessing single renal stones. In that case, S-ReSC should be used for assessment. Three other nomograms except the S.T.O.N.E. scoring system can be suitable for the assessment of lower caliceal stones.
Introduction
Retrograde intrarenal surgery (RIRS) is a minimally invasive treatment option for kidney stones and has been very popular among urologists over the past three decades. The European Association of Urology (EAU) recommends shockwave lithotripsy, percutaneous nephrolithotomy (PCNL), and ureteroscopic lithotripsy as first-line treatment methods. PCNL is the gold standard treatment option for stones larger than 20 mm. 1 With the help of innovations brought by technological advances, RIRS can be applied even for kidney stones larger than 20 mm but is not yet considered the default treatment option for these cases.
Preoperative assessment of patients is essential to select a treatment option and achieve high success rates when planning an intervention; thus, some nephrolithometric scoring systems have been developed by using the preoperative clinical data of the patient and the stone characteristics. Initially, nomograms predicting the stone-free rates (SFRs) were designed for PCNL. After this, some were modified, and new scoring systems were developed for RIRS. There are four nephrolithometric scoring systems defined for RIRS: the size of the stone, topography or location, degree of obstruction of the urinary system, number of stones, and evaluation of Hounsfield units (S.T.O.N.E.) nephrolithometry scoring system, the Resorlu-Unsal Stone Score (RUSS), the Modified Seoul National University Renal Stone Complexity (S-ReSC), and the R.I.R.S. scoring system. None of them has yet been universally accepted to predict SFR for RIRS.
Even though RUSS and S-ReSC scoring systems had been studied for validation, R.I.R.S. scoring system and S.T.O.N.E. scoring system for RIRS had not been validated. 2,3 In this research, we planned to validate and compare the accuracy of these four scoring systems predicting RIRS outcomes in terms of SFR.
Materials and Methods
After receiving Institutional Review Board approval, we conducted a prospective study, and the data of 110 patients who required RIRS between May 2018 and February 2020 were analyzed. The patients and stone characteristics were recorded in follow-up forms according to the parameters of the four scoring systems. An experienced endo-urologist recorded the size, location, average HU and number of stones, the grade of obstruction in the collecting system, whether the patient possessed a urinary abnormality, infundibulopelvic angle (IPA), and the length of the infundibulum preoperatively. Perioperative findings were recorded in the operating room after the patient recovered. The stone-free status was described as not having residual stone fragments greater than 4 mm, and operation outcomes were checked and recorded 4 weeks after the operation.
Both preoperative noncontrast CT (NCCT) and intravenous urography were used to evaluate the patients. The sum of maximal diameters was calculated as the stone size. The locations and the size of the stones were evaluated by NCCT and recorded. The IPA was measured by using Elbahnasy's method. 4 Obstruction and degree of hydronephrosis (grade 0–4) were recorded. Also, renal abnormalities and infundibulum length were recorded.
Patients who were younger than 18 years old, over 85 years old, had bilateral urinary system stones, concurrent ureteral stone, prior Double-J catheter, and urinary tract infection; missing data were excluded from the study.
All RIRS procedures were performed in lithotomy position under general anesthesia. A 9.5F ureteroscope (Karl Storz®, Tuttlingen, Germany) was used to access the ureter for safe dilatation under the guidance of a guidewire in all cases. The ureter was viewed to detect any abnormalities up to the ureteropelvic junction. A guidewire was placed into the renal pelvis before a ureteral access sheath (Elite Flex®, Ankara, Turkey) was inserted in the ureter under fluoroscopy. A 7.5F flexible ureteroscope (Flex-X2; Karl Storz) was used for the RIRS procedure. A 200 mm laser fiber (Ho YAG Laser; Dornier MedTech®, Munich, Germany/Dornier Med-Tech GmbH, Medilas H20 and HSolvo, Wessling, Germany) was used for laser lithotripsy for all patients included in the study. Finally, a Double-J catheter was placed in all patients. Operation time was recorded from the beginning of cystoscopy to the end of ureteral stent placement. Intraoperative data were recorded, and patients who had no complications were discharged the day after their operation. All patients were checked on the 21st day after intervention, and Double-J catheters were removed if there were not any complications or residual stone fragments requiring an auxiliary procedure.
All of the cases were evaluated, and patients were labeled regarding total score: 0, 1, 2, ≥3 according to RUSS. 5 The scores were calculated between 5 and 15 with the S.T.O.N.E. scoring system. 6 Modified S-ReSC scores of the patients were between 1 and 12. 7 Finally, the patients were classified between 4 and 10 points with the R.I.R.S. scoring system. 8
To evaluate interobserver agreement, an expert, a faculty member, a senior resident, and a junior resident participated in the assessment of the four scoring systems for each patient. The expert analyzed the data and rated each patient according to the four scoring systems. Among 110 patients, a total of 20 patients for RUSS, 45 patients for S.T.O.N.E., 35 patients for R.I.R.S. and 50 patients for modified S-ReSC, per 5 cases for each point, were selected for interobserver and test–retest reliabilities. Test–retest reliability was evaluated at a 2-week interval. Intraclass correlation coefficients and Cohen's kappa were used to assess interobserver and test–retest reliabilities.
Statistical Package for the Social Sciences version 25 was used for statistical analyses (IBM, Inc., SPSS®, IL, USA). Continuous variables were shown as mean and standard deviation. The normality of the distribution was assessed by using the Kolmogorov–Smirnov test. Normally distributed variables were compared by Student's t-test, and nonparametric data were analyzed with Mann–Whitney U tests. The Pearson chi-square test and Fischer's exact test were used to compare qualitative data. The Spearman correlation analysis was performed to evaluate the relationships between variables. A p-value of <0.05 was considered statistically significant. Logistic regression analysis was used to address independent predictive factors for SFR. The areas under curve (AUCs) of the receiver operating characteristics (ROC) of the RUSS, S.T.O.N.E., R.I.R.S., and modified S-ReSC scores were compared to assess the predictive accuracy of SFR.
Results
The data of 110 patients who met the inclusion criteria were evaluated; demographic data of the patients, stone characteristics, and operation outcomes are summarized in Table 1. The population of the study consisted of 62 males and 48 females. The mean stone number and the mean stone volume were 1.53 (±0.738) and 923.73 (±922.530) mm3, respectively. Eighty-one (73.6%) patients who enrolled in this study were stone-free. The mean operation time and fluoroscopy time were calculated as 71.45 (±34.272) minutes and 18.55 (±18.95) seconds, respectively.
Patients' Demographics, Stone Characteristics, and Outcomes
Nineteen (17.3%) of the patients had complications. One patient had urosepsis and was treated with appropriate antibiotics in the intensive care unit. Surgical interventions were performed on three patients; one due to the migration of the Double-J catheter to the distal ureter and two due to renal colic caused by Steinstrasse. The rest of the complications were low grade and managed with conservative treatment.
Interobserver reliability for the RUSS, S.T.O.N.E., R.I.R.S., and modified S-ReSC demonstrated high levels of agreement among participants, and the intraclass coefficients were 0.987 (95% confidence interval [CI]: 0.983–0.991), 0.993 (95% CI: 0.991–0.995), 0.991 (95% CI: 0.987–0.993), and 0.994 (95% CI: 0.993–0.996), respectively. Also, test–retest reliabilities were high for four participants. Interobserver and test–retest reliabilities for each scoring system are shown in Table 2.
Interobserver and Test–Retest Reliability of Four Scoring Systems
RIRS = retrograde intrarenal surgery; RUSS = resorlu-unsal stone score; S-ReSC = Seoul National University Renal Stone Complexity; S.T.O.N.E. = size of the stone, topography or location, degree of obstruction of the urinary system, number of stones, and evaluation of Hounsfield units.
The mean RUSS, S.T.O.N.E., R.I.R.S., and modified S-ReSC scores were 1.14 (±0.818), 10.78 (±1.499), 6.50 (±1.305), and 2.29 (±1.710), respectively. The operation outcomes were compared according to the stone-free status and are shown in Table 3. The mean stone number, mean stone size, mean stone volume, presence of urinary anomaly, and the mean scores of all four scoring systems were analyzed as statistically significant between the stone-free (+) and stone-free (−) group (p < 0.05). The score distribution of all cases according to the four scoring systems and comparison regarding the stone-free status is shown in Table 4.
Operation Outcomes
Bold values denote statistical significance.
Correlation Between the Scoring Systems And Stone-Free Status
Bold values denote statistical significance.
Receiver operating characteristics (ROC) analyses of all nomograms were performed and are shown in Figures 1 and 2. The AUC values of RUSS, S.T.O.N.E., R.I.R.S., and modified S-ReSC were 0.735 (95% CI 0.623–0.826), 0.725 (95% CI 0.626–0.823), 0.752 (95% CI 0.646–0.857), and 0.755 (95% CI 0.660–0.849), respectively.

ROC analyses of scoring systems;

ROC analyses of four scoring systems.
The logistic regression analysis revealed that the RUSS was an independent predictive factor for SFR (p = 0.028). Subgroup analyses showed that the four nomograms did not differ in the accuracy of predicting SFR with different stone sizes (p > 0.05). Only modified S-ReSC was predictive for single kidney stones (p = 0.028), but there was no difference among the four nomograms for multiple stones. Analysis of stones located in the lower calix demonstrated that RUSS (p = 0.048), R.I.R.S. (p = 0.032), and S-ReSC (p = 0.048) had statistical significance for predicting SFR but that the S.T.O.N.E. scoring system did not (p > 0.05).
Discussion
The treatment of urinary system stones changed after Fernström and Johansson developed and described PCNL in 1976. 9 Previously, renal stones were managed with open surgery, but the minimally invasive nature of PCNL made it popular among urologists. The increase in the use of PCNL came with the challenge of predicting operation success. Many factors affect the postoperative outcomes of PCNL; thus, many scoring systems predicting PCNL outcomes emerged over time. 7,10 –12 Although there are several nephrolithometric scoring systems defined in the literature, none of them has been accepted as the gold standard nomogram.
After the development of the flexible ureteroscope and laser fibers, RIRS began to take PCNL's place as a treatment for renal stones. Current guidelines recommend RIRS for renal stones smaller than 20 mm. However, several studies evaluated the safety and efficacy of RIRS for renal stones greater than 20 mm, and all indicated that PCNL and RIRS present some of the same challenges. 13 –15 Predicting the outcomes of a procedure is crucial for urologists, so various scoring systems have been developed. Nevertheless, none of them is universally accepted as the gold standard scoring system or is widely used.
The prediction of SFR for renal stone treatment is essential because of potential ancillary surgeries and complications. It can provide useful information for patient counseling. More than one treatment option can be suggested for some kidney stones. Scoring the stone of the patient according to nomograms can help urologists know which surgical procedures to present as options to patients and how to explain the potential advantages and disadvantages of these choices. This should improve patient satisfaction and prevent unrealistic expectations. It also provides a framework for clinical decision making and suggests the best modality for patients.
The scoring systems developed for RIRS are the RUSS, S.T.O.N.E., R.I.R.S., and modified S-ReSC scoring systems. To date, only two of them have been validated. In this study, we planned to validate four nomograms. The reliability tests showed that all four scoring systems are reliable and valid.
The first nephrolithometric scoring system defined for RIRS is the Resorlu-Unsal Scoring System (RUSS). 5 This nomogram is based on the complexity of the case regarding stone size, the number of stones, IPA <45° with lower calix localization, and the presence of a renal anomaly. The main limitation of the RUSS study was the small sample size. Although they included 207 patients in the study, only two patients had renal malformations, and 11 patients were in the ≥3 points group. Although it is a simple nomogram, calculation of the lower calix IPA is the most demanding parameter for inexperienced urologists.
The final SFR was 86% for the study population. The RUSS study reported that size, composition, and number of stones, renal malformations, and IPA had a significant influence on the results in multivariate analysis and concluded that there was a significant association between RUSS and SFR (p < 0.05).
Sfoungaritos and colleagues studied the external validation of RUSS with 85 patients and reported that RUSS was an independent predictive factor for SFR. They calculated the AUC as 0.707 (95% CI: 0.572–0.842), which had high predictive accuracy (p = 0.004). 2 In our study, the analysis for reliability tests revealed a high level of agreement for the nomogram (p < 0.001) and outcomes reported an AUC of 0.735 (95% CI 0.623–0.826). The logistic regression analysis reported that RUSS was the only independent predictive factor among the four nomograms we analyzed (p = 0.028) (Table 5).
Logistic Regression Analyses for Predictors of Stone-Free Status
Bold values denote statistical significance.
CI = confidence interval.
One of these four scoring systems is the modified S-ReSC scoring system that was developed by Jung and coworkers with 88 patients. 7 The scoring system was first designed for PCNL but was later modified for RIRS. The scoring system was designed according to the stone location, on the basis of the hypothesis that stone localization was the most significant predictor for SFR. This was the main limitation of the study, because other patient or stone characteristics such as size, volume, number, and renal anomaly were not considered critical predictive factors. This is the major disadvantage, because it cannot explain the differences in stone characteristics or renal anatomic variances. However, it is accurate and easy to use in daily practice.
In the original S-ReSC study, the AUC was reported as 0.806 (95% CI: 0.707–0.882), and the nomogram was considered easy to use and predictive of SFR. Also, they compared their scoring system with RUSS and stated that the modified S-ReSC scoring system was superior in terms of predicting the accuracy of SFR. In the literature, Park and colleagues evaluated the external validation of modified S-ReSC, and showed that the modified S-ReSC scoring system was valid and reliable in analyses (p < 0.001). The AUC was calculated as 0.732 (95% CI: 0.650–0.813). 16 In our study, the reliability tests revealed a high level of agreement for the nomogram (p < 0.001) and external validation of the modified S-ReSC scoring system revealed an AUC of 0.755 (95% CI: 0.660–0.849). Although they reported that the predictive accuracy of the modified S-ReSC scoring system was higher than RUSS, the outcomes of our study refuted this hypothesis.
Although the two scoring systems discussed earlier had been validated for RIRS before our study, the S.T.O.N.E. scoring system had not. Molina and coworkers evaluated 200 rigid and flexible ureteroscopy procedures and weighted cases according to the stone size, topography, level of obstruction, number of stones, and stone density. Finally, each case was graded between 5 and 15 points. The SFR of the cohort was 82%. The S.T.O.N.E. study revealed that the AUC was 0.764, and the multivariate regression model AUC was 0.837. The main limitations of the study were the retrospective method, standardization of evaluating the stone-free status, and small sample size for the higher scores.
In our study, the reliability tests revealed a high level of agreement for the nomogram (p < 0.001) and external validation of the S.T.O.N.E. scoring system revealed an AUC of 0.725 (95% CI 0.626–0.823). The predictive accuracy of the nomogram was high and found to be statistically similar to other scoring systems and can be used efficiently.
Finally, Xiao and colleagues described a new scoring system called R.I.R.S. in 2017. 8 The nomogram was based on renal stone density, IPA, infundibular length, and stone volume. The final score was between 4 and 10. The SFR was 61.5% on the first day after the operation and 73.6% one month after the operation. When patient stone-free status was compared, the criteria of the nomogram were statistically significant in univariate analysis. In multivariate logistic regression, stone density, inferior pole stone with narrow IPA, stone burden, and renal infundibular length were independent factors.
The ROC analysis of the R.I.R.S. study for SFR showed that the AUC was 0.828 on the first postoperative day and 0.904 one month postoperation. In our study, the reliability tests revealed a high level of agreement for the nomogram (p < 0.001) and the AUC was calculated as 0.752 (95% CI 0.646–0.857). The results of our study revealed that the R.I.R.S. scoring system predicts SFR accurately, but the calculation of the lower IPA and infundibular length may be challenging and time-consuming for inexperienced urologists. Also, the main limitation of the research is the absence of cases with renal malformation, which may influence operation outcomes.
In the literature, two studies evaluate and compare the scoring systems for RIRS. 17,18 Erbin and coworkers compared RUSS and modified S-ReSC for RIRS, and they reported that the RUSS is an independent predictive factor for SFR. However, they also stated that both nomograms had low predictive accuracy for identifying SFR. Karsiyakali and colleagues compared Guy's stone score (GSS), the S.T.O.N.E., Clinical Research Office Of The Endourologic Society (CROES), and S-ReSC. The study reported that the nomograms are not superior to the stone volume in predicting operation success. However, only two of the nomograms in their research were described or modified for RIRS, and only the modified S-ReSC was validated for RIRS. GSS and CROES are the nomograms described for PCNL. Our study was the first study ever to validate and compare the four nomograms described or modified for RIRS in the literature.
In this study, SFR was 73.6%, and the complication rate was 17.3%. These values are relatively close to those described in previous literature. 17,18 Although the studies comparing the scoring systems reported that there are several independent predictive components for SFR, logistic regression analysis showed that the RUSS was the independent predictive factor for SFR (p = 0.028). The other three nomograms, as well as stone size, stone density, stone volume, and stone quantity were not statistically significant.
Further, the nomograms were developed for all renal stone profiles, but in our study, we performed subgroup analysis for some special stone groups. All four nomograms predicted SFR equally for any stone size; none was superior. Likewise, all nomograms predicted SFR equally for multiple stones, but the S-ReSC scoring system was superior to the other three in predicting SFR for a single kidney stone. Finally, the SFR predictions of all nomograms for the lower caliceal stones, which are the most challenging stone group for many urologists, were statistically significant, except with the S.T.O.N.E. scoring system.
The prospective nature of this study has some limitations. The sample size is relatively small and includes fewer cases in high scores for each scoring system. The study was conducted in a single tertiary institution, and only 14 out of 110 patients had a urinary anomaly. Also, there was no consensus about the definition of stone-free among the four nomograms, though our study defined it as having residual stone fragments smaller than 4 mm.
Conclusion
The comparison of the four scoring systems showed that there are pros and cons for each nomogram. The results showed that all four scoring systems predict the SFRs for RIRS accurately and similarly. All nomograms can be used for patient counseling and decision making in daily practice. However, we observed that the RUSS is an independent predictive factor for SFR in the logistic regression analysis. We, therefore, conclude that surgeons should prefer RUSS when all four nomograms are available, except when assessing single renal stones. In that case, S-ReSC should be used for assessment. Three other nomograms except the S.T.O.N.E. scoring system can be suitable for the assessment of lower caliceal stones. Nevertheless, more comprehensive prospective studies with higher sample sizes are necessary to verify the results of this study.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received.
