Abstract
Objectives:
The modified Seoul National University Renal Stone Complexity scoring system (S-ReSC-R) for retrograde intrarenal surgery (RIRS) was developed as a tool to predict stone-free rate (SFR) after RIRS. We externally validated the S-ReSC-R.
Materials and Methods:
We retrospectively reviewed 159 patients who underwent RIRS. The S-ReSC-R was assigned from 1 to 12 according to the location and number of sites involved. The stone-free status was defined as no evidence of a stone or with clinically insignificant residual fragment stones less than 2 mm. Interobserver and test–retest reliabilities were evaluated. Statistical performance of the prediction model was assessed by its predictive accuracy, predictive probability, and clinical usefulness.
Results:
Overall SFR was 73.0%. The SFRs were 86.7%, 70.2%, and 48.6% in low-score (1–2), intermediate-score (3–4), and high-score (5–12) groups, respectively (p<0.001). External validation of S-ReSC-R revealed an area under the curve (AUC) of 0.731 (95% CI 0.650–0.813). The AUC of the three-titered S-ReSC-R was 0.701 (95% CI 0.609–0.794). The calibration plot showed that the predicted probability of SFR had a concordance comparable to that of observed frequency. The Hosmer–Lemeshow goodness of fit test revealed a p-value of 0.01 for the S-ReSC-R and 0.90 for the three-titered S-ReSC-R. Interobserver and test–retest reliabilities revealed an almost perfect level of agreement.
Conclusions:
The present study proved the predictive value of S-ReSC-R to predict SFR following RIRS in an independent cohort. Interobserver and test–retest reliabilities confirmed that S-ReSC-R was reliable and valid.
Introduction
I
Some previous studies have developed predictive scoring systems for renal stone surgery. 5,6 However, none of them was externally validated before they were clinically used. Recently, Jeong et al. published the Seoul National University Renal Stone Complexity scoring system for PCNL (S-ReSC-P) and modified the S-ReSC scoring system for RIRS (S-ReSC-R). 7,8 S-ReSC-P showed its feasibility and accuracy to predict SFR after PCNL. It was externally validated using an independent external cohort. 9 S-ReSC-R is basically similar to S-ReSC-P, in that both the scoring systems would evaluate surgical difficulty and surgical outcomes depending on the number and the site of renal stones. Because S-ReSC-P counted the number of sites involved with the presence of a stone worth a single point, the final scores ranged from 0 to 9. However, S-ReSC-R is characterized by giving double points for inferior pole stones, which were difficult to be removed during RIRS with final scores ranged 0–12.
We previously reported the importance of this scoring method because predictors for SFR included the number of stones and sites during RIRS. 10 Consequently, we externally validated S-ReSC-R using an independent external cohort and performed interobserver and test–retest reliabilities in the present study.
Materials and Methods
Subjects
The present study and the use of patients' information were approved by the Institutional Review Board (IRB) at the Seoul Metropolitan Government-Seoul National University, Boramae Medical Center. The approval number was 16-2012-21. Informed consents were exempted because the present study was a retrospective study without personal identifiers and the data were analyzed anonymously. The present study was conducted according to the ethical principles laid out in the 1964 Declaration of Helsinki and its later amendments.
A total of 159 consecutive patients who underwent RIRS from January 2010 to May 2014 were included in this study. Investigators selected appropriate candidates for RIRS according to the EAU guidelines for urolithiasis. 11 Patients with febrile urinary-tract infection, bleeding tendency, anatomical anomaly, or ureteral stricture were excluded from the analysis. Patients with bilateral stones were also excluded in the analysis. Patients' data were reviewed for medical history, physical examination, urinalysis, complete blood count, serum biochemistry, and coagulation tests. Computed tomography was used for evaluating stone characteristics preoperatively. The largest diameter of a main stone was measured. Stone volume was calculated using the ellipsoid formula (π/6×D 3 ). Total stone volume was calculated by the sum of each stone volume. Patients were routinely evaluated for any residual stones by a follow-up computed tomography scan within 3 months postoperatively. A clinically stone-free status was defined as no evidence of a stone or with clinically insignificant residual fragment stones less than 2 mm. The RIRS procedures were described in our previous publication. 8,10
Comparison between the validation and the development groups
To determine whether this scoring system is available in the heterogeneous groups, the validation and the development groups were included in the analysis. The data in the development group were obtained from a previous investigation. 8
S-ReSC-R scoring system
The S-ReSC-P gave a single point according to the number of sites involved in the renal pelvis (n=1), superior major calix (n=2), and anterior and posterior minor caliceal groups of the superior (n=4–5) and middle (n=6–7) caliceal groups. Also, the S-ReSC-R gave two points in the inferior major calix (n=3) and anterior and posterior minor caliceal groups of inferior minor calices (n=8–9). Thus, the total S-ReSC-R scores ranged from 1 to 12 points. The S-ReSC-R scores were classified into the following three-titered S-ReSC-R groups: low-score (1–2 points), intermediate-score (3–4 points), and high-score (5–12 points) groups.
External validation, interobserver, and test–retest reliabilities
For evaluating interobserver agreement, 9 a faculty member as an expert (S.Y.C.), a fellow, a junior resident, and a surgical assistant nurse participated in the evaluation of S-ReSC-R for each patient. The expert evaluated the images and rated the S-ReSC-R on a score from 1 to 12 for all patients. Among 159 patients, a total of 15 cases (five cases for each score group) were selected for interobserver and test–retest reliabilities. Test–retest reliability was evaluated at a 2-week interval. Intraclass correlation coefficients and Cohen's kappa were used to evaluate interobserver and test–retest reliabilities.
Continuous variables are presented as the mean±SD. The prediction model was statistically assessed for predictive accuracy, predictive probability, and clinical usefulness, as described previously.
9
The predictive accuracy for SFR following RIRS was assessed by the area under the curve (AUC) of the receiver operating curve. The relationship between predicted and observed SFR was shown in a calibration plot using 200 bootstrap resamples. Clinical utility of the prediction model was appraised by decision curve analysis. Significant predictors of SFR following RIRS were analyzed by univariate and multivariate logistic regression analyses. Statistical significance was considered at p<0.05. Statistical analyses were performed by a commercially available software such as IBM SPSS Statistics version 20 (IBM, Inc., Chicago, IL) and R version 3.0.1 (
Results
Patient characteristics
The characteristics of 159 patients enrolled in the validation group and 88 patients in the original development groups are summarized in Table 1. All parameters, except the largest diameter, were not significantly different between the two groups. SFRs in accordance with S-ReSC-R and three-titered S-ReSC-R are summarized in Table 2. The overall SFR for the validation group was 73.0%. When S-ReSC-R scores were treated as a continuous variable, the SFR tended to decrease with increasing S-ReSC-R scores. However, such a trend did not apply to all scores. When patients were stratified into three subgroups based on the S-ReSC-R scoring group system, the SFRs were 86.7% (65/75), 70.2% (33/47), and 48.6% (18/37) in the low-score (1–2), intermediate-score (3–4), and high-score (5–12) subgroups, respectively. The SFR according to S-ReSC-R score system was significantly (p<0.001) different from that according to the three-titered S-ReSC-R score system.
OR=odds ratio; CI=confidence interval.
Interobserver and test–retest reliability
Interobserver reliability for the S-ReSC-R showed almost perfect levels of agreement among participants (Table 3). The intraclass correlation coefficient was 0.947 (95% CI 0.904–0.971, p<0.001). Test–retest reliability also demonstrated almost perfect levels of concurrence for most participants. However, an assistant nurse showed only a substantial agreement level of concurrence (0.719, 95% CI 0.477–0.905, p<0.001).
Prediction accuracy, predictive probability, and clinical usefulness
External validation of S-ReSC-R revealed that AUCs of S-ReSC-R and three-titered S-ReSC-R were 0.732 (95% CI 0.650–0.813) and 0.702 (95% CI 0.609–0.794), respectively (Fig. 1A, B). The calibration plot showed that the predicted probability of SFR had concordance comparable to that of the observed frequency, with most predictions of a 5% error margin (Fig. 1C). The mean absolute error rates were 0.041 and 0.047 in S-ReSC-R and three-titered S-ReSC-R, respectively. The Hosmer–Lemeshow goodness of fit test revealed p-values of 0.01 and 0.90 for S-ReSC-R and three-titered S-ReSC-R, respectively. In decision curve analysis, the prediction model provided a superior net benefit with reduction of a probability threshold at around 20% (Fig. 1D).

Uni- and multivariate logistic regression analyses for predictors of stone-free status
As showed in Table 4, three-titered S-ReSC-R and total stone volume were significant predictors for SFR in univariate logistic regression analysis. However, only three-titered S-ReSC-R was the most important predictor for SFR among all parameters in multivariate logistic regression analysis.
BMI=body–mass index.
Discussion
Prediction of SFR after RIRS
Active treatments for renal stone usually include extracorporeal shock wave lithotripsy (ESWL), PCNL, and RIRS. ESWL is known to show excellent SFR for renal stones less than 20 mm. However, the SFR would be low when stones are associated with worse prognostic factors such as the lower pole calix, steep and long infundibular pelvic angle, narrow infundibulum, hard stones, and obesity. 12 On the other hand, PCNL is applicable to remove large stone burden such as stones with maximal diameter larger than 20 mm and staghorn stones. However, antegrade access for tract formation is inevitable when accompanied with renal injury, which could result in postoperative complications such as renal function deterioration, bleeding, and infection. 13 Recently, RIRS has attracted considerable attention as a new standard treatment modality for midsized renal stones because RIRS could minimize renal parenchymal damage induced by antegrade penetration while maximizing the SFR of lower pole stones, which are difficult to be removed by ESWL. 14 When state-of-the-art video systems are widely used and the know-how surgical techniques are accumulated, RIRS could achieve the same level of SFR as ESWL or PCNL. 15
Although the demands and interests about RIRS have increased, few predictive models for SFR after RIRS are useful in practice. Resorlu et al. evaluated the prognostic factors affecting SFR after analyzing 207 cases of RIRS. 5 The Resorlu–Unsal scoring system classified the surgical difficulty into five points (0–4) according to the following factors: (1) stone size >20 mm; (2) lower pole stone location and infundibulopelvic angle <45°; (3) more than one stone number in different calices; (4) abnormal renal anatomy such as the horseshoe kidney or pelvic kidney. This scoring system is easy to use in clinical practice. However, it has some limitations because there were few cases that have over three points. In addition, external validation was not conducted up to now. For clinical application of the scoring system, external validation is an essential step. Jeong et al. recently developed S-ReSC-P and S-ReSC-R. 7,8 They compared the AUCs of the S-ReSC-R and the Resorlu–Unsal scoring system and showed its feasibility and accuracy to predict SFR after RIRS. This scoring system was based on the number and the site of renal stones, which is completely inconsistent with that of our previous investigation. 10 Therefore, we decided to externally validate S-ReSC-R in the present study.
External validation of modified S-ReSC scoring system
To the best of our knowledge, the present study was the first external validation of scoring system that predicted SFR after RIRS. S-ReSC-P and S-ReSC-R are basically similar scoring methods that showed the importance of the number and the site of renal stones, indicating that the distributional complexity of renal stones was the most important prognostic factor in predicting SFR following PCNL or RIRS in accordance with the result of our previous investigation. 10 Although our results showed its feasibility and clinical applicability, it is not easy to count the exact number and location of all renal stones of various sizes. Therefore, we calculated the number of stones of sizes larger than 1 mm, which was relatively simple work to classify intrarenal stones into low-score (1–2), intermediate-score (3–4), and high-score (5–12) groups. This three-titered S-ReSC-R scoring system was well validated because the AUC with the validated data set was 0.702 and the Hosmer–Lemeshow goodness of fit test revealed a p-value of 0.90. Meanwhile, these values were lower in the validated data set than the original developed data set. This is presumed to be due to differences in the study populations. RIRS still has much room for future technical and academic development. S-ReSC-R would be an attractive and easy predictive model to be introduced to hospitals that have started RIRS for intrarenal stones.
The analysis for interobserver and test–retest reliability showed a perfect level of agreement in the scoring system. However, the subanalysis of Kappa values for each variable in the scoring system showed considerable discordance. The mean Kappa value was significantly higher in the low-score group than those of the intermediate- and high-score groups. This may be because of the small number of patients that was included in the analysis. However, authors think that this is basically due to difficulty of counting the number of tiny stones in the intermediate- and high-score groups in the CT scan. For example, a dumbbell-shaped small stone can be counted as a single stone or two stones gathered in the same calix.
Limitations of the study
There could be a risk of bias because RIRS is strongly dependent on the operator's skill. However, our results provided an accurate statistical method in this regard. We tried to show reliable results by including cases into analysis after we overcame the learning curve by an expert with experience. 10
Conclusion
The present study proved the predictive value of S-ReSC-R for SFR following RIRS in an independent cohort. Interobserver and test–retest reliabilities confirmed that S-ReSC-R was reliable and valid. Further investigations are necessary to evaluate the clinical significance of this scoring system.
Footnotes
Acknowledgment
The authors wish to thank Ha Ni Lee, an assistant nurse at the Department of Urology department of Seoul Metropolitan Government-Seoul National University, Boramae Medical Center, for surgical technical support.
Disclosure Statement
No competing financial interests exist.
