External Validation and Evaluation of Reliability and Validity of the Modified Seoul National University Renal Stone Complexity Scoring System to Predict Stone-Free Status After Retrograde Intrarenal Surgery

Abstract

Objectives:

The modified Seoul National University Renal Stone Complexity scoring system (S-ReSC-R) for retrograde intrarenal surgery (RIRS) was developed as a tool to predict stone-free rate (SFR) after RIRS. We externally validated the S-ReSC-R.

Materials and Methods:

We retrospectively reviewed 159 patients who underwent RIRS. The S-ReSC-R was assigned from 1 to 12 according to the location and number of sites involved. The stone-free status was defined as no evidence of a stone or with clinically insignificant residual fragment stones less than 2 mm. Interobserver and test–retest reliabilities were evaluated. Statistical performance of the prediction model was assessed by its predictive accuracy, predictive probability, and clinical usefulness.

Results:

Overall SFR was 73.0%. The SFRs were 86.7%, 70.2%, and 48.6% in low-score (1–2), intermediate-score (3–4), and high-score (5–12) groups, respectively (p<0.001). External validation of S-ReSC-R revealed an area under the curve (AUC) of 0.731 (95% CI 0.650–0.813). The AUC of the three-titered S-ReSC-R was 0.701 (95% CI 0.609–0.794). The calibration plot showed that the predicted probability of SFR had a concordance comparable to that of observed frequency. The Hosmer–Lemeshow goodness of fit test revealed a p-value of 0.01 for the S-ReSC-R and 0.90 for the three-titered S-ReSC-R. Interobserver and test–retest reliabilities revealed an almost perfect level of agreement.

Conclusions:

The present study proved the predictive value of S-ReSC-R to predict SFR following RIRS in an independent cohort. Interobserver and test–retest reliabilities confirmed that S-ReSC-R was reliable and valid.

Introduction

Incidence of renal stones is increasing.¹ Accordingly, the number of surgical cases of percutaneous nephrolithotomy (PCNL) and retrograde intrarenal surgery (RIRS) for treatment of renal stones has been increasing.^2
–4 Prediction of postoperative stone-free rate (SFR) is important because it helps determine the type of surgery and anticipate the need of ancillary procedures before surgery.

Some previous studies have developed predictive scoring systems for renal stone surgery.^5,6 However, none of them was externally validated before they were clinically used. Recently, Jeong et al. published the Seoul National University Renal Stone Complexity scoring system for PCNL (S-ReSC-P) and modified the S-ReSC scoring system for RIRS (S-ReSC-R).^7,8 S-ReSC-P showed its feasibility and accuracy to predict SFR after PCNL. It was externally validated using an independent external cohort.⁹ S-ReSC-R is basically similar to S-ReSC-P, in that both the scoring systems would evaluate surgical difficulty and surgical outcomes depending on the number and the site of renal stones. Because S-ReSC-P counted the number of sites involved with the presence of a stone worth a single point, the final scores ranged from 0 to 9. However, S-ReSC-R is characterized by giving double points for inferior pole stones, which were difficult to be removed during RIRS with final scores ranged 0–12.

We previously reported the importance of this scoring method because predictors for SFR included the number of stones and sites during RIRS.¹⁰ Consequently, we externally validated S-ReSC-R using an independent external cohort and performed interobserver and test–retest reliabilities in the present study.

Materials and Methods

Subjects

The present study and the use of patients' information were approved by the Institutional Review Board (IRB) at the Seoul Metropolitan Government-Seoul National University, Boramae Medical Center. The approval number was 16-2012-21. Informed consents were exempted because the present study was a retrospective study without personal identifiers and the data were analyzed anonymously. The present study was conducted according to the ethical principles laid out in the 1964 Declaration of Helsinki and its later amendments.

A total of 159 consecutive patients who underwent RIRS from January 2010 to May 2014 were included in this study. Investigators selected appropriate candidates for RIRS according to the EAU guidelines for urolithiasis.¹¹ Patients with febrile urinary-tract infection, bleeding tendency, anatomical anomaly, or ureteral stricture were excluded from the analysis. Patients with bilateral stones were also excluded in the analysis. Patients' data were reviewed for medical history, physical examination, urinalysis, complete blood count, serum biochemistry, and coagulation tests. Computed tomography was used for evaluating stone characteristics preoperatively. The largest diameter of a main stone was measured. Stone volume was calculated using the ellipsoid formula (π/6×D³). Total stone volume was calculated by the sum of each stone volume. Patients were routinely evaluated for any residual stones by a follow-up computed tomography scan within 3 months postoperatively. A clinically stone-free status was defined as no evidence of a stone or with clinically insignificant residual fragment stones less than 2 mm. The RIRS procedures were described in our previous publication.^8,10

Comparison between the validation and the development groups

To determine whether this scoring system is available in the heterogeneous groups, the validation and the development groups were included in the analysis. The data in the development group were obtained from a previous investigation.⁸

S-ReSC-R scoring system

The S-ReSC-P gave a single point according to the number of sites involved in the renal pelvis (n=1), superior major calix (n=2), and anterior and posterior minor caliceal groups of the superior (n=4–5) and middle (n=6–7) caliceal groups. Also, the S-ReSC-R gave two points in the inferior major calix (n=3) and anterior and posterior minor caliceal groups of inferior minor calices (n=8–9). Thus, the total S-ReSC-R scores ranged from 1 to 12 points. The S-ReSC-R scores were classified into the following three-titered S-ReSC-R groups: low-score (1–2 points), intermediate-score (3–4 points), and high-score (5–12 points) groups.

External validation, interobserver, and test–retest reliabilities

For evaluating interobserver agreement,⁹ a faculty member as an expert (S.Y.C.), a fellow, a junior resident, and a surgical assistant nurse participated in the evaluation of S-ReSC-R for each patient. The expert evaluated the images and rated the S-ReSC-R on a score from 1 to 12 for all patients. Among 159 patients, a total of 15 cases (five cases for each score group) were selected for interobserver and test–retest reliabilities. Test–retest reliability was evaluated at a 2-week interval. Intraclass correlation coefficients and Cohen's kappa were used to evaluate interobserver and test–retest reliabilities.

Continuous variables are presented as the mean±SD. The prediction model was statistically assessed for predictive accuracy, predictive probability, and clinical usefulness, as described previously.⁹ The predictive accuracy for SFR following RIRS was assessed by the area under the curve (AUC) of the receiver operating curve. The relationship between predicted and observed SFR was shown in a calibration plot using 200 bootstrap resamples. Clinical utility of the prediction model was appraised by decision curve analysis. Significant predictors of SFR following RIRS were analyzed by univariate and multivariate logistic regression analyses. Statistical significance was considered at p<0.05. Statistical analyses were performed by a commercially available software such as IBM SPSS Statistics version 20 (IBM, Inc., Chicago, IL) and R version 3.0.1 (www.r-project.org).

Results

Patient characteristics

The characteristics of 159 patients enrolled in the validation group and 88 patients in the original development groups are summarized in Table 1. All parameters, except the largest diameter, were not significantly different between the two groups. SFRs in accordance with S-ReSC-R and three-titered S-ReSC-R are summarized in Table 2. The overall SFR for the validation group was 73.0%. When S-ReSC-R scores were treated as a continuous variable, the SFR tended to decrease with increasing S-ReSC-R scores. However, such a trend did not apply to all scores. When patients were stratified into three subgroups based on the S-ReSC-R scoring group system, the SFRs were 86.7% (65/75), 70.2% (33/47), and 48.6% (18/37) in the low-score (1–2), intermediate-score (3–4), and high-score (5–12) subgroups, respectively. The SFR according to S-ReSC-R score system was significantly (p<0.001) different from that according to the three-titered S-ReSC-R score system.

Table 1.

Comparison of Patients' Demographics Between the Validation Data and the Original Development Data of Modified S-ReSC (Seoul National University Renal Stone Complexity) for Retrograde Intrarenal Surgery (S-ReSC-R)

	Validation group	Development group	p-Value
No. of patients	159	88
Age, year	54.8±14.8	53.9±14.0	0.651
Body–mass index, kg/m²	24.9±3.8	24.4±3.5	0.345
Gender, No. (%)			0.178
Male	103 (64.8%)	51 (58.0%)
Female	56 (35.2%)	37 (42.0%)
Previous treatment, No. (%)
Shock wave lithotripsy	32 (20.1%)	10 (11.4%)	0.156
Retrograde intrarenal surgery	16 (10.1%)	2 (2.3%)	0.058
Percutaneous nephrolithotomy	7 (4.4%)	10 (11.4%)	0.091
Laterality, No. (%)			0.691
Right	73 (45.9%)	38 (43.2%)
Left	86 (54.1%)	50 (56.8%)
The largest diameter, mm	15.3±7.2	12.0±6.1	<0.001
The stone volume, cm³	1.6±2.1	2.5±5.2	0.060
Major stone composition, No. (%)			0.242
Calcium oxalate monohydrate	56 (35.2%)	54 (61.4%)
Calcium oxalate dehydrate	3 (1.9%)	4 (4.5%)
Carbonite apatite	7 (4.4%)	8 (9.1%)
Uric acid	19 (11.8%)	10 (11.4%)
Struvite	2 (1.3%)	2 (2.3%)
Cystine	0 (0%)	4 (4.5%)
Others	0 (0%)	1 (1.1%)
Missing	72 (45.3%)	5 (5.7%)

Table 2.

Stone-Free Rates According to S-ReSC-R

S-ReSC-R score	Stone-free rate	S-ReSC-R score group	Stone-free rate	p-Value	OR	95% CI
1	95.9% (47/49)	Low (1–2)	86.7% (65/75)	Reference
2	69.2% (18/26)
3	73.5% (25/34)	Intermediate (3–4)	70.2% (33/47)	0.029	0.363	0.145–0.904
4	61.5% (8/13)
5	27.3% (3/11)	High (5–12)	48.6% (18/37)	<0.001	0.146	0.058–0.368
6	53.3% (8/15)
7	83.3% (5/6)
8	25.0% (1/4)
11	100% (1/1)
	p<0.001		p<0.001

OR=odds ratio; CI=confidence interval.

Interobserver and test–retest reliability

Interobserver reliability for the S-ReSC-R showed almost perfect levels of agreement among participants (Table 3). The intraclass correlation coefficient was 0.947 (95% CI 0.904–0.971, p<0.001). Test–retest reliability also demonstrated almost perfect levels of concurrence for most participants. However, an assistant nurse showed only a substantial agreement level of concurrence (0.719, 95% CI 0.477–0.905, p<0.001).

Table 3.

Interobserver and Test–Retest Reliability of S-ReSC-R at 2-Week Intervals

	Interobserver agreement		Test–retest reliability
	Cronbach's alpha (95% CI)	p-Value	Cohen's kappa (95% CI)	p-Value
Expert	0.947 (0.904–0.971)	<0.001	0.951 (0.813–0.999)	<0.001
Fellow			0.949 (0.826–0.999)	<0.001
Resident			0.903 (0.735–0.999)	<0.001
Assistant nurse			0.719 (0.477–0.905)	<0.001

Prediction accuracy, predictive probability, and clinical usefulness

External validation of S-ReSC-R revealed that AUCs of S-ReSC-R and three-titered S-ReSC-R were 0.732 (95% CI 0.650–0.813) and 0.702 (95% CI 0.609–0.794), respectively (Fig. 1A, B). The calibration plot showed that the predicted probability of SFR had concordance comparable to that of the observed frequency, with most predictions of a 5% error margin (Fig. 1C). The mean absolute error rates were 0.041 and 0.047 in S-ReSC-R and three-titered S-ReSC-R, respectively. The Hosmer–Lemeshow goodness of fit test revealed p-values of 0.01 and 0.90 for S-ReSC-R and three-titered S-ReSC-R, respectively. In decision curve analysis, the prediction model provided a superior net benefit with reduction of a probability threshold at around 20% (Fig. 1D).

FIG. 1.

(A) ROC curve of S-ReSC-R. (B) ROC curve of three-titered S-ReSC-R subgroups. (C) A calibration plot for the predicted probability of stone-free rate. (D) A decision curve analysis showed that the prediction model provided a superior net benefit using S-ReSC-R.

Uni- and multivariate logistic regression analyses for predictors of stone-free status

As showed in Table 4, three-titered S-ReSC-R and total stone volume were significant predictors for SFR in univariate logistic regression analysis. However, only three-titered S-ReSC-R was the most important predictor for SFR among all parameters in multivariate logistic regression analysis.

Table 4.

Uni- and Multivariate Logistic Regression Analyses for Predictors of Stone-Free Status

	Univariate			Multivariate
	OR	95% CI	p-Value	OR	95% CI	p-Value
Age	1.002	0.979–1.026	0.874
BMI	1.009	0.921–1.106	0.840
Laterality	0.799	0.394–1.619	0.533
S-ReSC-R score group
Low versus intermediate	0.363	0.145–0.904	0.029	0.390	0.154–0.983	0.046
Low versus high	0.146	0.058–0.368	<0.001	0.173	0.065–0.459	<0.001
Largest diameter	0.965	0.920–1.012	0.141
Total stone volume	0.999	0.998–0.999	0.016	0.999	0.998–0.999	0.295

BMI=body–mass index.

Discussion

Prediction of SFR after RIRS

Active treatments for renal stone usually include extracorporeal shock wave lithotripsy (ESWL), PCNL, and RIRS. ESWL is known to show excellent SFR for renal stones less than 20 mm. However, the SFR would be low when stones are associated with worse prognostic factors such as the lower pole calix, steep and long infundibular pelvic angle, narrow infundibulum, hard stones, and obesity.¹² On the other hand, PCNL is applicable to remove large stone burden such as stones with maximal diameter larger than 20 mm and staghorn stones. However, antegrade access for tract formation is inevitable when accompanied with renal injury, which could result in postoperative complications such as renal function deterioration, bleeding, and infection.¹³ Recently, RIRS has attracted considerable attention as a new standard treatment modality for midsized renal stones because RIRS could minimize renal parenchymal damage induced by antegrade penetration while maximizing the SFR of lower pole stones, which are difficult to be removed by ESWL.¹⁴ When state-of-the-art video systems are widely used and the know-how surgical techniques are accumulated, RIRS could achieve the same level of SFR as ESWL or PCNL.¹⁵

Although the demands and interests about RIRS have increased, few predictive models for SFR after RIRS are useful in practice. Resorlu et al. evaluated the prognostic factors affecting SFR after analyzing 207 cases of RIRS.⁵ The Resorlu–Unsal scoring system classified the surgical difficulty into five points (0–4) according to the following factors: (1) stone size >20 mm; (2) lower pole stone location and infundibulopelvic angle <45°; (3) more than one stone number in different calices; (4) abnormal renal anatomy such as the horseshoe kidney or pelvic kidney. This scoring system is easy to use in clinical practice. However, it has some limitations because there were few cases that have over three points. In addition, external validation was not conducted up to now. For clinical application of the scoring system, external validation is an essential step. Jeong et al. recently developed S-ReSC-P and S-ReSC-R.^7,8 They compared the AUCs of the S-ReSC-R and the Resorlu–Unsal scoring system and showed its feasibility and accuracy to predict SFR after RIRS. This scoring system was based on the number and the site of renal stones, which is completely inconsistent with that of our previous investigation.¹⁰ Therefore, we decided to externally validate S-ReSC-R in the present study.

External validation of modified S-ReSC scoring system

To the best of our knowledge, the present study was the first external validation of scoring system that predicted SFR after RIRS. S-ReSC-P and S-ReSC-R are basically similar scoring methods that showed the importance of the number and the site of renal stones, indicating that the distributional complexity of renal stones was the most important prognostic factor in predicting SFR following PCNL or RIRS in accordance with the result of our previous investigation.¹⁰ Although our results showed its feasibility and clinical applicability, it is not easy to count the exact number and location of all renal stones of various sizes. Therefore, we calculated the number of stones of sizes larger than 1 mm, which was relatively simple work to classify intrarenal stones into low-score (1–2), intermediate-score (3–4), and high-score (5–12) groups. This three-titered S-ReSC-R scoring system was well validated because the AUC with the validated data set was 0.702 and the Hosmer–Lemeshow goodness of fit test revealed a p-value of 0.90. Meanwhile, these values were lower in the validated data set than the original developed data set. This is presumed to be due to differences in the study populations. RIRS still has much room for future technical and academic development. S-ReSC-R would be an attractive and easy predictive model to be introduced to hospitals that have started RIRS for intrarenal stones.

The analysis for interobserver and test–retest reliability showed a perfect level of agreement in the scoring system. However, the subanalysis of Kappa values for each variable in the scoring system showed considerable discordance. The mean Kappa value was significantly higher in the low-score group than those of the intermediate- and high-score groups. This may be because of the small number of patients that was included in the analysis. However, authors think that this is basically due to difficulty of counting the number of tiny stones in the intermediate- and high-score groups in the CT scan. For example, a dumbbell-shaped small stone can be counted as a single stone or two stones gathered in the same calix.

Limitations of the study

There could be a risk of bias because RIRS is strongly dependent on the operator's skill. However, our results provided an accurate statistical method in this regard. We tried to show reliable results by including cases into analysis after we overcame the learning curve by an expert with experience.¹⁰

Conclusion

The present study proved the predictive value of S-ReSC-R for SFR following RIRS in an independent cohort. Interobserver and test–retest reliabilities confirmed that S-ReSC-R was reliable and valid. Further investigations are necessary to evaluate the clinical significance of this scoring system.

Footnotes

Acknowledgment

The authors wish to thank Ha Ni Lee, an assistant nurse at the Department of Urology department of Seoul Metropolitan Government-Seoul National University, Boramae Medical Center, for surgical technical support.

Disclosure Statement

No competing financial interests exist.

Abbreviations Used

References

Romero

, Akpinar

, Assimos

. Kidney stones: A global picture of prevalence, incidence, and associated risk factors. Rev Urol, 2010; 12:e86–e96.

Morris

, Wei

, Taub

, Dunn

, Wolf

Jr. , Hollenbeck

. Temporal trends in the use of percutaneous nephrolithotomy. J Urol, 2006; 175:1731–1736.

Rosa

, Usai

, Miano

, et al. Recent finding and new technologies in nephrolitiasis: A review of the recent literature. BMC Urol, 2013; 13:10.

, Autorino

, Kim

, et al. Percutaneous nephrolithotomy versus retrograde intrarenal surgery: A systematic review and meta-analysis. Eur Urol, 2015; 67:125–137.

Resorlu

, Unsal

, Gulec

, Oztuna

. A new scoring system for predicting stone-free rate after retrograde intrarenal surgery: The “resorlu-unsal stone score”. Urology, 2012; 80:512–518.

Labadie

, Okhunov

, Akhavein

, et al. Evaluation and comparison of urolithiasis scoring systems in percutaneous kidney stone surgery. J Urol, 2015; 193:154–159.

Jeong

, Jung

, Cha

, et al. Seoul national university renal stone complexity score for predicting stone-free rate after percutaneous nephrolithotomy. PLoS One, 2013; 8:e65888.

Jung

, Lee

, Park

, et al. Modified Seoul National University Renal Stone Complexity score for retrograde intrarenal surgery. Urolithiasis, 2014; 42:335–340.

Choo

, Jeong

, Jung

, et al. External validation and evaluation of reliability and validity of the S-ReSC scoring system to predict stone-free status after percutaneous nephrolithotomy. PLoS One, 2014; 9:e83628.

10.

Cho

, Choo

, Jung

, et al. Cumulative sum analysis for experiences of a single-session retrograde intrarenal stone surgery and analysis of predictors for stone-free status. PLoS One, 2014; 9:e84878.

11.

Türk

, Knoll

, Petrik

, Sarica

, Skolarikos

, Straub

, Seitz

. Guidelines on Urolithiasis. Updated 2014 [Internet]. Arnhem: European Association of Urology, c2014 [cited 2015 Feb 6]. 2014. Available from: www.uroweb.org/guidelines/online-guidelines.

12.

Ruggera

, Beltrami

, Ballario

, Cavalleri

, Cazzoletti

, Artibani

. Impact of anatomical pielocaliceal topography in the treatment of renal lower calyces stones with extracorporeal shock wave lithotripsy. Int J Urol, 2005; 12:525–532.

13.

Kyriazis

, Panagopoulos

, Kallidonis

, Ozsoy

, Vasilas

, Liatsikos

. Complications in percutaneous nephrolithotomy. World J Urol, 2014 [Epub ahead of print].

14.

Resorlu

, Oguz

, Resorlu

, Oztuna

, Unsal

. The impact of pelvicaliceal anatomy on the success of retrograde intrarenal surgery in patients with lower pole renal stones. Urology, 2012; 79:61–66.

15.

Akman

, Binbay

, Ugurlu

, et al. Outcomes of retrograde intrarenal surgery compared with percutaneous nephrolithotomy in elderly patients with moderate-size kidney stones: A matched-pair analysis. J Endourol, 2012; 26:625–629.