Multicenter External Validation and Comparison of Stone Scoring Systems in Predicting Outcomes After Percutaneous Nephrolithotomy

Abstract

Background and Purpose:

Several scoring systems have recently emerged to predict stone-free rate (SFR) and complications after percutaneous nephrolithotomy (PCNL). We aimed to compare the most commonly used scoring systems (Guy's stone score, S.T.O.N.E. nephrolithometry, and CROES nomogram), assess their predictive accuracy for SFR and other postoperative variables, and develop a risk group stratification based on these scoring systems.

Materials and Methods:

We performed a retrospective review of patients who have had a PCNL at four academic institutions between 2006 and 2013. Primary outcome was SFR within 3 weeks of the surgery and secondary outcomes were operative time (OT), complications, and length of stay (LOS). We performed chi-squared, t-test, logistic, linear, and Poisson regressions, as well as receiver operating characteristics curve with area under the curve (AUC) calculation.

Results:

We identified 586 patients eligible for analysis. Of these, 67.4% were stone free. Guy's, S.T.O.N.E., and CROES score were predictive of SFR on multivariable logistic regression (odds ratio [OR]: 1.398, 95% confidence interval [CI]: 1.056, 1.852, p = 0.019; OR: 1.417, 85% CI: 1.231, 1.631, p < 0.001; OR: 0.993, 95% CI: 0.988, 0.998, p = 0.004) and have similar predictive accuracy with AUCs of 0.629, 0.671, and 0.646, respectively. On multivariable linear regression, only S.T.O.N.E. was an independent predictor of longer OT (β = 14.556, 95% CI: 12.453, 16.660, p < 0.001). None of the scores were independent predictors of postoperative complications or a longer LOS. Poisson regression allowed for risk group stratification and showed the S.T.O.N.E. score and CROES nomogram to have the most distinct risk groups.

Conclusions:

The three evaluated scoring systems have similar predictive accuracy of SFR. S.T.O.N.E. has additional value in predicting OT. Risk group stratification can be used for patient counseling. Further research is needed to identify whether or not any is superior to the others with regard to clinical usefulness and predictive accuracy.

Introduction

The prevalence of urolithiasis is on the rise with an increase from 3.2% in the 1970s to 8.8% in 2010.^1,2 Moreover, a recent analysis using the U.S. Nationwide Inpatient Sample identified a 12% increase in hospitalizations for renal calculi in 2009 compared to 1999.³ Although the proportion of percutaneous nephrolithotomy (PCNL) surgeries performed for urolithiasis did not increase, the absolute number of surgeries has more than doubled in the past 10 years.^4
–6 Despite all the technological advances and surgery modifications, the complication rate has increased along with the number of surgeries performed.^6,7

Despite years of research on preoperative variables as predictors of outcomes after PCNL, there remains a lack of standardized reporting of preoperative patient and stone-related data.^8
–10

Until recently, a useful scoring system for prognostic evaluation of success and complication rates of PCNL surgeries was unavailable. Since 2011, Guy's stone score (GSS), S.T.O.N.E. nephrolithometry, and the Clinical Research Office of the Endourological Society (CROES) nomogram have been proposed as means of preoperative assessment using patient and stone characteristics.^11
–13 These scoring systems are based on data that are easily obtainable before the surgery from preoperative imaging and patient history and could become a means of standardized reporting of preoperative cohort data. The different variables used in each of the scoring systems are listed and compared in Table 1. To date, there have been a few studies validating these scoring systems on single institutional patient cohorts. Applicability and generalizability of these scoring systems remain to be elucidated.

Table 1.

Variables Included in Each of the Scoring Systems

Variable	Guy's	S.T.O.N.E.	CROES
Number of stones	Single vs multiple		Single vs multiple
Renal anatomy	Simple vs abnormalDiverticulum = 3
Stone location	Pelvis/mid/lower pole vs upper pole		Middle>pelvic>lower>upper>multiple
Staghorn configuration	Partial = 3Full = 4		Yes vs no (no definition of partial or full)
Spina bifida/spinal injury	Yes = 4
Stone size/burden		Length × width in mm² on CT	Length × width × π/4 in mm²
Tract length		<100 mm vs >100 mm, calculated from axial CT measurements
Hydronephrosis		None vs severe
Number of calices involved		1–2 vs 3 vs staghorn (>3)
Hounsfield units		<950 vs >950
Prior treatment			None>SWL/PCNL/URS>multiple>open surgery
Case volume/year			See Nomogram

PCNL = percutaneous nephrolithotomy; SWL = extracorporeal shockwave lithotripsy; URS = ureteroscopy; CROES = Clinical Research Office of the Endourological Society.

To the best of our knowledge, we present the largest multicenter cohort study evaluating and comparing these three scoring systems for their accuracy in predicting postoperative outcomes and clinical applicability.

Materials and Methods

After obtaining internal Institutional Review Board approval at each participating institution, we performed a retrospective chart review of patients who underwent a PCNL between 2006 and 2013 at four academic institutions.

Selection criteria

Patients, who were 18 years old or older at the time of surgery with an available preoperative CT were included in the study. All surgeries included were performed as primary PCNL surgeries. Secondary surgeries for treatment of residual fragments after the initial surgery were excluded from this analysis. All surgeries were performed in academic referral centers for stone disease by experienced fellowship-trained endourologists.

Measurements

A single observer from each center reviewed the images and reported on all the variables obtainable from CT, necessary for calculation of the GSS, S.T.O.N.E. score, and CROES nomogram, as described by Thomas et al.,¹¹ Okhunov et al.,¹² and Smith et al.¹³ We defined a partial staghorn as a stone extending into one or two calices, and stones extending in more than two calices were categorized as staghorn calculi.

Demographic and perioperative data

Patient demographics collected were age, gender, body mass index (BMI), American Society of Anesthesiology (ASA) score, and previous surgical and medical history. Perioperative data collected included operative time (OT), length of stay (LOS), number and location of percutaneous tracts, and postoperative complications, according to the Clavien scores assigned to each complication post PCNL as described by de la Rosette et al.¹⁴

Outcomes

Primary outcome of our analysis was stone-free rate (SFR) on postoperative day 1 or within 3 weeks of the surgery as assessed by noncontrast CT or kidney, ureter, and bladder radiograph with renal ultrasound. We used a cutoff size of 2 mm for clinically insignificant residual fragments.¹⁵ As a secondary analysis, we aimed to identify the predictive value and accuracy of the scoring systems for OT, LOS, and complications according to the adjusted Clavien classification.¹⁴ The final goal of our analysis was to create risk groups based on the scoring systems.

Statistical analysis

For comparison of the variables between patients who are stone free and not stone free, we used chi-square test for categorical variables and Student's t-test for continuous variables. For multiple group comparisons, between-group comparison was performed with an adjusted p-value according to Holm.¹⁶ To assess predictability of SFR, we performed univariable and multivariable logistic regression analysis. Univariable and multivariable linear regression were used to assess predictability of OT, postoperative complications, and LOS. The S.T.O.N.E. score was categorized in four groups and CROES nomogram was divided in quartiles for risk group stratification. We used a modified Poisson regression model with a robust variance estimator to estimate relative risk for residual stone after a single PCNL surgery. To identify the predictive accuracy of each of the stone scoring systems, we generated receiver operating characteristic (ROC) curves with area under the curve (AUC) analysis. AUCs were compared according to Hanley and McNeil.¹⁷ Significance was established with a p-value <0.05. Statistical analysis was performed using SPSS version 22 (IBM Corp., Armonk, NY).

Results

Out of a total cohort of 1696 patients treated, 586 patients had all data necessary for analysis available and were included in the study. Patient demographics and clinical characteristics, as well as perioperative variables of patients who are stone free and not stone free, are available in Table 2. After the surgery, 67.4% of patients were considered stone free. The average stone size was significantly larger in the cohort which was not stone free (557 vs 1045 mm², respectively, p-value <0.001). Patients with stones in multiple locations had significantly lower success rates compared to patients who had a stone in a single location (p-value 0.002) on post hoc analysis. SFRs for lower pole and renal pelvis stones were higher than for stones in other or multiple locations (p = 0.041 and 0.013), but did not reach statistical significance considering adjusted p-values for multiple comparison. The median scores of the three scoring systems were significantly different for patients who are stone free vs patients who are not stone free: Guy's 2 vs 3, p < 0.001; S.T.O.N.E. 7 vs 9, p < 0.001; and CROES 220 vs 183, p < 0.001.

Table 2.

Patient Demographics and Clinical Characteristics

Variable	Total N (%) or mean (SD)	Stone-free N (%) or mean (SD)	Not stone-free N (%) or mean (SD)	p
N	586	395 (67.4%)	191 (32.4%)
Age, years	55.8 (15.1)	55.4 (15.1)	56.5 (14.9)	0.434
Gender
Male	308 (52.6%)	215 (69.8%)	93 (30.2%)	0.217
Female	278 (47.4%)	180 (64.7%)	98 (35.3%)
BMI	30.5 (8.6)	30.5 (9.1)	30.5 (7.7)	0.950
ASA score
1	53 (9.7%)	39 (73.6%)	14 (26.4%)	0.850
2	239 (43.8%)	162 (67.8%)	77 (32.2%)
3	230 (42.1%)	155 (67.4%)	75 (32.6%)
4	24 (4.4%)	16 (66.7%)	8 (33.3%)
Laterality
Left	313 (53.4%)	209 (66.8%)	104 (33.2%)	0.791
Right	273 (46.6%)	186 (68.1%)	87 (31.9%)
Renal anomaly	69 (11.8%)	47 (68.1%)	22 (31.9%)	1.00
Stone size, mm²	714 (817)	557 (475)	1045 (1197)	<0.001
Stone density, HU	895 (357)	888 (354)	904 (355)	0.579
Stone location
Upper pole	52 (8.9%)	31 (59.6%)	21 (40.4%)	0.005
Middle pole	37 (6.3%)	25 (67.6%)	12 (32.4%)
Lower pole	97 (16.6%)	74 (76.3%)	23 (23.7%)
Renal pelvis	158 (27.0%)	119 (75.3%)	39 (24.7%)
Multiple locations	242 (41.3%)	146 (60.3%)	96 (39.7%)
Staghorn
Partial	32.0%	69.1%	30.9%	0.736
Full	20.7%	52.2%	47.8%	<0.001
Number of stone
Single	229 (39.1%)	174 (76%)	55 (24%)	<0.001
Multiple	357 (60.9%)	221 (61.9%)	136 (38.1%)
Number of calices involving stone	1.3 (1.6)	1.1 (1.5)	1.8 (1.7)	<0.001
Hydronephrosis
Yes	161 (27.5%)	94 (58.4%)	67 (41.6%)	0.004
No	424 (72.5%)	300 (70.8%)	124 (29.2%)
Tract length, mm	113 (30)	112 (30)	114 (28)	0.372
Operative time, minutes	100 (54)	92 (47)	118 (63)	<0.001
Number of tracts	1.06 (0.283)	1.04 (0.211)	1.11 (0.389)	0.003
Length of stay, days	3.2 (2.4)	3.2 (2.5)	3.2 (2.2)	0.981
Complication: Clavien
0	415 (70.8%)	287 (72.7%)	128 (67.0%)	0.149
1	126 (21.5%)	80 (20.3%)	46 (24.1%)
2	25 (4.3%)	17 (4.3%)	8 (4.2%)
3a	10 (1.7%)	7 (1.8%)	3 (1.6%)
3b	7 (1.2%)	4 (1.0%)	3 (1.6%)
4	3 (0.5%)	0 (0%)	3 (1.6%)
Nephrolithometry score (median and IQR)
Guy's	2 (1)	2 (2)	3 (2)	<0.001
S.T.O.N.E.	8 (2)	7 (3)	9 (3)	<0.001
CROES	207 (95)	220 (93)	183 (84)	<0.001

Significant results are bolded.

ASA = American Society of Anesthesiology; BMI = body mass index; IQR = interquartile range.

On multivariable logistic regression, accounting for stone size, tract length, stone location, mean stone density, hydronephrosis, staghorn morphology, age, BMI and ASA, the GSS, S.T.O.N.E. score, and CROES nomogram score were all independent predictors of residual fragments after PCNL (odds ratio [OR]: 1.398, 95% confidence interval [CI]: 1.056, 1.852, p = 0.019; OR: 1.417, 85% CI: 1.231, 1.631, p < 0.001; OR: 0.993, 95% CI: 0.988, 0.998, p = 0.004, respectively).

The AUCs of the ROCs of the three stone scoring systems were 0.629, 0.671, and 0.646 for Guy's, S.T.O.N.E., and CROES score, while the stone size in mm² has an AUC of 0.652 (Table 3 and Fig. 1). When comparing the AUCs of the different scoring systems and stone size, there were no statistical differences.

FIG. 1.

Receiver operating characteristic curves for the three scoring systems in predicting stone-free status.

Table 3.

Area Under the Curve Results for the Three Scoring Systems in Predicting Stone Free Status

Variable	AUC	p	Asymptotic 95% CI
Guy's	0.629	<0.001	0.581, 0.677
S.T.O.N.E.	0.671	<0.001	0.624, 0.718
CROES	0.646	<0.001	0.598, 0.694
Stone size	0.652	<0.001	0.604, 0.700

AUC = area under the curve; CI = confidence interval.

After stratifying both the S.T.O.N.E. and CROES scores in four groups, we calculated the relative risk of residual stone after one PCNL surgery. The relative risks reported in Table 4 demonstrate the increased risk of not achieving stone-free status compared to the low risk group (e.g., patients who have a preoperative Guy's Grade 3 have a twofold higher risk of not becoming stone free with one surgery compared to a patient with a Grade 1).

Table 4.

Stone Free Rates and Relative Risks of Residual Stone Disease After a Single PCNL Procedure with Risk Group Stratification Within Each of the Scoring Systems

	Stone-free rate	Relative risk for residual stone (95% CI)	Risk group
Guy's
Grade 1	102/127 (80.3%)		Low
Grade 2	135/183 (73.8%)	1.332 (0.869, 2.042)	Intermediate
Grade 3	101/167 (60.5%)	2.008 (1.348, 2.990)	High
Grade 4	57/109 (52.3%)	2.423 (1.620, 3.625)	Very High
S.T.O.N.E.
5–6	110/134 (82.1%)		Low
7–8	178/245 (72.7%)	1.527 (1.007, 2.315)	Intermediate
9–10	78/133 (58.6%)	2.309 (1.524, 3.497)	High
11–13	29/74 (39.2%)	3.395 (2.262, 5.096)	Very High
CROES
276–340	85/101 (84.2%)		Low
211–275	130/181 (71.8%)	1.779 (1.072, 2.951)	Intermediate
146–210	143/222 (64.4%)	2.246 (1.386, 3.641)	High
80–145	36/81 (44.4%)	3.507 (2.149, 5.724)	Very High

We had a total complication rate of 29.2% with only 3.4% complications of Clavien grade 3 or higher. None of the scoring systems were strong predictors of complications. GSS was the only predictor on univariable analysis of a longer hospital stay (β = 0.221 days, 95% CI: 0.032, 0.409, p = 0.022) with an increase of ∼5.3 hours per increase in Guy's Grade. On multivariable analysis, however, controlling for ASA, urinary tract abnormality, age, total stone burden, mean stone density, and complication, the only independent predictors of hospital stay in the equation were ASA, urinary tract abnormality, and presence of complication. Multivariable linear regression analysis identified only the S.T.O.N.E. score as an independent predictor of OT (β = 14.556, 95% CI: 12.453, 16.660, p < 0.001).

Discussion

In recent years, the importance of systematic and standardized reporting of outcomes after various endourologic surgeries, including PCNL, has been emphasized.^8,9,14 Although Hyams et al. had previously highlighted a vast heterogeneity in reporting of preoperative variables in surgical management of kidney stones, consensus recommendations on standardized reporting of preoperative data have not yet been proposed.¹⁰ Preoperative prognostic tools can be useful not only to stratify patients in different risk groups but also as a means of standardized reporting of preoperative cohort data.

Currently, GSS, S.T.O.N.E. score, and the CROES nomogram score represent the three most commonly used prognostic tools for PCNL.^11
–13 Although these systems were constructed independently and through different methodologies, they are all proposed to aid the surgeon in assessing case complexity to predict SFR and complication risks, while assisting in preoperative surgical planning.

There are several important differences between the currently analyzed scoring systems to be illustrated. Although the S.T.O.N.E. score is entirely based on data obtainable from the preoperative CT, both the GSS and CROES score include patient variables. The GSS, however, does not include stone size, a strong predictor of success. All the scoring systems include a measure of stone complexity. Staghorn or partial staghorn stone formation is most used to indicate the complexity of a stone and is included as a variable in the GSS and CROES nomogram.^11,13 The lack of consensus on definitions for these terms renders these scoring systems susceptible to score variations due to subjective interpretation of partial and full staghorn stone. This was identified by both Thomas and Ingimarsson.^11,18 The S.T.O.N.E. scoring systems tried to eliminate this ill-defined feature by using the number of calices involved by stone as a surrogate for stone complexity.¹² This more objective measure of stone complexity could reduce scoring variations. In comparison to the GSS and CROES nomogram, however, the S.T.O.N.E. score does not include the number of stones or stone location, which have been shown to influence treatment success.^13,19,20 Although the CROES nomogram includes most of the variables that appeared significantly different between the cohort that is stone free and the cohort with residual stone in our population and may therefore be a more complete scoring system, the large continuous scale and complexity in use limit its application in everyday practice.

As important as the development of scoring systems, is the external validation of these prediction models before widespread use in clinical practice.^21,22 The GSS has been validated on multiple occasions.^18,23
–25 Ingimarsson et al. have shown a good interrater concordance for the GSS and interestingly pointed out that 56% of the discordant results were due to unclear definitions of abnormal renal anatomy and partial staghorn stone.¹⁸ Assessment of interobserver reliability for the S.T.O.N.E. score showed an effectively good concordance between urology residents, fellows, and staff.²⁶ Stone size and number of calices involved seemed to be the most challenging variables to measure with a slightly lower concordance. To date, the CROES nomogram has been externally validated on two occasions, showing fair predictive accuracy for SFR after PCNL.^25,27 Vergouwe et al. have demonstrated that a large enough sample size, including at least 100 events (in this case patients with residual stone), is required to adequately perform external validation.²⁸ This should be taken into account when interpreting results of validation studies performed on smaller patient cohorts. Our analysis represents the largest sample external validation analysis to date.

The staghorn stone morphometry classification by Mishra et al. and the Seoul National University Renal Stone Complexity (S-ReSC) score by Jeong et al. were not included in this comparison.^29,30 The staghorn morphometry score is a model that aims to predict the number of tracts and stages needed to clear the stone burden in patients with staghorn renal calculi. A contrast-enhanced CT with urography phase, which entails a higher radiation dose for the patient, and specific CT scan volumetric assessment software are necessary to classify a stone as type 1 (single tract single stage), type 2 (single stage multiple tract or multiple tract single stage), or type 3 (multiple tract multiple stage).²⁹ The S-ReSC score is based on only one parameter, the number of sites in the collecting system involved by stone and appears to have a good predictive accuracy.³⁰ We could, however, not evaluate this score as these data were not collected as outlined in the article.

In our current analysis, we establish that all the evaluated scoring systems are equally accurate in predicting SFR after single PCNL surgery. This finding corroborates previously reported similarities in predictive accuracy of scoring systems in smaller cohorts.^25,31,32 Interestingly, none of the systems have significant added predictive accuracy over stone size alone as a predictor of SFR. When assessing each of the stone scores against a set of variables on multivariable analysis (controlling for stone size, mean HU, tract length, hydronephrosis, number of calices, stone location, age, and ASA), it is interesting to visualize that with S.T.O.N.E., the stone location is retained in the model, with CROES, the number of calices involved, and with GSS, hydronephrosis is retained. Although each of the scoring systems is not more accurate in predicting SFR than stone size alone, stone size is not retained in a multivariable regression model containing any of the scoring systems, most likely due to collinearity.

By stratifying all patients in four groups within the Guy's, S.T.O.N.E., or CROES score, we could include a risk stratification with calculated relative risks for residual stone after the surgery compared to the reference, that is, lowest risk group (Table 4). Although the relative risk differences between the risk groups of both S.T.O.N.E. and CROES score are quite similar, those differences seem to be smaller for the GSS. The risk of residual stone is not significantly different between a Guy's Grade 1 and Grade 2 and there appears to be only a small increase in risk of having residual stones with a Guy's Grade 4 compared to a Grade 3. This may indicate that there is a more clear distinction between risk groups when using the S.T.O.N.E. score or CROES nomogram than with the GSS.

Although a higher S.T.O.N.E. score predicts a longer OT, none of the scoring systems can be considered a strong predictor for postoperative complications. Goyal et al. similarly reported that the GSS is not an independent predictor of postoperative complications in a pediatric population.³³ In the initial articles describing the GSS and S.T.O.N.E. score, the scores were not strong predictors of complications.^11,12 The CROES nomogram was initially not assessed for ability to predict postoperative complications.¹³ In contrast to our findings, others did identify a correlation between GSS and complications in a prospective validation cohort.^23,24,34

In a comparison of the GSS and S.T.O.N.E. score, Noureldin et al. demonstrated both scoring systems to be independent predictors of longer OR-time.³¹ Although the GSS was indeed a predictor of longer OR-time in our patient cohort on univariable analysis, adding ∼8 minutes per increase in score, it was not retained as an independent predictor in a multivariable linear regression model. Bozkurt et al., comparing the GSS to the CROES nomogram, reported that both scoring systems were predictive of estimated blood loss, OT, and overall complications.³²

In contrast to an earlier evaluation and comparison of the scoring systems on a smaller cohort, we could not identify any independent relation between the scoring systems and LOS in the current analysis.²⁵

We were unable to identify any score as being superior compared to the others with regard to predicting a stone-free outcome after PCNL. Although the S.T.O.N.E. score has no statistical benefit over the other scoring systems with regard to predicting SFR, it seems to be the easiest to obtain, relying purely on imaging characteristics without addition of other patient-related data and provides distinct risk stratification. This is, however, a subjective finding and should be more objectively substantiated, for instance by surveying a large group of practicing urologists about the use of the scoring systems. With the risk stratification as suggested in this article, patients can be classified as low risk, intermediate risk, high risk, and very high risk of residual stone disease after single PCNL for renal stone disease and counseled as such.

The main limitations of this study are its retrospective character and the data assessment by a single observer in each of the participating centers. We tried reducing these limitations of a multicenter retrospective study by communicating standardized outcome definitions and data collection methodology. In addition, interrater reliability has been shown to be fairly good for both the GSS and the S.T.O.N.E. score with κ = 0.72 and 0.75, respectively.^18,26 Although outcome definitions were standardized, postoperative imaging assessment of stone-free status was not. This indeed is a limitation of the study and is due to its retrospective character. When comparing the patients who had a CT postoperatively to the patients who did not, the three models still perform similarly and not significantly different from each other within and between the cohorts (AUC for GSS, S.T.O.N.E. score, and CROES nomogram were 0.637, 0.643, and 0.649, respectively, for the CT cohort and 0.614, 0.688, and 0.653 for the non-CT cohort). Although the accuracy of plain radiography may be lower than CT for residual fragments after PCNL,³⁵ it was not within the scope of this project to identify and compare diagnostic differences between different imaging modalities for postoperative SFR assessment. In a prospective study, this as well would need to be standardized in all participating centers. One could argue that with AUCs below 0.7, not one of the scoring systems performs well in predicting stone-free status after a single PCNL surgery for renal stone disease. They are for now, at least, a step in the right direction toward a more uniform way of reporting preoperative variables in PCNL-related research. However, as the systematic use of any of the five scoring systems is dependent on surgeon's preference, this only partially solves the problem. Further research is needed to identify if one is superior to the others with regard to clinical usefulness and predictive accuracy.

Conclusions

The three evaluated scoring systems for predicting outcomes after PCNL are all equally predictive of stone-free status after a single PCNL surgery. Patients can be stratified in a low, intermediate, high, or very high-risk group with their associated relative risk for residual stone. We have outlined the differences between the scoring systems and would argue that the S.T.O.N.E. score would be the easiest to obtain, while the CROES nomogram may be a more complete assessment of patient and stone complexity. All scores seem to be clinically useful. Further research is needed to identify whether or not any is superior to the others with regard to clinical usefulness and predictive accuracy.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Abbreviations Used

References

Moe

. Kidney stones: Pathophysiology and medical management. Lancet, 2006; 367:333–344.

Scales

, Smith

, Hanley

, Saigal

. Prevalence of kidney stones in the United States. Eur Urol, 2012; 62:160–165.

Ghani

, Sammon

, Karakiewicz

, Sun

, Bhojani

, Sukumar

, et al. Trends in surgery for upper urinary tract calculi in the USA using the Nationwide Inpatient Sample: 1999–2009. BJU Int, 2013; 112:224–230.

Oberlin

, Flum

, Bachrach

, Matulewicz

, Flury

. Contemporary surgical trends in the management of upper tract calculi. J Urol, 2015; 193:880–884.

Jayram

, Matlaga

. Contemporary practice patterns associated with percutaneous nephrolithotomy among certifying urologists. J Endourol, 2014; 28:1304–1307.

Ghani

, Sammon

, Bhojani

, Karakiewicz

, Sun

, Sukumar

, et al. Trends in percutaneous nephrolithotomy use and outcomes in the United States. J Urol, 2013; 190:558–564.

Mirheydar

, Palazzi

, Derweesh

, Chang

, Sur

. Percutaneous nephrolithotomy use is increasing in the United States: An analysis of trends and complications. J Endourol, 2013; 27:979–983.

Mitropoulos

, Artibani

, Graefen

, Remzi

, Rouprêt

, Truss

. Reporting and grading of complications after urologic surgical procedures: An ad hoc EAU guidelines panel assessment and recommendations. Eur Urol, 2012; 61:341–349.

Opondo

, Gravas

, Joyce

, Pearle

, Matsuda

, Sun

Y-H

, et al. Standardization of patient outcomes reporting in percutaneous nephrolithotomy. J Endourol, 2014; 28:767–774.

10.

Hyams

, Bruhn

, Lipkin

, Shah

. Heterogeneity in the reporting of disease characteristics and treatment outcomes in studies evaluating treatments for nephrolithiasis. J Endourol, 2010; 24:1411–1414.

11.

Thomas

, Smith

, Hegarty

, Glass

. The Guy's stone score—Grading the complexity of percutaneous nephrolithotomy procedures. Urology, 2011; 78:277–281.

12.

Okhunov

, Friedlander

, George

, Duty

, Moreira

, Srinivasan

, et al. S.T.O.N.E. nephrolithometry: Novel surgical classification system for kidney calculi. Urology, 2013; 81:1154–1159.

13.

Smith

, Averch

, Shahrour

, Opondo

, Daels

FPJ

, Labate

, et al. A nephrolithometric nomogram to predict treatment success of percutaneous nephrolithotomy. J Urol, 2013; 190:149–156.

14.

De la Rosette

JJMCH

, Opondo

, Daels

FPJ

, Giusti

, Serrano

, Kandasami

, et al. Categorisation of complications and validation of the Clavien score for percutaneous nephrolithotomy. Eur Urol, 2012; 62:246–255.

15.

Raman

, Bagrodia

, Gupta

, Bensalah

, Cadeddu

, Lotan

, et al. Natural history of residual fragments following percutaneous nephrostolithotomy. J Urol, 2009; 181:1163–1168.

16.

Holm

. A simple sequentially rejective multiple test procedure. Scand J Stat, 1979; 6:65–70.

17.

Hanley

, McNeil

. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982; 143:29–36.

18.

Ingimarsson

, Dagrosa

, Hyams

, Pais

. External validation of a preoperative renal stone grading system: Reproducibility and inter-rater concordance of the Guy's stone score using preoperative computed tomography and rigorous postoperative stone-free criteria. Urology, 2014; 83:45–49.

19.

Tirapegui

, González

, Tobía González

, Daels

. Pyelocaliceal distribution of kidney stones used as an outcome predictor in percutaneous nephrolithotomy after being evaluated with preoperative and postoperative CT scan. J Endourol, 2015; 29:666–670.

20.

Bayar

, Kadihasanoglu

, Aydin

, Sariogullari

, Tanriverdi

, Kendirici

. The effect of stone localization on the success and complication rates of percutaneous nephrolithotomy. Urol J, 2014; 11:1938–1942.

21.

Bleeker

, Moll

, Steyerberg

, Donders

, Derksen-Lubsen

, Grobbee

, et al. External validation is necessary in prediction research. J Clin Epidemiol, 2003; 56:826–832.

22.

Altman

, Royston

. What do we mean by validating a prognostic model?. Stat Med, 2000; 19:453–473.

23.

Vicentini

, Marchini

, Mazzucchi

, Claro

, Srougi

. Utility of the Guy's stone score based on computed tomographic scan findings for predicting percutaneous nephrolithotomy outcomes. Urology, 2014; 83:1248–1253.

24.

Mandal

, Goel

, Kathpalia

, Sankhwar

, Singh

, Sinha

, et al. Prospective evaluation of complications using the modified Clavien grading system, and of success rates of percutaneous nephrolithotomy using Guy's Stone Score: A single-center experience. Indian J Urol, 2012; 28:392–398.

25.

Labadie

, Okhunov

, Akhavein

, Moreira

, Moreno-Palacios

, Del Junco

, et al. Evaluation and comparison of urolithiasis scoring systems used in percutaneous kidney stone surgery. J Urol, 2015; 193:154–159.

26.

Okhunov

, Helmy

, Perez-Lansac

, Menhadji

, Bucur

, Kolla

, et al. Interobserver reliability and reproducibility of s.T.o.N.e. Nephrolithometry for renal calculi. J Endourol, 2013; 27:1303–1306.

27.

Sfoungaristos

, Gofrit

, Yutkin

, Landau

, Pode

, Duvdevani

. External validation of CROES nephrolithometry as a preoperative predictive system for percutaneous nephrolithotomy outcomes. J Urol, 2015. [Epub ahead of print]; DOI:10.1016/j.juro.2015.08.079.

28.

Vergouwe

, Steyerberg

, Eijkemans

MJC

, Habbema

JDF

. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol, 2005; 58:475–483.

29.

Mishra

, Sabnis

, Desai

. Staghorn morphometry: A new tool for clinical classification and prediction model for percutaneous nephrolithotomy monotherapy. J Endourol, 2012; 26:6–14.

30.

Jeong

, Jung

J-W

, Cha

, Lee

, Jeong

, et al. Seoul National University Renal Stone Complexity Score for predicting stone-free rate after percutaneous nephrolithotomy. PLoS One, 2013; 8:e65888.

31.

Noureldin

, Elkoushy

, Andonian

. Which is better? Guy's versus S.T.O.N.E. nephrolithometry scoring systems in predicting stone-free status post-percutaneous nephrolithotomy. World J Urol, 2015; 33:1821–1825.

32.

Bozkurt

, Aydogdu

, Yonguc

, Yarimoglu

, Sen

, Gunlusoy

, et al. Comparison of Guy and Clinical Research Office of the Endourological Society Nephrolithometry Scoring Systems for predicting stone-free status and complication rates after percutaneous nephrolithotomy: A single center study with 437 cases. J Endourol, 2015; 29:1006–1010.

33.

Goyal

, Goel

, Sankhwar

, Singh

, Sinha

, et al. A critical appraisal of complications of percutaneous nephrolithotomy in paediatric patients using adult instruments. BJU Int, 2014; 113:801–810.

34.

Sinha

, Mukherjee

, Jindal

, Sharma

, Saha

, Mitra

, et al. Evaluation of stone-free rate using Guy's Stone Score and assessment of complications using modified Clavien grading system for percutaneous nephro-lithotomy. Urolithiasis, 2015; 43:349–353.

35.

Gokce

, Ozden

, Suer

, Gulpinar

, Gulpınar

, Tangal

. Comparison of imaging modalities for detection of residual fragments and prediction of stone related events following percutaneous nephrolitotomy. Int Braz J Urol, 2015; 41:86–90.