Abstract
Purpose:
R.E.N.A.L. Nephrometry Score (NS) is an imaging-based (CT/MRI) scoring system commonly used by urologists to standardize the reporting of renal masses by enabling quantification of anatomical characteristics. We sought to examine the inter-rater correlation of NS between urologists, radiologists, and tumor-board collaborators.
Methods:
We identified adult patients undergoing partial or radical nephrectomy over 10 years (n=2450). Patients with autosomal dominant polycystic kidney disease (ADPKD), metastatic disease, masses >10 cm, and studies in which the study urologists or radiologists partook in patient care were excluded. Preoperative imaging was evaluated and patients with multiphasic CT available were included. Scans were provided to the reviewers to evaluate with a R.E.N.A.L. nephrometry questionnaire. Results were analyzed using kappa correlation coefficients.
Results:
One hundred twenty patients met inclusion criteria with mean age of 59.5 years. The majority of cases were partial nephrectomies (72%). Eighty-five percent of the tumors were malignant, with 26% having high-grade histology. The mean (standard deviation) overall NS was 6.8 (1.9) with fair correlation among reviewers (κ=0.222). Collaborators had the highest inter-rater correlation, ranging from 0.41 to 0.84 for NS component scores, compared with 0.42–0.85 for radiologists and 0.36–0.86 for urologists. “R” scores were best correlated (κ>0.8). NS correlation ranged between 0.16 and 0.31 for the groups while the NS complexity category correlation ranged between 0.50 and 0.61.
Conclusions:
Despite being naive to NS, inter-radiologist scoring patterns were better correlated than inter-urologist. The urologist and radiologist collaborating in tumor board showed the highest agreement, suggesting that a multidisciplinary approach in the characterization of renal masses may provide benefit to patient management.
Introduction
S
Based on the significant increase in published literature, multidisciplinary collaborations in the management of urologic malignancy, including renal cell carcinoma, are growing in popularity as clinical benefits and improved patient satisfaction are demonstrated. Regular participants in a genito-urinary (GU) tumor board can vary but typically include urologists, medical oncologists, radiation oncologists, pathologists, and radiologists. Discussion of renal tumor characteristics, normal kidney anatomy, presence or absence of tumor thrombus, lymph node disease, and metastatic spread is commonplace in this setting. As such, tumor board participants may develop similar patterns and approaches in the interpretation of renal mass imaging, although this has not been previously evaluated. Currently only urologists have implemented nephrometry scoring into their armamentarium for the assessment of kidney tumors; however, expanding nephrometry scoring to the field of radiology has the potential to improve communication and decision making in multidisciplinary settings. The effectiveness of the nephrometry scoring system is reliant upon high inter-rater correlation and reliability. Therefore, we sought to assess NS correlation between urologists, radiologists, and collaborators to determine any patterns or score distributions unique to certain groups.
Materials and Methods
After approval of the study by the Indiana University Institutional Review Board, we conducted a retrospective review of all adult patients undergoing partial or radical nephrectomy at our institution between June 2003 and June 2013 (n=2450). Patients were selected for inclusion if they had a preoperative multiphasic CT scan (dual or triphasic) in our system. This included patients whose CT was uploaded as a temporary file at our institution. We excluded patients with metastatic or locally advanced disease, tumor diameter >10 cm, and previously diagnosed polycystic kidney disease. Two urologists and two radiologists were chosen to participate in the study as reviewers. Both urologists are fellowship trained in urologic oncology (Timothy A. Masterson) and robotics (Ronald S. Boris). The radiologists focus on abdominal and GU radiology (Aashish A. Patel and Mark Tann). Any patient who received care from the study reviewers was excluded (i.e., imaging initially read by selected radiologists or surgery performed by selected urologists). One urologist and one radiologist regularly collaborate in multidisciplinary tumor boards and educational conferences and were defined for subanalysis as “collaborators.” These collaborators assessed the cases individually but, we hypothesized, would benefit from previous collaborative efforts.
Multiphasic CT scans were de-identified and made available to reviewers on an external hard drive. A computerized R.E.N.A.L. nephrometry scoring questionnaire was generated. 1 R.E.N.A.L. nephrometry is used to predict case complexity on a scale of 4–12, with 4–6 defined as low complexity, 7–9 as moderate complexity, and 10–12 as high complexity. 2 For the purpose of our study, we did not include the “A” variable (anterior vs posterior location) as it is not linear. Of note, no overview or introduction to nephrometry scoring was provided to the reviewers prior to this project, and neither radiologist was previously familiar with the scoring system. Study data were collected and managed using Research Electronic Data Capture (REDCap) tools hosted at Indiana University. 8 REDCap is a secure, Web-based application designed to support data capture for research studies, providing (1) an intuitive interface for validated data entry, (2) audit trails for tracking data manipulation and export procedures, (3) automated export procedures for seamless data downloads to common statistical packages, and (4) procedures for importing data from external sources.
Descriptive analysis was performed using Pearson's chi-squared test for categorical variables. Cohen's and Fleiss' kappa correlation coefficients were used to assess overall correlation between all reviewers and correlation between the urologists, radiologists, and collaborators. Significance of the kappa scores was defined as follows: <0.01, poor correlation; 0.01–0.2, slight correlation; 0.21–0.4, fair correlation; 0.41–0.6, moderate correlation; 0.61–0.8, substantial correlation; and 0.81–1, almost perfect correlation. 9 Scoring for one patient was not completed by reviewer 4, so it was eliminated from relevant correlative analyses. A priori, p-values<0.05 were considered statistically significant. Stata version 12.1 (Stata Corp. LP, College Station, TX) was used for all statistical analyses.
Results
One hundred twenty cases were identified for inclusion in the study. The average (standard deviation [SD]) age for patients included was 59.5 (13.2) years. The majority of these patients underwent partial nephrectomy for management of their mass (n=86). Tumor size ranged from 0.6 to 10.0 cm, with mean (SD) diameter of 3.3 (1.9) cm on final pathology. Eighty-five percent of the patients had malignant pathology (n=102) with clear cell being the most common histology (n=74). High-grade disease, defined as Fuhrman Score III–IV, was identified in 31 (26%) patients.
NSs assigned by three of the reviewers had a range from 4 to 11, while the final reviewer ranged from 4 to 12. Mean (SD) NS for all reviewers was 6.8 (1.9) with fair correlation between the reviewers (κ=0.222) (Table 1). Correlations for the overall NS between the urologists, radiologists, and collaborators ranged from slight to moderate with the strongest correlation noted for the radiologists (κ=0.306) (Table 1). Moderate-to-substantial correlation was noted when comparing the assigned complexity categories with κ=0.530 between all reviewers and κ=0.610 for the collaborators, which corresponded to 77.5% agreement between the two reviewers. The distribution of cases categorized within each complexity category was similar between the reviewers (p=0.092) (Table 2).
κ represents Fleiss' kappa.
κ represents Cohen's kappa.
NS=nephrometry score.
Collaborators.
Within the individual components of the NS (i.e., “R,” “E,” etc.), there was variable correlation. Tumor diameter score (“R”) was the most highly correlated with almost perfect correlation of κ=0.844. The percent agreement between each of the three groups was 93% (Table 1). Correlation for the “E,” “N,” and “L” scores was slight to moderate with 56%–72% agreement within each group. Aside from the overall NS and the “R” score, the highest percentage agreement was found among the collaborators.
Discussion
The interpretation of uroradiology falls primarily within the scope of training for radiology residency but it also exists within urology residency, as evidenced by the American Board of Urologists continuing to incorporate radiographic interpretation in the certification process for urologists. The specific role for each is not well delineated; however, it has been advocated that radiologists and urologists bring separate interpretive and prognostic skills to a collaboration. 10 Collaboration in multidisciplinary conferences and tumor boards likely provides the greatest value to patients. Within the field of endourology and renal calculus disease, previous studies have determined that both radiologists and urologists identify ureteric stones with generally equivalent results for both disciplines. 11,12 A recent study by Cho and colleagues examined imaging follow-up when guided by the ordering urologist and interpreting radiologist. 13 Their results showed that rates of follow-up imaging were reduced when clinical correlation and radiology recommendations were collectively utilized by urologists to decide the need for a particular study. These findings underscore the collaborative role of both disciplines in the optimum management of urology patients who depend on imaging for their care.
Since its inception nephrometry scoring has improved the descriptive abilities among urologists for various renal masses. 2 Although not initially intended to be a system for prediction of postoperative complications, multiple studies have investigated this potential, with generally significant results. 3 –6,14 –18 Both Hew and colleagues and Simhan and colleagues reported that NSs could be useful in predicting perioperative complications following partial nephrectomies. 4,6 Broughton and colleagues evaluated clinical T1a renal cortical tumors and determined that renal mass complexity was independently associated with urologists' decision to perform partial or radical nephrectomy. 7 Implementing a system such as nephrometry scoring among radiologists has the potential to greatly impact the urologists' surgical planning and perioperative management strategy. Familiarity with NS may improve communication and standardization between radiologists and urologists as they independently assess various renal masses. Additionally, using NS may assist with patient education regarding perioperative expectations and complication risks. While the historical cohort of patients used for evaluating NS included both open and minimally invasive radical and partial nephrectomies, recent studies have been equivocal in determining whether NS is predictive of postoperative outcomes among minimally invasive patients alone. 5,15 –19 Regardless of findings, the usefulness of nephrometry scoring is critically dependent upon the assumption that the scores are reliable and reproducible, justifying our investigation of correlation across disciplines.
Calculating inter-rater correlation and variability relies on both the observed agreement between raters and the probability of agreement between the rates by chance alone. 9 There are limitations to using the kappa correlation coefficient, including the fact that in the setting of higher inter-rater agreement, it can be relatively more difficult to achieve a higher kappa coefficient. 20 Because of these factors, alternative correlation measures have been suggested, such as the Gwet AC1, which calculates a potentially more robust probability of chance agreement than the kappa correlation coefficient. 20 Regardless, the kappa correlation coefficient remains the most widely accepted method of describing inter-rater variability. Understanding the limitations of the kappa is critical to interpreting our, and others', findings. We found it interesting that although the individual subjective scores had slight-to-moderate correlation, there was near-perfect correlation for the objective scores of diameter (“R”), which has been similarly identified in previous studies. 4,19,21 –23 Weight and colleagues examined concordance for NSs finding substantial correlation; however, among tumors >7 cm, this dropped to fair correlation. 23 Similarly, Kolla and colleagues reported substantial correlation for the individual NSs. 21 Our study, in contrast, found only slight correlation in the individual NSs. We hypothesize that this might be reflective of our choosing not to provide a group overview of NS prior to beginning the study; yet, as we were attempting to replicate everyday practice in which radiologists and urologists are reliant upon their interpretation of the scoring system, we believe that our results offer interesting findings. We do report moderate-to-substantial correlation for the overall NS complexity category, which implies that regardless of variation in the reported component NSs, the overall complexity of the tumor, as defined by NS, is fairly well correlated between urologists, radiologists, and collaborators.
To our knowledge, no other study has attempted to examine correlation in NS between urologists and radiologists in the evaluation of renal masses. The radiologists, with no exposure to NS aside from the
In interpreting our results the consistently highest percentage agreement was found between the collaborating urologist and radiologist over all other compared groups. Correlation in scores was similar between the collaborators and the radiologists. Although this in and of itself does not advocate the benefit of a multidisciplinary approach to GU oncologic disease, it does suggest that collaboration may harbor similar patterns of analysis and interpretation among participants that could strengthen the reproducibility of interpretive systems, such as nephrometry scoring. In other GU malignancies such as prostate cancer, the potential benefit of multidisciplinary care has been frequently evaluated. 24 –27 In a recent study that examined 15 years of experience with a multidisciplinary clinical for management of prostate cancer, Gomella and colleagues showed improved survival particularly for patients with high-risk, locally advanced prostate cancer. 26 Although not examining exclusively prostate cancer patients, Acher and colleagues found that there was a benefit to multidisciplinary evaluation of selected cases who were identified as “potential change cases” prior to the meeting but that there was little benefit to cases who were not identified as such. 28 The need for careful examination is further echoed in a SEER database study in which Bekelman and colleagues report that there was a significant increase in use of intensity-modulated radiation therapy among localized prostate cancer patients managed by integrated care groups with a decrease in the less-expensive androgen deprivation therapy. 25 Particularly in the setting of increasingly widespread Accountable Care Organizations, the importance of multidisciplinary collaborations to provide appropriate, guideline-driven care for complex patients is paramount. 29 Although the role of tumor board collaboration for renal cell cancer has not specifically been examined, these prior studies suggest that the potential benefits may extend outward to other GU malignancies. Whether our study results specifically support this theory is unclear, but it does suggest that collaboration may eliminate some subjectivity in the interpretation of renal masses allowing for a more consistent, standard approach to tumor management. Because the value of NSs is dependent on its reproducibility, assessing the relevance of its correlative strength among physicians who would be potentially using it is critically important.
There are several limitations of our study. First, we utilized only four reviewers, two urologists and two radiologists. Additionally, having a reviewer-wide conversation regarding nephrometry scoring at the onset of the study may have increased our correlation of NSs, particularly the more subjective components; however, we elected to evaluate correlation only between raters who were self-interpreting the scoring system. We recognize that increasing the number of reviewers would strengthen our conclusions and will be a future direction moving forward. Further, because of exclusion criteria only a limited number of scans (120) were included in our analysis, which could have impacted our results. Despite these limitations, this study both demonstrates the potential benefit of multidisciplinary collaborations in improving inter-rater description of renal masses and provides unique insight into the inter-reviewer variability for nephrometry scoring using urologists familiar with NS and radiologists with minimal prior exposure to NS.
Conclusions
Although nephrometry scoring has been previously reported to have high inter-rater correlation, we found substantial correlation only for the complexity categories of the NS. The highest agreement in scoring was seen between multidisciplinary collaborators. The role of multidisciplinary collaboration in improving description and characterization of renal masses should be further investigated and potentially incorporated into radiology and urology residency training in order to foster greater collaboration and understanding of the contributions each field can play in the management of patients with a renal mass.
Footnotes
Disclosure Statement
No competing financial interests exist for any of the authors.
