Abstract
Introduction:
Expert-reported outcomes and complications may not reflect the standardized coding that can be provided by independent, third-party evaluations. The goal of this article is to compare expert-reported complications with standardized coding by the National Surgical Quality Improvement Program (NSQIP). The procedures evaluated were laparoscopic radical nephrectomy (LRN), robot-assisted radical prostatectomy (RARP), and radical cystectomy (RC).
Methods:
The 10 largest LRN, RARP, and RC series were reviewed for reported complications. An index patient was derived from each series using patient demographic data. Index patients were entered into the NSQIP surgical risk calculator (SRC), which provides 11 predicted outcomes based on inputted data. SRC-predicted outcomes were compared with available complication rates in each series.
Results:
Across the 30 studies, 172 out of 330 (52%) of NSQIP-provided outcome types were presented within expert manuscripts. Death and venous thromboembolism (VTE) were the most commonly reported (27 and 23 studies, respectively), whereas urinary tract infection (UTI) (9) and pneumonia (10) were the least commonly presented. Comorbidities and follow-up duration were reported in 8 out of 30 and 17 out of 30 studies, respectively. For LRN, the median number of reported outcomes was 3 (range 1–5). LRN experts demonstrated a shorter mean length of stay (LOS) (2.5 days, SD=1.7) (p<0.001). In RARP studies, a median of 7.5 (3–11) outcomes was reported. Experts outperformed NSQIP RARP predictions in serious complications (p<0.001), any complication (p<0.001), surgical site infection (p=0.025), UTI (p<0.001), and VTE (p=0.002). RC manuscripts reported a median of 7 (2–11) outcomes. RC experts had higher rates of serious complications (p<0.001), reoperation (p<0.001), and death (p<0.001) than predicted by SRC.
Conclusion:
The level of standardization in reporting of outcomes differs between expert series and NSQIP, thus making comparisons difficult.
Introduction
H
The American College of Surgeons' (ACS) NSQIP is a national clinical registry specifically developed to help surgeons and hospitals improve surgical quality. Standardized coding is performed by trained individuals, is audited for accuracy, and is consistent across hospitals. Perioperative data on preoperative risk factors, intraoperative variables, and 30-day postoperative mortality and morbidity are prospectively collected. 3,4 Studies have shown that NSQIP more reliably captures surgical complications and mortality when compared with systems that use administrative and claims data. 5 –8 Recently, the ACS has used the NSQIP database to develop the NSQIP surgical risk calculator (SRC), a web-based tool that allows users to select a current procedural terminology (CPT) code for a particular surgery and to enter 21 preoperative patient factors (e.g., demographics, comorbidities). Using regression models, the SRC analyzes the patient factors and predicts 30-day postoperative outcomes. 4
Interest in the NSQIP database by urologic oncology has grown rapidly, with studies querying the database for surgeries such as nephrectomy, 9,10 radical prostatectomy, 11,12 and radical cystectomy (RC). 13 –15 In this study, we sought to compare the reported postoperative complication in the urologic oncology expert literature with data from the NSQIP database by using the SRC to generate predicted 30-day complications.
Methods
Three major urologic procedures were selected to compare to the NSQIP database: laparoscopic radical nephrectomy (LRN), robotic-assisted radical prostatectomy (RARP), and RC. From 1990 to 2014, we conducted a search of all published English literature using PubMed. Search terms for each procedure were combined with the keyword “complications.” A total of 214, 134, and 982 papers were initially returned for LRN, RARP, and RC, respectively. Articles were compiled into a database, and abstracts were then screened for relevance. Subsequently, the bibliographies of relevant studies, reviews, and international guidelines were hand-searched. Bibliographies of the retrieved literature were cross-searched manually for additional publications. All available publications describing peri-operative complications for LRN, RARP, or RC patients were evaluated. Randomized, prospective observational, and retrospective observational studies were included. Studies were excluded if they reported only subsets of larger cohorts (e.g., studies only reporting LRN outcomes for T2 or larger renal tumors). For RC, studies were selected regardless of type of urinary diversion. Initially, 29 LRN, 17 RARP, and 20 RC studies met the inclusion criteria. From these, the 10 largest papers for each of the three surgeries were selected. If an institution's patient cohort was described in multiple historical series, the largest (and usually most contemporary) publication was selected.
To compare expert-reported outcomes with NSQIP SRC-predicted outcomes, an index patient was created from each expert study. Index patients were defined by the average demographics reported in each study (e.g., age, body mass index [BMI], American Society of Anesthesiologists [ASA] score). For studies not reporting demographics, an index patient was created based on the averaged cumulative patient factors of the remaining studies within that surgery group. Each study's index patient preoperative factors were entered into the online SRC along with the appropriate CPT code, and predicted 30-day postoperative complication rates were generated for each study. For RC studies that included different types of urinary diversion, both orthotopic neobladder and ileal conduit index patients were generated separately. For each of the three surgeries, overall predicted outcomes were also obtained using a cumulative index patient calculated from averaged patient factors from all 10 studies; studies not specifically reporting complications till 30 days were not used to calculate these cumulative index patients.
NSQIP SRC-predicted outcomes are reported in 11 categories as follows: 16 pneumonia; cardiac (cardiac arrest or myocardial infarction); surgical site infection (SSI) (superficial incisional, deep incisional, or organ space); renal failure; urinary tract infection (UTI); venous thromboembolism (VTE); and return to operating room. In addition, “any complications” include all of what has been mentioned earlier, plus wound disruption, unplanned intubation, ventilator use >48 hours, stroke, or systemic sepsis. “Serious complications” are similar to “any complications,” except that superficial incisional SSI, ventilator >48 hours, and stroke are not considered serious. Death is the 10th outcome, and according to SRC categorization, 16 is included under “serious complications” but not “any complications.” The 11th outcome is the predicted hospital length of stay (LOS).
Reported complications from each of the 30 studies were extracted and tabulated according to the SRC complication categories listed earlier. Only complications that were reported according to the NSQIP criteria were included for analysis. Each of the 30 studies was also assessed for risk of bias in postoperative outcomes reporting using the Cochrane Collaboration's tool for assessing risk of bias. 17 Cumulative expert-reported and SRC-predicted outcomes were compared using one-sample test of proportions for rates and one-sample t-test for LOS. Statistical significance was assumed when p<0.05. Statistical analysis was performed using R version 3.1.3.
Results
The surgical outcomes in 30 studies, which included 2503 LRN, 18 –27 8924 RARP, 28 –37 and 4687 RC 38 –47 patients, were analyzed. One study 31 was a prospective cohort trial comparing RARP with radical retropubic prostatectomy. The remaining 29 studies were retrospective in design, although 16 accrued their data prospectively. Risk of bias in outcome reporting is shown in Figure 1. Average age was 59.8, 60.2, and 66.1 years; average ASA score was 2.5, 2.2, and 2.4; and average BMI was 26.9, 27.3, and 26.9 kg/m2 for LRN, RARP, and RC, respectively. Fifteen of 30 studies did not report ASA scores, while 13 of 30 studies did not include BMI. The average ASA score was 2.4 in the RC studies; therefore, SRC-predicted outcomes were calculated for both ASA 2 and 3. Comorbidity data (e.g., CCI, diabetes, cardiovascular disease) and duration of follow-up were reported in 8 out of 30 and 17 out of 30 studies, respectively.

Risk of bias in urological studies reporting complications as evaluated by Cochrane Collaboration's tool. *Attrition is assumed to be zero, as authors assess only complications occurring before discharge. †Primarily assesses those complications leading to death. ‡Only complications leading to hospital readmissions are captured.
Tables 1 –3 outline the complication rates from each study in comparison with SRC-predicted outcomes for index patients. Fifty-two percent (172 of 330) of the 11 NSQIP SRC outcome categories were captured by the 30 expert studies. UTI (9 of 30 studies) and pneumonia (10 of 30 studies) were the least frequently reported complications, while death (27 of 30 studies) and VTE (23 of 30 studies) were the most regularly reported.
Highlighted cells contain outcomes rates for which a comparison was possible.
Bold text indicates a p-value<0.05.
Duration of follow-up not indicated.
Index patient: male, age <65, BMI normal, ASA 3, clean-contaminated.
Reports on all laparoscopic procedures performed from 1993 to 2005. Only reports gross number of complications for LRN by Clavien classification.
Median LOS.
Index patient: male, age <65, BMI overweight, ASA 2, clean-contaminated.
Index patient: male, age <65, BMI class 1 obese, ASA 2, clean-contaminated.
Index patient: male, age <65, BMI overweight, ASA 3, clean-contaminated.
Does not distinguish myocardial infarctions from nonspecific arrhythmias.
Reports pulmonary embolism but omits deep vein thrombosis.
Does not state if 6 days is mean or median LOS.
Only uses data from studies reporting 30-day postoperative complications.
SSI=surgical site infection; UTI=urinary tract infection; VTE=venous thromboembolism; ARF=acute renal failure; ROR=return to operating room; LOS=length of stay; SD=standard deviation; NR=not reported/not assessed; NSQIP=National Surgical Quality Improvement Program; BMI=body mass index; ASA=American Society of Anesthesiologists; LRN=laparoscopic radical nephrectomy.
Highlighted cells contain outcomes rates for which a comparison was possible.
Bold text indicates a p-value<0.05.
Index patient: male, age <65, BMI overweight, ASA 2, clean-contaminated.
Reports only complications occurring before initial discharge.
Complications listed based on unscheduled visits and hospital readmissions.
Duration of follow-up not indicated.
Reports complications till postoperative 90 days.
Median LOS.
Only includes data from studies reporting 30-day postoperative complications.
NSQIP predicted death rate set at 0.0001% for the null hypothesis value in test-for-one-proportion.
Highlighted cells contain outcomes rates for which a comparison was possible.
Bold text indicates a p-value<0.05.
Primarily analyzes perioperative mortality.
Reports pulmonary embolism but omits deep vein thrombosis.
Index patient: male, age 65–74, BMI overweight, ASA 3, contaminated.
Reports complications till 90 days postoperative.
Median LOS.
Reports 176 “UTI/pyelonephritis” but does not specify UTIs alone. These data were not added to “Any” or “Serious” complications category.
Index patient: male, age <65, BMI overweight, ASA 3, contaminated.
Complications were captured based on hospital readmissions.
Acute respiratory distress syndrome was combined with pneumonia.
Reports “abscesses” under infectious complications but does not list any other type of SSI.
Only includes data from studies reporting 30-day postoperative complications.
Statistical analysis comparing cumulative expert rates with predicted NSQIP outcomes are shown in the last rows of Tables 1 –3. For LRN, only mean LOS was found to be significantly different between that reported by experts and that predicted by SRC (2.5 days vs 3 days, p<0.001) (Table 1). Compared with the RARP SRC index patient, centers of excellence had significantly better rates for both overall and serious complications, including lower SSI, UTI, and VTE rates (Table 2). For RC, experts had statistically lower pneumonia, SSI, VTE, and acute renal failure (ARF) rates, as well as lower UTI and overall complication rates after adjusting for complexity of urinary diversion type (Table 3). However, rates of serious complications and death regardless of RC diversion type were cumulatively higher at centers of excellence when compared with NSQIP index patients. Experts had greater reoperation rates than the NSQIP neobladder index patient.
Discussion
A comparison of outcomes and complications allows for education and improvement in surgery. As a result, there is marked interest in quantifying associated risks. In addition, projecting risk to the prospective surgical patient facilitates informed consent. While reports from expert urologic oncologists have been enlightening, the rigorous assessment of outcomes data is hindered by lack of standardized reporting. There was a lack of standardization among the 30 expert studies included in this analysis, making a comparison with NSQIP-predicted outcomes challenging.
When compared with NSQIP outcomes categories, some complication data were not directly reported in the expert literature. In total, 52% (172/330) of the 11 SRC outcomes were found in the manuscripts assessed. This difference was the most pronounced for LRN, wherein 28% (31/110) of outcomes were reported. Even for VTE (one of the most consistently reported complications [23/30]), four studies 23,24,38,45 listed pulmonary embolisms but did not record the number of deep vein thrombosis. Thus, the expert VTE rates used in this analysis may be underreported.
There were also differences in terminology employed in the expert literature, which limited robust comparisons. For example, some studies' complications were listed as “cardiac,” 20,41,45 “pulmonary,” 18,20,21,41 or “wound” 20,21 without further explanation. In the present analysis, these complications were not counted as myocardial infarction/cardiac arrest, pneumonia, and SSI or wound disruption, respectively. The inexplicit label “wound infection” was counted as an SSI but was not included as a deep incisional or organ space SSI in the “serious complication” category. “Respiratory distress” was not specific enough to qualify for “unplanned intubation,” whereas “acute respiratory distress syndrome” qualified as such. Finally, for studies that listed complications according to Clavien classification, solely reporting a complication as Grade 3 without further detail 40,43 was not considered specific enough to qualify as a return to the operating room.
On comparing LRN series with SRC-predicted outcomes, LOS was the only significant difference, favoring the expert literature (2.5 days vs 3 days, p<0.001). In the largest of the LRN series, the authors acknowledge lack of sufficient perioperative complication data from their multicenter cohort, stating that “complications [data] were not available from some institutions owing to a lack of a common protocol for collecting data at each institution.” 18 A movement toward standardized reporting may clarify any potential differences in complications after LRN.
For RARP, expert series had significantly better cumulative complication rates for a majority of outcomes categories, including SSI, VTE, and serious complications. While these lower rates likely reflect the inverse relationship between hospital volume and perioperative morbidity and mortality, 48 –50 only half of the studies mention that effort was taken to obtain outpatient follow-up information, which may introduce attrition bias (Fig. 1). Therefore, complications that do not occur immediately postoperatively (e.g., SSI, VTE) may be under-captured. Pierorazio et al. reported on their series of 1422 RARP patients and presented immediate perioperative morbidity and LOS. 30 The lack of 30-day complication data limits the assessment of perioperative events to subsequent morbidity. Two patients from the entire expert RARP cohort died, one from a suspected infection 23 days postoperatively 28 and one from an aspiration event on postoperative day 5 (Table 2). 30 The cumulative expert mortality rate of 0.02% was statistically greater than the SRC-predicted rate of 0%. However, it is likely that these deaths were anomalies and the statistical difference is narrow. In addition, data from the entire NSQIP database indicate that national mortality rates for RARP are around 0.05%, 12 which is not significantly different from expert rates (p=0.3).
In RC, the experts demonstrated higher rates of serious complications and death than were predicted by the SRC (Table 3). This was an unanticipated finding given that population-based studies of RC have consistently shown that high-volume institutions have lower morbidity and mortality than lower-volume institutions. 51 –53 The most likely explanation for our findings is that the RC index patient used for SRC-predicted outcomes did not accurately represent the true comorbidity and disease-severity status of patients undergoing RC at centers of excellence. We attempted to adjust for this by rounding the cumulative expert-reported ASA score of 2.4 up to 3. However, lack of standardized reporting of comorbidities (e.g., cardiac disease or diabetes) by most of the RC studies prevented us from further risk-adjusting the index patients. A recent analysis of the NSQIP database for perioperative outcomes after RC showed that baseline comorbidity status was indeed associated with increased odds of complications, including a 2.4 times increased risk of death among patients with cardiovascular disease. 14 We were not able to fully incorporate these risk factors into the index patients used in this study. Further, the SRC does not currently allow for the input of important preoperative risk factors, such as weight loss or serum albumin levels, which could clearly impact the surgical outcomes of the more comorbid RC patients seeking care at tertiary centers. The “surgeon adjustment of risks” option also was not employed in the SRC model for purposes of consistency.
NSQIP represents one possible vehicle for prospective collection of data of the perioperative outcomes and complications of a wide variety of surgeries. The NSQIP database is reproducible, standardized, and validated. The database is populated by persons specially trained and audited, who meticulously gather preoperative through 30-day postoperative data on surgical patients using strict comorbidity and adverse event definitions. Irrespective of discharge status, patients are followed for 30 days after surgery either by manual review of medical records or through personal communication with patients and outside physicians. 54 Furthermore, NSQIP may actually be improving care in the hospitals that currently employ it. Hall et al. examined trends over time in surgical mortality and morbidity from 118 hospitals participating in NSQIP and found 66% improved risk-adjusted mortality and 82% improved risk-adjusted complication rates. The authors estimate that, on average, participation in NSQIP may have resulted in each institution avoiding more than 200 complications and 12 to 36 deaths. 55 The NSQIP database is not without limitations. Currently, it lacks consideration of oncologic severity (i.e., grade and stage), as well as risk -stratification for preoperative factors that are common in cancer patients (e.g., weight loss, serum albumin levels). NSQIP also does not currently subcategorize complications by severity grade, such as by the Clavien-Dindo Scoring system. The database is also deficient in surgery-specific outcomes (e.g., impotence and incontinence for RARP), and therefore it is subject to much of the same reporting biases noted in several expert series (Fig. 1). In some surgeries, complications after 30 days are relatively common; these complications would not be illustrated by NSQIP. Further, the SRC is based on NSQIP data from 2009 to 2012; it does not yet incorporate newly accrued surgical data into its models. Finally, the SRC does not allow for surgeon- and institution-specific factors, such as surgical volume and provider experience. A recent study using the SRC in laparoscopic colectomy patients suggests that the calculator accurately predicts outcomes for average surgical risk patients but may not accurately predict outcomes for serious complications. 56 This highlights the need for further external validation of the SRC and improved risk-stratification models.
Our study is not without limitations. The creation of index patients was limited by a finite quantity of available risk factors reported in the expert series. Therefore, the index patients may not accurately represent the population of patients seen at tertiary care centers. In addition, we did not assume that studies that made no mention of certain adverse events had a complication rate of zero for that particular outcome. For example, if a study did not specify that data on renal failure were collected and postoperative kidney function was not listed, that study was recorded as having not assessed ARF. Thus, series relying on an assumption that reviewers would interpret no mention of a complication as absence of that complication were under-captured by this study. The Cochrane Collaboration's tool for assessing risk of bias was originally developed to evaluate the methodological quality of randomized control trials. Its use herein to evaluate institutions' surgical case series should not be viewed as a critique of these series' lack of randomization or blinding. Rather, its use is intended to draw attention to biases commonly found in observational studies. Finally, several of the expert series used in this study predate SRC data by several years. Improvements in surgical technique and surgeon experience make a comparison with more modern SRC outcomes difficult.
It is not necessarily the contention of the findings here that there are significant differences in outcomes between the experiences of experts in the literature and those surgeons participating in NSQIP. Rather, it is noted that clear differences exist in the way that outcomes and complications are reported and that standardizing this process may afford transparent comparisons.
Conclusion
Differences in the style and components of complication reporting within the expert urologic oncology literature exist, making a comparison with predicted outcomes from the highly standardized NSQIP database difficult. The need for standardization in the accrual and reporting of surgical complications may become more critical as healthcare moves toward outcome metrics as measures for quality of care.
Footnotes
Author Disclosure Statement
Sam B. Bhayani is a consultant for Intuitive Surgical, Inc. For the remaining authors, no competing financial interests exist.
