Abstract
Background:
Ventilator associated pneumonia (VAP) is defined by the American College of Surgeons Trauma Quality Improvement Program (ACS TQIP) using laboratory findings, pathophysiologic signs/symptoms, and imaging criteria. However, many critically ill trauma patients meet the non-specific laboratory and sign/symptom thresholds for VAP, so the TQIP designation of VAP depends heavily upon imaging evidence. We hypothesized that physician opinions widely vary regarding chest radiograph findings significant for VAP.
Patients and Methods:
The TQIP Spring 2021 Benchmark Report (BR) was used to identify 14 patients with VAP at an academic Level 1 Trauma Center. Critically ill trauma patients (n = 7) who spent at least four days intubated and met TQIP's laboratory and sign/symptom thresholds for VAP but did not appear as VAPs on the BR comprised the control group. For each deidentified patient, four successive chest radiographic images were compiled and arranged chronologically. Cases and controls were randomly arranged in digital format. Blinded physicians (n = 27) were asked to identify patients with VAP based solely on imaging evidence.
Results:
Radiographic evidence of VAP was highly subjective (Krippendorff α = 0.134). Among physicians of the same job description, inter-rater reliability remained low (α = 0.137 for trauma attending physicians; α = 0.141 for trauma fellows; α = 0.271 for radiologists). When majority judgment was compared to the TQIP BR, there was disagreement between the two tests (Cohen κ = -0.071; sensitivity, 64.3%; specificity, 28.6%).
Conclusions:
Current definitions of VAP rely on subjective imaging interpretation and ignore the reality that there are numerous explanations for opacities on CXR. The inconsistency of physicians' imaging interpretation and protean physiologic findings for VAP in trauma patients should preclude the current definition of VAP from being used as a quality improvement metric in TQIP.
Ventilator-associated pneumonia (VAP) is among the American College of Surgeons Trauma Quality Improvement Program's (ACS TQIP's) 21 hospital events. It is defined as “a pneumonia where the patient is on mechanical ventilation for >2 calendar days on the date of event, with a day of ventilator placement being Day 1 AND the ventilator was in place on the date for the event or the day before.” 1 This event must also include laboratory findings, clinical signs/symptoms, and serial chest radiographic evidence defined as “two or more serial chest imaging results with at least one of the following: new and persistent or progressive and persistent infiltrate, consolidation, or cavitation.” 1 This definition is complicated, and the inclusion of chest imaging interpretation introduces yet another layer of ambiguity.
Ventilator-associated pneumonia is a serious medical problem that contributes to increased ventilator time, mortality, hospital length of stay, intensive care unit (ICU) length of stay, and cost. Ventilator-associated pneumonia can complicate the course of up to 25% of patients receiving mechanical ventilation, and its mortality rate can range from 24% to 50%, even reaching 76% if infection is caused by high-risk pathogens. 2 However, many critically ill trauma patients often share the same non-specific laboratory findings and clinical signs/symptoms, so the TQIP definition of VAP often hinges on what constitutes a radiographic density or change in density, both of which are subjective and difficult to uniformly measure within and between individuals. Therefore, VAP is difficult to define and practically impossible to uniformly measure. For this reason, major public reporting organizations, including the U.S. Centers for Disease Control and Prevention's (CDC's) National Healthcare Safety Network (NHSN), have attempted to streamline the definition of VAP but to no avail. 3 The NHSN consequently abandoned the use of VAP as a quality metric in 2013, opting instead to monitor ventilator-associated events. 4 Thus, there is logic for questioning the inclusion of VAP in the TQIP Benchmark Report (BR).
This study tests the hypothesis that interpretations vary widely among trauma and critical care specialists in chest radiographic findings consistent with VAP in trauma patients. In intubated trauma patients there are numerous conditions that may produce infiltrates or consolidation on chest radiograph, either of which may fulfill the TQIP diagnostic criteria for VAP. 1 These include pulmonary contusion, transfusion-associated cardiac overload (TACO), and transfusion-related acute lung injury (TRALI), among other conditions. Although inter-observer reliability in chest radiography has been studied previously, 5 to our knowledge, no previous study has examined our hypothesis, which relates specifically to VAP in trauma.
Patients and Methods
The TQIP Spring 2021 BR was used to identify 14 patients diagnosed with VAP at an academic level 1 trauma center. For each patient, four successive chest radiographic images were compiled and arranged chronologically. Chest radiographs included the two chest radiographs that were taken immediately before and immediately after each patient's positive bronchoalveolar lavage (BAL) result. These cases were matched with a control group that consisted of 14 critically ill trauma patients who spent at least four days intubated and met TQIP's laboratory and sign/symptom thresholds for VAP but did not appear as VAPs on the BR.
Chest radiographic images from cases and controls were randomly arranged in an electronic presentation (Supplementary Fig. S3). The presentation was delivered individually to prospective respondents via email. The corresponding data collection tool was an online survey using Google Forms survey administration software (Google LLC, Mountain View, CA) (Supplementary Figs. S1 and S2). Twenty-seven physicians from a single academic level 1 trauma center were asked to identify patients with VAP based solely on imaging evidence, as defined by TQIP. 1 These physicians included trauma and critical care specialists as well as attending radiologists. They were selected based on a convenience sampling. Voting was performed individually and anonymously, and results were compiled by the researchers.
Cases and controls were originally matched 1:1. However, it was noted that some control patients might appear as VAPs on subsequent TQIP BRs that were not available at the time our study was conducted. We planned to exclude these patients from our analysis post hoc.
Respondents' opinions were compared with their peers to assess the likelihood of internal agreement. Statistical analysis was performed using IBM SPSS Statistics, version 28 (IBM Corp, Armonk, NY). The Krippendorff α test was used to estimate the inter-coder reliability. 6 This test is used to assess inter-rater reliability for a nominal outcome with any number of judges. With respect to individual patient cases, the opinion of the majority (>50% of respondents) was compared to the TQIP BR designation of VAP. This was analyzed using Cohen κ, with the majority opinion representing the experimental instrument and the BR being the current instrument. This was also used to assess sensitivity and specificity. This study was conducted in accordance with Institutional Review Board guidelines.
Results
Respondents analyzed 28 patients. Seven patients from the control group were excluded from the analysis post hoc, because these patients appeared as positive VAPs on subsequent TQIP BRs that were not available at the time our study was conducted. Therefore, the final match of cases and controls was 2:1.
The final study population comprised 21 patients: 14 (67%) with VAP based on the TQIP BR (cases) and 7 (33%) without VAP based on the TQIP BR (controls). Most respondents identified VAP in 14 of the 21 patients (67%). When majority judgment was compared to the TQIP BR using Cohen κ, we found disagreement between the two tests (κ = −0.071). Sensitivity was 64.3%, and specificity was 28.6%. This sensitivity indicates that most physicians agreed that there was radiographic evidence of VAP in only 64% (9/14) of TQIP-designated VAPs. Thus, the over-reporting of VAP to TQIP was 36%. Specificity of 29% signifies that 71% of control patients were identified as VAP by respondents. There were 27 survey respondents. These included 13 (62%) trauma/critical care attendings, seven (33%) trauma/critical care fellows, and seven (33%) radiologists. The inter-coder reliability was low (Krippendorff α = 0.134), indicating a lack of agreement among the 27 respondents. Subgroup analysis was performed among physicians of the same job description, and inter-rater reliability remained low (α = 0.137 for trauma attending physicians; α = 0.141 for trauma fellows; α = 0.271 for radiologists).
Radiology attending physicians were the most likely group to identify a VAP among all patients, whereas trauma/critical care attendings were the least likely to identify a VAP (Table 1). Trauma fellows were the most likely to agree with the TQIP designation of VAP (86%) or lack thereof (43%).
Likelihood of Majority Judgement Identifying VAP by Job Description
VAP = ventilator-associated pneumonia; BR = Trauma Quality Improvement Program (TQIP) Spring 2021 Benchmark Report.
Example: the majority of trauma attendings identified VAP in 6 of 14 (43%) cases described as VAP by BR.
Discussion
Disagreement among physicians and TQIP
This study demonstrates that there is discordance regarding what constitutes radiographic evidence of VAP. There is a lack of agreement among physicians regarding VAP, and the physician consensus contrasts with nationally reported cases of VAP. There was poor agreement among the study respondents about which images represented VAP. Likewise, there was low inter-rater reliability when physicians were compared with their peers with the same job description.
There was discordance between the majority position of clinicians and the TQIP BR with regards to VAP designation. Both sensitivity and specificity were poor. Overall, these statistical measures convey a lack of agreement on the radiographic findings that constitute VAP.
Chest radiographs in the diagnosis of pneumonia
The interpretation of chest imaging, particularly chest radiographs, holds considerable weight in the designation of VAP, as the TQIP sign/symptom and laboratory thresholds for VAP are easily satisfied by many trauma ICU patients regardless of their infectious status. Some studies have reported that lung ultrasound is superior to chest radiographs for the detection of VAP, but radiographs remain the most common form of imaging used.7,8 Studies do, however, note that chest radiographs are imperfect in the diagnosis of VAP, with reported sensitivity of 32% to 78%.7,9,10 Butler et al. 11 observed that chest radiographs failed to detect infiltrates in 40% of patients with VAP that had been confirmed by bronchoscopy with brush sampling. This illustrates that normal radiograph does not exclude the presence of VAP. 11 Still, it is equally problematic that roentgenographic findings of VAP lack specificity.
Accordingly, there are many non-infectious causes of infiltrates on chest radiograph, particularly in the critically ill trauma patient, such as pulmonary edema, atelectasis, alveolar hemorrhage, and acute respiratory distress syndrome.11−13 Up to 38% of trauma patients may have abnormal chest radiographs on ICU admission. 14 Haliloglu et al. 8 reported that the likelihood that any opacity observed on chest radiograph is due to pneumonia is low, ranging from 27% to 35%. The subjective interpretation of chest radiographs further complicates matters. In the evaluation of community-acquired pneumonia, studies have found agreement among radiologists to be “fair” or “fair-to-good.”15,16 However, this is the first study to examine inter-rater agreement for portable chest radiographs in the diagnosis of VAP, and it was found to be poor. Although this may be satisfactory for clinical use, it falls short of the necessary standard for benchmarking.
Recent studies have examined the use of artificial intelligence, machine learning, and deep learning systems in radiographic interpretation.17,18 Pham et al. 18 showed that deep learning systems can improve inter-rater reliability among radiologists in reviewing chest radiographs. This is a notable area of medical advances. However, the use of machine learning is not currently widespread in clinical diagnosis nor benchmarking. 17
The National Trauma Data Standard Data (NTDS) dictionary used by the ACS TQIP requires laboratory findings, patient signs/symptoms, and serial chest imaging evidence to diagnose VAP, as summarized in Table 2. Novosel et al. 19 and Klompas et al. 20 noted that some discordance between reported rates of VAP may be attributed to the use of different diagnostic criteria. We have shown that chest radiography is an unreliable diagnostic tool in its current state with reference to VAP. Furthermore, this makes the current definition of VAP unreliable for clinical benchmarking.
TQIP VAP Algorithm
TQIP = Trauma Quality Improvement Program; VAP = ventilator-associated pneumonia; WBC = white blood cell count; LRT = lower respiratory tract; BAL = bronchoalveolar lavage; PMN = polymorphonuclear.
Study limitations
Our study used an online survey to compare VAPs from an academic level 1 trauma center's TQIP BR to the opinions of blinded trauma/critical care and radiology experts. Cases and controls were matched 2:1 rather than 1:1 because several controls were found to have VAP on subsequent TQIP BR. There was bias in the selection of survey respondents, who were predominantly trauma surgeons with board certification in critical care. We did endeavor to include radiologists, who may review hundreds of chest radiographs in a single day, and trauma fellows who are in the final year of their two-year critical care fellowship.
Additionally, there was bias in the process of selecting chest radiographs for electronic presentation. We elected to display the two chest radiographs that were taken immediately before and immediately after each patient's positive BAL result, which would have satisfied TQIP's laboratory threshold for VAP. For the study to remain blinded, survey respondents did not have access to the full breadth of chest radiography that would have been available in each patient's unabbreviated medical record. It is a possible source of bias that study respondents may have seen the same images in the clinical setting prior to taking our survey. However, given the large patient volume at our center, it is unlikely that a provider would recall a given patient's chest radiographs without context.
Finally, it was necessary to summarize the opinion of the survey respondents to compare our experimental instrument to the current instrument (TQIP BR) using Cohen κ. For each case/control we chose to use the majority of responses to represent the consensus opinion. However, because agreement was so low (Krippendorff α = 0.134), a true consensus was seldom achieved. Additionally, Cohen κ test presumes that the current instrument is highly dependable and valid, which we contend is a false premise in the case of TQIP's BR.
Conclusions
This study illustrates that there is ambiguity in the diagnosis of VAP despite robust efforts to use a single standard definition. This obscures the practice of reporting VAP as a hospital-related event. In this study population, registrar interpretation of imaging reports resulted in 36% overreporting of VAP to TQIP. The current definition of VAP ignores the biologic reality that there are multiple causes for opacities on chest radiograph. The subjectivity of imaging interpretation among physicians and the confounding physiologic parameters for VAP in trauma patients should preclude the current definition of VAP from being used as a quality improvement metric in TQIP. This proposal concurs with the CDC's NHSN, which abandoned the use of VAP in 2013.
Footnotes
Acknowledgments
The authors would like to acknowledge Olga Quintana and the team of registrars at the Ryder Trama Center.
Authors' Contributions
Conceptualization: Ramsey, O'Neil, Proctor, Namias. Methodology: Ramsey, O'Neil, Saberi, Gilna, Kaufman, Lieberman, Lineen, Namias. Software: Ramsey, Meizoso, Pizano, Satahoo, Danton, Proctor. Formal analysis: Ramsey, O'Neil, Saberi, Gilna. Investigation: Ramsey, O'Neil, Saberi, Gilna, Kaufman, Lieberman, Lineen, Meizoso, Pizano, Satahoo, Danton, Proctor, Namias. Writing—original draft: Ramsey, Meece. Writing—review and editing: Ramsey, O'Neil, Saberi, Meece, Gilna, Kaufman, Lieberman, Lineen, Meizoso, Pizano, Satahoo, Danton, Proctor, Namias. Visualization: Ramsey, O'Neil, Saberi. Data curation: Danton. Project administration: Namias. Supervision: Danton, Proctor, Namias.
Funding Information
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author Disclosure Statement
The authors have no conflicts of interest to declare.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
