Abstract
Introduction:
Surgical site infections (SSIs) are an important quality measure. Identifying SSIs often relies upon a time-intensive manual review of a sample of common surgical cases. In this study, we sought to develop a predictive model for SSI identification using antibiotic pharmacy data extracted from the electronic medical record (EMR).
Methods:
A retrospective analysis was performed on all surgeries at a Veteran Affair’s Medical Center between January 9, 2020 and January 9, 2022. Patients receiving outpatient antibiotics within 30 days of their surgery were identified, and chart review was performed to detect instances of SSI as defined by VA Surgery Quality Improvement Program criteria. Binomial logistic regression was used to select variables to include in the model, which was trained using k-fold cross validation.
Results:
Of the 8,253 surgeries performed during the study period, patients in 793 (9.6%) cases were prescribed outpatient antibiotics within 30 days of their procedure; SSI was diagnosed in 128 (1.6%) patients. Logistic regression identified time from surgery to antibiotic prescription, ordering location of the prescription, length of prescription, type of antibiotic, and operating service as important variables to include in the model. On testing, the final model demonstrated good predictive value with c-statistic of 0.81 (confidence interval: 0.71–0.90). Hosmer–Lemeshow testing demonstrated good fit of the model with p value of 0.97.
Conclusion:
We propose a model that uses readily attainable data from the EMR to identify SSI occurrences. In conjunction with local case-by-case reporting, this tool can improve the accuracy and efficiency of SSI identification.
Introduction
Surgical site infections (SSIs) occur in 0.5%–3% of patients undergoing surgery, add an average increased hospital length of stay by 7–11 days, and account for $3.5 billion to $10 billion in added US healthcare costs annually.1–4 As such, an institution’s SSI rate is an important surgical quality metric used for benchmarking performance for accreditation and reimbursement. 5 Surveillance and reporting of SSIs is an important function of surgical quality programs, and many institutions use dedicated staff to find and review cases of SSI. However, the manual curation of this data is time consuming and costly; the Agency for Healthcare Research and Quality estimates that 33 hours of chart review is needed to identify one incidence of SSI. 6
The implementation of electronic medical records (EMRs) has allowed for automated data abstraction and utilization of computer algorithms in identifying SSIs. Much of the pre-existing research has focused on models trained on administrative claims data, namely International Classification of Diseases (ICD) codes, to capture instances of SSI.7–8 Identifying SSIs through ICD coding can be problematic, as it relies on accurate coding and documentation of a patient encounter and can be biased by its primary goal of hospital reimbursement. While modeling claims data can provide an accurate reflection of overall SSI rate and trends, such models are inaccurate when identifying complications in individual patients.9–10
Since chart review and claims data both rely on documentation, we sought a novel method to identify SSIs. We hypothesized that outpatient antibiotic prescription data from an EMR can be leveraged by a machine learning algorithm to accurately estimate the occurrence of an SSI.
Methods
Following institutional review board approval, we performed a retrospective analysis on all patients who received an outpatient prescription for an oral antibiotic within 30 days of a surgical procedure done at a U.S. Veteran Affairs Medical Center between September 2020 and September 2022. We excluded patients who did not receive antibiotics within 30 days of surgery and those who had undergone an endoscopic (such as a colonoscopy, esophagogastroduodenoscopy, hysteroscopy, and cystoscopy), dental, or an isolated anesthesia pain procedure. For patients meeting inclusion criteria, data were abstracted from the Corporate Data Warehouse including demographics, surgery date, operating service, wound classification, patient demographics, type of antibiotic, reason for treatment, date and length of prescription, and the type of encounter from which the prescription was ordered (hospital, clinic, emergency department, etc.).
The primary aim of the study was to develop a machine learning model to accurately identify the occurrence of an SSI using variables that can be readily extracted from surgery and antibiotic prescription data residing in the EMR without manual chart review. Identifying patient-level risk factors for SSI was beyond the scope of this study. Chart review was conducted on the included patients to determine the incidence and type of SSI as defined by the VA Surgical Quality and Improvement Program (VASQIP) wound definitions (see Supplementary Data S1). Logistic regression was performed to identify suitable variables for inclusion in the model, and we included variables with p values < 0.2. The regression model was trained with k-fold cross validation with k = 10 slices on a random 80% subset of the data and then tested on the remaining 20%. The c-statistic, or the area under the curve of the receiver operating characteristic (ROC) curve was calculated, as was the probability threshold for optimum sensitivity and specificity of our model. Hosmer–Lemeshow testing was used to determine goodness of fit. Missing values were not replaced during analysis. All statistics and machine learning computations were performed using R Studio with the caret and pROC packages (Version 3.5.3, Posit, Boston, MA).11–12 The Transparent Reporting of a Multivariable Prediction model for Individual Prognosis or Diagnosis guidelines were followed for standard reporting of this study. The completed checklist for the article is available in Supplementary Table S1.
As a secondary aim, we tested our assumption that antibiotic data would capture most of our institution’s SSIs. We compared with the collected antibiotic data to SSI reporting data collected by our institution’s quality and peer review teams to identify possible areas of deficiency in our model and opportunities for further quality improvement. The surgical quality team at our institution regularly monitors readmissions after surgery and encourages self-reporting of SSIs by providers.
Results
There were 8253 eligible cases over the study period for which 793 (9.6%) involved an outpatient antibiotic prescription within 30 days. Chart review confirmed SSI in 128 (1.6%) cases, of which 101 (78.9%) were superficial, 15 (11.7%) were deep incisional, and 12 (9.4%) were organ space SSIs. An additional 97 (1.2%) cases receiving antibiotics were because of a surgical wound concern but did not meet the VASQIP defined SSI criteria (Fig. 1). The patient cohort receiving antibiotics within 30 days of surgery had median age of 65 (interquartile range 52–72), was 90% male, and 61% white. The majority of the involved procedures were inpatient and done by general, orthopedic, urologic, or podiatric surgery teams (Table 1).

Study consort diagram.
Cases Receiving Antibiotics Within 30 Days of Surgery
IQR, interquartile range; Rx, antibiotic prescription; Abx, antibiotic.
Using multivariable binomial logistic regression, six variables were selected for inclusion in our model: the time from surgery to the time of antibiotic prescription, the length and clinical setting of the prescription, the type of antibiotic, the wound class of the procedure, and the operating service. Increased time from surgery, a prescription from the operating service’s clinic, clean wound class, and the general surgery and orthopedics hand services were most predictive of the antibiotic being prescribed for an SSI. The final model after training and testing had a c-statistic of 0.81 (confidence interval: 0.71–0.90) (Table 2). A probability threshold of 11.9% yielded the optimum combination of sensitivity and specificity of 94.4% and 53.6%, respectively (Fig. 2), as well as a negative and positive predictive value of 98.7% and 20.7%, respectively. Hosmer–Lemeshow testing demonstrated good fit of the model with p value of 0.974.

Receiver operating characteristic curve for the model; ROC, receiver operating characteristic; SSI, surgical site infection. Point illustrated indicates probability threshold yielding the optimal combination of specificity and sensitivity.
Surgical Site Infection Model Final Parameters
Rx, antibiotic prescription; ED, Emergency Department; PCP, primary care provider, Abx, antibiotic.
The combination of SSIs identified by antibiotic prescription review and surgical quality reports represented all known cases of SSI at our institution during the study period. There were 146 occurrences of SSIs, giving an overall incidence of 1.8%. Antibiotic data outperformed quality reporting along, capturing 88% of SSIs compared with to 33% (Table 3a). While superior to quality reporting in detecting superficial (95% vs. 19%) and organ space infections (71% vs. 41%), it was inferior in the detection of deep incisional infections (65% vs. 91%). There were 18 (12%) cases of SSI that were not captured by antibiotic data. Most of the missed cases were not detected because of the prescription of intravenous (IV) antibiotics, the prescription occurring outside of 30 days post procedure, treatment at an outside hospital, or treatment completed entirely in an inpatient setting prior to discharge (i.e., no outpatient antibiotic prescription) (Table 3b).
Performance of Outpatient Antibiotic Prescription Data in Detecting Surgical Site Infections
Unique cases of SSIs identified on review of antibiotic data and quality team reports, representing all known SSIs during the study period.
Eighteen cases of SSIs were captured by the quality team but were not captured by the 30-day antibiotic data.
SSI, surgical site infection; Abx, antibiotic; QI, quality improvement; IV, intravenous.
Discussion
We hypothesized that outpatient antibiotic data after surgery can be leveraged in a machine learning algorithm to retrospectively identify incidences of SSIs. We demonstrated that quality reporting alone is insufficient in capturing a true estimate of SSI rate and that the addition of antibiotic data analysis improves SSI identification. Furthermore, our model accurately predicts SSIs derived from antibiotic data with high sensitivity and moderate specificity.
Previous studies on SSI models using structural data have relied on administrative claims data, which is reliant on documentation and accurate reporting. We have provided a novel and alternative approach using operative and antibiotic prescription data, which can be automatically extracted from the EMR and is objective and not documentation based. It provides a highly sensitive way to screen patients for SSIs and can decrease the burden of chart review. However, analyzing antibiotic data alone is not sufficient in capturing all cases, particularly deep incisional and organ space infections. These types of SSI are likely to merit readmission, reoperation, and/or prolonged hospital length of stay and should trigger review by a dedicated surgical quality team.
Our study does have limitations. The demographics of the veteran surgical population differ from the general population, particularly in that it is predominately male. This may limit the study’s generalizability to other patient groups. The VA medical system provides longitudinal and comprehensive care for veterans; such care may not be provided at other types of institutions where postoperative patients may present for care of suspected SSI to a hospital system different from where they underwent their index operation. In addition, other systems may exhibit different prescribing practices in terms of types of antibiotics prescribed and locations in which patients are evaluated. Even within the VA health system, differences in local station characteristics, such as rurality, may limit follow-up. Thus, our results from an urban VA medical center may not be generalizable to other centers. The outpatient antibiotic data did not account for IV antibiotics or antibiotics prescribed outside of the 30-day time window which contributes to missing cases, particularly deep incisional and organ space infections.
Using antibiotic prescription data in conjunction with a surgical quality team and self-reporting gives a more accurate picture of an institution’s SSI Rate. Our model of prescription data offers a more efficient way to screen patients by decreasing the burden of manual chart review by quality and safety officers. Increased specificity in similar models can be improved by incorporating natural language processing of provider notes, which has been shown to increase accuracy of SSI models and will be a target for future research.7,13,14
Conclusion
Postoperative antibiotic data can be leveraged in machine-learning algorithms to accurately predict occurrences of SSIs without manual chart review. The addition of such algorithms to pre-existing quality reporting systems can greatly enhance the identification of SSIs and support the development of quality and safety initiatives.
Footnotes
Acknowledgments
This material is the result of work supported with resources and the use of facilities at the Jennifer Moreno Department of Veterans Affairs Medical Center in San Diego, California.
The contents do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.
Authors’ Contribution
L.P.: Conceptualization, methodology, software, formal analysis, data curation, writing—original draft, and writing—review and editing; T.O.: methodology, formal analysis, and writing—review and editing; W.A.: conceptualization, methodology, and writing—review and editing; B.P.: conceptualization, methodology, writing—review and editing, supervision.
Author Disclosure Statement
The authors have no conflicts of interest and no funding source to declare.
Funding Information
None.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
