Abstract
Background:
Skin and soft tissue infection (SSTI) after hernia surgery is infrequent yet catastrophic and is associated with mesh infection, interventions, and hernia recurrence. Although hernia repair is one of the most common general surgery procedures, uncertainty persists regarding incidence of long-term infections. Our goal is to develop a machine learning regression model that detects the occurrence of long-term hernia-associated SSTI.
Patients and Methods:
The data set consisted of veterans receiving hernia repair with implanted synthetic mesh during 2008–2015. The outcome of interest was occurrence of SSTI related to the index hernia surgery over a five-year follow-up. A neural network regression was fit on a medical record reviewed sample, then applied to the study population.
Results:
The study population was 96,435 surgeries, of which 76,886 (79.7%) were inguinal, 11,177 (11.6%) were umbilical, and 8,372 (8.7%) were ventral. In the training set, 40 patients had SSTI probability ≥90%, of whom 38 (95%) had a true SSTI. In 249 patients with SSTI probability <10%, only five (2%) patients had a true SSTI. In the testing set, nine patients were assigned a probability >90% and all were true-positives. In 100 patients with probability <10%, only two (2%) patients had a true infection. C-statistics were 0.929 in the training set and 0.901 in the testing set.
Conclusions:
The model showed excellent discrimination between those with and without infection and had good calibration. The model could be used to reduce the cost of detecting long-term infections.
Infection of the skin and soft tissues presenting in the form of subclinical fluid collections, cellulitis, sudden drainage with wound dehiscence, or formation of a sinus track at the site of a hernia repair could indicate a festering infection of an implanted mesh or lead to the infection of a mesh. Infection in a mesh-reinforced field is difficult to treat with intravenous antibiotic therapy because of the characteristics of certain meshes and the formation of biofilms with some of the micro-organisms [1]. These complications often lead to a cycle of re-operation, mesh explantation, and repair of a recurrent hernia [2]. Although hernia repair is one of the most common general surgery procedures, uncertainty persists regarding incidence of long-term infections and factors associated with this complication.
Previous studies have estimated infection incidence rates that vary substantially. A study of 1,346 veterans undergoing incisional hernia repair at 16 hospitals found a 30-day surgical site infection (SSI) rate of 5.6% and a mesh explantation rate of 5.0% [3]. A single-site study of veterans undergoing umbilical hernia surgery using either suture or implanted mesh found a long-term infection rate of 6.6% at a mean interval of 3.1 years after initial surgery [4]. In a Dutch study of patients undergoing incisional hernia repair, the rate of chronic infection of the surgical site was 2.1% at 5 years [5]. It has become increasingly apparent that the use of a short post-operative measurement interval fails to provide an adequate picture of complications after hernia repair.
The purpose of this study was to develop a machine learning regression model that detects the occurrence of long-term SSTI using data that are collected routinely. Machine learning has been used increasingly in recent years to detect a wide range of health-related adverse outcomes [6–9]. Such a model could be used by researchers to identify a cohort of high-risk patients in epidemiologic studies or clinical trials, as well as a case-ascertainment tool to reduce the expense of medical record review and provide long-term surveillance for infection in these cases. Although it would be feasible to build a detection algorithm based on search terms indicative of SSTI in the patients' surgical and nursing notes, we believe such an approach would be limited in its ability to discriminate between SSTI and non-SSTI outcomes because of hospital and provider-level variation of their descriptions, and produce a high false-positive rate resulting from ambiguity of suspected infections or rule-out diagnoses.
Patients and Methods
Study design and data sources
The study consisted of patients undergoing hernia repair over an eight-year period. Data on index surgeries, patient demographic, and clinical characteristics and 30-day post-operative outcomes were obtained from the VA Surgical Quality Improvement Program (VASQIP) [10]. Data on long-term post-operative outcomes and other machine learning features were obtained from the VA Corporate Data Warehouse (CDW) [11]. The VA Boston Institutional Review Board approved this study, and waiver of informed consent was granted, because risk to participants was minimal.
Inclusion criteria
We initially obtained from VASQIP all surgeries during calendar years 2008–2015 in which the principal Common Procedure Terminology (CPT) code represented abdominal or groin hernia repair (Supplementary Appendix e1). We then identified in the CDW Surgical Package any surgical implant placed on these patients at the time of index hernia surgery. The implant descriptions were categorized manually by a surgeon to retain only those implants describing synthetic mesh. We then excluded any surgeries for which there was no record of synthetic mesh implantation.
Medical record review
We began case selection by identifying a group of patients likely to have had a long-term SSTI related to the index surgery. The criterion was occurrence of a CPT code (Supplementary Appendix e1) possibly indicating mesh explantation within five years of index surgery. After complete review and confirmation of this high-risk group, we randomly selected approximately an equal number of patients who did not have an explantation code. Given the low baseline incidence of long-term SSTI, we expected this low-risk group to have few or no SSTI outcomes. To identify additional patients with SSTI, we reviewed another group who had at least three of the following factors suggestive of infection: positive culture, relevant infection diagnosis consistent with an SSTI, relevant radiologic imaging, and antibiotic administration [12].
A trained surgeon conducted medical record review on all selected cases. The reviewer used an electronic data entry form to record information about the infection outcome. The criteria for the SSTI determination were a positive culture from the site of index surgery, a positive diagnostic imaging study, or SSTI diagnosis in a progress note or surgical procedure note. We did not count abdominal infections that were likely unrelated to the index surgery (e.g., SSI after cholecystectomy), and the infection may or may not involve the implanted mesh itself.
Skin and soft tissue infection detection algorithm
The purpose of the machine learning regression model was to assign each participant a probability of long-term SSTI based on patient and surgery characteristics, as well as selected clinical occurrences within five years after surgery. For emphasis, this is a detection algorithm that assigns probabilities retrospectively based on events that have already occurred; it does not predict future infection based on current data. We chose not to use any VASQIP-assessed post-operative outcomes (e.g., 30-day SSI) as features, because this would reduce the generalizability to healthcare systems without access to surgical registry data.
We examined each 90-day post-operative interval separately (post-operative days 1–90, 91–180, etc.) and counted the occurrence of four components—positive culture, infection diagnosis, imaging study, and systemic antibiotic use—used in a previously published SSI algorithm [12]. For each patient, we retained the 90-day interval in which the maximum count (between 0 and 4) occurred, and in cases of ties retained the earliest interval. We then collected additional information from CDW on items related to possible infection during that 90-day interval, including fever (>100.4°F), outpatient visits with a general surgeon or infectious disease specialist, administration of antibiotic agents either through outpatient pharmacy or in the inpatient setting, positive culture from an abdominal wound, white blood cell count (WBC) <4 or >12, hospital admission, emergency department visits, ICD-9/-10 diagnosis of infection, and performance of an abdominal imaging study. All administrative codes are listed in Supplementary Appendix e1.
A neural network regression model was run using the nnet package in R 3.6.3. Although any number of alternative algorithms could be implemented—from classic logistic regression to newly developed ones such as XGBoost—the neural network has advantages such as making no assumptions about statistical distributions of input features, modeling complex interactions and non-linearities, and generalizing well to new data while being computationally feasible and easy to implement in large datasets. We chose regression instead of classification to allow the evaluation of various probability thresholds that could be tailored to the needs of research studies or clinical projects. The model's label was the yes/no variable for occurrence of any SSTI between the day after surgery through five years. Any SSI within 30 days of surgery is included in our definition. The features of the model included the baseline factors of gender, age, type of index surgery, and pre-operative diabetes, as well as the post-operative clinical features listed in the prior paragraph. We first divided the chart reviewed cases into a training set (75%) and a testing set (25%) using stratified random sampling. The model was trained on the 75% sample using five-fold cross-validation, and all features were centered and scaled. After training, predictions were made on the test set (the 25% sample) and the c-statistic was calculated for both sets. We also examined the true-positive SSTI rate stratified by prediction ranges (0%–10%, >10%–20%, etc.). We did not assess model performance for the three subsets (inguinal, umbilical, ventral). Although ventral/incisional hernias are known to be at highest risk for SSTI, we wanted to assess outcomes for all VA enrollees with mesh-reinforced abdominal hernia repair.
After SSTI probabilities were assigned to every patient in the study, we performed two additional validation exercises. First, a random sample of 20 cases each were taken from the group of patients with probabilities of 70%–79%, 80%–89%, and ≥90%, respectively. The reviewer determined whether these high-probability cases represented true SSTI using the same criteria as above and noted the reasons why any false positives may have occurred. Second, we compared the distribution of assigned SSTI probabilities in the group of patients with a VASQIP-assessed 30-day SSI versus those without, with the expectation that SSI cases will have higher mean probabilities.
Results
The final study population was 96,435 surgeries with synthetic mesh implantation, of which 76,886 (79.7%) were inguinal, 11,177 (11.6%) were umbilical, and 8,372 (8.7%) were ventral (Table 1). The mean age (standard deviation [SD]) of patients was 62.0 (12.1) and 98.8% were male. Mean (SD) body mass index (BMI) was 27.2 (5.08) overall and ranged from 26.08 (4.21) among inguinal repairs to 30.96 (5.96) among ventral repairs. The majority of patients were in American Society of Anesthesiologists (ASA) class 3–5, ranging from 55.6% of inguinal repairs to 73.8% of ventral repairs. Diabetes with or without complications was prevalent (12.9% of all participants). Emergency surgery was rare (1.3% of surgeries overall) and was most common (2.5%) in ventral repair. A non-clean wound was most common in the ventral cohort (7.7%) compared with inguinal (3.2%) and umbilical (2.5%). The overall mean (SD) work relative value units (RVU) was 8.3 (1.8) and was highest in the ventral cohort at 12.8 (1.8).
Baseline Characteristics of Veterans Undergoing Hernia Repair
All statistical comparisons among the three surgery cohorts were significant at p < 0.001.
SD = standard deviation; BMI = body mass index; ASA = American Society of Anesthesiologists; CDC = U.S. Centers for Disease Control and Prevention; RVU = work relative value units.
An administrative procedure code possibly indicating mesh explantation was present in 611 of 96,435 (0.63%) patients, and medical record review was conducted on all 611 (Fig. 1). An SSTI related to the index surgery occurred in 318 of 611 (52.0%). We then reviewed a random sample of 300 patients without an administrative explantation code, none of whom were found to have a relevant SSTI. Identification of the group of patients with three or four SSI score components yielded an additional 119 cases and 61 positives, for a total of 1,030 complete medical record reviews of which 379 (36.5%) had presence of SSTI.

Medical record review case selection and model development.
In Table 2 we present the incidence of selected post-operative occurrences that were used as features in the neural network algorithm. The most common occurrence was administration of systemic antibiotic agents in the outpatient or inpatient settings, seen in 41.9% of patients overall. An infection diagnosis was coded in 17.4% of patients, and 19.5% had at least one hospital admission. Overall, patients had a mean (SD) of 0.68 (1.21) emergency department visits and 1.05 (2.35) outpatient visits with a general surgeon, whereas outpatient visits with an infectious disease specialist were relatively uncommon (0.09 [0.66]). In all 10 post-operative model features that we examined, incidence rates were highest in the ventral surgery cohort. All statistical comparisons among the three index surgery types were significant at p < 0.001.
Selected Clinical Post-Operative Events Included as Features in Detection Model
Incidence rates represent occurrences during the 90-day post-operative interval during which the patient had the highest score, consisting of four yes/no components [12]: antibiotic administration, relevant infection diagnoses most consistent with SSTI, relevant radiologic studies, and positive cultures from relevant topography. All statistical comparisons among the three3 surgical cohorts were significant at p < 0.001. Abnormal WBC <4 or >12.
Administrative codes are listed in Supplementary Appendix e1.
WBC = white blood cell count; ED = emergency department; SD = standard deviation; ID = infectious disease.
The performance of the neural network algorithm is illustrated in Figure 2. In the training set, 40 patients were assigned an SSTI predicted probability ≥90%, of whom 38 (95%) had a true SSTI as determined by medical record review. Among the 249 patients with SSTI probability <10%, only five (2%) patients had a true infection. In the testing set (i.e., those patients manually reviewed but not used for model training), nine patients were assigned a probability >90% and all of those were true-positives. In 100 patients with probability <10%, only two (2%) patients had a true infection. The c-statistic was 0.929 in the training set and 0.901 in the testing set.

Performance and calibration of neural network model in training and testing sets. Note: skin and soft tissue infection (SSTI) within five years of surgery.
In the additional validation exercises, we found that a true SSTI occurred in all 20 patients randomly sampled from the group with assigned probability ≥90%. There were 19 of 20 (95%) sampled from the 80%–89% group that had true infection; one patient had a foot infection. Finally, in the 70%–79% group, we found that 15 of 20 (75%) were true SSTI related to the index hernia surgery. Of those five negative patients, three had an unrelated abdominal infection, one had a foot infection, and one had a lower extremity vascular graft infection. The mean SSTI probability in those with any VASQIP-assessed 30-day SSI was 57.1%, compared with 12.4% in those without 30-day SSI.
The distribution of predictions varied among the inguinal, umbilical, and ventral cohorts (Fig. 3). Among inguinal surgeries, nearly all patients were assigned an SSTI probability of <25%, and 83.0% were in the range of 0%–10%. Umbilical repair patients had higher probabilities overall, with 75% of patients having a probability <25%, and 64.2% were in the probability range of 10%–19%. Ventral repairs had the highest infection probabilities, with nearly half of patients >25%, and 42.7% of patients were in the 20%–29% prediction range. The vast majority of those with predicted probability of SSTI >80% were in the ventral repair cohort.

Distribution of predictions stratified by surgical cohort. Note: Numbers above bars denote number of surgeries (% within surgical type).
Discussion
We built a machine learning model to detect the occurrence of SSTI up to five years after index hernia surgery. The model showed excellent ability to discriminate between those with and without infection, as measured by the c-statistic. It also had good calibration in the training and testing sets, because the true incidence of SSTI within each prediction range was reasonably accurate.
There are several possible applications of the model we developed. First, it could be used as a case ascertainment tool to detect patients with long-term SSTI after hernia surgery and could be validated in other types of surgery involving prosthesis in proximity of the skin and soft tissues. Conventional measurement of SSI has focused on the 30- or 90-day post-operative window, which is already burdensome for the medical record reviewer but becomes even more burdensome with a five-year follow-up. In our experience conducting nearly 1,100 medical record reviews for this study, we estimate that it takes between 10 minutes and 1.5 hours per case to review. It is impractical to identify a meaningfully large number of positive cases by random sampling, given that hernia repair is one of the most common procedures performed by general surgeons and that the adverse outcomes after inguinal and umbilical repair are relatively uncommon. Instead, sampling from high-probability patients above a clinically relevant threshold, perhaps 70% or 80%, would enable the conduct of observational studies in which a large number of rare outcomes must be identified.
Another application would be to assist healthcare systems in semi-automated surveillance of adverse outcomes, again by reducing the burden of manual review. Long-term surveillance in hernia surgery and in other surgeries involving prosthesis has often been limited to single institutions [13] or to databases in which reporting is voluntary by surgeons [14]. As such, our ability to evaluate risk factors, mesh characteristics, characteristics of the prosthesis, and the epidemiology of long-term mesh infection has been hampered. With such a model, it is more feasible to study the effect of 30-day SSI in hernia surgery on long-term infections, risk factors for long-term skin and soft tissue infection at the site of hernia repair, progress to mesh infection, the effect of long-term antibiotic therapy, local debridement, and wound care on mesh preservation, and reasons for mesh explantation. The model should be generalizable to other surgical specialties and in any healthcare system in which the features (diagnosis and procedure billing codes, laboratory microbiology, vital signs, pharmacy, etc.) are collected and stored in a database. We believe the model has undergone sufficient validation for it to be deemed valuable in tracking hernia mesh-related SSTI outcomes. A similar validation can be undertaken in other surgeries involving prosthesis. As with any detection model, it should be periodically assessed after implementation, and a sample manually reviewed, to assess the model's performance continually.
There are several notable strengths to this study. First, we used highly reliable, validated data from a national surgical registry [15]. This allowed us to identify index hernia surgeries as well as patient demographics, pre-operative characteristics, and surgical factors accurately. Although we had access to selected post-operative outcomes assessed by VASQIP, we omitted these as factors in the machine learning model for it to be generalizable to other healthcare systems. Second, we obtained national data from 120 VA medical centers, which reduces the risk of bias resulting from hospital-specific practice and coding patterns. We also were able to follow patients over time because of the VA's integrated medical record and central store of administrative healthcare data. The study also has limitations. Our patient population was mostly male, which is typical of VA studies.
Retrospective assessment of surgical infection is often difficult and given the size of the study we were unable to perform dual review and estimate inter-rater agreement. We did rely on administrative data for many of the machine learning model features, and these data can sometimes be inaccurate or ambiguous (e.g., a diagnosis code for abdominal infection may reflect an abdominal infection unrelated to the hernia surgery, or an unspecific culture topography such as “wound”), possibly leading to false-positive hernia-related SSTIs. In any population-based study, there is risk that data will be missing from outside healthcare services, and many VA enrollees are also Medicare beneficiaries. Although we validated the model predictions on the 25% test sample that was not used to train the model, it is advisable to perform an external validation when implementing the model on new data.
There remains uncertainty about the true incidence and risk factors of infectious complications after hernia surgery. The model we developed would reduce the administrative burden to assess rates of infection over time and facilitate surveillance programs to study long-term infection after hernia surgery with implanted mesh. It would be feasible to implement in a healthcare system that routinely collects data on patient characteristics, procedure and diagnostic billing codes, and laboratory microbiology results. Investigators will be able to better understand the long-term epidemiology of mesh infections and develop tools to prevent this catastrophic complication that leads to patient pain, burden, and excessive healthcare utilization.
Footnotes
Authors' Contributions
Dr. Itani and Mr. O'Brien were responsible for study design. Mr. O'Brien performed data acquisition and statistics. Dr. Dipp Ramos performed medical record review. All authors contributed substantially to intellectual content and manuscript editing.
Funding Information
This work was funded by a Cooperative Research and Development Agreement (CRADA) between Pfizer Inc. and the authors' institution. The funder had no role in the design of the study, nor in preparation, review, or approval of the manuscript and decision to submit the manuscript for publication.
Author Disclosure Statement
Dr. Gupta is consultant to Iterum, Paratek, Ocean Spray, Tetraphase, and an author for UpToDate. Mr. O'Brien had full access to all the data in the study and takes full responsibility for the integrity of the data and the accuracy of the data analysis.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
