Abstract
BACKGROUND:
Early identification of sepsis has been shown to significantly improve patient prognosis.
OBJECTIVE:
Therefore, the aim of this meta-analysis is to systematically evaluate the diagnostic efficacy of machine-learning algorithms for sepsis prediction.
METHODS:
Systematic searches were conducted in PubMed, Embase and Cochrane databases, covering literature up to December 2023. The keywords included machine learning, sepsis and prediction. After screening, data were extracted and analysed from studies meeting the inclusion criteria. Key evaluation metrics included sensitivity, specificity and the area under the curve (AUC) for diagnostic accuracy.
RESULTS:
The meta-analysis included a total of 21 studies with a data sample size of 4,158,941. Overall, the pooled sensitivity was 0.82 (95% confidence interval [CI]
CONCLUSION:
Machine-learning algorithms have demonstrated excellent diagnostic accuracy in predicting the occurrence of sepsis, showing potential for clinical application.
Introduction
Sepsis, a severe infectious disease characterised by uncontrolled inflammation following infection, has shown an increasing incidence globally, posing a significant challenge to healthcare systems [1]. It is a major cause of mortality, particularly contributing to deaths in intensive care units (ICUs) [2]. Typically caused by bacteria, fungi or viruses, sepsis spreads through the bloodstream, leading to Systemic Inflammatory Response Syndrome [3]. Epidemiological studies indicate its prevalence across all age groups, with the elderly, immunocompromised individuals and those with hospital-acquired infections being high-risk populations. The inflammatory response triggered by sepsis not only harms the site of infection but may also result in Multiple Organ Dysfunction Syndrome, increasing the risk of patient mortality [4].
Early prediction of sepsis is crucial for timely and effective intervention, given the condition’s acute and rapidly evolving nature, allowing for a reduction in severity and improved patient survival rates [5]. Despite treatment guidelines providing recommendations for the timing of intervention, early identification of sepsis remains a challenging task due to its clinical complexity. Currently, there is a lack of widely accepted gold standards for early sepsis diagnosis, compounded by the high heterogeneity of underlying diseases in patients, making accurate early identification relatively challenging [6].
With the rapid advancement of computational capabilities and the widespread application of big data, machine-learning (ML) and artificial intelligence (AI) technologies have made significant breakthroughs in recent years. Compared with traditional clinical judgment, ML algorithms exhibit unique advantages, particularly in handling the dynamic and complex task of processing clinical information obtained from patients with sepsis [7]. Traditional physicians may face cognitive fatigue when confronted with large patient datasets, whereas ML algorithms efficiently process large-scale, high-dimensional data, uncovering potential patterns and correlations. One key feature of ML algorithms is their ability to automatically learn and optimise models, continuously updating predictive performance based on changing data. This is crucial for addressing the complex and dynamic nature of clinical information from patients with sepsis. Additionally, ML algorithms can uncover non-linear relationships in data, identifying latent features and patterns to enhance the accuracy of early sepsis prediction [8]. In the medical field, ML has already been successfully applied to areas such as tumour prediction and cardiovascular disease risk assessment. For sepsis prediction models, ML algorithms can integrate multiple dimensions of data, including clinical indicators, laboratory test results and medical imaging, to establish relatively comprehensive and accurate models [9].
In the past two years, numerous studies on ML prediction of sepsis have emerged. Researchers have successfully built models with high predictive performance by integrating patient medical imaging, electronic health records and laboratory test data. Although some previous systematic reviews have provided comprehensive summaries of relevant research, the diagnostic data lack timely updates and fail to reflect the recent surge in studies and models, resulting in a lack of a comprehensive understanding of current research progress. Therefore, the purpose of this study is to conduct a more detailed and quantitative meta-analysis, summarising the latest research and evaluating the diagnostic efficacy of ML algorithms in predicting sepsis.
Methods
This study followed the reporting items recommended by the Preferred Reporting Items for a Diagnostic Test Accuracy guideline [10] for reporting systematic reviews and meta-analyses of diagnostic test accuracy studies.
Search strategy and literature selection
PubMed, Embase and the Cochrane Central Register of Controlled Trials (CENTRAL) were selected for comprehensive searches in this study, covering literature from the inception of databases up to 11 December 2023. The search had no language restrictions and utilised a combination of controlled vocabulary terms (MeSH or Emtree) and free-text terms. The primary keywords included sepsis, ML and prediction. Additionally, manual searches were conducted to include relevant references from reviews or meta-analyses.
Two independent researchers conducted the search based on a predefined strategy. Electronic literature was managed using Endnote software, which automatically removed duplicate entries, and manual verification was performed. The researchers screened studies by reviewing titles and abstracts to identify potential studies that met the inclusion criteria. In the data collection process, essential data were extracted using a pre-established data collection form. In case of discrepancies between the two researchers’ results, resolution was achieved through consensus.
Inclusion and exclusion criteria
The inclusion and exclusion criteria in the present study were designed following the Population, Intervention, Comparison, Outcome, Study Design principle. Specifically, studies meeting the following conditions were included: (1) studies involving diseased or potentially diseased humans as the study participants; (2) studies using ML algorithms to model data for predicting the risk of sepsis or related conditions, such as severe sepsis or septic shock; (3) studies not mandating a specific diagnostic reference method; (4) studies reporting numerical values of diagnostic parameters, including false negatives, false positives, true negatives and true positives, or providing data to calculate a 2*2 table; and (5) studies designed as cohort studies or case-control studies.
Studies were excluded if they fell into the following categories: (1) redundant studies with identical data; (2) irrelevant reviews, letters, conference abstracts or non-peer-reviewed studies; and (3) studies with incomplete data or studies that did not report predefined outcomes.
Data extraction and risk of bias assessment
From each included study, two independent researchers extracted the following data: the first author and publication year of the study, study design and centre, geographical region, data year, data source, sample size, incidence, validation method, specific algorithm, number of included variables, duration between the prediction time point and the onset of sepsis, data types, predicted outcomes, outcome definitions, diagnostic parameters and area under the curve (AUC) data. If a study developed multiple algorithm models, priority was given to the model with the best performance (highest AUC value) and diagnostic parameters from the validation cohort. Two researchers independently assessed the methodological quality using the Quality Assessment of Diagnostic Accuracy Studies tool, which includes patient selection, index test, reference standard, flow and timing. Disagreements between the two researchers were resolved through discussion or consultation with a third researcher.
Statistical analysis
Data analysis for this study utilised the RevMan 5.4 software and Stata SE 15.0 software. Sensitivity and specificity calculations were based on 2*2 diagnostic tables, presented through forest plots and summary receiver operating characteristic curves. Stata SE 15.0 was used for merging diagnostic values, such as sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio and overall AUC. The merged values and their 95% confidence intervals (CIs) were reported. The heterogeneity of the included study results was assessed based on Cochran’s Q test and the
PRISMA flow diagram of study selection.
Literature search
The process of research retrieval and selection is illustrated in Fig. 1. Initially, after searching three electronic databases, a total of 1,466 articles were obtained, with 569 from PubMed, 861 from Embase and 36 from CENTRAL. Following the removal of duplicate records using both software and manual methods, 983 independent electronic records were obtained. Subsequently, after reviewing the titles and abstracts, 938 articles unrelated to the research topic were excluded, leaving 45 articles for full-text reading. During the full-text reading process, a total of 24 articles were collectively excluded due to reasons such as having incomplete data, only prognostic outcomes or non-sepsis-related content and being abstracts/letters. Finally, 21 articles were included for the subsequent meta-analysis [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31].
Basic characteristics of included studies
The characteristics of the studies meeting the inclusion criteria are detailed in Table 1. Among the 21 studies, the total sample size was 4,158,941, with 6 studies focusing on the emergency department, 2 on inpatient settings, 11 on the ICU and 2 on mixed scenarios. With the exception of 1 study that did not report information, 20 studies employed a retrospective analysis approach, with the majority (61.9%) being single-centre studies and 7 (33.3%) being multi-centre studies. Of the 21 studies, 14 used data from the United States, 2 from China, 2 from South Korea, 1 from Sweden and 1 from Thailand. Only 1 did not report the data source. The study data spanned from 2001 to 2021, with 6 studies utilising data from the MIMIC database. The sample size ranged from 142 to 2,759,529, and the event occurrence rate for the study outcomes ranged from 0.41% to 79.5%. Various validation methods were employed, with 52.4% using random splitting, 19.0% using temporal validation and 19.0% using cross-validation, among others.
Table 2 provides an overview of all the constructed ML models. In the 21 studies meeting the criteria, different algorithms were used for modelling, with common ML algorithms including support vector machines, random forests and naïve Bayes. Nearly all studies (95.2%) used vital signs for modelling, 8 studies used demographic data, 13 studies used laboratory tests, 3 studies used nursing documentation, and 2 studies used baseline characteristics. The number of variables included ranged from 2 to 41. A total of 13 studies solely predicted sepsis, 3 studies predicted septic shock, and 4 studies predicted multiple events. There were differences in the definition of sepsis among the included studies, with 4 studies using International Classification of Diseases definitions, 9 studies using Sepsis-3 definitions and 3 studies using a mixed definition. The reported types of diagnostic accuracy parameters varied, with 3 studies not reporting the AUC; the reported AUC ranged from 0.65 to 0.994. A total of 8 studies reported the time before the onset of sepsis, ranging from 0 to 48 hours.
Risk of bias assessment
The methodological quality of the 21 included studies is presented in Fig. 2. Due to unclear descriptions, 1 study was at unclear risk of bias in the “Patient Selection” domain, and 7 studies were determined to be at high risk due to factors such as continuous inclusion. In comparison, most studies (90.5%) had a low risk of bias in the “Index Test” domain. Given the lack of a clear gold standard for sepsis and different studies using slightly different criteria, the risk of bias score for the “Reference Standard” domain was deemed unclear for all studies. Moreover, only 3 studies had some risk of bias in the “Flow and Timing” domain. It is noteworthy that, considering the relevance to the meta-analysis question, 1 study may have a relatively high sensitivity in the “Index Test” domain.
Meta-analysis
The findings from the meta-analysis regarding the early prediction of sepsis risk using ML are depicted in Fig. 3. The collective sensitivity across the 21 studies was 0.82 (95% CI
Overview of the included studies
Overview of the included studies
ED: emergency department; USA: the United States of America; ICU: intensive care unit; UCSF: University of California, San Francisco; MIMIC: Multiparameter Intelligent Monitoring in Intensive Care.
Characteristics of machine learning-based prediction models for sepsis
ML: machine learning; AUC: area under the curve; ED: emergency department; NB: naïve Bayes; ICD-9: International Classification of Disease-9; PPV: positive predictive value; NPV: negative predictive value; FPR: False positive rate; GB: gradient boosting; RF: random forest; SVM: support vector machine; GBM: gradient boosting machine; MARS: multivariate adaptive regression splines; LASSO: least absolute shrinkage and selection operator; CART: classification and regression trees; EM: ensemble methods; SIRS: systemic inflammatory response syndrome; DT: decision tree; KNN: K- Nearest Neighbor; PHM: proportional hazards model; NNM: neural network model; LSTM: long short term memory; GBDT: gradient boosting decision tree; GLM: generalized linear model; DOR: diagnostic odds ratio; LR+: positive likelihood ratio; LR-: negative likelihood ratio.
Methodological evaluation of included literatures.
0.11–0.35). Moreover, the diagnostic odds ratio was 44 (95% CI
Forest plot of machine learning-based models for predicting sepsis.
SROC curve of machine learning-based models for predicting sepsis.
Funnel plot of machine learning-based models for predicting sepsis.
Publication bias analysis of the included studies, as shown in Fig. 5, demonstrated no apparent publication bias based on the asymmetry test of the funnel plot (
Clinical utility
Machine-learning-based sepsis prediction models were used; when the pre-test probability was positive, the post-test probability increased from 50% to 90%, with a positive likelihood ratio of 9. When the pre-test probability was negative, the post-test probability decreased from 50% to 17%, with a negative likelihood ratio of 0.20 (Fig. 6).
Fagan plot of machine learning-based models for predicting sepsis.
This study conducted a comprehensive analysis of the latest research using a quantitative approach, specifically a meta-analysis, to evaluate the accuracy of ML algorithms in predicting the risk of sepsis. The main findings are as follows: (1) In the 21 studies included, ML algorithms demonstrated excellent diagnostic performance for predicting sepsis, not only in the ICU but also in the emergency department. (2) The methodological assessment revealed that the data quality of current studies is not high, with many studies being excluded due to incomplete data reporting. This suggests the need for more high-quality studies following guidelines in the future. (3) Before clinical application, research on the algorithm’s impact on patient prognosis and clinical decision-making is necessary. This meta-analysis affirms the diagnostic potential of ML algorithms, providing direction for future research and potentially aiding clinical decision-making. Machine-learning algorithms can seamlessly integrate with clinical decision support systems, providing clinicians with actionable insights at the point of care. By embedding ML-based prediction models into electronic health record systems, clinicians can receive real-time alerts and recommendations based on individual patient data, facilitating early recognition of sepsis and guiding appropriate treatment strategies. The study’s meta-analysis underscores the potential of ML-based decision support systems to augment clinical decision-making and enhance patient safety.
The characteristics of sepsis, including its prolonged duration, rapid changes and extensive course data, pose significant challenges to clinical decision-making. Machine-learning algorithms show notable advantages in addressing these challenges. By extracting patterns of disease progression from rich data at different time points, ML provides clinicians with more comprehensive and in-depth information, potentially compensating for the limitations of traditional medical judgment when handling extensive course data [32]. The analysis revealed that most studies primarily utilised vital signs and laboratory results in the modelling process, indicating the crucial role of these physiological indicators in predicting sepsis [33]. Future research could further explore the weight and impact of these specific data in ML models to better understand their relationship with the progression of sepsis. The potential widespread application of ML, from emergency rooms to wearable devices in home settings, positions it as a powerful tool for assisting clinical decision-making [33, 34, 35]. Real-time monitoring of patients’ physiological data enables ML to provide timely risk assessments and predictions, facilitating more prompt therapeutic interventions and reducing treatment delays. The advantage of ML lies in its ability to detect trends in disease progression early, which is crucial for early sepsis prediction. Early identification of sepsis allows for targeted therapeutic interventions, improving patient outcomes before the onset of systemic inflammatory storms and reducing adverse events [34, 36]. This study conducted a comprehensive analysis of the latest research using a quantitative approach, specifically a meta-analysis, to evaluate the accuracy of ML algorithms in predicting the risk of sepsis. In addition, recent studies demonstrated that ML techniques could inform the development and validation of ML algorithms for predicting sepsis risk in neonatal populations, potentially enhancing early detection and management strategies for sepsis in newborns [37] and leveraging diverse data sources to enhance diagnostic accuracy and enable early intervention for patients with sepsis combined with underlying cardiac conditions [38]; ML-based software suites for respiratory disease diagnosis may offer valuable lessons for developing similar tools for sepsis prediction, facilitating rapid risk assessment and treatment initiation in patients presenting with respiratory symptoms suggestive of sepsis [39]. Mustafic et al. focused on the diagnosis of severe aortic stenosis using an implemented expert system, which may incorporate ML algorithms to analyse clinical data and imaging findings. Incorporating insights from these articles highlights the versatility of ML techniques in various domains of healthcare [40]. By drawing parallels between different applications of ML in medical diagnosis and prediction, the study can underscore the overarching importance of leveraging advanced analytics to improve patient outcomes and enhance clinical decision-making across diverse clinical scenarios.
However, challenges and limitations exist in the future application of ML algorithms for clinical sepsis prediction or diagnosis. Currently, studies applying ML algorithms to sepsis prediction exhibit methodological issues. Variability in standards and methods across studies, coupled with a lack of a unified gold standard for sepsis diagnosis, may lead to unreliable research results, affecting the genuine performance assessment of algorithms. Therefore, some heterogeneity is observed in the meta-analysis, indicating differences between studies. This heterogeneity may stem from various factors, including the heterogeneity of study participants, different ML algorithms used, differences in sample size and variations in study design. Different clinical environments and disease backgrounds may also contribute to existing heterogeneity. To better understand this heterogeneity, future research can explore the relationships between these factors and attempt subgroup analyses to reveal potential reasons. In the process of applying ML algorithms to the medical field, regulation and standardisation are prominent issues. The lack of uniform standards and regulatory mechanisms may lead to the application of algorithms in clinical practice without sufficient validation [41]. This raises concerns about patient safety and the risks associated with clinical application. Additionally, the interpretability and explainability of algorithms pose challenges, especially in the medical field where clinicians need to understand the decision-making process of algorithms to accept and trust their results.
This study represents a recent and comprehensive meta-analysis of ML algorithms for sepsis prediction. Despite existing systematic reviews [42, 43], the rapid growth of research in the field, especially in the last two years, necessitates timely updates and synthesis of the latest research findings. The study’s meta-analysis not only provides a comprehensive overview of the current landscape of ML-based sepsis prediction but also identifies key areas for future research and clinical practice. By highlighting methodological gaps, such as variability in study design and validation methods, the meta-analysis informs the design of future studies aimed at improving the accuracy and generalisability of ML algorithms. Additionally, by emphasising the clinical implications of ML-based prediction models, the study encourages the integration of these tools into routine clinical practice, paving the way for more proactive and personalised sepsis management strategies. However, certain limitations should be acknowledged. The exclusion of grey literature and non-peer-reviewed literature, along with the relatively strict inclusion criteria, may result in a limited number of covered publications, especially considering that ML studies might be presented at various academic conferences. Therefore, the study results may not comprehensively reflect all relevant research in the field, introducing some selection bias. Additionally, the lack of unified standards for evaluating ML algorithm performance poses challenges. Focusing solely on a single parameter (e.g. AUC) may not comprehensively assess the algorithms’ merits. The selection of the model with the highest AUC during the combined analysis may lead to oversimplification, potentially neglecting the contributions of other performance indicators. Consequently, determining the optimal algorithm for real-world application remains uncertain. The study exhibits relatively high heterogeneity in the meta-analysis, potentially influenced by the limited number of studies with independent external validation. Considering the potential impact of heterogeneity on result consistency, careful interpretation and control of this heterogeneity are required. Increasing the number of studies with independent external validation could improve the methodological quality of the meta-analysis and the robustness of the results.
Conclusion
In conclusion, this meta-analysis demonstrates that ML algorithms exhibit excellent diagnostic accuracy in predicting the occurrence of sepsis. They outperform traditional risk scores in numerous studies, showing potential value in future clinical applications. Machine learning provides new insights and methods for the diagnosis and treatment of sepsis. Nevertheless, further high-quality studies are needed to validate the results of this study, propelling the clinical application of ML algorithms in this field.
Footnotes
Conflict of interest
None to report.
