Abstract
Aim:
To develop and validate models that use electronic health record (EHR) data to predict diabetic ketoacidosis (DKA)-related hospitalizations over 90 and 180 days among adults with type 1 diabetes (T1D).
Methods:
We used EHR data from adults with T1D treated at an academic health system in the United States, between January 1, 2017, and April 30, 2023. Models were built to predict the 90- and 180-day DKA risk using EHR data from the 2 years preceding the index date. We constructed seven predictors: (1) prior DKA event, (2) number of prior DKA events, (3) average time between DKA events in years, (4) time since the most recent DKA event in years, (5) most recent HbA1c, (6) the absence of a HbA1c result in the past 2 years, and (7) insurance type. The dataset was split into discovery and prospective validation cohorts. Logistic regression models were built using the discovery cohort and validated using the prospective validation cohort.
Results:
Our dataset included 7798 adults with T1D, of which 667 (8.6%) experienced ≥1 post-T1D diagnosis DKA event, totaling 1102 DKA events. The 90-day model achieved a mean area under the receiver operating characteristic curve (AUC) of 0.87 (standard deviation [SD] ± 0.02). The 180-day model achieved a mean AUC of 0.84 (SD ± 0.02). Among the 5% highest risk individuals, the 90-day model had a recall of 0.45, precision of 0.11, and specificity of 0.95, while the 180-day model had a recall of 0.42, precision of 0.17, and a specificity of 0.96.
Conclusion:
We developed EHR-based logistic regression models that effectively predict DKA-related hospitalizations in adults with T1D. Future work will enhance model performance by incorporating additional features and applying advanced machine learning methods.
Keywords
Introduction
Despite advances in diabetes technology and federal initiatives to improve access to care, diabetic ketoacidosis (DKA) remains a challenge for persons living with type 1 diabetes (T1D).1–4 While DKA occurs more frequently in young adults (∼10% annually), it affects people with T1D of all ages.1,5–9 DKA is a leading cause of death in persons living with T1D who are <50 years of age, and recurrent DKA in particular is associated with increased mortality.10–16 Furthermore, DKA has a substantial economic burden, with DKA-related hospital charges exceeding $6.5 billion in the United States annually.10,17
A critical step in preventing DKA is identifying individuals at high risk to provide timely interventions. DKA prediction models have been successfully developed and implemented in clinical practice for youth with T1D, with predictive factors including prior DKA, average duration between DKA events, time since the most recent DKA event, HbA1c, and noncommercial insurance.18–27 However, intrinsic differences in DKA risk factors between children and adults with T1D, such as sociodemographic characteristics, limit the generalizability of pediatric DKA prediction models to adult populations. 28 To our knowledge, only one study by Li et al. has developed DKA prediction models specifically for adults with T1D, and these models have not been implemented clinically. 29
Our objective was to develop and validate models that use electronic health record (EHR) data to predict the 90- and 180-day DKA risk among adults with T1D. We aimed to create models that provide time-bound predictions and can be implemented in clinical practice.
Methods
Overview
We developed and validated seven-feature logistic regression models to predict the risk of DKA-related hospitalizations in adults with T1D. Our long-term goal is to integrate these models into our health system’s EHR to routinely (e.g., monthly) identify high-risk individuals who may benefit from an intervention to reduce DKA risk. To build and validate the models, we designated the first day of each month from January 1, 2017, to January 1, 2023, as prediction points (i.e., index dates). Using EHR data from the 2 years preceding each index date, the models forecast an individual’s DKA risk over 90- and 180-day prediction horizons.
Dataset
The study was reviewed and approved by the University of Minnesota Institutional Review Board (IRB), and it met the criteria for exemption (IRB ID: STUDY00019502, WORKSHEET: Exemption HRP-312). We established a dataset of 7798 adults with T1D who received care in our academic health system in the Upper Midwest, United States, between January 1, 2017, and April 30, 2023. All individuals eligible for inclusion in the dataset were retained unless they had opted out of research participation. Our dataset development followed a two-step process. First, we adapted the SUPREME-DM algorithm to our health system’s EHR (Epic). 30 This allowed us to identify all individuals (ages 18 years and older) with any type of diabetes, who had at least one encounter (defined as an ambulatory provider visit, emergency department encounter, or hospitalization) between January 1, 2017, and April 30, 2023. Next, we adapted the Klompas algorithm to identify which of those individuals had T1D.31,32 For each person in the dataset, we extracted EHR data elements (e.g., sociodemographics, medication data, laboratory values, and ambulatory and inpatient care records) from their first encounter within the health system (the earliest available diagnosis codes were from October 21, 2010) through April 30, 2023.
Classifying DKA events
We identified DKA events by adapting criteria from the American Diabetes Association (ADA) 2024 Consensus Report. 1 We defined a DKA event as an encounter meeting all four of the following criteria: (1) an ICD code for DKA (Supplementary Data S1—Classifying DKA Events), (2) a venous or capillary beta-hydroxybutyrate concentration ≥3.0 mmol/L or urine ketone strip result ≥2+, (3) arterial or venous blood gas pH <7.3 and/or a serum bicarbonate concentration <18 mmol/L, and (4) an emergency department encounter and/or hospital admission. Consistent with the ADA Consensus Report, we did not include glucose as a criterion for DKA because everyone in our dataset has established T1D. This approach ensures that we did not exclude cases of euglycemic DKA. In a manual review of 10 patients’ EHRs, two endocrinologists independently confirmed that our definition had a 100% positive predictive value for identifying true DKA events.
Encounters that met only a subset of the DKA criteria were classified as ambiguous and excluded from the analysis. In the 90-day discovery cohort (60 index dates), 2133 events were excluded; in the 90-day prospective validation cohort (10 index dates), 350 events were excluded. In the 180-day discovery cohort (60 index dates), 2823 events were excluded; in the 180-day prospective validation cohort (five index dates), 217 events were excluded. All the remaining encounters were categorized as negative for DKA. To ensure our models predicted DKA risk only in individuals with established T1D, we excluded DKA events that were the initial presentation of T1D. Only individuals with an ICD code for diabetes at an encounter preceding the index date were eligible for prediction on the index date.
Feature construction and handling of missing data
We constructed features for our models based on the established literature.18,22,33 We incorporated the three features from the Schwartz et al. model, which used EHR data to predict DKA events in youth with T1D: most recent HbA1c, prior DKA event, and type of health insurance (public vs. private). 22 We also incorporated three features from the Vandervelden et al. model, which also used EHR data to predict DKA events in youth with T1D: average duration between DKA events, time since the last DKA event, and the number of prior DKA events. 18
Our models use seven features: (1) prior DKA event (binary), (2) number of prior DKA events (continuous), (3) average time between DKA events in years (continuous), (4) time since the most recent DKA event in years (continuous), (5) most recent HbA1c value (continuous), (6) the absence of a HbA1c result in the past 2 years (binary), and (7) insurance type (categorical: private or public). Using EHR data from the 2 years preceding each index date, the models forecast DKA risk over 90- and 180-day prediction windows.
For features 1–4, DKA events included both post-T1D diagnosis DKA and DKA occurring at the time of initial diabetes diagnosis. For individuals with two or more DKA events, the average time in years between DKA events was calculated. The time since the most recent DKA event was calculated as the duration in years from the most recent prior DKA event to the index date. For individuals with no prior DKA events, we adopted the imputation strategy used by Vandervelden et al., where both the average time between DKA events and the time since the most recent DKA event were imputed as T1D duration plus 2 years. 18 For individuals with only one prior DKA event, we followed the same approach, imputing the average time between DKA events as the time since the most recent DKA event plus 2 years.
T1D duration was defined as follows. If a person’s first three EHR encounters (defined as an ambulatory provider visit, emergency department encounter, or hospitalization) did not contain an ICD code for diabetes, but a subsequent encounter did, the diagnosis date for T1D was defined as the date of the first encounter that included an ICD code for diabetes. In such cases, T1D duration was calculated as the time span between the index date and the date of the first encounter with a diabetes ICD code.
If any of a person’s first three encounters in the EHR included an ICD code for diabetes, the T1D diagnosis date was considered unknown, as the individual may have had an established T1D diagnosis before initiating care within our health system. In these cases, we applied a set of rules to assign an estimated age at T1D diagnosis. These rules were based on the individual’s age at the time of their first encounter with a diabetes-related ICD code. The imputation of T1D diagnosis age was informed by existing epidemiological data, specifically reflecting the bimodal age distribution of childhood-onset T1D and the median age at diagnosis (24 years; interquartile range, 12–40 years) observed among adults with T1D.34–36
Specifically, if the individual was younger than 10 years at the time of the first encounter with a diabetes-related ICD code, the T1D diagnosis age was imputed as 4 years. For individuals aged 10–15 years at the time of their first encounter in the EHR that included a diabetes-related ICD code, the diagnosis age was imputed as 10 years; for those aged 15–19 years, as 15 years; for those aged 20–24 years, as 20 years; for those aged 25–29 years, as 22 years; for those aged 30–39 years, as 24 years; for those aged 40–49 years, as 27 years; for those aged 50–59 years, as 30 years; and for those aged 60 years or older, as 33 years.
For individuals without a HbA1c measurement in the 2 years preceding the index date, imputation was performed using the mean HbA1c value of the entire cohort, and we used a separate binary variable to represent the missing HbA1c measurement (1 = present, 0 = absent). Insurance type was categorized as private or public (e.g., Medicare or Medicaid). To represent insurance status, we created two binary variables: private insurance (1 = present, 0 = absent) and public insurance (1 = present, 0 = absent). This approach allowed for classification of all individuals in the dataset, including those without insurance (who were coded as zero for both variables).
Model development and validation
Two approaches were used to partition the dataset into discovery and prospective validation cohorts. The primary approach, which serves as the focus of this study and is described below, involved splitting the dataset temporally (Fig. 1). An alternative approach partitioned the dataset based on medical record number (Supplementary Data S2—Dataset Partitioning Based on Medical Record Number—Supplementary Table S1).

Overall study design.
With the temporal dataset split, the discovery cohort spanned from January 1, 2017, to December 1, 2021, with 60 index dates for both the 90- and 180-day prediction windows. To ensure that model evaluation was conducted solely on unseen data, we implemented a buffer period between the discovery and prospective validation cohorts. Thus, the prospective validation dataset spanned from April 1, 2022, to January 1, 2023, with 10 index dates for the 90-day prediction window, and from June 1, 2022, to October 1, 2022, with five index dates for the 180-day prediction window.
Because we used rolling monthly index dates with 90- or 180-day prediction windows, the same DKA event could appear in multiple prediction windows. For example, a DKA event on February 15, 2017, would be included in both the prediction window beginning January 1, 2017, and the one beginning February 1, 2017. Of note, the values of the predictors may differ between these two index dates. This approach aligns with our goal of developing a model for clinical implementation that can be deployed at regular intervals (e.g., monthly).
In addition, at each index date, we only made a prediction for individuals if they had at least one encounter (defined as an ambulatory provider visit, emergency department encounter, or hospitalization) in our health system within the 2 years preceding the index date, as well as at least one encounter during the subsequent prediction window. Thus, the number of individuals for whom predictions were made varied per index date and ranged from 3046 to 4539 in the 90-day cohort and from 3046 to 4966 in the 180-day cohort.
Two separate logistic regression models were developed: one for 90-day DKA prediction and one for 180-day DKA prediction. In the 90-day model discovery cohort, 0.9% of encounters were DKA events; in the prospective validation cohort, 1.2% were DKA events (Table 1). In the 180-day model discovery cohort, 1.6% of encounters were DKA events; in the prospective validation cohort, 2.0% were DKA events. The proportion of DKA events (i.e., the number of DKA events divided by the total number of encounters) was similar from 2017 to 2023 (Supplementary Data S3—Summary of Encounters and DKA Events Over Time—Supplementary Tables S2 and S3).
Dataset Characteristics for the 90-Day and 180-Day Diabetic Ketoacidosis Prediction Models
DKA, diabetic ketoacidosis.
Performance metrics
We assessed model performance using a set of performance metrics computed in the prospective validation cohort. To assess the models’ predictive performance, we examined the area under the receiver operating characteristic curve (AUC), both stratified by index date and cumulatively.
37
Statistical comparisons of AUCs over time were conducted using DeLong’s test. We also assessed the Brier score to evaluate model calibration. We used the Student’s t-test to compare mean differences between groups for continuous variables, and the chi-squared test to assess proportional differences between groups for binary variables. To identify high-risk subgroups, such as the top 2%, 5%, and 10%, individuals were ranked by their predicted risk of experiencing DKA. For these subgroups, we evaluated recall, precision, and enrichment.
True positives were defined as individuals in the high-risk subgroup who experienced a DKA event within the prediction window. False positives were defined as individuals in the high-risk subgroup who did not experience a DKA event within the prediction window. False negatives were defined as individuals not in the high-risk subgroup who did experience a DKA event within the prediction window.
Precision represents the proportion of predicted high-risk individuals who truly experienced a DKA event, indicating the positive predictive value of the model. Recall reflects the proportion of all actual DKA cases that were successfully identified as high risk, measuring the model’s sensitivity.
Enrichment quantifies how much the DKA rate in the model’s highest risk subgroup exceeds the baseline rate in the full cohort, reflecting the model’s ability to stratify risk. Enrichment was examined for each index date, with the mean reported. This approach allowed us to determine how effectively the model identified risk-enriched subgroups with significantly higher probabilities of developing DKA. Mortality was measured through April 30, 2023.
All analyses were conducted in R (version 4.4.2) with statistical significance set at P ≤ 0.05.
Results
Cohort demographics and clinical characteristics
Table 2 presents the demographic and clinical characteristics of adults with T1D, stratified by the occurrence of post-T1D diagnosis DKA-related hospitalizations between January 1, 2017, and April 30, 2023. During this period, 7798 adults with T1D received care within our health system. Of those individuals, 667 (8.6%) experienced at least one postdiagnosis DKA-related hospitalization between January 1, 2017, and April 30, 2023, accounting for a total of 1102 events. Among those with DKA, 487 individuals (73.0%) had a single event, 91 (13.6%) had two events, and 89 (13.3%) experienced three or more events. The highest number of DKA events recorded for a single individual was 17 during the study period. Individuals who had DKA were significantly younger (42.6 ± 17.4 years vs. 48.0 ± 17.8 years, mean ± standard deviation [SD], P < 0.001) and had higher HbA1c levels (9.9% ± 2.1% vs. 8.1% ± 1.7%, mean ± SD, P < 0.001). Black or African American individuals had higher representation in the DKA group (13.2% vs. 7.2%, P < 0.001). While continuous glucose monitor use was similar between groups (44.4% vs. 45.1%, P = 0.945), insulin pump use was significantly higher among those with DKA events (32.1% vs. 24.7%, P < 0.001). All-cause mortality was significantly higher in those who experienced DKA (9.6% vs. 6.2%, P = 0.003). No significant differences were observed in sex distribution or ethnicity between the groups (P > 0.05).
Characteristics of Individuals with Type 1 Diabetes Stratified by Post-Type 1 Diabetes Diagnosis Diabetic Ketoacidosis Status Between January 1, 2017, and April 30, 2023
Max, maximum; min, minimum; N/A, not applicable; SD, standard deviation. Percentages are rounded to one decimal place, thus the total may not equal 100%.
Model features for DKA prediction
Tables 3 and 4 summarize the features included in the 90-day and 180-day DKA prediction models, respectively. The analyses were conducted at the encounter level rather than the patient level, allowing individuals to be represented more than once in the dataset if they had multiple encounters. The 90-day model was trained on 220,647 total encounters (2079 DKA positive; 218,568 DKA negative) across 60 index dates, while the 180-day model was trained on 250,260 total encounters (3946 DKA positive; 246,314 DKA negative) across 60 index dates. The 90-day model was tested on 42,097 total encounters (491 DKA positive; 41,606 DKA negative) across 10 index dates, while the 180-day model was tested on 23,641 total encounters (462 DKA positive; 23,179 DKA negative) across five index dates.
Features Used in the 90-Day Diabetic Ketoacidosis Prediction Model
Statistically significant differences (P < 0.001) were observed for all features between DKA-positive and DKA-negative encounters across the discovery and prospective validation datasets. The average time between DKA events and the time since the most recent DKA event were imputed as needed, as described in the Methods section—the Feature Construction and Handling of Missing Data section.
Features Used in the 180-Day Diabetic Ketoacidosis Prediction Model
Statistically significant differences (P < 0.001) were observed for all features between DKA-positive and DKA-negative encounters across the discovery and prospective validation datasets. The average time between DKA events and the time since the most recent DKA event were imputed as needed, as described in the Methods section—the Feature Construction and Handling of Missing Data section.
Statistically significant differences (P < 0.001) were observed for all features between DKA-positive and DKA-negative encounters across the discovery and prospective validation cohorts for both the 90- and 180-day prediction models. In both models, a prior DKA event was more frequent in DKA-positive encounters (45.5% vs. 8.1% for the 90-day model; 42.1% vs. 7.8% for the 180-day model). DKA-positive encounters had higher mean HbA1c values (9.5% ± 2.1% vs. 8.1% ± 1.5% for the 90-day model; 9.4% ± 2.0% vs. 8.1% ± 1.5% for the 180-day model) and more frequently lacked HbA1c measurements in the 2 years preceding the index date (30.4% vs. 14.8% for the 90-day model; 32.7% vs. 18.7% for the 180-day model). Public insurance was more common in DKA-positive encounters (49.9% vs. 34.3% for the 90-day model; 49.1% vs. 33.8% for the 180-day model). These patterns were consistently replicated in the prospective validation cohorts. The logistic regression coefficients are reported in Supplementary Data S4—Feature Significance and Importance—Supplementary Tables S4 and S5).
Model performance
In the prospective validation dataset, the 90-day DKA prediction model achieved a mean AUC of 0.87 (SD ± 0.02) across 10 index dates, while the 180-day model achieved a mean AUC of 0.84 (SD ± 0.02) across five index dates (Fig. 2). Model performance across index dates did not differ significantly, suggesting the model generalized well over time. The Brier scores were 0.011 for the 90-day model and 0.018 for the 180-day model, indicating that the predicted probabilities aligned closely with the observed outcomes.

Receiver operating characteristic curves and performances over time for the 90-day and 180-day DKA prediction models. In the prospective validation dataset, the 90-day model had a mean AUC of 0.87
Tables 5 and 6 summarize the models’ performance metrics across index dates for the high-risk subgroups within the prospective validation cohort, which included an average of 4210 individuals per index date for the 90-day model and 4728 individuals per index date for the 180-day model. High-risk subgroups were defined based on predicted DKA risk. For example, in the 90-day model, among the 4210 individuals, the top 2% subgroup consisted of the 84 individuals with the highest predicted DKA risk. In the 90-day model (Table 5), the top 2% highest risk subgroup had an enrichment of 13.4, meaning these individuals were 13.4 times more likely to develop DKA compared with what would be expected by randomly selecting 2% of individuals from the entire cohort. The recall was 0.27, indicating that this group (n = 84) accounted for ∼27% (13/49) of all DKA-related hospitalizations in the health system during the 90-day prediction window. The precision was 0.16, meaning that ∼16% (13/84) of individuals in this subgroup experienced DKA within the 90-day prediction window. As the high-risk subgroup expanded to include more individuals, recall improved (reaching 0.74 for the top 20%), while both precision and enrichment declined (0.16 to 0.04 and 13.4 to 3.7, respectively). Similar performance metrics were observed in the 180-day prediction model (Table 6). Across both prediction time frames (90 and 180 days) and all high-risk subgroups (top 2%, 5%, 10%, and 20%), specificity (0.81–0.98) and negative predictive values (0.98–1.00) remained high.
Performance Metrics by High-Risk Subgroup for the 90-Day Diabetic Ketoacidosis Prediction Model
N, number.
Performance Metrics by High-Risk Subgroup for the 180-Day Diabetic Ketoacidosis Prediction Model
Discussion
We developed and validated logistic regression models to predict the 90- and 180-day DKA risk among adults with T1D. Risk predictions were generated using seven EHR-based features that were recorded within the 2 years preceding the index date. Our models demonstrated strong discriminatory ability, with an AUC of 0.87 (SD ± 0.02) for the 90-day DKA prediction window and 0.84 (SD ± 0.02) for the 180-day prediction window. Our models’ performance, as quantified by AUC, was comparable with previously published DKA prediction models in youth with T1D, which reported AUCs ranging from 0.72 to 0.91.18–23,27 One exception is a single study that evaluated nine prediction models, reporting a broader range of AUCs from 0.50 to 0.89. 24
Our models also performed similarly to the only other published models predicting DKA events in adults with T1D, developed by Li et al., which reported an AUC of 0.82. 29 However, the Optum dataset used by Li et al. limits their models’ generalizability and clinical value, and their models were not adopted in clinical practice. 38 For example, only 2.0% of individuals in their dataset, including both those who did and did not experience DKA, were using insulin pump therapy at the time of assessment. In contrast, 25.3% of individuals in our dataset used insulin pump therapy, which is more reflective of the increasing adoption of diabetes technology among adults with T1D. 5 In addition, Li et al.’s models estimated DKA risk without specifying a time frame for when it may occur. In contrast, our models forecast DKA risk over discrete intervals (i.e., 90 days or 180 days), providing risk estimates within potentially actionable time frames.
We developed our models with the ultimate goal to integrate them within our health system’s EHR to identify adults with T1D at the highest risk for DKA and prioritize them for targeted interventions. To assess their potential utility in this context, we ranked all adults with T1D in our health system by their predicted 90- and 180-day DKA risk and evaluated the models’ performance within these high-risk subgroups (e.g., the top 2%, 5%, 10%, and 20%). Previous DKA prediction models for youth with T1D also used rank-ordering to identify individuals at the highest risk for DKA.18,20,21,25 Vandervelden et al. predicted the 6-month DKA risk in youth with T1D. 18 In their model, for the top 2% highest risk individuals, the precision was 0.60 and the recall was 0.57. Williams et al. predicted the 180-day DKA risk in youth with T1D. 20 For the top 5% highest risk individuals in their cohort, the precision and recall were both 0.50. In our study, among the top 5% highest risk individuals, the 90-day model had a precision of 0.11 and a recall of 0.45, while the 180-day model had a precision of 0.17 and a recall of 0.42.
There are multiple factors that may explain the higher precision and recall in these models compared with our models. The Vandervelden et al. model used an ensemble of gradient-boosted decision trees and Williams et al. used a long short-term memory deep learning model, whereas we used a logistic regression model.18,20 Gradient-boosted decision tree models are well suited to high-dimensional data (e.g., EHR datasets), can handle missing data effectively, and can capture complex nonlinear interactions. 37 Long short-term memory models excel at learning temporal patterns and trajectories, making them well suited for time-series data (e.g., trends in laboratory values over time, such as HbA1c). 37 Consequently, these approaches may be better suited than logistic regression for developing an EHR-based DKA prediction model. Furthermore, our model used only seven features, compared with the 15 features in the Vandervelden et al. model and the >500 features in the Williams et al. model. The wide range of known DKA risk factors suggests that models that incorporate a broader set of features may outperform our approach.18,22,33 In addition, underlying differences in DKA risk factors between pediatric and adult populations with T1D may also contribute to differences in model performance. 28
Interestingly, in our dataset, insulin pump use was more common among individuals with DKA compared with those without DKA (32.1% vs. 24.7%, P < 0.001). This aligns with earlier studies reporting increased DKA risk among insulin pump users, particularly before automated insulin delivery systems became widely available.1,39 Because our dataset spans from January 1, 2017, to April 30, 2023, it includes individuals using nonautomated pump therapy, which may contribute to this finding. However, emerging data suggest that automated insulin delivery systems may reduce the risk of DKA, highlighting the need for further investigation into how evolving pump technologies impact DKA risk in real-world settings. 1 In contrast, continuous glucose monitor (CGM) use was similar between groups (44.4% vs. 45.1%, P = 0.945), which may reflect preferential prescribing of CGM to individuals perceived to be at higher risk for DKA.
Strengths and limitations
A key strength of our models is their focus on adults with T1D, a population for whom DKA prediction models are scarce and not currently used clinically. Designed for population health management, on each index date (i.e., the first day of every month), our models will rank all adults with T1D in the health system by their predicted risk of DKA. This approach facilitates the identification of high-risk individuals who may benefit from targeted prevention strategies. A risk-graded strategy could provide more intensive, multidisciplinary support, such as frequent follow-up, tailored education, mental health services, and social support, for those in the highest risk group (e.g., the top 5%), while offering enhanced but less intensive support for those at moderate risk. Further research is needed to determine the optimal interventions for each risk level.
Furthermore, our models were designed for clinical implementation, using a streamlined set of established, validated features that are routinely available in the EHR. They provide DKA risk predictions over clinically actionable time frames (i.e., 90 and 180 days). Given that the average cost of treating a single DKA event is estimated at $39,000 (based on 2017 estimates of ∼$30,000 per episode, adjusted for inflation to March 2025), this approach has the potential to substantially reduce health care costs by enabling timely, preventive interventions in high-risk individuals.10,40,41
Another strength of our models is their rigorous design. We constructed our dataset by adapting the validated SUPREME-DM and Klompas algorithms and used the criteria from the ADA 2024 Consensus Report to define DKA.1,30–32 To reduce the risk of overfitting and data leakage, we introduced a buffer period between the discovery and prospective validation cohorts, ensuring that model evaluation was performed only on temporally distinct and unseen data. However, temporally splitting the discovery and validation cohorts may introduce limitations on generalizability, such as (1) evolving therapies over time, (2) increasing diabetes technology use over time, (3) COVID-19’s impact on DKA rates, and (4) overlap of individuals across both cohorts if they received care during both periods. To address this, we tested the models by splitting the datasets by medical record number, which yielded an identical performance to the temporal split (Supplementary Data S2—Dataset Partitioning Based on Medical Record Number—Supplementary Fig. S1).
There are inherent limitations in using EHR data to develop a prediction model, such as missing data. In our dataset, 20.4% of individuals lacked a HbA1c value within the 2 years preceding the index date. However, we included the absence of HbA1c as a feature in our model, and it emerged as the third most important predictor based on the scaled coefficient. For additional details, see Supplementary Data S4—Feature Significance and Importance. Some individuals in our dataset may have been hospitalized for DKA or received care outside of our health system, limiting access to those records. However, many health systems in Minnesota and the Midwest also use Epic as their EHR, which facilitates data sharing. This minimizes external DKA hospitalizations that are not captured in our dataset. In addition, as our model was developed using data from an academic health system in the Upper Midwest, United States, its generalizability to other health systems may be limited.
Future directions
Maximizing precision and recall is essential to enhancing a model’s clinical utility. DKA prediction models for youth with T1D developed by Vandervelden et al. (ensemble of gradient-boosted tree-based models using 15 features), Williams et al. (long short-term memory deep learning model using >500 features), and Subramanian et al. (gradient-boosted ensemble of decision trees with 44 features) all demonstrated strong precision and recall.18–20 As a next step, we aim to enhance our models’ precision and recall by incorporating additional features (e.g., sociodemographics, comorbidities, and laboratory data), and by applying advanced machine learning methods. We also plan to assess the value of incorporating CGM data and anticipate that continuous ketone monitoring data, once commercially available, could be leveraged as an additional predictor in our model. Recognizing the practicality of streamlined models, we will also explore whether one with fewer features can achieve comparable performance. This has important implications for implementation, as simpler models are generally easier for clinicians to interpret and easier to integrate into EHR systems. Notably, both Schwartz et al. and Mejia-Otero et al. developed strong DKA risk prediction models using only three features.22,23 Such models may be well-suited for integration into the EHR, for example, as embedded risk scores within patient dashboards or as clinical decision support tools.
Authors’ Contributions
J.K.: Conceptualization, methodology, validation, formal analysis, investigation, writing—original draft, writing—review and editing, visualization, project administration, and funding acquisition. M.X.: Methodology, software, validation, formal analysis, investigation, data curation, writing—original draft, writing—review and editing, and visualization. R.C.: Methodology, software, formal analysis, writing—review and editing, and visualization. E.S.H.: Conceptualization, methodology, and writing—review and editing. A.C.G.: Conceptualization and writing—review and editing. N.M.: Conceptualization, methodology, and writing—review and editing. C.V.: Methodology, resources, and writing—review and editing. M.C.: Conceptualization, methodology, resources, and writing—review and editing. L.S.C.: Conceptualization, methodology, and writing—review and editing. S.M.: Conceptualization, methodology, software, validation, formal analysis, investigation, writing—review and editing, visualization, and supervision.
Footnotes
Author Disclosure Statement
J.K. receives research support from Dexcom (CGM devices for an investigator-initiated clinical study).
Funding Information
J.K. was supported by the NIH NIDDK Diab-Docs K12DK133995. N.M. was supported by a grant from the NIDDK R01DK134955.
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
