Abstract
Background:
Computer simulators of human metabolism are powerful tools to design and validate new diabetes treatments. However, these platforms are often limited in the diversity of behaviors and glycemic conditions they can reproduce. Replay methodologies leverage field-collected data to create ad hoc simulation environments representative of real-life conditions. After formal validations of our method in prior publications, we demonstrate its capacity to reproduce a recent clinical trial.
Methods:
Using the replay methodology, an ensemble of replay simulators was generated using data from a randomized crossover clinical trial comparing the hybrid closed loop (HCL) and fully closed loop (FCL) control modalities in automated insulin delivery (AID), creating 64 subject/modality pairs. Each virtual subject was exposed to the alternate AID modality to compare the simulated versus observed glycemic outcomes. Equivalence tests were performed for time in, below, and above range (TIR, TBR, and TAR) and high and low blood glucose indices (HBGI and LBGI) considering equivalence margins corresponding to clinical significance.
Results:
TIR, TAR, LBGI, and HBGI showed statistical and clinical equivalence between the original and the simulated data; TBR failed the equivalence test. For example, in the HCL mode, simulated TIR was 84.89% versus an observed 84.31% (P = 0.0170, confidence interval [CI] [−3.96, 2.79]), and for FCL mode, TIR was 76.58% versus 77.41% (P = 0.0222, CI [−2.54, 4.20]).
Conclusion:
Clinical trial data confirm the prior in silico validation of the UVA replay method in predicting the glycemic impact of modified insulin treatments. This in vivo demonstration justifies the application of the replay method to the personalization and adaptation of treatment strategies in people with type 1 diabetes.
Introduction
People with type 1 diabetes (T1D) require a lifelong insulin replacement therapy to compensate for the insulin secretion deficiency due to the autoimmune destruction of pancreatic beta-cells. Over the past few decades, there has been a substantial investment in diabetes technology driven by the increased availability of precise and interconnected medical devices, such as continuous glucose monitoring (CGM) devices and continuous subcutaneous insulin infusion (CSII) pumps, and the advances in simulation platforms that leverage mathematical models of glucose-insulin dynamics. 1,2 These metabolic simulators are powerful tools to design and validate new treatment paradigms, greatly accelerating and de-risking the path to clinical validation. 1,3
Among the simulation platforms used in T1D treatment research, the FDA-accepted UVA/Padova simulator stands out. 4 The simulator has a large in silico cohort of 100 children, 100 adolescents, and 100 adults. Other well-known platforms include the Oregon Health and Science University simulator, 5 and the Cambridge simulator. 6 In addition, a minimal model describing glucose kinetics was developed by Grosman et al., 3 which also served as a test bed to predict the insulin therapy outcomes of different clinical trials. 7,8
These simulators rely on specific mathematical functions that relate carbohydrate ingestion and insulin administration to generate blood glucose time series. 9 However, present physiological models do not encompass the entirety of human biological systems. They are limited in accounting for various phenomena affecting the glycemic profile of an individual, such as glycemic disturbances and patient behavior. 10,11
Recent works have focused on enhancing T1D simulators to fill the lack of real-life scenarios and capture variability observed in the field. Ahmad et al. 11 used clinical data from 14 subjects to generate a new virtual patient cohort by means of parameter optimization of the UVA/Padova model to capture day-to-day variations in glycemia and represent the glycemic outcomes of the data collected. However, this method did not fully capture intraday variability caused by factors unrelated to meals and exercise present in real data. In another recent study, a method for determining the parameters of a simplified version of the UVA/Padova simulator based on the Markov chain Monte Carlo was presented. 12 This strategy seeks to mimic the glucose response to different changes in insulin therapy observed in real-world data, although it has only been evaluated within single-meal scenarios and for specific time frames without meals or boluses in the four hours preceding the period of interest.
Our approach aims to reproduce the variability observed in the real-world blood glucose traces for daylong streams of data, regardless of meal intakes and/or the presence of unmodeled phenomenon (e.g., mixed physical activity, stress, or illness). 13,14 This methodology allows for the creation of ad hoc simulators of 24 h of field-collected T1D data, enabling the quantification of the glycemic response to “what-if” scenarios. It is based on the identification of a simplified glucose-insulin model and the estimation of an additive disturbance signal, aka the net-effect signal, that represents unmodeled phenomena. The domain of validity of this approach has been assessed with experimental scenarios generated from with the UVA/Padova simulator by comparing the true and replayed glucose traces when changing meals and insulin therapy. 13 The present work presents a step forward on the validation of the UVA replay simulator by reproducing real-life glucose traces under different insulin regimens from subjects undergoing a clinical trial.
Materials and Methods
The objective of this study is to showcase the potential of the UVA replay simulator as a benchmark for testing and validating treatment strategies for T1D within the context of daily scenarios experienced by real-life patients. To that end, data from the DCLP6 clinical trial (clinicaltrials.gov NCT04877730), which explored different automated insulin delivery (AID) modalities, hybrid closed loop (HCL) and fully closed loop (FCL), are used as baseline for comparison. 15 The intra- and intersubject variability observed in field is captured using the replay methodology, generating a cohort of representative virtual subjects per control modality. Subsequently, each virtual cohort is set in closed loop with the different control modes, simulating glucose response to therapy variations. The resulting outcomes from these simulations are compared with those observed in real subjects.
Data set
Data from 32 people with T1D were collected during the DCLP6 clinical trial conducted by UVA and approved by the UVA Institutional Review Board for Health Sciences Research (210035; Charlottesville, VA). 15 The study aimed to assess the glycemic impact of different modalities of the UVA AID system for T1D treatment. Participants completed 3 study days. Each day on a different control modality randomly assigned at 16:00 h. 15 Three meals were provided every day, dinner ranging from 18:00 h to 21:00 h, lunch fixed at 13:00 h for all subjects, and breakfast from 7:00 h to 9:00 h. Despite not being used by the controllers under the FCL mode, the information about the carbohydrate content and time of each meal was manually recorded by the study team.
For simplicity, data from the HCL and FCL modalities were selected for this analysis: CGM and insulin delivery (basal rate (BR) modulation, automatic bolus, and manual bolus) with the corresponding time stamps were downloaded from the DiAs Web Monitoring (DWM) tool used during the clinical trial. 16 In addition, time and value data pertaining to BR, carbohydrate ratio (CR), and correction factor (CF) profiles, hypoglycemia treatments (HTs), self-monitoring blood glucose (SMBG), insulin-on-board (IOB), estimated model states, and log messages were also downloaded from DWM for unit test purposes. Body weight (BW) was obtained from the participants’ medical history.
Details on data preprocessing can be found in the Supplementary Data S1, along with the description of the unit tests performed to identify and address potential discrepancies in control actions and glucose traces when applying the strategy to real-world data. Following preprocessing, the data were separated in 24-h segments corresponding to the use of a specific AID modality by an individual participant (a subject/modality pair).
Virtual diabetic subject
The methodology to generate virtual subjects is based on the replay simulation tool developed at UVA. 13,14 This is a data-driven tool that integrates a glucose–insulin minimal model personalized to represent each day of data for each subject, along with a net-effect signal obtained through regularized deconvolution intended to capture unmodeled phenomena.
Glucose-insulin minimal model
The metabolic model used to construct the virtual patients is built around a discrete-time linearized version of the subcutaneous oral minimal model
14,17
updated by Hughes et al.
13
The model can be described by the following:
Model identification
For each subject/modality pair (the 24 h of data where a study participant used a specific AID modality), model (1) is individualized on patient available data (CGM, insulin, meals, and HTs), while setting the net-effect to zero. Parameters related to insulin sensitivity and fractional glucose effectiveness are estimated on a daily basis, while parameters related to the carbohydrate absorption rate are fitted to each individual meal to account for intermeal absorption differences. Parameters are identified by maximizing the posterior probability of observing the set of parameters conditioned on the day’s data, with population averages used as priors. Population parameters are used for the remaining parameters. 18
Residual metabolic signal estimation
The net-effect signal is estimated following the procedure described by Patek et al. 14 In summary, the residual metabolic signal for the daily data is estimated through regularized deconvolution by inverting the model previously identified. Once the net-effect signal is obtained, it is fed back into model (1) with the insulin/meal/HT records to reproduce the data, encompassing all its inherent variability.
Closed loop simulation
Once each virtual subject model is obtained, it is integrated with the control algorithm, that is, glucose and insulin signals are regenerated through replay simulation instead of using data retrieved from the DWM. As shown in Figure 1, the first simulation step, denoted replay, consists of simulating each virtual patient under the sequence of control modalities tested in the study. This step aims to provide an idea of how well we are reconstructing the signal when factors occurring in real life are closely reproduced. The second step is resimulation, where the insulin therapy applied to the virtual subject is modified to predict the glucose impact. Here, aiming to compare the predicted glucose with data available, resimulation is performed with the alternate control modality as the one evaluated in the clinical trial for this 24-h data segment (see Fig. 1).

Process diagram of the replay validation methodology.
Hypoglycemia treatments
Rescue carbohydrates are an important simulation factor to consider. Although virtual subjects are generated using data on HT timing and carbohydrate amount, HTs must be automated based on the glucose response in the replay and resimulation phases. This is because insulin dose changes may result in different glucose conditions requiring new or omitting previous HTs. In this regard, conditions aiming to emulate each virtual subject’s glucose response to fast-absorbing carbohydrate ingestion are considered as in Diaz et al. 18 The glucose absorption submodels corresponding to HTs are averaged and used to simulate the response to the new rescues. In the absence of recorded HTs, an average submodel from meals is computed considering accelerated carbohydrate absorption rates. Furthermore, the subject’s behavior in treating hypoglycemic events is approximated to determine carbohydrate amount and timing. Carbohydrate amount is the average from the HTs provided, and timing is determined by the average glucose value and glucose derivative when HTs were ingested, as well as the time between consecutive HTs.
Outcomes and statistical analysis
Assessment of the replay phase is done by computing the root mean square error (RMSE) between CGM data and the simulated glucose traces for each subject/control-modality pair. Glucose outcome metrics are computed from both the clinical trial source data files and the simulated blood glucose values during both the replay phase and the resimulation phase when alternating the control mode. Considered glycemic outcomes are the percentages of time between 70 and 180 mg/dL (TIR), above 180 mg/dL (TAR), and below 70 mg/dL (TBR). In addition, we computed the high and low blood glucose indices (HBGI and LBHI) and reported the total daily insulin (TDI). An equivalence test is performed to assess whether the outcomes from data-replay and data-resimulation are similar under predefined equivalence margins.
19,20
For each outcome metric, the margins are defined as follows:
Results
Replay of clinical trial results
Each day of data collected during the clinical trial is reproduced with the replay methodology to obtain an associated virtual subject representation and then set in closed loop with the corresponding simulated control modality. The cumulative distribution of the RMSE between CGM data and replayed glucose traces for the cohort of subjects using each of the control modalities is shown in Figure 2. An RMSE of less than 20 mg/dL was obtained except for one subject/modality pair. In addition, Table 1 reports outcome metrics obtained from collected data and from the replay stage for both control modalities. Equivalence with P < 0.001 was shown for all metrics except for TBR, not being able to show TBR equivalence under the HCL mode. All 64 subject/modality pairs were selected for analysis of the resimulation phase.

Cumulative distribution of RMSE between CGM reported from the clinical trial and replay glucose with each control modality. RMSE, root mean square error; CGM, continuous glucose monitoring.
Comparison of Data Versus Replay Overall Outcome Metrics When Using Each Control Mode During the Clinical Trial
Resimulation with different control modalities
Each virtual subject/day pair is simulated using a different control modality, so that subjects generated in HCL mode are simulated using FCL therapy, and subjects generated in FCL are simulated using HCL (as in Fig. 1). Individual subject glucose and insulin traces comparing collected data, replay phase, and resimulation phase under both control modes are presented in Supplementary Figure S1. The Figure illustrates how the effects on glucose levels during the clinical trial are approximated when the therapy switch is applied to each virtual subject/day. Specifically, it shows faster drops of glucose levels during prandial periods due to manual boluses when using the HCL mode, and the corresponding increase in glucose levels aligning with original data collected during the FCL mode.
The distributions of TIR, TBR, and TAR for each control modality are depicted in Figure 3, comparing the outcomes computed from collected data and those from the resimulation phase. Population outcome metrics are also reported in Tables 2 and 3, for HCL and FCL, respectively. Equivalence was demonstrated for all glycemic metrics except for TBR. The CI for the TIR, TAR, LBGI, HBGI, and TDI is completely contained within the defined equivalence margin. The simulated TBR was higher than TBR obtained in the clinical trial in both HCL (data: 2.42% vs. simulation: 2.94%) and FCL (data: 2.50% vs. 2.68%) modalities. In addition, outcome metrics during overnight and daytime periods are reported in Supplementary Table S1. In HCL mode, data versus simulation for daytime TIR is 81.93% versus 81.28%, TBR is 2.42% versus 3.15%, and delivered TDI is 41.36U versus 40.76U, while during the overnight period, TIR is 94.14% versus 93.41%, TBR is 2.14% versus 2.27%, and delivered TDI is 7.66U versus 6.86U. In the case of FCL, daytime TIR is 72.50% versus 72.70%, TBR is 2.47% versus 3.08%, and delivered TDI is 38.11U versus 37.66U, and overnight metrics correspond to a TIR of 88.91% versus 91.70%, TBR is 2.53% versus 1.54%, and delivered TDI is 6.53U versus 7.42U.

Comparison of the distribution of glucose outcome metrics obtained from data and resimulation phase.
Comparison of Glycemic Outcomes When Using HCL Mode During the Clinical Trial and When Resimulating the Virtual Patient Cohort Generated with FCL
Comparison of Glycemic Outcomes When Using FCL Mode During the Clinical Trial and When Resimulating the Virtual Patient Cohort Generated with HCL
Discussion
There are at present numerous platforms for generating virtual cohorts of T1D subjects with the goal of evaluating new treatments and allowing rapid iteration and improvement of therapies in a safe, low-cost environment. Even though some platforms are based on maximal models, they still fail to capture all the glucose variabilities observed in data collected from real subjects. To address this limitation, a replay methodology that captures glucose variability as a net-effect signal was previously proposed and validated in silico. In this work, we detail the features to consider when applying the strategy to real-world data and showcase its domain of validity to the resimulation of daily glucose traces by comparing the glucose outcomes with those obtained during a clinical trial.
The validation process comprises two stages. The first is the creation of the cohort of virtual subjects to replay data by closing the loop with the control strategy used during data generation. Following that, a resimulation stage is performed in which insulin therapy is altered to assess its effect on glucose. Here, the control mode was switched from HCL to FCL or from FCL to HCL, and glycemic metrics were compared with those obtained during the clinical trial.
Importance of the replay stage relies on providing an idea of how well the method is able to reproduce real-world data, as nonsimulated events that occur in real-life conditions such as pump disconnection, pump occlusions, or bolus override cause differences in the control actions that ultimately lead to a discrepancy in CGM traces when using simulation strategies. Available signals from the pump or monitoring devices can be used to mitigate these differences in control action, as an example, here the information from the time stamps of CGM is used to reproduce asynchronism occurring in real data (see Supplementary Data S1). An equivalence test such as the one presented in Table 1 helps to have a better idea about the ability to reproduce the cohort metrics obtained in the collected data and possible aspects to consider when drawing conclusions in the resimulation stage. In this study, equivalence was demonstrated for all metrics except TBR under the HCL mode.
Replay simulations have been used to suggest insulin dosing parameters in a personalized manner to individuals with T1D. Works related encompass optimization of BR, CR, and CF in multiple daily insulin injection, CSII, and AID therapies. 18,22,23 It has been acknowledged that, as any simulation strategy, replay simulations have a domain of validity, and so, a threshold based on RMSE can be imposed to consider a valid subject/day to move forward to the resimulation stage.
The obtained results show TIR, TAR, LBGI, HBGI, and TDI equivalence between real and resimulated data with both control modes. This highlights the replay methodology’s potential to predict the performance of an untested control strategy in a clinical setting. In turn, suggesting a valid use as a new platform for the validation of new therapy designs in scenarios that include glucose variability observed in real subjects. When performing the resimulation with both control modes, a TBR greater than that of the collected data was observed. We attribute this to two main factors as follows: first, the evaluation of the TIR with the hard threshold imposed at 70 mg/dL, as for some virtual subject/day pairs, despite having similar responses, the glucose remained close to but below 70 mg/dL for longer periods, inflating the TBR (this possible explanation is supported by the equivalence of LBGI, a different hypoglycemia metric that is not affected by the threshold effect); and second, the change of HT inputs from those reported in the data to the automated version in the resimulation, as both the conditions to trigger an HT (CGM value and its derivative) and the glucose response to fast-absorbing carbohydrates were averaged from daily data. To counteract this second factor, better personalized rules for determining the timing, amount, and effect on glucose of HT could be sought.
Additional outcome reports during overnight and daytime periods provide insights into how the resimulation stage adequately reflects glucose outcomes according to the effect each control mode had during the trial. In the case of HCL mode, insulin dosage increased during the day to similar amounts as those collected in the data (TDI data: 41.36U ± 19.93U, TDI resimulation: 40.76U ± 18.62U), which consequently dropped glucose to similar levels than those observed during the trial (TAR data: 15.55% ± 10.53%, TAR resimulation: 15.57% ± 10.48%). For the case of FCL, insulin dosing during the day was reduced (TDI data: 38.11U ± 17.22U, TDI resimulation: 37.66U ± 16.76U) increasing TAR and reducing TIR as observed in data (TAR data: 25.03% ± 11.48%, TAR resimulation: 24.22% ± 10.56%, TIR data: 72.50% ± 11.14%, TIR resimulation: 72.70% ± 12.20%).
To further show the potential of the replay methodology, results were compared with those obtained with the UVA/Padova simulator, which was used to gain FDA approval for the DCLP6 clinical trial. The simulator was set to mimic the clinical trial conditions as follows: 24 h per control mode, 3 fixed meals at 7:00 h, 13:00 h, and 19:00 h, with intraday variability to account for down phenomenon and variations of insulin sensitivity. The comparison is reported in Supplementary Table S2. The results suggest that the outcomes of the clinical trial are better reflected with the replay simulator. For example, in the case of HCL mode, the TIR observed in the trial data was 84.89%, which is closer to the replay simulator’s TIR 84.31% compared with 79.04% obtained with the UVA/Padova simulator. Similarly, the replay simulator’s TAR was 12.75%, compared with 20.72% for the UVA/Padova simulator, which is again closer to the clinical trial data (12.68%).
This work represents an important advancement in the evaluation of the UVA replay methodology in particular, and the first clinical validation of a replay methodology in general, by reproducing the outcomes of a full day of a clinical study under controlled conditions. While an essential next step, we recognize that this singular case alone is not sufficient to confidently assert a broadened domain of validity. Rather, validation of in silico modeling methodologies requires an iterative approach as new real-world data sets become available. This work may also provide a blueprint for the validation of similar replay environments, leveraging controlled clinical trials to allow for the in silico treatment switch.
Future research should assess the strategy’s validity using data from free-living conditions, as hotel-based AID trials still impose restrictions on behaviors and therefore glycemic variability and allow for data quality that may be hard to achieve in real life. Furthermore, while comprehensive analyses of model identification and input independence were outside the scope of this validation study, we continue to work on improving the present method, particularly as it pertains to baseline accuracy of the core physiological model identification process and the independence of the additive signal to inputs that may be changed in the application of our simulation methodology. In addition, the effect of metabolic changes over longer time periods and data handling in an uncontrolled environment remains to be investigated.
Conclusions
The ability of the UVA replay methodology to generate realistic virtual T1D subjects and predict clinical outcomes is shown in this study. Resimulating cohorts with different therapies replicated key glycemic metrics from a closed-loop trial, establishing validity as a standard for in silico evaluation. The presented platform should allow for the testing of new treatments under realistic conditions, thereby accelerating therapeutic optimization and delivery to improve diabetes management.
Footnotes
Author Disclosure Statement
The authors declared the following potential conflicts of interest. M.F.V. and P.C. receive research support and royalties from Dexcom handled by the University of Virginia’s Licensing and Ventures Group. M.D.B. declares research support handled by the University of Virginia by Dexcom, Novo Nordisk, Tandem Diabetes Care; patent royalties handled by the University of Virginia by Dexcom, LifeScan, Novo Nordisk, and Sanofi. Honoraria: Tandem and Sanofi. Consulting: Roche, Portal Insulin LLC, and Dexcom.
Funding Information
This work was supported by an NIH grant 5R01DK129553.
Supplementary Material
Supplementary Data
Supplementary Figure S1
Supplementary Table S1
Supplementary Table S2
Supplementary Data S1
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
