Continuous glucose monitoring using machine learning models and IoT device data: A meta-analysis

Abstract

BACKGROUND:

Machine learning offers diverse options for effectively managing blood glucose levels in diabetes patients. Selecting the right ML algorithm is critical given the array of available choices. Integrating data from IoT devices presents promising opportunities to enhance real-time blood glucose management models.

OBJECTIVE:

This meta-analysis aims to evaluate the effectiveness of machine learning models utilizing IoT device data for predicting blood glucose levels.

METHODS:

We systematically searched electronic databases for studies published between 2019 and 2023. We excluded studies lacking ML model derivation or performance metrics. The Quality Assessment of Diagnostic Accuracy Studies tool assessed study quality. Our primary outcomes compared ML models for BG level prediction across different prediction horizons (PHs).

RESULTS:

We analyzed ten eligible studies across prediction horizons of 15, 30, 45, and 60 minutes. ML models exhibited mean absolute RMSE values of 15.02 (SD 1.45), 21.488 (SD 2.92), 30.094 (SD 3.245), and 35.89 (SD 6.4) mg/dL, respectively. Random Forest demonstrated superior performance across these PHs.

CONCLUSION:

We observed significant heterogeneity across all subgroups, indicating diverse sources of variability. As the PH lengthened, the RMSE for blood glucose prediction by the ML model increased, with Random Forest showing the highest relative performance among the ML models.

Keywords

Machine learning diabetes CGM Internet of Things blood glucose hyperglycemia

1. Introduction

Diabetes is a global health problem that is expected to worsen in the next decade [1, 2]. The impact of uncontrolled diabetes on an individual’s health and wellbeing is significant; this underscores the urgent need for effective management strategies to prevent these problems and improve outcomes [3, 4]. Diabetes management is important to reduce the risk of serious and chronic diseases associated with diabetes. Machine learning combined with IoT applications is expected to revolutionize diabetes management [5]. The IoT plays a significant role in the healthcare industry, both in application-oriented tasks and in maintaining patient health records. IoT enables automatic and continuous monitoring, which is particularly useful in mobile healthcare applications [6, 7, 8]. IoT devices such as continuous blood glucose monitors (CGM) can instantly and continuously monitor physical data and provide valuable information that can be used by ML algorithms for predictive modeling and personal impact. ML models are valuable tools for identifying and managing diabetes. These models have proven to be excellent predictors of diabetes development, leveraging data from a person’s medical history, risk factors, and genetic makeup [9, 10]. Machine learning algorithms can analyze complex data provided by CGMs, electronic health records (EHRs), and lifestyle factors to predict glycemic changes and improve glycemic control [11, 12]. Various machine learning algorithms, such as random forests (RF), support vector machines (SVM), neural networks, and autoregressive models, have been examined for their effectiveness in predicting blood glucose (BG) levels and diabetes complications. However, the effectiveness of these models may vary between studies due to differences in data elements, designs, and patient populations [13, 14, 15].

The role in management is important for people with diabetes. With the continuous improvement of machine learning models and the rapid development of IoT devices, this meta-analysis examines the different trends and changes observed in the last five years. By reviewing studies published between 2019 and 2023, we aim to provide a new assessment of the future of machine learning in the context of blood sugar monitoring with IoT technology. This meta-analysis aimed to evaluate the effectiveness of ML models in predicting blood glucose outcomes and improving glycemic control in diabetic patients. This study aims to identify advances, challenges, and trends in the integration of machine learning and IoT technologies for diabetes management by integrating existing literature. Specific machine learning and IoT architectures that have been shown to be useful in diabetes prediction and management will also be examined in research. This study will provide important information that will guide future research and inform machine learning-driven solutions in clinical practice, ultimately improving diabetes glucose management and reducing the burden of diabetes complications.

2. Methods

This study strictly followed the reporting guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Liberati et al., 2009) [16, 17]. PRISMA provides an effective and reproducible method for data analysis, article selection, evaluation, and analysis. A pre-defined protocol was established to document the analysis methodology and criteria for inclusion.

2.1 Study design

This section outlines the research design and methodology utilized in this study. It covers the eligibility criteria, information sources, research inquiries, study selection procedures, data collection methods, and the article selection process for publication.

2.2 Research questions

General questions (GQ):

1.
What advancements and trends have emerged in diabetes management with the integration of machine learning and IoT technologies over the past five years?
2.
What are the key challenges and gaps identified in the literature regarding the practical implementation of machine learning and IoT solutions in diabetes management during this period?

Specific questions (SQ):

These questions aim to delve deeper into the ways in which specific health parameters and physiological data are monitored and analyzed using machine learning and IoT devices in the context of diabetes management. They focus on understanding: what are the outcomes and effectiveness observed from the application of machine learning and IoT techniques in the management and control of diabetes, particularly in predicting blood glucose levels and optimizing glycemic control? Which machine learning algorithms and IoT architectures are predominantly employed in the development and deployment of solutions for diabetes management?

The answers to these research questions will provide comprehensive insights into the current state of machine learning and IoT applications in diabetes management. They will help evaluate the impact, challenges, and opportunities associated with these technologies in improving diabetes care and patient outcomes.
2.3 Search strategy

The process of interpreting the search string involves conducting searches in scientific databases and cross-referencing known terms, including synonyms, acronyms, and word combinations relevant to the study’s context. We use the PICOS method to refine our search strings [18]. This approach is recommended for the development of various concepts from PRISMA, such as defining objectives, research questions and appropriate criteria. Each letter in PICOS contains a domain of expertise: participants (P), intervention (I), comparison (C), outcome (O), and design.

Participants: Adult individuals diagnosed with diabetes mellitus, including those with type 1 diabetes, type 2 diabetes, or gestational diabetes.

•
Interventions: Utilization of machine learning algorithms and IoT technologies, including wearable devices and smart sensors, for monitoring, management, and prediction of blood glucose levels in diabetic patients.
•
Comparisons: Comparison of the effectiveness and outcomes achieved through the integration of machine learning and IoT technologies with traditional methods of diabetes management.
•
Outcomes: Assessment of outcomes related to glycemic control, blood glucose prediction accuracy, improvement in patient outcomes (such as quality of life, morbidity, and mortality rates), identification of challenges and gaps in the implementation of machine learning and IoT solutions in diabetes management.
•
Study Design: Inclusion of research articles, clinical trials, observational studies, and feasibility studies that investigate the integration of machine learning and IoT technologies in diabetes management. Emphasis on studies reporting outcomes related to the application of machine learning algorithms and IoT architectures in predicting blood glucose levels, optimizing glycemic control, and addressing challenges in diabetes management.

Based on the search strategy, we demonstrated the search string defined to be used in querying the databases:
2.4 Study selection

For article selection, we retrieved studies published within the last five years (2019–2023) from electronic databases using our predefined search string. The databases surveyed included Scopus, Springer, IEEE Xplore, PubMed, CINAHL, Embase, Web of Science, and Nature. These databases were selected due to their comprehensive coverage of relevant articles in the field addressed in this paper. Moreover, they offer access to full-text journals and conference proceedings from prominent health conferences focusing on patient self-care, IoT, diabetes, wearable devices, and related topics. The last search was done on January 15^th, 2024.

2.5 Exclusion criteria

Articles focused on pediatric populations, including children and adolescents (up to 18 years of age), were excluded.

Our meta-analysis specifically focuses on Continuous Glucose Monitoring (CGM) technologies used in diabetes management.

Articles not reporting primary research studies, such as thesis, opinions, abstracts, dissertations, criticisms, books, protocols, posters, reviews, and oral presentations were excluded.

Articles that do not specifically discuss the utilization of IoT techniques, including wearable electronic devices, for monitoring, self-care, and management during the treatment phase of diabetes patients were excluded.

2.6 Inclusion criteria

Studies involving adult men and women diagnosed with diabetes mellitus, including type 1 diabetes, type 2 diabetes, or gestational diabetes.

Studies published within the last 5 years to capture recent advancements and trends in the field of diabetes management.

Articles written in English to ensure accessibility and comprehensibility for analysis and interpretation in the meta-analysis.

2.7 Data extraction and management

Both reviewers independently conducted data extraction and quality assessment. Any disagreements were resolved by an impartial third reviewer. When a study reported multiple test results for the same ML model, the most favorable outcome was chosen for extraction. Similarly, if a study evaluated multiple ML models, performance metrics for each model were extracted individually. In studies focusing on blood glucose level prediction, root mean square errors (RMSEs) for different prediction horizons (PHs) were extracted. For studies not specifying PHs, performance metrics such as R-squared value and Accuracy of ML models were extracted.

Specifically, the following information was extracted:

•
General characteristics: first author, publication year, country, data source, and study purpose (i.e., predicting blood glucose).
•
Experimental information: participants (type of DM, type 1 or 2), sample size (patients, data points, and hypoglycemia), demographic information, models, study place and time, model parameters (i.e., input and PHs), model performance metrics, IoT applications used.

2.8 Methodological quality assessment of included studies

The quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. This tool evaluates studies across four domains: patient selection (5 items), index test (3 items), reference standard (4 items), and flow and timing (4 items). All four domains were used to assess the risk of bias, while the first three domains were specifically used to evaluate concerns regarding applicability. Each domain consists of a set of questions (totaling 7) related to either risk of bias or applicability [19].

2.9 Data synthesis and statistical analysis

The performance metrics of models used for blood glucose level prediction were evaluated independently based on their specified prediction horizons. Studies that did not specify prediction horizons were analyzed separately. The primary performance metric used was the root mean square error (RMSE) of ML models in predicting BG levels. For each study, effect sizes (Cohen’s d) and standard errors were calculated. Study heterogeneity was assessed using I² values obtained from multivariate random-effects meta-regression, which accounted for within- and between-study correlations. Heterogeneity was categorized into quartiles based on these values: 0% to $<$ 25% for low heterogeneity, 25% to $<$ 50% for low-to-moderate heterogeneity, 50% to $<$ 75% for moderate-to-high heterogeneity, and $>$ 75% for high heterogeneity [20, 21]. Additionally, meta-regression was employed to explore the sources of heterogeneity. Publication bias was evaluated using regression testing for funnel plot asymmetry through Egger’s test.

Furthermore, studies focusing on BG levels were divided into four subgroups based on different prediction horizons (15, 30, 45, 60, and 120 minutes). A two-sided $P$ value of less than 0.05 was considered statistically significant. All statistical analyses were conducted using Jeffreys’s Amazing Statistics Program (JASP Version 0.18.3), and guidelines from Cochrane Review Manager were utilized.

3. Results

Out of 1,174 studies identified through systematic search of predefined electronic databases, 1,067 (91%) remained after removing duplicates. Following screening of titles and abstracts, 734 (68.79%) studies were excluded due to irrelevant topics or lack of predefined outcomes. The remaining 333 (31.2%) studies underwent full-text evaluation. Of these, 323 (97%) were excluded for various reasons, leaving 10 (3%) studies included in the final meta-analysis.

Figure 1.

PRISMA flow diagram of identifying and including studies.

3.1 Description of included studies

In total, the 10 studies included 8,776 participants with over 20 different ML models and different IoT devices (Table 1).

3.2 Quality assessment of included studies

The evaluation findings using the QUADAS-2 tool indicated that 30% of the studies included did not provide detailed reporting of patient selection criteria, resulting in substandard patient selection quality.

3.3 Statistical analysis

3.3.1 Machine learning models for predicting blood glucose levels

In our meta-analysis evaluating the performance of machine learning (ML) models at a 15-minute prediction horizon, we observed significant heterogeneity across the included studies. This analysis incorporated data from 4 studies [22, 23, 25, 27], collectively examining 5 distinct ML models. The mean RMSE was 15.02 (SD 1.45) mg/dL. The omnibus test of model coefficients yielded a statistically significant result (Q $=$ 15.651, df $=$ 1, $p<$ 0.001), indicating that the choice of ML model significantly influenced the outcome variable. Similarly, the test of residual heterogeneity revealed substantial residual heterogeneity across studies (Q $=$ 191.880, df $=$ 7, $p<$ 0.001), underscoring significant variability in effect sizes not explained by the ML models alone. Residual heterogeneity estimates further confirmed the extent of variability, with an estimated $\tau^{2}$ (Tau-squared) of 4.435 and $\tau$ (Tau) of approximately 2.106 as shown in the Forest Plot (Fig. 3). The I² statistic (97.345%) indicated that a large proportion of the total variability in effect sizes was due to heterogeneity rather than sampling error, emphasizing considerable differences in study outcomes among the included ML models. Additionally, the H² value (37.668%) reflected the ratio of true heterogeneity to total observed variability, highlighting the impact of

Table 1
Baseline characteristics from included studies of predicting BG levels

Study_ID	Sample size (N)	Outcome measure	ML model	IoT devices	Demographic information
[22]	40 (DM1)	RMSE R-squared ( $R^{2}$ )	RF SVR	Abbott Freestyle Libre CGM System, Fitbit Charge 5 Smart Band	Age, sex, BMI, duration of diabetes, HbA1C (%)
[23]	40 (DM1)	RMSE, R squared (R2)	RF SVM BRNN	Abbott Freestyle Libre CGM System, Fitbit Charge 5 Smart Band	Age, sex, BMI, duration of diabetes, HbA1C (%)
[24]	3 different datasets: • 12-DM1 (OhioT1DM data set) • 25-DM1 (ABC4D data set) • 12 DM1 (ARISES data set)	RMSE MAE gRMSE	E3NN TCN [32] CRNN [33] LSTM [34] Bi-LSTM [35] SVR [36] ARIMA [37]	Medtronic Enlite CGM, Dexcom G5 CGM, Dexcom G6 CGM	–
[25]	40 DM1	RMSE	RF SVM BRNN	Abbott Freestyle Libre CGM sensor, Fitbit Charge 5 smart band	Sex, age, BMI, HbA1C %, insulin units per day, duration of diabetes
[26]	Six from the Ohio T1DM dataset and one study participant who is also an author of the study	RMSE, MRE	Ridge Regression	Dexcom G6 (CGM measurements to Apple Health), Empatica E4 wristband, Oura ring, Apple Watch	–
[27]	12 (T1DM)	RMSE, gRMSE, MAE, MAPE, Time Lag	Deep Learning algorithm embedded within the ARISES platform	Clinically validated wearable sensor wristband	Age, gender, insulin regimen, HbA1c, glucose level, daily risk range
[28]	Dataset 1 (768, Female) Dataset 2 (Excluded pediatric as well)	Precision, Accuracy, Specificity, Sensitivity, F1-Score, NPV, FNR, FPR, FDR, MCC	FMATSO-MDDTCN TSO-MDDTCN MAO-MDDTCN CSO-MDDTCN EOO-MDDTCN	IoT sensors-based diabetic data collection	Insulin level, Body Mass Index (BMI), age
[29]	2217 (T2D)	RMSE, MAPE	CGP Model (RNN based model)	Mobile-app (January AI), CGM (Freestyle Libre, Abbott), HR monitor (Apple Watch or Fitbit)	BMI, Weight, height, age
[30]	Dataset1 (Pima Indians diabetes, 768) Dataset2 (Hospital Frankfurt Germany diabetes dataset, 2000) Dataset3 (merged dataset, 2768)	Confusion matrix; Accuracy	Adaptive random forest algorithm	IoT-enabled Blood Pressure Monitor, Glucose Monitor, Sleep Tracker, Heart Rate Monitor, Smart Scale (weight)	Age, BMI, Blood pressure, Diabetes Pedigree Function, Glucose, Insulin, Outcome, Pregnancies, and Skin Thickness
[31]	147 participants 74 of 93 in waist-worn wearables arm 73 of 93 in wrist-worn wearables arm	Area Under the Receiver Operating Characteristic (ROC) Curve, $R^{2}$	LR LSR RR CART RF GB EML	Waist-worn (Fitbit Zip) or wrist-worn (Fitbit Charge HR 2) wearable arm	Age, gender, race/ethnicity, education, marital status, and annual household income

RF, Random Forest; SVR, Support Vector Regression; SVM, Support Vector Machine; BRNN, Bayesian Regularized Neural Networks; RMSE, Root Mean Square Error; $R^{2}$ , R squared; gRMSE, glucose specific RMSE; MAE, mean absolute error; MAPE, mean absolute percent error; One-Dimensional Convolutional Neural Network (1DCNN); Long Short-Term Memory (LSTM); Multi-scale Dilated Deep Temporal Convolutional Network (MDDTCN); CGP, Continuous glucose prediction; DirecNet, Diabetes research in children Network; LR, Linear Regression; LSR, Lasso regression, RR, Ridge regression; CART, Classification and regression trees; GB, Gradient boosting; EML, Ensemble machine learning, E3NN, embedded edge evidential neural network; TCN, temporal convolutional network; CRNN, convolutional RNN; ARIMA, autoregressive integrated moving average.

Figure 2.

Assessment of study quality. Graph (A) depicting risk of bias and concerns about applicability, and Summary (B) showing risk of bias and applicability concerns.

heterogeneity on the meta-analysis results. Furthermore, regression testing for funnel plot asymmetry using Egger’s test detected significant asymmetry ( $z=-$ 5.707, $p<$ 0.001), suggesting the presence of publication bias. This finding underscores the need for cautious interpretation of the meta-analytic results and consideration of potential bias in the synthesized evidence.

Figure 3.

Forest Plot for comparing ML models at a PH $=$ 15 mins.

For PH $=$ 30 minutes, 3 studies [24, 25, 27] with 11 different ML models. The mean RMSE was 21.488 (SD 2.92) mg/dL. The omnibus test of model coefficients revealed a statistically significant effect (Q $=$ 6.895, df $=$ 1, $p=$ 0.009), indicating that the choice of ML model significantly influenced the outcome variable within the selected studies. Similarly, the test of residual heterogeneity showed substantial residual heterogeneity across studies (Q $=$ 306.266, df $=$ 71, $p<$ 0.001), highlighting significant variability in effect sizes not explained by the ML models alone. Residual heterogeneity estimates further quantified the variability, with an estimated $\tau^{2}$ of 0.384 and $\tau$ of approximately 0.620 as shown in the Forest Plot (Supplementary Fig. 1). The I² statistic (75.595%) indicated a moderate to high level of heterogeneity among the included studies, suggesting considerable differences in effect sizes across ML models. Model type and sample size both were the source of heterogeneity. Additionally, the H² value (4.098%) reflected the ratio of true heterogeneity to total observed variability, emphasizing the impact of heterogeneity on the meta-analysis results. Furthermore, regression testing for funnel plot asymmetry using Egger’s test did not detect significant asymmetry ( $z=$ 0.427, $p=$ 0.669), suggesting no substantial publication bias among the included studies.

For PH $=$ 45 minutes, 5 studies [22, 23, 25, 26, 27] with 7 different ML models. The mean RMSE was 30.094 (SD 3.245) mg/dL. The omnibus test of model coefficients yielded a statistically significant result (Q $=$ 5.580, df $=$ 1, $p=$ 0.018), indicating that the choice of ML model significantly influenced the outcome variable within the selected studies. Similarly, the test of residual heterogeneity revealed substantial residual heterogeneity across studies (Q $=$ 153.332, df $=$ 9, $p<$ 0.001), highlighting significant variability in effect sizes not explained solely by the ML models. Residual heterogeneity estimates further quantified the extent of variability, with an estimated $\tau^{2}$ of 2.505 and $\tau$ of approximately 1.583 as shown in the Forest Plot (Fig. 4). The I² statistic (95.709%) indicated a high level of heterogeneity among the included studies, suggesting considerable differences in effect sizes across ML models. Additionally, the H² value (23.304%) reflected the ratio of true heterogeneity to total observed variability, emphasizing the impact of heterogeneity on the meta-analysis results. Furthermore, regression testing for funnel plot asymmetry using Egger’s test did not detect significant asymmetry ( $z=$ 1.700, $p=$ 0.089), suggesting no substantial publication bias among the included studies at this prediction horizon. These findings highlight the challenges associated with assessing ML model performance at the 45-minute prediction horizon, characterized by notable residual heterogeneity and variability across studies. Future research efforts should aim to address heterogeneity and consider the implications of different ML model choices within this timeframe, enhancing the reliability and applicability of ML-based predictive modeling in relevant healthcare contexts.

Figure 4.

Forest Plot for comparing ML models at a PH $=$ 45 mins.

For PH $=$ 60 minutes, 2 studies [24, 27] with 9 different ML models. The mean RMSE was 35.89 (SD 6.4) mg/dL. The omnibus test of model coefficients did not yield a statistically significant result (Q $=$ 3.182, df $=$ 1, $p=$ 0.074), suggesting that the choice of ML model may have a relatively minor influence on the outcome variable within the selected studies. Similarly, the test of residual heterogeneity showed moderate residual heterogeneity across studies (Q $=$ 83.888, df $=$ 68, $p=$ 0.093), indicating some variability in effect sizes not entirely explained by the ML models. Residual heterogeneity estimates quantified the extent of variability, with an estimated $\tau^{2}$ of 0.044 and $\tau$ of approximately 0.210 as shown in the Forest Plot (Supplementary Fig. 2). The I² statistic (25.830%) indicated a relatively low level of heterogeneity among the included studies, suggesting moderate consistency in effect sizes across ML models. Additionally, the H² value (1.348%) reflected a low ratio of true heterogeneity to total observed variability, indicating less impact of heterogeneity on the meta-analysis results compared to other prediction horizons. Regression testing for funnel plot asymmetry using Egger’s test did not detect significant asymmetry ( $z=-$ 0.625, $p=$ 0.532), suggesting no substantial publication bias among the included studies at this prediction horizon (Fig. 5). These findings suggest that ML model performance at the 60-minute prediction horizon may be relatively consistent and less influenced by model choice compared to shorter prediction horizons.

Figure 5.

Funnel Plot for studies comparing ML models at a PH $=$ 60 mins.

For PH $=$ 2 hours, 1 study [29] with 3 different ML models. The omnibus test of model coefficients revealed a statistically significant result (Q $=$ 140.661, df $=$ 1, $p<$ 0.001), indicating variability in model effects beyond chance. The test of residual heterogeneity also showed significant heterogeneity (Q $=$ 61.527, df $=$ 2, $p<$ 0.001), suggesting substantial inconsistency among study outcomes. The estimate of residual heterogeneity ( $\tau^{2}=$ 0.008, $\tau=$ 0.088) indicated a high degree of variability between studies as shown in the Forest Plot (Fig. 6), with an I² value of 96.501% and H² of 28.582%, classifying the heterogeneity as high. The regression test for funnel plot asymmetry (Egger’s test) further confirmed asymmetry ( $z=-$ 7.839, $p<$ 0.001), suggesting potential publication bias or other sources of bias affecting the meta-analysis results.

Figure 6.

Forest Plot for comparing ML models at a PH $=$ 2 hours.

Studies without a specific predictive horizon were included in the analysis to assess the performance of machine learning models in diabetes management irrespective of time-based forecasting; this includes 3 studies [28, 31, 30]with 13 different ML models. The omnibus test of model coefficients yielded a statistically significant result (Q $=$ 7.731, df $=$ 1, $p=$ 0.005), suggesting that the choice of ML model significantly influenced the outcome variable within the selected studies. Similarly, the test of residual heterogeneity revealed substantial residual heterogeneity across studies (Q $=$ 898.036, df $=$ 54, $p<$ 0.001), indicating significant variability in effect sizes not entirely explained by the ML models. Residual heterogeneity estimates quantified the extent of variability, with an estimated $\tau^{2}$ of 0.102 and $\tau$ of approximately 0.320 as shown in the Forest Plot (Supplementary Fig. 3). The I² statistic (93.118%) indicated a high level of heterogeneity among the included studies, suggesting considerable differences in effect sizes across ML models. Additionally, the H² value (14.530%) reflected a moderate ratio of true heterogeneity to total observed variability, emphasizing the impact of heterogeneity on the meta-analysis results. Furthermore, regression testing for funnel plot asymmetry using Egger’s test did not detect significant asymmetry ( $z=-$ 1.326, $p=$ 0.185), suggesting no substantial publication bias among the included studies without a specific predictive horizon. These findings underscore the complexity and variability in ML model performance across studies without a defined prediction horizon, highlighting the need for further investigation into specific model characteristics and contextual factors influencing performance.

4. Discussion

4.1 Key findings

This meta-analysis comprehensively evaluated the effectiveness of various ML models in improving blood glucose management among patients with diabetes mellitus (DM), from a selection of 10 eligible studies. Through a thorough and exhaustive literature searches, we obtained comprehensive evidence to assess the collective predictive capacity of ML models for BG level prediction in diabetes management.

4.2 Included studies comparison

Clearly, RMSE of machine learning models in predicting blood glucose levels increased as the PH extended from 15 to 60 mins, suggesting that extended PHs are associated with greater prediction inaccuracies. Based on these findings, the Random Forest (RF) model consistently demonstrates superior performance compared to other models (SVR, SVM, ARISES) across different studies for a prediction horizon of 15 minutes. Therefore, RF may be considered the best-performing model for predicting BG levels at this specific prediction horizon based on the available data. In our research focusing on a 15-minute prediction horizon for blood glucose management in diabetes, we analyzed multiple studies with Cohen’s d values ranging from $-$ 2 to $-$ 2.7119. These results indicate RF’s superior ability to predict BG levels within a short time frame. Overall, our meta-analysis highlights Random Forest as the most effective ML model for BG prediction at a 15-minute horizon in diabetes management.

In the investigation of a 30-minute prediction horizon the results from Rodríguez-Rodríguez et al. consistently demonstrated that Random Forest exhibited superior performance compared to Support Vector Machine and Bidirectional Recurrent Neural Network models, with Cohen’s d values ranging from $-$ 2.6358 to $-$ 2.7226. This indicates RF’s effectiveness in predicting BG levels within a 30-minute window. Additionally, Zhu et al. [24] studied various models using OhioT1DM dataset, where different models such as TCN, CRNN, LSTM, Bi-LSTM, SVR and ARIMA were compared by showing varying performance metrics. Our meta-analysis highlighted that the best ML model for blood glucose prediction in the 30-minute interval was Random Forest, consistent with the findings of Rodríguez-Rodrguez et al. [22, 23, 25]. The meta-analysis revealed several important findings. Rodríguez-Rodriguez et al. showed that random forest with Cohen’s d of $-$ 2.0 outperforms support vector regression (SVR); This indicates a strong predictive capability. Similarly, RF was reported to outperform SVM and bidirectional recurrent neural network (BRNN) models; this demonstrates the effectiveness of RF in predicting BG in a 45-minute window. Zhu et al. [27] demonstrated the performance of the ARISES model with RF, reaching a Cohen d of 0.6691, indicating a good performance for BG prediction in this period. Additionally, Zhu et al. demonstrated contrasting results with SVM and BRNN models, underscoring the variability in model performance across different studies and datasets. These findings highlight the nuanced effectiveness of ML models in BG management within a 45-minute prediction horizon. Based on the provided data for the 45-minute prediction horizon in blood glucose management, the model with the highest Cohen’s d value, indicating the best performance, is the ARISES model with Random Forest (RF) from Zhu et al. The Cohen’s d value for this model is 0.6691, suggesting that it exhibited the most favorable predictive capability compared to the other models evaluated within this timeframe.

For PH $=$ 60 min Across multiple comparisons, E3NN (OhioT1DM) consistently demonstrated superior performance compared to other models (TCN, CRNN, LSTM, Bi-LSTM, SVR, ARIMA). Negative Cohen’s d values (ranging from $-$ 0.4562 to $-$ 0.8103) indicated that E3NN (OhioT1DM) outperformed these models in various contexts. The effect sizes, though moderate in magnitude, were consistently in favor of E3NN (OhioT1DM). The 95% confidence intervals around Cohen’s d estimates provided additional context, indicating the precision and reliability of the effect size measurements. While some intervals were relatively wide due to the small sample size ( $N=$ 12), they generally supported the conclusion of E3NN (OhioT1DM) superiority.

In our comparative analysis of predictive models with no specific time frame, the Ensemble machine learning consistently emerged as the most effective model. This model demonstrated a substantial advantage over Linear Regression (LR), Random Forest (RF), and Gradient Boosting (GB), with Cohen’s d effect sizes ranging from $-$ 0.665 to $-$ 0.7335, favoring EML. These results are significant, as evidenced by the non-overlapping confidence intervals. The superior performance of EML demonstrates its ability as a powerful forecasting tool and highlights the importance of model selection in making accurate predictions.

For Zahedani et al. [29] study, among evaluated models (CGP, XGBoost, RF), CGP outperformed with the lowest RMSE (13.4), highest correlation (0.71), and lowest percent error (10.3%). These results highlight CGP’s suitability for accurate predictions in similar datasets, underscoring the impact of advanced machine learning techniques on predictive accuracy.

4.3 Strengths and limitations

The study is subject to several limitations. Despite employing a comprehensive search strategy, there is a possibility of missing relevant studies. To enhance literature retrieval, major medical databases such as PubMed, CINAHL, and Embase were included, and baseline models from relevant studies were screened to minimize omissions. Additionally, significant heterogeneity was observed across all subgroups due to various factors, including different types of diabetes mellitus, machine learning models, data sources, reference indices, and the timing and settings of data collection. To address this, meta-regression analyses were conducted within subgroups to explore potential sources of heterogeneity. Moreover, some studies lacked the required outcome measures or had inconsistent ones, necessitating the use of estimation methods for calculating indicators, which may have introduced some estimation error. However, this error was considered acceptable due to the use of appropriate estimation methods, enriching the study’s findings. Nonetheless, future studies should report all relevant outcome measures for comprehensive evaluation.

4.4 Future directions

In the future, improved ML models will enhance BG management for patients with DM, reducing adverse BG events and improving quality of life. Future studies should prioritize enhancing ML model performance in longer prediction horizons (e.g., 60 minutes) and address imbalanced CGM data to improve model accuracy. Integrating factors like meal intake and exercise into ML models, optimizing ensemble structures, and validating models in clinical settings are crucial steps for advancing BG management to support real-time feedback and medical intervention. Additionally, leveraging IoT benefits such as continuous monitoring and data integration could further enhance the effectiveness of these ML models in managing blood glucose levels.

5. Conclusion

In summary, as the prediction horizon (PH) extends, the RMSE for blood glucose level prediction models increases, with Random Forest (RF) demonstrating the most robust performance among the ML models assessed. Future research should prioritize improving predictive accuracy and implementing ML models effectively in clinical settings. Additionally, exploring enhanced approaches for integrating data from IoT devices could further optimize glucose management strategies.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author contributions

All authors contributed to the study’s conception and design. Material preparation, data extraction and analysis were performed by YH and YK. The first draft of the manuscript was written by YK. All authors read and approved the final manuscript.

Data availability

The original contributions presented in the study are included in the article/Supplementary materials, further inquiries can be directed to the corresponding author.

Supplementary data

The supplementary files are available to download from https://dx-doi-org.web.bisu.edu.cn/10.3233/THC-241403.

Footnotes

Acknowledgments

We would like to thank the senior management of Delhi Technological University for their constant support and guidance.

Conflict of interest

The authors declare no conflict and competing interest.

References

Saeedi

Petersohn

Salpea

Malanda

Karuranga

Unwin

, et al. IDF Diabetes Atlas Committee. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9 edition. Diabetes Res Clin Pract. 2019; 157: 107843.

Rowley

Bezold

Arikan

Byrne

Krohe

. Diabetes 2030: Insights from yesterday, today, and future trends. Popul Health Manag. 2017; 20(1): 6-12. doi: 10.1089/pop.2015.0181.

Chen

Wang

Shang

Liu

, et al. Development and validation of an incidence risk prediction model for early foot ulcer in diabetes based on a high evidence systematic review and meta-analysis. Diabetes Res Clin Pract. 2021; 180: 109040.

Guo

Guan

, et al. The predictive value of diabetic retinopathy on subsequent diabetic nephropathy in patients with type 2 diabetes: A systematic review and meta-analysis of prospective studies. Ren Fail. 2021; 43(1): 231-240.

Farooq

Riaz

Tehseen

Farooq

Saleem

. Role of Internet of things in diabetes healthcare: Network infrastructure, taxonomy, challenges, and security model. Digit Health. 2023; 9: 20552076231179056. doi: 10.1177/.20552076231179056.

Zhuang

Jumani

Sbeih

. Internet of things-assisted intelligent monitoring model to analyse the physical health condition. Technol Health Care. 2021; 29(6): 1277-1290.

Yang

Díaz

Kumar

. Internet of things-based intelligent physical support framework using future internet of things. Technol Health Care. 2021; 29(6): 1187-1199.

Tang

Seetharam

Vignesh

. Internet of Things-assisted intelligent monitoring model to analyze the physical health condition. Technol Health Care. 2021; 29(6): 1355-1369.

El-Attar

Moustafa

Awad

. Deep learning model to detect diabetes mellitus based on DNA sequence. Intell Autom Soft Comput. 2022; 31: 325-338.

10.

Iparraguirre-Villanueva

Espinola-Linares

Flores Casta neda

Cabanillas-Carbonell

. Application of machine learning models for early detection and accurate classification of type 2 diabetes. Diagnostics (Basel). 2023; 13(14): 2383.

11.

Bellemo

Lim

Rim

Tan

GSW

Cheung

Sadda

, et al. Artificial intelligence screening for diabetic retinopathy: The real-world emerging application. Curr Diab Rep. Jul 31, 2019; 19(9): 72.

12.

Niu

. Study on risk factors of peripheral neuropathy in type 2 diabetes mellitus and establishment of prediction model. Diabetes Metab J. 2021; 45(4): 526-538.

13.

Afsaneh

Sharifdini

Ghazzaghi

, et al. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: A comprehensive review. Diabetol Metab Syndr. 2022; 14: 196.

14.

Contreras

Vehi

. Artificial intelligence for diabetes management and decision support: Literature review. J Med Internet Res. 2018; 20(5): e10775. doi: 10.2196/10775.

15.

Liu

, et al. Machine learning models for blood glucose level prediction in patients with diabetes mellitus: Systematic review and network meta-analysis. JMIR Med Inform. 2023; 11: e47833. doi: 10.2196/47833.

16.

Moher

Liberati

Tetzlaff

Altman

, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med. 2009; 6(7): e1000097.

17.

Liberati

Altman

Tetzlaff

Mulrow

Gøtzsche

Ioannidis

JPA

, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration. BMJ. 2009; 339: b2700.

18.

Huang

Lin

Demner-Fushman

. Evaluation of PICO as a knowledge representation for clinical questions. AMIA Annu Symp Proc. 2006; 359-363.

19.

Whiting

Rutjes

AWS

Westwood

Mallett

Deeks

Reitsma

, et al. QUADAS-2 Group. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011; 155(8): 529-536.

20.

Higgins

JPT

Thompson

Deeks

Altman

. Measuring inconsistency in meta-analyses. BMJ. 2003; 327(7414): 557-560.

21.

White

. Multivariate random-effects meta-regression: Updates to Mvmeta. Stata J. 2011; 11(2): 255-270.

22.

Rodríguez-Rodríguez

Campo-Valera

Rodríguez

Frisa-Rubio

. Constrained IoT-Based Machine Learning for Accurate Glycemia Forecasting in Type 1 Diabetes Patients. Sensors (Basel). 2023; 23(7): 3665. doi: 10.3390/s23073665.

23.

Rodríguez-Rodríguez

Campo-Valera

, et al. IoMT innovations in diabetes management: Predictive models using wearable data. Expert Syst Appl. 2023; 238(Part C). doi: 10.1016/j.eswa.2023.121994.

24.

Zhu

Kuang

Daniels

, et al. IoMT-Enabled Real-Time Blood Glucose Prediction with Deep Learning and Edge Computing. IEEE Internet of Things Journal. 2023; 10(5): 3706-3719. doi: 10.1109/JIOT.2022.3143375.

25.

Rodríguez-Rodríguez

Campo-Valera

, et al. Forecasting glycaemia for type 1 diabetes mellitus patients by means of IoMT devices. Internet of Things. 2023; 24: 100945. doi: 10.1016/j.iot.2023.100945.

26.

Wolff

Schaathun

, et al. Mobile software development kit for real time multivariate blood glucose prediction. IEEE Access. 2024; 12: 5910-5919. doi: 10.1109/ACCESS.2024.3349496.

27.

Zhu

Uduku

Herrero

Oliver

Georgiou

. Enhancing self-management in type 1 diabetes with wearables and deep learning. NPJ Digit Med. 2022; 5(1): 78. doi: 10.1038/s41746-022-00626-5.

28.

Tripathi

Mishra

Vasudevan

. Smart diabetic prediction: An Intelligent IoT-Based Diabetic Monitoring System with Stacked Spatio Temporal Features-Based Multiscale Dilated Deep Temporal Convolutional Network. Sens Imaging. 2024; 25(2). doi: 10.1007/s11220-023-00446-1.

29.

Zahedani

McLaughlin

Veluvali

, et al. Digital health application integrating wearable data and behavioral patterns improves metabolic health. npj Digit. Med. 2023; 216. doi: 10.1038/s41746-023-00956-y.

30.

Azbeg

Boudhane

Ouchetto

, et al. Diabetes emergency cases identification based on a statistical predictive model. J Big Data. 2022; 31(9). doi: 10.1186/s40537-022-00582-7.

31.

Patel

Polsky

Small

, et al. Predicting changes in glycemic control among adults with prediabetes from activity patterns collected by wearable devices. NPJ Digit Med. 2021; 4(1): 172. doi: 10.1038/s41746-021-00541-1.

32.

Liu

Zhu

, et al. GluNet: A deep learning framework for accurate glucose forecasting. IEEE J. Biomed. Health Inform. 2020; 24(2): 414-423.

33.

Daniels

, et al. Convolutional recurrent neural networks for glucose prediction. IEEE J. Biomed. Health Inform. 2020; 24(2): 603-613.

34.

Martinsson

Schliep

Eliasson

Mogren

. Blood glucose prediction with variance estimation using recurrent neural networks. J. Healthcare Informat. Res. 2020; 4(1): 1-18.

35.

Mohebbi

, et al. Short term blood glucose prediction based on continuous glucose monitoring data. Proc. 42nd Annu. IEEE Int. Conf. Eng. Med. Biol. Soc. 2020. pp. 5140-5145.

36.

Georga

, et al. Multivariate prediction of subcutaneous glucose concentration in type 1 diabetes patients based on support vector regression. IEEE J. Biomed. Health Inform. 2013; 17(1): 71-81.

37.

Plis

Bunescu

Marling

, et al. A machine learning approach to predicting blood glucose levels for diabetes management. Proc. Workshops 28th AAAI Conf. Artif. Intell. 2014. pp. 35-39.

Continuous glucose monitoring using machine learning models and IoT device data: A meta-analysis

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSION:

Keywords

1. Introduction

2. Methods

2.1 Study design

2.2 Research questions

2.5 Exclusion criteria

2.6 Inclusion criteria

2.7 Data extraction and management

2.9 Data synthesis and statistical analysis

3. Results

3.2 Quality assessment of included studies

3.3 Statistical analysis

3.3.1 Machine learning models for predicting blood glucose levels

Table 1 Baseline characteristics from included studies of predicting BG levels

4.1 Key findings

4.2 Included studies comparison

4.3 Strengths and limitations

4.4 Future directions

5. Conclusion

Funding

Author contributions

Data availability

Supplementary data

Footnotes

Acknowledgments

Conflict of interest

References

Table 1
Baseline characteristics from included studies of predicting BG levels