Identifying Continuous Glucose Monitoring Data Using Machine Learning

Abstract

Background and Aims:

The recent increase in wearable devices for diabetes care, and in particular the use of continuous glucose monitoring (CGM), generates large data sets and associated cybersecurity challenges. In this study, we demonstrate that it is possible to identify CGM data at an individual level by using standard machine learning techniques.

Methods:

The publicly available REPLACE-BG data set (NCT02258373) containing 226 adult participants with type 1 diabetes (T1D) wearing CGM over 6 months was used. A support vector machine (SVM) binary classifier aiming to determine if a CGM data stream belongs to an individual participant was trained and tested for each of the subjects in the data set. To generate the feature vector used for classification, 12 standard glycemic metrics were selected and evaluated at different time periods of the day (24 h, day, night, breakfast, lunch, and dinner). Different window lengths of CGM data (3, 7, 15, and 30 days) were chosen to evaluate their impact on the classification performance. A recursive feature selection method was employed to select the minimum subset of features that did not significantly degrade performance.

Results:

A total of 40 features were generated as a result of evaluating the glycemic metrics over the selected time periods (24 h, day, night, breakfast, lunch, and dinner). A window length of 15 days was found to perform the best in terms of accuracy (86.8% ± 12.8%) and F1 score (0.86 ± 0.16). The corresponding sensitivity and specificity were 85.7% ± 19.5% and 87.9% ± 17.5%, respectively. Through recursive feature selection, a subset of 9 features was shown to perform similarly to the 40 features.

Conclusion:

It is possible to determine with a relatively high accuracy if a CGM data stream belongs to an individual. The proposed approach can be used as a digital CGM “fingerprint” or for detecting glycemic changes within an individual, for example during intercurrent illness.

Introduction

Cybersecurity in digital health care is a growing concern.¹ In diabetes care, wearable devices, such as continuous glucose monitoring (CGM), connected insulin pumps, and smart insulin pens, are generating large data sets of frequently sampled patient-level data.² These data are held in cloud-based storage managed by multiple providers and the use of open-source platforms, such as NightScout and OpenAPS, may be associated with a lower security standard.³ In addition, some users share their de-identified data in publicly available data sets, such as the OpenAPS Data Commons.⁴

Unintended identification of CGM data in open data sets is a data privacy concern and may have consequences for people with diabetes in insurance-based health systems.

In this study, we aim to demonstrate that it is possible to identify CGM data at an individual level by standard machine learning techniques and discuss implications for reidentification of data and other potential applications of the proposed technique.

Methods

Data

The publicly available REPLACE-BG data set (NCT02258373) was used.⁵ The IRB approval for this study was granted by Jaeb Center for Health Research (JCHR). The study was conducted between May 2015 and March 2016 in adult participants with type 1 diabetes (T1D) of >1-year duration. All participants used an insulin pump and had an Hemoglobin A1C (HbA1c) of 9.0% (75 mmol/mol) or lower, with intact awareness of hypoglycemia and no recent severe hypoglycemia. After a run-in phase during which participants wore blinded CGM (Dexcom G4 Platinum; Dexcom, San Diego, CA), participants were randomized to make insulin dosing decisions using either real-time CGM only, or real-time CGM and self-monitoring blood glucose (SMBG) for 26 weeks.

CGM and SMBG data were available for all participants. Both groups used SMBG measurements to calibrate their CGM devices, as per manufacturer specifications. A total of 226 participants were included in the data set. The median (interquartile range) age was 43.0 (31.0–55.0) years, with HbA1c of 53 (49–57) mmol/mol [7.0 (6.6–7.4) %]. Additional participant characteristics according to treatment group can be found in the original publication by Aleppo et al.⁵

In this study, an additional off-line exclusion criterion to restrict the analysis to participants with at least 90 days of available CGM data was used, where a day of data is considered to have at least 70% of CGM measurements available.

Assessment method

A binary classifier, a type of machine learning algorithm that is employed for classifying the elements of a set into two groups based on a classification rule, was chosen to determine if a CGM data stream belongs to an individual with T1D or not. In particular, a support vector machine (SVM) was chosen because of its ability to work with relatively small data sets and its ability to minimize overfitting.⁶

The proposed method requires CGM data from the individuals to be identified (target individuals) as well as from other subjects. A binary classifier is then trained for each one of the target individuals in the data set. Once the classifiers have been trained, each classifier is able to infer if an unseen CGM data stream of a given length belongs to a target individual.

Glycemic metrics and feature vector

The following glycemic metrics were chosen to generate the feature vector used for classification: mean blood glucose (BG), percentage time in glucose range 70–180 mg/dL (%TIR70–180), percentage times <70 and 54 mg/dL (%TB70 and %TB54), percentage times >180 and 250 mg/dL (%TA180 and %TA250)⁷, BG coefficient of variation (CV)⁸, mean absolute glucose (MAG)⁹, high blood glucose index, low blood glucose index¹⁰, Continuous Overlapping Net Glycemic Action over an 8-h period (CONGA), and percentage of available CGM data (%CGM).¹¹

With the exception of %CGM, which was only evaluated on the 24-h period, all the other metrics were evaluated for the 24-h period, the day period (8 am–midnight), and the night period (midnight–7 am). Finally, the mean postprandial glucose peak (MAXBG) and its corresponding mean time (TMAXBG) were evaluated for the following postprandial periods: breakfast (5 am–12 pm), lunch (12 pm–5 pm), and dinner (5 pm–midnight). A moving average filter was applied to smooth the MAXBG and TMAXBG metrics.

Training and testing data sets

To generate the training and testing data sets, each individual CGM data stream was partitioned into multiple smaller data packets using a nonoverlapping sliding window of a predefined length (e.g., 15 days). The use of a nonoverlapping sliding window ensures that no CGM measurements are shared between the resulting data packets.

CGM data packets containing <70% of the expected CGM measurements were excluded from the analysis. The feature vector was then evaluated on each of the resulting data packets to generate the datapoints used for training and testing the classifiers. Each one of the generated datapoints included the subject identifier that was later used to generate the labels for classification. Figure 1 displays in a graphical way the steps followed for obtaining a datapoint.

FIG. 1.

An illustration of the employed steps for the generation of a datapoint.

For each subject, 70% of the resulting datapoints were randomly selected and introduced into an auxiliary training data set also containing data from other individuals. Similarly, the remaining 30% of datapoints were introduced into an auxiliary testing data set. Therefore, the resulting auxiliary training and testing data sets retain the 70%–30% proportion, respectively.

To train the individual model, all the datapoints from the auxiliary training data set that correspond to the target individual are retrieved and their label is set to “positive” to indicate that they belong to the positive class (a target individual). Then, we randomly pick an equal number of datapoints from the auxiliary training data set corresponding to other subjects, and we set their label (or class) to “negative” to indicate that they do not belong to the target individual. This process guarantees that the number of positive and negative datapoints in the individual training set are the same, hence avoiding problems due to class imbalance.¹²

To generate the individual testing set, a similar process used to generate the individual training set was followed except nontarget individuals whose datapoints were employed to generate the individual training set are excluded when generating the individual testing data set to ensure participants from the training phase and the testing phase are different. This constraint emulates real-world conditions where subjects from the training phase and the deployment phase might be different.

Figure 2 displays in a graphical way the proposed methodology for training and testing the individual binary classifiers.

FIG. 2.

Graphical representation of the methodology employed for training and testing an individual SVM binary classifier. SVM, support vector machine.

Window length selection

Durations of 3, 7, 15, and 30 days were selected to evaluate the impact of the length of the nonoverlapping time window. The lengths are based on clinical criteria, including the duration of CGM sensors (10–14 days), as well as machine learning criteria of ensuring sufficient data to train and test the individual classifiers. Furthermore, there is consensus about the number of days that are required for robust estimation of some of the chosen glycemic metrics.^13
–15 However, there is evidence that indicates that for other metrics, such as those used to report hypoglycemia, a longer duration is required.¹⁶

Feature selection and elimination

In binary classification, adding more features is not always associated with better performance due to overfitting.¹⁷ To assess the performance of the classifier when certain features are eliminated, we applied recursive feature elimination, dropping one feature at a time, and eliminating the one that has the least impact on classification performance. For this purpose, the F1 score metric was employed. This process was then recursively repeated until one feature was left.

For comparison of the impact of the different window lengths, and of the recursive feature elimination, the random seeds used to generate the training and testing sets were fixed.

Data analysis

To evaluate the performance of the classifiers, the following commonly employed statistical measures were selected: sensitivity, specify, precision, accuracy, and F1 score. In addition, the average sizes of the individual training and testing sets were reported for the different time window lengths. Results are reported as mean ± standard deviation (SD). Experiments and data analysis were performed in Matlab 2020b (Mathworks, Inc., Natick, MA) including the Machine Learning and Statistical toolbox.

Results

Out of 226 subjects in the REPLACE-BG data set, 205 subjects were left after applying the off-line exclusion criteria.

An SVM with a Radial Basis Function (RBF) kernel and standardization provided the optimal results when compared with the polynomial and linear kernels and was used throughout. The selected glycemic metrics evaluated for the different time periods (i.e., 24 h, day, night, breakfast, lunch, and dinner) generated a total of 40 features.

Table 1 reports the average performance (mean ± SD) of the binary classifiers for the 40 selected features and the window lengths of 3, 7, 15, and 30 days. It also includes the average size of training and testing sets for each one of the time window lengths.

Table 1.

Average Performance of the Classifiers When Accounting for the 40 Features for Different Time Window Lengths

Length (days)	Sensitivity (%)	Specificity (%)	Precision (%)	Accuracy (%)	F1 score (%)	Training set size	Testing set size
3	75.3 ± 12.1	75.1 ± 12.9	75.8 ± 10.6	75.2 ± 9.9	0.75 ± 0.10	106.8 ± 11.1	45.8 ± 4.9
7	82.5 ± 15.6	80.8 ± 14.3	82.0 ± 12.4	81.7 ± 11.2	0.81 ± 0.12	45.6 ± 5.3	19.2 ± 2.2
15	85.7 ± 19.5	87.9 ± 17.5	89.9 ± 13.9	86.8 ± 12.8	0.86 ± 0.16	20.9 ± 2.7	8.8 ± 1.2
30	88.2 ± 26.4	78.7 ± 29.7	84.7 ± 21.7	83.5 ± 20.6	0.84 ± 0.22	10.3 ± 1.8	3.9 ± 0.6

Overall, the classifier using a window length of 15 days performs best in terms of accuracy and F1 score. Hence, the window length of 15 days was selected for the subsequent analysis. Another reason for choosing 15 days rather than 30 days is the reduced size of the training and testing sets (i.e., 10.3 ± 1.8 and 3.9 ± 0.6, respectively), which reduces the risk of overfitting.

Figure 3 shows the evolution of the F1 score metric (mean ± SD) when applying the recursive feature elimination method. Mean F1 score remains virtually constant (∼0.86) up to the elimination of 31 features. From 32 features onward the mean performance of the classifiers is significantly reduced. The features left after eliminating the 31 less important features are %TB54^24h, %TA180^night, CV^night, MAG^night, CV^day, MAG^day, MAXBG^breakfast, MAXBG^lunch, and MAXBG^dinner, where the super indices “24h,” “day,” “night,” “breakfast,” lunch,” and “dinner” indicate the time interval on which the metric is evaluated.

FIG. 3.

F1 score (mean ± SD) versus the number of features eliminated by the recursive feature elimination method. SD, standard deviation.

Discussion

This study demonstrates that by extracting nine commonly used CGM metrics it is possible to determine with a relatively high accuracy (>86%) if a CGM data set of 15 days belongs to an individual with T1D. This finding suggests that CGM data may be a personal “glucoprint,” with potential for future clinical implementation. However, there may also be significant cybersecurity implications as CGM data become identifiable, even without sex, date of birth, or other traceable identifiers. Potential ways to protect individuals against reidentification from CGM data sets would be to use traditional encryption although this may be too computationally expensive for time series.

It may be possible to add noise, for example, by jumbling days, or applying a transformation such as a wavelet transform, as used for electrocardiogram (ECG) security.¹⁸ Multilevel clustering-based anonymization has been used for physical activity data.¹⁹ These techniques may preserve times in ranges while features identifying individuals are weakened. More complex methodologies for data security have been described, including generative adversarial networks.²⁰

It may also be that the cybersecurity offered by a glucoprint can be used to increase security, for example, by pairing an automated insulin delivery controller or alternate controller enable pump with an individual glucoprint. It could also be feasible to link a glucoprint to an electronic health record to securely link individual data across platforms. Given the issues raised, it may be that further efforts are needed to break up publicly available data sets.

Some of the employed CGM metrics, and in particular the ones used to report hypoglycemia, may need CGM time windows of >30 days to report reliable results.¹⁶ Although a window length of 15 days has been chosen to carry out the feature selection process, a longer window might improve the machine learning if sufficient data are available. Of the CGM metrics identified, MAXBG and MAG are consistently in the nine best features, even when the experiments are repeated for different random seeds. However, there is some randomness in the selection of the other selected features and there may be some redundancy in the employed metrics.

MAG has been associated with important clinical outcomes⁹ and is a good discriminator of glucose variability.²¹ MAXBG as used in these data has not been associated with any clinical outcome but may be similar to any assessment of glucose response to a carbohydrate load, as used in the diagnosis of diabetes. It may also reflect individual meal behaviors, further data are required to discriminate this.

The reported results correspond to population averages so it is possible that the model for an individual subject would perform less well, making some individuals less readily identifiable.

One limitation of this study is that the individual data sets used for training and testing are relatively small. However, the data sets are balanced, mitigating this issue and the SVM approach is robust with smaller data sets.

The proposed approach not only can be used as a digital CGM “glucoprint” to identify individual data from other subject's data, but also for detecting glycemic changes within the same individual, such as during acute illness or psychological stress. This information can be then used to alert the user and health care team to ensure proactive management.

This study has only considered clinical glycemic metrics to generate the feature vector used in the classification problem; however, there are other type of features, such as signal processing-based features,²² which can be employed. The analysis of other types of features will be the subject of future study, and further analyses in data sets from people at high risk of hypoglycemia and in type 2 diabetes will be important.

Finally, other biomedical signals (such as meals, insulin, and exercise data) can be considered in the classification and potentially improve the machine learning performance. Furthermore, the proposed methodology can be applied to other types of continuous medical data (e.g., ECG, EEG).

Footnotes

Disclaimer

The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care.

Acknowledgment

Infrastructure support was provided by the NIHR Imperial Biomedical Research Centre.

Author Disclosure Statement

N.S.O. has received research funding from Dexcom and Roche Diabetes, and is a member of advisory boards for Dexcom, Roche Diabetes, and Medtronic Diabetes.

Funding Information

No funding was received for this article.

References

Ghafur

, Grass

, Jennings

, Darzi

: The challenges of cybersecurity in health care: the UK National Health Service as a case study. Lancet Digit Health, 2019; 1:e10–e12.

Galindo

, Aleppo

: Continuous glucose monitoring: the achievement of 100 years of innovation in diabetes technology. Diabetes Res Clin Pract, 2020; 170:108502.

, Borst

, Garrity

, et al.: Evolution of do-it-yourself remote monitoring technology for type 1 diabetes. J Diabetes Sci Technol, 2020; 14:854–859.

Kariotis

, Ball

, Greshake Tzovaras

, et al.: Emerging health data platforms: from individual control to collective data governance. Data Policy, 2020; 2:e13.

Aleppo

, Ruedy

, Riddlesworth

, et al.: REPLACE-BG: a randomized trial comparing continuous glucose monitoring with and without routine blood glucose monitoring in adults with well-controlled type 1 diabetes. Diabetes Care, 2017; 40:538–545.

Noble

: What is a support vector machine?. Nat Biotechnol, 2006; 24:1565–1567.

Maahs

, Buckingham

, Castle

, et al.: Outcome measures for artificial pancreas clinical trials: a consensus report. Diabetes Care, 2016; 39:1175–1179.

Service

: Glucose variability. Diabetes, 2013; 62:1398–1404.

Hermanides

, Vriesendorp

, Bosman

, et al.: Glucose variability is associated with intensive care unit mortality. Crit Care Med, 2010; 38:838–842.

10.

Kovatchev

, Straume

, Cox

, Farhy

: Risk analysis of blood glucose data: a quantitative approach to optimizing the control of Insulin Dependent Diabetes. J Theor Med, 2000; 3:1–10.

11.

McDonnell

, Donath

, Vidmar

, et al.: A novel approach to continuous glucose analysis utilizing glycemic variation. Diabetes Technol Ther, 2005; 7:253–263.

12.

Krawczyk

: Learning from imbalanced data: open challenges and future directions. Prog Artif Intell, 2016; 5:221–232.

13.

Xing

, Kollman

, Beck

, et al.: Optimal sampling intervals to assess long-term glycemic control using continuous glucose monitoring. Diabetes Technol Ther, 2011; 13:351–358.

14.

Riddlesworth

, Beck

, Gal

, et al.: Optimal sampling duration for continuous glucose monitoring to determine long-term glycemic control. Diabetes Technol Ther, 2018; 20:314–316.

15.

Camerlingo

, Vettoretti

, Sparacino

, et al.: Design of clinical trials to assess diabetes treatments: minimal duration of continuous glucose monitoring data to estimate time-in-ranges with a desired precision. Diabetes Obes Metab, 2021; 23:2446–2454.

16.

Herrero

, Alalitei

, Reddy

, et al.: Robust determination of the optimal continuous glucose monitoring length of intervention to evaluate long-term glycaemic control. Diabetes Technol Ther, 2020; 23:19–22.

17.

Subramanian

, Simon

: Overfitting in prediction models—is it a problem only in high dimensions?. Contemp Clin Trials, 2013; 36:636–641.

18.

Kumar

, Ranganatham

, Singh

, et al.: A robust digital ECG signal watermarking and compression using biorthogonal wavelet transform. Res Biomed Eng, 2021; 37:79–85.

19.

Parameshwarappa

, Chen

, Koru

: An effective and computationally efficient approach for anonymizing large-scale physical activity data. Int J Inf Secur Priv, 2020; 14:72–94.

20.

Yoon

, Drumright

, Van Der Schaar

: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J Biomed Health Inform, 2020; 24:2378–2388.

21.

Moscardó

, Herrero

, Reddy

, et al.: Assessment of glucose control metrics by discriminant ratio. Diabetes Technol Ther, 2020; 44:1–44.

22.

Guemes

, Cappon

, Hernandez

, et al.: Predicting quality of overnight glycaemic control in type 1 diabetes using binary classifiers. IEEE J Biomed Health Inform, 2020; 24:1439–1446.