Abstract
Sleep disturbance occurs early in Huntington’s disease (HD). Consumer- and research-grade activity monitors may enable routine assessment of sleep disturbances in HD. We compared Actiwatch Spectrum Pro, Jawbone UP2 and Fitbit One to the gold standard, polysomnography, in four late presymptomatic and three early HD participants. Compared to polysomnography, all ambulatory monitors overestimated total sleep time by >60 minutes and sleep efficiency by ∼15%. Thus, for assessment of specific sleep parameters in HD, none of the activity monitors are sufficiently accurate to replace polysomnography, although they may be sufficient for estimating overall sleep-wake patterns. Larger sample replication is required.
INTRODUCTION
Sleep disruption is one of the earliest symptoms of Huntington’s disease (HD), emerging up to 10 years prior to diagnosis [1]. Chronic insufficient sleep in this population may contribute to cognitive impairment or decline, some neuropsychiatric symptoms [2–4], and an increased rate of neurodegeneration [5]. Sleep problems are prevalent in people with HD, and perceived by them as contributing to the disease burden [6]. Sleep dysfunction is greater in more advanced disease [1]. Ongoing sleep monitoring using non-invasive techniques has the potential to clarify how sleep disturbances advance with the disease, and how they are associated with disease progression, and cognitive function, in particular.
At present, actigraphy and consumer wearable activity monitors, such as Fitbit or Jawbone, might be the most suitable assessment tool for this task. These devices estimate sleep-wake patterns by detecting wrist movements, and can be deceived by an absence of movement during wake, or by excessive movement during sleep. The precision of actigraphic sleep estimations varies between populations and different models of the monitors, thus validation of each specific model of actigraph, for each population, has been recommended [7]. In healthy populations actigraphy and consumer wearables showed reasonable validity [8–13]. In clinical samples with disrupted sleep, validity is reduced depending on the level and characteristics of sleep disturbance [14–21]. A study, providing comparable statistical characteristics allowing for assessment of validity of actigraphy and/or consumer wearables in HD populations, has not yet been published.
Poor sleep quality is often present in Huntington’s populations [22], and along with characteristic movement disorders, emphasises the importance of validation of actigraphy and consumer wearables before research and/or clinical use. Therefore, we evaluated the validity of Actiwatch Spectrum Pro, Jawbone UP2 and Fitbit One, in comparison to lab-based polysomnography as the gold standard for sleep measurement in late premanifest and early stage HD.
MATERIALS AND METHODS
Participants
Seven Huntington’s gene carriers, all Caucasian (6 females, 1 male; Mage = 54.14±6.4, MCAG = 42.6), were recruited from Monash University HD research volunteer database. The participants’ disease severity ranged from presymptomatic (n = 4; MDisease Burden Score = 333.4), to early symptomatic (n = 3; Mduration of illness = 3.2 years). The Unified Huntington’s Disease Rating Scale Total Functional Capacity score ranged 9–13, and Total Motor Score ranged 0–19 (see Supplementary material 1 for more clinical characteristics and medication list). Exclusion criteria included: a) current participation in clinical drug trials; b) concomitant major neurological, psychiatric, or severe medical illness; c) a history of traumatic brain injury; d) drug or alcohol abuse; e) shift work; f) travel across time zones within the previous 3 months; and f) regular consumption:>300 mg caffeine per day, or ≥4 standard alcoholic drinks in one sitting or ≥2 a day. The study was approved by the Monash University Human Research Ethics Committee. All participants provided written informed consent.
Self-reported sleep quality
Participants completed two self-report measures: the Pittsburgh Sleep Quality Index (PSQI) [23], measuring global sleep quality, and the Insomnia Severity Index (ISI) [24], measuring severity of insomnia symptoms. On average, participants reported poor sleep quality (PSQI total = 10.6±6.1), and subthreshold insomnia (ISI = 10.4±10.5).
Polysomnography
We recorded standard clinical polysomnography using Compumedics Grael High Definition (Compumedics Limited, Australia). Sleep was scored according to the American Academy of Sleep Medicine criteria [25] in 1 min epochs.
Actigraphy, Jawbone UP2 and Fitbit One
Consumer wearable monitors Jawbone UP2 and Fitbit One were set to default settings, and their companies’ specialised software provided sleep scored in 1 min epochs. To keep data comparable, the Actiwatch Spectrum Pro (Philips/Respironics, Murrysville, PA) was also set to collect data in 1 min epochs using default settings (medium sensitivity threshold (40counts); and 10 min of immobility rule with ≤1 epoch scored as wake for auto scoring of sleep onset and offset).
Procedure
During habitual sleep-wake times participants underwent overnight laboratory polysomnography while wearing the Actiwatch SP, Jawbone UP2 and Fitbit One on their non-dominant wrist (see Supplementary material 6).
Data processing and analysis
We aligned the monitors’ data with polysomnography and analysed from lights-out until lights-on. Based on sleep-wake activity recorded by polysomnography and all monitors we calculated outcome sleep parameters: total sleep time – time asleep; sleep latency – time until first 10 min of inactivity with ≤1 epoch scored as wake; sleep efficiency – percentage of sleep epochs between lights-out and lights on; and wake after sleep onset – time awake between initial falling asleep and final awakening.
Statistical analyses
Estimates of sleep parameters from the ‘gold standard’ polysomnography were compared to estimates from Actiwatch, Jawbone and Fitbit using paired t-tests. We used the Bland-Altman method [26] to assess agreement between monitoring methods. We set a priori clinical agreement limits to ±30 min for total sleep time, sleep latency, and wake after sleep onset, and to ±5% for sleep efficiency [21, 27].
We assessed epoch-by-epoch concordance between polysomnography and all monitors by determining sensitivity, specificity, accuracy, predicted value for sleep, predicted value for wake, and the Prevalence and Bias-Adjusted Kappa (PABAK). PABAK gives balanced weight to sleep and wake epochs [28], correcting for the overrepresentation of sleep epochs. PABAK’s strength of agreement was interpreted using Landis and Koch’s guidelines [29].
RESULTS
Estimates of the sleep parameters by the monitors differed significantly from polysomnography, with each monitor showing similar patterns (Table 1). Actiwatch significantly overestimated total sleep time by 74 min (t = 3.60, p = 0.011, d = 1.36) and sleep efficiency by 14.8% (t = 3.54, p = 0.012, d = 1.34). Jawbone overestimated total sleep time by 78.7 min (t = 4.07, p = 0.007, d = 1.54) and sleep efficiency by 16.3% (t = 4.25, p = 0.005, d = 1.6), and underestimated wake after sleep onset by 36 min (t = –3.38, p = 0.015, d = –1.28). Fitbit overestimated total sleep time by 88.1 min (t = 4.93, p = 0.003, d = 1.9) and sleep efficiency by 17.4% (t = 4.64, p = 0.004, d = 1.8), and underestimated wake after sleep onset by 39 min (t = –3.55, p = 0.012, d = –1.34).
Results of parametric paired comparisons of sleep parameters between PSG and monitors
Note. Significantly different compared to polysomnography at *p < 0.025, **p < 0.01 (two-tailed). Estimation errors beyond a priori clinical agreement limits are in bold. PSG – polysomnography.
Bland-Altman analyses (Supplementary materials 2-4) showed that, compared to the polysomnography gold standard, average estimation errors (Bias) of all three monitors for total sleep time and sleep efficiency, fell outside of the clinical agreement limits. For wake after sleep onset it was the case only for Jawbone and Fitbit. Total sleep time was overestimated by Actiwatch in 86% of cases, and by Jawbone and Fitbit in 71% of cases; sleep efficiency was overestimated by Actiwatch, Jawbone and Fitbit in 86% of cases; wake after sleep onset was underestimated by Actiwatch in 29% of cases, and by Jawbone and Fitbit in 43% of cases. All monitors showed a trend towards larger estimation errors of sleep parameters in participants with poorer sleep, as seen in the regression lines in Supplementary materials 2, 3. Descriptively, all monitors showed slightly smaller estimation errors in the early HD group (Supplementary material 5), although all comparisons between presymptomatic and symptomatic subgroups are exploratory only due to the very small sample sizes.
All monitors showed high sensitivity in determining sleep, low and substantially varied specificity in identifying wake, and satisfactory level of accuracy. PABAK scores fell in the low end of substantial agreement between PSG and all monitors (Table 2).
Epoch by epoch agreement analyses
Note. Sensitivity – a proportion of PSG-classified sleep epochs correctly identified by actigraphy; Specificity – a proportion of PSG-classified wake epochs correctly identified by actigraphy; Accuracy – a proportion of epochs correctly identified by actigraphy: PVS – Predictive values of sleep; PVW – Predictive values of wake; PABAK – Prevalence and Bias Adjusted Kappa. The range represents spread of values of sensitivity, specificity and accuracy across the participants.
DISCUSSION
Actiwatch SP, Jawbone UP2 and Fitbit One exhibited similar patterns across sleep parameters, significantly overestimating total sleep time by >60 minutes, and sleep efficiency by ∼15%, while underestimating wake after sleep onset by >30 minutes (Table 1), thus failing to meet our pre-specified threshold for acceptable levels of clinical agreement. The only exceptions were the Actiwatch showing acceptable agreement with polysomnography on wake after sleep onset, and all monitors showing acceptable agreement with polysomnography for sleep latency, although these were still notably underestimated. The unexpected observation of marginally more precise estimation of sleep parameters by the monitors in the symptomatic subgroup (Supplementary material 5) seems to contradict the trends for reduction in the monitors’ validity with sleep deterioration (Supplementary materials 2, 3). This subgroup had worse subjective sleep as shown by PSQI (Supplementary materials 1), however, subjective ratings are rarely good predictors of objective sleep [30]. This observation is also only descriptive in nature, due to the small size of the subgroups.
Overall, these results are similar to other studies [14, 31], in which epoch-by-epoch comparisons with polysomnography showed consistently high sensitivity for sleep detection, and varied and low specificity for wake identification (see Table 2). PABAK showed acceptable consistency between polysomnography and each of the monitors, but only just reached the acceptable level. Accuracy, which accounts for both sleep and wake, was acceptable for participants with better levels of sleep efficiency, but dropped below acceptable levels for participants with low sleep efficiency (<72%). Thus, suggesting a reduction in the monitors’ validity with deterioration of sleep (Supplementary materials 2-4). This could have important consequences for longitudinal studies in HD, where sleep is expected to deteriorate over time.
Our small sample size might not reflect true population’s objective sleep at this stage of HD. It is unlikely, though, sleep quality in our sample was better than average, due to the disruptive nature of the “first laboratory night effect” [32], and possible insomnia side effects from medication used by two of our participants. If we captured worse than average sleep quality, then it provided a greater test for the monitors. However, our observations are consistent with patterns shown in other studies in clinical populations of similar age and clinical characteristics (i.e., overestimating total sleep time and sleep efficiency, and underestimating wake after sleep onset) [14–19, 21]. One caveat is that our sample had very little chorea, so we were unable to ascertain the impact of chorea on the accuracy of the monitors, although it is reasonable to assume that chorea will adversely affect monitor accuracy. Accuracy could potentially be improved by utilising more sensitive activity thresholds for determining wake periods on the Fitbit One and Actiwatch SP (see Parkinson’s study [14]), however, Jawbone UP2 does not provide an option for changing activity thresholds. Another option is to change the sampling rate to 30 sec epochs on Actiwatch SP, which is not available for Jawbone or Fitbit.
Overall, we demonstrated that compared to polysomnography, Actiwatch SP, Fitbit One and Jawbone UP2 produce less accurate estimates of specific sleep parameters in HD. Nonetheless, in the absence of inexpensive alternatives to polysomnography that can be widely applied in patients’ homes, consumer-grade wearables may be sufficient for overall estimations of sleep-wake patterns, and/or to assess gross level changes over time.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
Footnotes
ACKNOWLEDGMENTS
The authors would like to thank all the participants for their contribution to this study. The authors also would like to acknowledge the team members of the Monash University Sleep and Circadian Medicine Laboratory that provided training, support and night shift supervision. Special thanks are to Christopher Andara and Parisa Vidafar.
This research was funded by the Monash University.
