Abstract
Background:
The Huntington’s Disease Integrated Staging System (HD-ISS) has four stages that characterize disease progression. Classification is based on CAG length as a marker of Huntington’s disease (Stage 0), striatum atrophy as a biomarker of pathogenesis (Stage 1), motor or cognitive deficits as HD signs and symptoms (Stage 2), and functional decline (Stage 3). One issue for implementation is the possibility that not all variables are measured in every study, and another issue is that the stages are broad and may benefit from progression subgrouping.
Objective:
Impute stages of the HD-ISS for observational studies in which missing data precludes direct stage classification, and then define progression subgroups within stages.
Methods:
A machine learning algorithm was used to impute stages. Agreement of the imputed stages with the observed stages was evaluated using graphical methods and propensity score matching. Subgroups were defined based on descriptive statistics and optimal cut-point analysis.
Results:
There was good overall agreement between the observed stages and the imputed stages, but the algorithm tended to over-assign Stage 0 and under-assign Stage 1 for individuals who were early in progression.
Conclusion:
There is evidence that the imputed stages can be treated similarly to the observed stages for large-scale analyses. When imaging data are not available, imputation can be avoided by collapsing the first two stages using the categories of Stage≤1, Stage 2, and Stage 3. Progression subgroups defined within a stage can help to identify groups of more homogeneous individuals.
Keywords
INTRODUCTION
The Huntington’s Disease Integrated Staging System (HD-ISS) [1] is an evidence-based framework intended to facilitate clinical research and interventional studies at points earlier in the disease course than previously considered. The HD-ISS characterizes disease progression from birth onward using four stages. In Stage 0, individuals have the Huntington’s disease genetic mutation (CAG≥40) without any detectable pathological alterations. Stage 1 is marked by measurable underlying pathophysiology as indicated by striatal atrophy. Stage 2 indicates the appearance of HD signs and symptoms (motor or cognitive), and Stage 3 reflects functional decline.
Stage classification depends on the pattern of meeting threshold criteria for landmark variables. Each variable has a cut-off threshold; if either measured landmark variable surpasses the established cut-off threshold, then an individual meets the criteria for a stage. Final classification is based on the highest stage criteria reached, provided this is achieved in the order consistent with the HD-ISS (otherwise, classification is undefined).
The landmark for Stage 0 is CAG length, with a threshold of 40 or greater, which is based on current penetrance evidence [1]. The landmarks for Stage 1 are caudate and putamen volume, corrected for total intra-cranial volume. The landmarks for Stage 2 are the UHDRS [2] Total Motor Score (TMS) and Symbol Digit Modalities Test (SDMT; education is factored into the SDMT thresholds). Finally, the landmarks for Stage 3 are Total Functional Capacity (TFC) and the Independence Scale (IS).
For Stages 1–3, surpassing the threshold for either variable or both fulfills the criteria. These thresholds depend on age, but not on CAG length, having been derived from data of non-HD controls (CAG < 36). Tabled threshold values appear in the appendix of the original paper [1], and a web-based tool is available for staging based on user input (https://enroll-hd.org/HD-ISS-Calculator/).
The HD-ISS is intended to be applied cross-sectionally. The established landmark thresholds demarcate the extreme values derived from the non-HD control population, rather than signaling a within-person shift from previous visits. Therefore, stage criteria are based on the level of the landmark variables at a particular visit, and not based on change or the rate of change over visits. To aid the use of the HD-ISS in clinical research, we address two challenges of implementation. First, precise staging requires that all the landmark variables be available. The largest active natural history study of HD, Enroll-HD [3], does not collect imaging. Because Enroll-HD is widely used to study natural history and to plan clinical trials, it would be beneficial to apply the HD-ISS to Enroll-HD.
Second, the HD-ISS defines stage boundaries, but it does not provide specific information regarding subgroups of progression within a stage. Individuals who recently entered a stage might be changing at a slower rate compared to individuals who will soon exit the stage. While the stages broadly categorize the phase of disease progression, identifying a progression subgroup within a stage can help researchers to define a more homogeneous subpopulation. This information can be used as the basis of prognostic enrichment strategies for improving interventional studies.
Both challenges are addressed in this paper. We impute HD-ISS stages for study visits that have missing landmark variables (for example, imaging in Enroll-HD) or that have patterns of variables that are incongruent with the system. Then, we map the progression indices, such as the CAG-age product (CAP) [4] and the HD prognostic index normed (PIN) [5], to the HD-ISS and define subgroups of progression within stages. The results are discussed in the context of clinical trial planning with an emphasis on using Enroll-HD data.
MATERIALS AND METHODS
Participants
Four data sets were used in the analysis: Enroll-HD [3] (fifth periodic data set), IMAGE-HD [6], PREDICT-HD [7], and TRACK-HD/ON [8]. Study sites were required to obtain and uphold local ethics committee approvals and all participants gave written informed consent that included the distribution of coded data for research purposes. The Enroll-HD data set was obtained in December 2020, and the remaining data sets were obtained in February 2020. The data sets are publicly available (https://www.enroll-hd.org/).
Inclusion criteria for the analysis were CAG 40–50 and age 18 or older at the first visit. This resulted in a total sample size of N = 15338 with 41194 repeated visits. The number of participants (and visits) per study was 14190 (37513) for Enroll-HD, 70 (183) for IMAGE-HD, 915 (2353) for PREDICT-HD, and 273 (1145) for TRACK-HD/ON. The mean follow-up time was 3.62 years (SD = 1.68), with 31% of the overall sample having only one visit and 69% having 2–10 visits.
Measures
Imaging was performed in all the studies except Enroll-HD. Caudate and putamen volumes were obtained from segmentation using the recon-all pipeline of FreeSurfer version 6 (see the HD-ISS paper supplementary appendix [1] for more details). The volume of each structure was divided by total intra-cranial volume (ICV) to adjust for head size. In addition to the UHDRS variables of TMS, SDMT, TFC, and IS, the Diagnostic Confidence Level (DCL) and the Stroop Word Test (SWR) were used. DCL = 4 (the highest rating) is defined as clinical motor diagnosis in research contexts [9]. TFC and IS were not collected in IMAGE-HD. Education was treated as binary, with “low” being the UNESCO ISCED 1997 classification 0–3 and “high” being 4–6 [10].
Progression indices included the two versions of CAP from the HD literature. The first version [4] was computed as CAP = age×(CAG –33.66). To provide context, CAP = 413 was associated with a 50% probability of clinical motor diagnosis (DCL = 4) for the data considered in this study (results not presented). The second version [11] was computed as CAP100 = age×(CAG –30) ÷ 6.49. CAP100 is named as such because the expected age of clinical HD diagnosis as reported in Enroll-HD (hddiagn) is associated with a score of 100. We also considered PIN [5], computed as a weighted combination of TMS, SDMT, and CAP, PIN =(51 × TMS - 34 × SDMT + 7 × age × (CAG - 34) - 883 ) ÷ 1044. PIN = 0 indicates that if a hypothetical HD cohort started with this value at baseline, then 50% would be predicted to reach a rating of DCL = 4 within 10 years. PIN < 0 indicates it would take longer than 10 years (the cohort is farther from clinical motor diagnosis), and PIN > 0 indicates it would take less than 10 years (the cohort is closer to clinical motor diagnosis). For participants who had repeated visits, the progression scores were time-varying.
Statistical analysis
Missing data for each visit was singly imputed using the machine learning algorithm MissForest [12], which uses random forest [13] with chained equations [14]. Consistent with the cross-sectional nature of the HD-ISS, all visits were used to build the imputation model, and there was no explicit modeling of the repeated measurements nested within participants. MissForest imputes on a variable-by-variable basis with each incomplete variable acting as the outcome and using all other variables in the imputation model as predictors. Additional details are provided in the Supplementary Material.
The stages compatible with the HD-ISS in PREDICT-HD and TRACK-HD/ON were not imputed, but rather treated as “ground truth” for training of the algorithm. Imputation was based on the landmark variables and education, age, DCL, SWR, and sex (though the last three had negligible predictive power). Because MissForest cycles through each incomplete variable, imputation was performed for all variables with missing values, not just the HD-ISS stage.
Graphical procedures were used to examine the imputation results, with an emphasis on the extent of agreement among the observed and imputed values [15, 16]. In order to depict HD participants with a range of CAG lengths in the same graph, we mimicked previous approaches [7, 17] in using CAP100 as the time metric for most graphing, which can be thought of as age adjusted for CAG expansion.
To account for the pre-existing differences among the studies, a participant from PREDICT-HD or TRACK-HD/ON who had an observed stage was matched to a participant from Enroll-HD who had an imputed stage (1:1 matching). The goal was to balance several of the observed variables (with no missingness) common among the studies and then compare the observed and imputed stages among the matched samples. All the variables from the imputation analysis with complete data were used for the matching, which excluded the imaging variables. SWR was also excluded because of a relatively high rate of missingness. Exact matching was used for CAG length, and propensity score matching [18] was used for age, TMS, SDMT, TFC, DCL, education, and sex. A caliper was applied for age and TMS to help ensure similar means and variances among the groups. To evaluate the (dis)agreement of the observed and imputed stages, we arbitrarily created 100 bins of CAP100 and computed the proportion of individuals in a stage for each bin.
Finally, to define progression subgroups, the distributions of the progression indices (PIN, CAP100, CAP) were examined by stage for the combined sample (observed and imputed data). To address the overlap of these distributions among stages, optimal cut-point analysis was conducted. A non-parametric method [19] was used to determine the optimal cut-point of the progression index that best separated two adjacent stages in terms of maximizing the product of sensitivity and specificity [20]. The area under the receiver-operator characteristic curve (AUC) was computed as an index of the optimal cut-point classification performance (the optimal cut-points computed here should not be confused with the landmark cut-off thresholds for the HD-ISS conditions).
The analysis was performed with the R computing platform [21] (version 4.1.3). The ggplot2 [22] package was used for graphing, missRanger [23] for MissForest data imputation, MatchIt [24] for propensity score matching, cutpointr [19] for cut-point estimation, and graph smoothers were generalized additive (mixed) models (GAMs or GAMMs) estimated with mgcv [25].
RESULTS
Table 1 shows baseline (first visit) descriptive statistics for key variables by study. The table indicates that on average, Enroll-HD had the oldest and most progressed HD participants, whereas PREDICT-HD had the youngest and least progressed. For example, Enroll-HD had
Baseline (first-visit) descriptive statistics of key variables by study
SD, standard deviation; Min, minimum; Q1, first quartile (25th percentile); Q2, second quartile (50th percentile); Q3, third quartile (75th percentile); Max, maximum.
Table 2 shows the imputation results. Columns C0–C3 are indicators of whether the criteria for Stage 0–3 were met (1 if met, 0 otherwise). For example, the pattern 1, 1, 1, 0 indicates that the stage criteria up to and including Stage 2 were met (resulting in a Stage 2 classification). Counts in the table are for visits, and the patterns of the C0–C3 criteria indicators in each section are sorted by mean CAP100. Stage Count and Stage Proportion show how the algorithm assigned the stage for each Stage Criteria pattern, with the following exception: patterns 1–4 are the observed indicator patterns that were compatible with the HD-ISS and therefore not imputed.
Results of the imputation. Patterns 1–4 have observed HD-ISS stages, whereas 5–25 have imputed HD-ISS stages. Missing data is indicated by a dot (.)
aC0 = 1 if CAG≥40 and 0 otherwise (dot indicates missing; all participants had 40≤CAG≤50). bC1 = 1 if either putamen or caudate or both are below threshold. cC2 = 1 if either TMS or SDMT or both are beyond threshold. dC3 = 1 if either TFC or IS or both are below threshold.
All the stages for patterns 5–25 were imputed. Patterns 5–8 were incompatible with the HD-ISS because one or more stage criteria were met out of order. Patterns 9–25 had inconclusive classification because one or more landmark variables was missing (indicated by a dot). Patterns 17–19 were the most frequent in Enroll-HD and could be consistent with staging (if the imaging criteria were met).
Figure 1 shows observed and imputed scores of four landmark variables as a function of CAP100 and imputation status (observed: 1882 visits, imputed: 39312 visits –with stage and caudate volume imputed for all visits, but only missing values imputed for the other variables). Putamen and IS were omitted because their results were very similar to caudate and TFC, respectively. The configuration of the imputed stages was similar to that of the observed stages in the sense that the stages tended to occur at similar CAP100 (e.g., Stage 0 was associated with small CAP100, and Stage 3 was associated with large CAP100). The imputed database had a wide range of progression and the GAM curves for the imputed data tended to decelerate (or plateau) for larger CAP100. Overall, the imputed caudate volume (Fig. 1A) was larger than the observed volume, which is illustrated by the GAM curves having different initial values. Both groups showed relatively orderly transitions from Stage 0 (left-most) through Stage 3 (right-most).

Observed and imputed scores of key variables by CAP100, HD-ISS stage (color), and paneled by imputation status. Smooth curves are based on generalized additive models. Stage and caudate volume were fully imputed, but for the other variables only scores that were missing were imputed. ICV, total intracranial volume.
Propensity score matching resulted in the studies being much better balanced than without matching in terms of similarity of means and variances (see Supplementary Table 1). However, slight differences remained, the largest being a mean difference in TMS. Figure 2 shows stage proportion as a function of CAP100 bin midpoint for the observed data and the matched imputed data. The GAM curves indicate very similar proportions in the range of CAP100≥80 for all stages. However, there were sizable discrepancies for the range of CAP100 < 80 with Stage 0 and 1. Panels in the upper row show that the MissForest algorithm tended to over-assign Stage 0 and under assign Stage 1 early in progression. The lower row indicates relatively minor under-assignment of Stage 2 early in progression and excellent correspondence with the observed data for all progression levels of Stage 3.

Propensity score matching results. Proportion of HD-ISS stage by CAP100 and imputation status (observed, imputed). CAP100 is the midpoint of a bin range (100 bins total), and the smooth curves are based on generalized additive models.
Figure 3 shows the longitudinal trends of key variables for Enroll-HD with baseline HD-ISS stage. The sample size (number of visits) for starting in Stage 0 to 3 was respectively, 2199 (22652), 1013 (10776), 1678 (17700), and 9292 (98824). The first column indicates that when starting in Stage 0, there tended to be progression through the stages over time. The last column shows that when starting in Stage 3, some regression to earlier stages did occur, but the vast majority persisted in Stage 3. The middle two panels show there was stage progression over time for a large majority, but some did regress.

Enroll-HD longitudinal data of four variables (rows) for different HD-ISS starting stages (columns) with CAP100 as the time metric (CAG-adjusted age). Stage and caudate volume (ICV = intracranial volume) were completely imputed, whereas other variables had partial imputation. Repeated visits of the same participant are connected by a thin line, and the smooth curves are based on group-level generalized additive mixed models.
Figure 4 shows the distributions of the progression indices for the combined data (all observed and imputed data). For all indices, the median increased with stage, but there was extensive overlap of distributions between stages. The overlap indicates that visits could be differently ordered by the HD-ISS and the progression indices. For example, the PIN distribution for Stage 2 shows that the lowest quarter of scores (below the bottom box edge) were smaller than the highest quarter of Stage 1 (above the upper box edge).

Distributions of progression indices as a function of HD-ISS stage for the combined data (observed and imputed). Jittered values (points) are shown with boxplots wrapped by violin plots. Panel A is for PIN, B depicts CAP100, and C shows CAP.
Table 3 shows key descriptive statistics for the progression indices by stage for the combined data. The overlap of the distributions reflected in the descriptive statistics motivated the optimal cut-point analysis to demarcate non-overlapping segments for each stage based on the progression indices. Results of the optimal cut-point analysis are shown in the last three columns of Table 3 (visualization is provided in Supplementary Figure 1). By definition, Lwr and Upr provided limits that did not overlap among stages. The CAP100 cut-point was best at separating Stage 0 and 1 (AUC = 0.88), but PIN and CAP were close behind (AUCs = 0.86). The PIN cut-point was best at separating Stage 1 and 2 (AUC = 0.82), and the CAP scores had relatively poor performance (maximum AUC = 0.65). PIN performed well at separating Stage 2 and 3 (AUC = 0.88), and the CAPs performed less well (AUC ≈ 0.80).
Descriptive statistics and optimal cut-point analysis results of the progression indices for the HD-ISS stages. Results are based on the combined data (observed and imputed)
SD, standard deviation; Min, minimum; Q1, first quartile (25th percentile); Q2, second quartile (50th percentile); Q3, third quartile (75th percentile); Max, maximum; Lwr, lower stage limit based on the cut-point analysis; Upr, upper stage limit based on the cut-point analysis; AUC, area under the receiver-operator curve for the previous stage versus the current stage.
DISCUSSION
Our results show that by using a machine learning algorithm, the HD-ISS stage can be imputed when some of the landmark variables are missing. This finding is especially pertinent with Enroll-HD, for which imaging data are not currently collected. Imputation of all four stages is possible with Enroll-HD, but our results did show some discrepancies between imputed and observed values for early progression. The discrepancies might be accounted for by the pre-existing progression differences among the observed and imputed databases. There is evidence that the imputed stages can be treated similarly to the observed stages for the types of large-scale analysis supported by Enroll-HD, especially when the focus is on HD-ISS Stage > 1.
There are several practical implications for the application of the HD-ISS with the Enroll-HD database. First, the imputation results imply an alternative classification system that need not rely on imputation. Two of the most frequently occurring stage criteria indicator patterns of Enroll-HD had high consistency of stage assignment (Table 2 Pattern 18 and 19). When criteria for Stage 0 and 2 were met, but not Stage 3, the algorithm regularly assigned Stage 2 (84% of the time). When the criteria for Stage 0, 2, and 3 were met, the algorithm always assigned Stage 3 (100% within rounding). Therefore, for these observed patterns, if we were to assume that the Stage 1 criteria for brain atrophy were also met, we would often agree with the algorithm assignment. On the other hand, when the criteria for Stage 0 was met but was not for Stage 2 and 3 (Pattern 17), the algorithm assigned Stage 0 or 1 97% of the time. These results lead to the suggestion that if we are willing to collapse Stage 0 and 1, then the HD-ISS thresholds can be directly used to classify into the less precise categories of Stage≤1, Stage 2, and Stage 3 without imputation. More simply stated: when Stage 1 criteria are missing, if the criteria for Stage 0, 2 and 3 are met, we classify in Stage 3; if the criteria for only Stage 0 and 2 are met, we classify in Stage 2; if the criterion for only Stage 0 is met, we classify in Stage≤1. This approach grounds the classification in observed data, and it might suffice for much HD research that is currently focused on HD-ISS Stage 2 and 3. This also addresses the potential weakness of the imputation algorithm, which showed the greatest discrepancies in assigning Stage 0 and 1 for early progression.
A second practical advantage of our results is the definition of progression subgroups. The subgroups can help define treatment subpopulations to plan trials consistent with the HD-ISS. Subgroups based on PIN are most applicable to Stage 2, as the PIN combination contains the landmark variables for the stage. For Stage 3, subgroups might be defined by traditional TFC staging [26]. For example, early Stage 3 might be defined as TFC > 10, and early-to-mid Stage 3 defined as TFC > 6. Enrichment for Stage 1 (or Stage≤1) is particularly challenging, as it is best to define subgroups using a biomarker. Imaging for enrichment is resource-intensive, and fluid biomarkers, such as neurofilament light chain, could possibly provide an alternative in the future.
An approach to clinical trial planning might involve the following steps. First, based on scientific considerations, the HD-ISS stage for participant recruitment is identified. Second, enrichment is used to define a more homogeneous subgroup within a stage. Third, the database is interrogated to estimate the untreated rate of change for a continuous endpoint (e.g., TMS), or the rate for a time-to-event endpoint (e.g., transition to Stage 3).
As an example of clinical trial planning for a continuous outcome, let us say the goal of a trial is to examine the effect of a treatment on TMS, that is, to slow its progression. Assume the treatment population is defined to be in Stage 2 at the start of the study and the goal is to exclude individuals who have recently entered the stage or are soon to exit. In this case, enrichment might focus on the middle of the Stage 2 PIN distribution, 0.47 < PIN < 1 . 84. By applying the selection criteria to Enroll-HD and computing the relevant statistics, analytic formulas can be used to estimate the required sample size [27, 28].
A similar strategy can be used for a time-to-event analysis in which a treatment is expected to, for example, delay the transition into Stage 3. In this design, suppose we want to recruit individuals whose first visit is in Stage 2, and follow them until they transition into Stage 3 or to the end of the study. The endpoint is time to any drop in the TFC or IS or both (i.e., entry into Stage 3). If we want to ensure that there is time for a measurable drug effect, the first visit should not be too close to Stage 3, and therefore selection might be below the median PIN in Stage 2, PIN < 1.09. After applying the selection criteria and computing statistics, equations for time-to-event (or survival) analysis can be used to estimate the required sample size [29, 30].
A caveat for these trial planning scenarios is that Enroll-HD is not a treatment study, which means a placebo effect cannot be estimated. Placebo effects are caused by many factors [31], and there is evidence that they can be relatively strong in HD trials [32]. Data from completed HD trials can be used in planning to help account for placebo effects [33].
The above scenarios are just two examples, as enrichment can be used whenever there is a need to identify more homogeneous progression subgroups than are afforded by the HD-ISS stages themselves. Furthermore, subgroups need not be defined based on descriptive statistics as we have done in our analysis. Rather, different PIN or CAP100 ranges can be considered to optimize to the problem at hand.
It is best to use the progression indices within a single stage, as they may not properly indicate inter-stage progression. The HD-ISS stages provide classification into groups by disease status (no detectable pathology, displays of neurodegeneration, HD signs and symptoms, and functional deficits). Although the progression indices are powerful tools and are predictive of progression, they do not always agree with the HD-ISS regarding an individual’s global clinical status. This is illustrated by the extensive overlap of the between-stage distributions. A sizable proportion of participants in Stage 3 (for example) have CAP100 (and PIN) values that are smaller than participants in Stage 2. The scenario is not unexpected because while over 50% of the variation in disease progression is accounted for by age and CAG, these variables are not the only determinant factors in explaining disease severity. The landmarks for Stage 3 that signal functional loss are not accounted for by CAP100 (or PIN). Therefore, the power of using these methods is in their combination. The overlap indicates that mixing the progression indices with the HD-ISS could result in individuals not being properly ordered in one sense or the other and is why we recommend use of PIN (or CAP100) only within an individual stage. If compelled to consider the progression indices for use across stages, then proper ordering on both dimensions will be facilitated if we select the inner 50% of the distributions for each stage. This refinement will help to define progression groups that are ordered similarly for both the HD-ISS and the progression index at the group level.
There are a few caveats that deserve comment. The discrepancy between the imputed and observed brain volume is difficult to definitively resolve because of the progression differences among the databases. The training databases (PREDICT-HD, TRACK-HD/ON) for the imputation were different than the database on which the imputation was mainly applied (Enroll-HD for the most part). The study differences may not have been completely accounted for in the propensity score matching, which is never perfect. The slowing in the rate of loss for the imputed volume late in progression is biologically feasible (see Fig. 1). It is logical that at some point, striatal loss becomes exhausted, leading to a slowing down in the rate of deterioration. A more sophisticated imaging processing approach using higher quality scans from PREDICT-HD did reveal the late slowing that was not apparent from our FreeSurfer version 6 results (Liu et al., unpublished data). The Enroll-HD database had a much higher density of advanced progression visits than the observed database, which perhaps enables us to get a glimpse of the nature of striatal loss very late in the disease. However, the imputation is a type of extrapolation beyond the progression bounds of the observed volumes, thus the accuracy is unknown.
The MissForest algorithm is stochastic, meaning that when it is re-run with the same data the results will change. We did repeat the imputation (results not reported) and found that the overall stage classification proportions changed very little, by a maximum of approximately 0.1%. The imputed HD-ISS stage for each Enroll-HD visit is included in the recent Enroll-HD periodic data set (PDS6, released December 2022). Because of the stochastic stage assignment, each participant visit also has associated variables for the probability of assignment for each HD-ISS stage. This will help researchers understand the reliability of the imputed stage designations for proper applications.
The imputation algorithm treated visits from the same participant as independent, thus ignoring the nested nature of the data. Since the majority of participants in the analysis had repeated visits (69%), an argument can be made for using a longitudinal multiple imputation approach [34]. However, the HD-ISS is fundamentally a cross-sectional system and our intent was to be consistent with this design. If our approach needs to be defended, we would say that the HD-ISS stages will probably not be used as a primary outcome to be modeled longitudinally over time. Rather, we anticipate that the HD-ISS will be used as in our examples above, to anchor individuals within a stage based on a single visit, or to fix a stage transition event for a time-to-event analysis. The evidence from our analysis is that the stage imputation is adequate for these types of uses.
Single imputation was implemented rather than the multiple imputation that is recommended for general applied data analysis [35]. The MissForest algorithm does impute multiple values internally, but there is a final single value that is chosen by a type of popular vote among the random forests [12]. The primary reason for using single imputation in our study was to assess whether MissForest might be a successful approach. In the first trial analysis scenario discussed above, the single imputation might be sufficient because the HD-ISS was considered for selection of participants and not for use in the data to be collected over the trial. It is unclear if the added complexity of multiple imputation would improve the accuracy of the imputed HD-ISS stages for this preliminary step; additional research is needed. Should one want to use the HD-ISS in an analysis model, then the MissForest algorithm can be run several times to generate multiple imputed data sets. The analysis model can be fitted to the data sets and results combined using standard methods [35].
Finally, there are many machine learning methods that can be used for imputation [36]. The chained equations approach used here has been shown to provide good performance in a wide variety of scenarios [14], and random forest is known to provide good all-around prediction performance [13]. Whether there are benefits of using alternative machine learning approaches with or without chained equations is a topic for future research.
In summary, we have shown that despite missing brain imaging variables, observational studies such as Enroll-HD can be staged according to the HD-ISS, perhaps most reliably with a three-category staging scenario. Progression subgroups within a stage can be defined to hone the definition of treatment populations. Our hope is that this information will facilitate the use of the HD-ISS to aid in the planning of interventional studies in HD.
Footnotes
ACKNOWLEDGMENTS
Data used in this work were generously provided by the research participants in Enroll-HD and made available by the CHDI Foundation Inc. IMAGE-HD data used in this work were generously provided by the participants in the IMAGE-HD study. TRACK-HD/ON data used in this work were generously provided by the participants in the TRACK-HD and TRACK-ON studies and made available by Dr. Sarah Tabrizi, principal investigator, University College London. PREDICT-HD data used in this work were generously provided by the participants in the PREDICT-HD study and made available by the PREDICT-HD investigators and coordinators of the Huntington study group, Jane S. Paulsen, principal investigator.
CONFLICT OF INTEREST
Jeffrey D. Long reports personal compensation from Alynlam, Annexon, AskBio, Prilenia, PTC, Remix, Roche, Spark, Triplet, uniQure, Vertex, and Wave. His funding comes from CHDI and NIH.
Emily C. Gantman is an employee and receives salary from CHDI Management.
James A. Mills reports personal compensation from PTC and Triplet. His funding comes from CHDI and NIH.
Jatin G. Vaidya’s funding comes from CHDI and NIH.
Alexandra Mansbach is a consultant to CHDI Management.
Sarah J. Tabrizi reports personal fees from F Hoffmann La Roche, Annexon, PTC Therapeutics, Takeda Pharmaceuticals, Vertex Pharmaceuticals, Alnylam Pharmaceuticals, Alphasights, Genentech, LoQus23 Therapeutics, Triplet Therapeutics, Novartis, Atalanta, Spark Therapeutics, Horama, University College Irvine, and Guidepoint; a patent application (2105484.6) and structural analogues licensed to Adrestia Therapeutics; funding from the CHDI Foundation, the UK Dementia Research Institute that receives its funding from DRI, the UK Medical Research Council, Alzheimer’s Society, and Alzheimer’s Research UK, and the Wellcome Trust (200181/Z/15/Z).
Cristina Sampaio is an employee and receives salary from CHDI Management, and has received consultancy honorariums (unrelated to Huntington's disease) from Pfizer, Kyowa Kirin, vTv Therapeutics, GW pharmaceuticals, Neuraly, Neuroderm, Green Valley Pharmaceuticals, and Pinteon Pharmaceuticals.
