Abstract
Background:
When developed properly, disease-specific patient reported outcome measures have the potential to measure relevant changes in how a patient feels and functions in the context of a therapeutic trial. The Huntington’s Disease Health Index (HD-HI) is a multifaceted disease-specific patient reported outcome measure (PROM) designed specifically to satisfy previously published FDA guidance for developing PROMs for product development and labeling claims.
Objective:
In preparation for clinical trials, we examine the validity, reliability, clinical relevance, and patient understanding of the Huntington’s Disease Health Index (HD-HI).
Methods:
We partnered with 389 people with Huntington’s disease (HD) and caregivers to identify the most relevant questions for the HD-HI. We subsequently utilized two rounds of factor analysis, cognitive interviews with fifteen individuals with HD, and test-retest reliability assessments with 25 individuals with HD to refine, evaluate, and optimize the HD-HI. Lastly, we determined the capability of the HD-HI to differentiate between groups of HD participants with high versus low total functional capacity score, prodromal versus manifest HD, and normal ambulation versus mobility impairment.
Results:
HD participants identified 13 relevant and unique symptomatic domains to be included as subscales in the HD-HI. All HD-HI subscales had a high level of internal consistency and reliability and were found by participants to have acceptable content, relevance, and usability. The total HD-HI score and each subscale score statistically differentiated between groups of HD participants with high versus low disease burden.
Conclusion:
Initial evaluation of the HD-HI supports its validity and reliability as a PROM for assessing how individuals with HD feel and function.
Keywords
INTRODUCTION
Huntington’s disease (HD) is a progressive neurodegenerative disorder caused by an expanded CAG trinucleotide repeat in the HTT gene on chromosome 4 [1]. Symptoms experienced by individuals with HD are diverse and can involve involuntary movements, emotional problems, and cognitive dysfunction [2]. In preparation for future HD clinical trials, it is important to have a mechanism to consistently measure how a patient feels and functions in response to a therapeutic intervention. The use of patient reported outcome measures (PROM) represents a potential mechanism to monitor both benefit and risk during clinical trials [3]. The United States Food and Drug Administration (FDA) has provided a guidance and structure for creating and validating PROMs that allow for both the use of appropriately developed tools in clinical trials and as a mechanism to support drug labeling claims [4]. To date, many advances in developing patient reported outcomes for HD have been made, including NeuroQOL, HDQLIFE, and HDQOL [5–7]. However, widespread use of these instruments in therapeutic trials is lacking, and none have been fully qualified through the FDA [8]. We therefore apply a participant-centered approach and FDA published guidance to develop and validate a multifaceted disease-specific patient reported outcome measure for use in clinical studies involving patients with HD.
We previously conducted qualitative interviews with individuals with HD and caregivers and completed a cross-sectional study of 389 individuals with HD and caregivers to identify the symptoms and issues that have the highest frequency and effect in people with HD [9]. In accordance with FDA guidelines, we utilized this data in conjunction with beta interviews, expert review, reliability assessment, and additional validity testing to develop the Huntington’s Disease Health Index (HD-HI). The HD-HI and its subscales were designed for use in clinical trials and as a tool to potentially support the merit of HD therapeutics. The present study details the creation of the HD-HI, its content validation, construct validity, test-retest reliability, and sensitivity in differentiating between known groups of HD participants with different levels of disease burden.
MATERIALS AND METHODS
Eligibility criteria
All participants were aged 18 years or older and were either individuals with manifest HD, individuals with a positive genetic test for the expanded CAG repeat who self-reported no clinical diagnosis or symptoms (prodromal HD), or caregivers of individuals with HD. Participants were recruited via social media advertising through the Huntington Study Group, or through their prior participation in our cross-sectional study, PRISM-HD [9]. Four unique HD participant groups were used in this study, each for a different aspect of the research. Their general characteristics and research roles are detailed in Table 1. Group 1 included all participants of the PRISM-HD study [9]. All research activities were approved by the Research Subjects Review Board at the University of Rochester.
Demographic and clinical characteristics of all participants
*Participants of the PRISM-HD study [9]. Individuals with manifest HD and prodromal HD were included in the sex, age, and CAG repeat demographic analysis. ±Each of the 15 participants completed the entire HD-HI and were recruited from prior participation in clinical studies at the University of Rochester. §Patient responses were used to determine the test-retest reliability of the HD-HI. Each of the 25 participants completed the entire HD-HI and were recruited from prior participation in clinical studies at the University of Rochester. ¥Two participants reported having prodromal HD while simultaneously reporting low CAG repeat lengths (18). Their data was not utilized during striatal dysfunction calculations.
Huntington’s Disease Health Index creation
Question selection and content validity
In a previous study, we used semi-structured qualitative interviews to identify the symptoms of HD that have the greatest impact on the lives of those with the disease. From the 2,082 direct quotes collected, we identified 216 potential symptoms of importance to be included in a questionnaire. This questionnaire was completed by 389 participants (Group 1) to quantify the frequency and life impact of each symptom and symptomatic theme [9]. The present study identified the most important and impactful symptoms to be used in the HD-HI. Symptomatic questions with a population impact score greater than 1.0 (ranges from 0–4 with 4 representing the greatest impact) were considered for inclusion in the instrument [9]. Symptomatic questions were excluded if they were: 1) redundant; 2) potentially offensive to future respondents; 3) had unclear wording; 4) had a lack of generalizability; or 5) were deemed to not be sufficiently responsive to future therapeutic intervention to be useful.
Internal consistency of the HD-HI subscales
Qualifying survey questions were grouped by content into subscales representing the most important HD symptomatic themes using a research team consensus approach. Internal consistency was evaluated using Cronbach’s alpha by subscale. Item placement was evaluated using corrected item-total correlations. Qualifying questions were moved to alternative subscales as needed to maximize the overall internal consistency of each subscale. Following this assessment, the first version of the HD-HI was generated.
Cognitive interviews
The first version of the HD-HI was distributed to 15 HD participants (Group 2). Participants were asked to complete version one of the HD-HI, be interviewed about the instrument, provide feedback regarding the content, relevance, and usability of the instrument and its subscales. Participants also described their perception of the themes behind each subscale, assessed whether the provided answer choices reflected their desired responses, and discussed the timeframe they used to recall each symptom. In addition, participants identified any symptoms not addressed in the instrument and provided feedback on the wording of questions and responses. Interviews were audio recorded, transcribed, and analyzed before being used to discard and reword questions deemed to be problematic or unclear in the instrument. A comprehensive interview guide was utilized for each participant. Individual participant responses to 20 interview questions regarding the HD-HI and its subscales were analyzed by our research group. We used participant responses and a consensus approach to decide which modifications were needed to further improve the instrument. Following this process, version two of the HD-HI was created.
Scaling and scoring of the HD-HI
All subscales were scored with a possible range of 0 to 100 (with 0 representing no disease burden and 100 representing the highest degree of disease burden). Questions within subscales were weighted based on their relative importance to the HD population [9] and similarly subscale scores were weighted to generate a global total HD-HI score (range 0–100).
Test-retest reliability
Test-retest reliability of the HD-HI, version two, was assessed in a cohort of twenty-five participants (Group 3) who completed the HD-HI three times. The instrument was administered at baseline, two weeks, and four weeks. Participants utilized a paper version of the HD-HI at baseline and at two weeks and subsequently completed an online version of the HD-HI at four weeks. Assessment of reliability was conducted on the total instrument score, each subscale score, and each question. Reliability of the total HD-HI score and each subscale score was quantified using interclass correlation coefficients (ICCs) [10]. Questions with poor reliability were considered for potential deletion.
Known groups validity and final assessment of the interval consistency of the HD-HI
The average HD-HI total and subscale scores were determined for predefined subgroups hypothesized to have different disease severities. HD participants were grouped by employment (employed vs. unemployed), disease state (prodromal vs. manifest HD), Total Functional Capacity (TFC) score (low score of 2–10 vs. high score of 11–13), highest level of education obtained (no college degree vs. college degree or more), CAG repeat length (above or below average of 42.87), age (above vs. below average of 44.22 years), sex, duration of symptoms (above vs. below average of 10 years), and ambulatory status (normal ambulation vs. mobility impairment). We also compared participants who identified as being prodromal to participants with manifest HD and a TFC score of 11 to 13, and associated total HD-HI scores to striatal dysfunction as estimated by age and CAG repeat [11]. An extended dataset from the PRISM-HD study (Group 1) was used for known groups analysis (Group 4) [9]. This extended dataset utilized data from all individuals with manifest or prodromal HD from the PRISM-HD study and additional data from individuals with manifest or prodromal HD whose data was not included in the original PRISM-HD study due to a partial completion of the original study survey. Re-worded questions were assigned values based on the responses given to the original question. Wilcoxon rank sum scores were used for group comparisons of the mean total scores. For comparing between two groups, such as prodromal versus manifest HD, we used the Wilcoxon Two-Sample Test and t approximation two-sided Pr > |Z|. For comparing multiple groups, we used Kruskal’s-Wallis test for determination of p-value. Lastly, following the generation of the final version of the HD-HI, we assessed the internal consistency of each of the subscales in the instrument.
RESULTS
Participants
Demographic characteristics for participants in Groups 1, 2, 3, and 4 are provided in Table 1.
Question selection
We began with 216 original symptom questions across 15 symptomatic themes. Figure 1 provides an overview of the process used to develop, select, and narrow questions for the HD-HI. The final version of the HD-HI contained 127 quick check box items representing 13 symptomatic themes (subscales). A 13-question short form was also created as a surrogate of the total instrument. This scale includes one representative question from each of the HD-HI’s subscales. Each question in the short form represents a major symptomatic theme of importance in HD. These symptomatic themes were identified through participant interviews and selected by our research team. A list of the symptomatic themes measured by the HD-HI and the internal consistency of each subscale is demonstrated in Table 2.

Development of the HD-HI.
Test-retest reliability and internal consistency of final HD-HI subscales
Patient assessment with cognitive interviews
The fifteen participants in Group 2 provided feedback on the ease of completion and clarity of the instrument. One question was reworded based on participant feedback: “daytime somnolence” was changed to “daytime sleepiness” to increase participant understanding of the question.
Overall, participants provided positive feedback and stated that the content of the instrument was highly relevant and represented the symptoms that most affected their lives. Participants stated that the instructions were clear and that the survey was easy to complete. They also reported that there were “just the right number” of questions and that it was “pretty easy” to understand and complete. When asked to identify the theme represented by each subscale, correct answers were provided, and participants reported that there were adequate response choices to describe their symptoms. On average, the reported time to complete the instrument was 22 minutes on paper and 17 minutes in the online format among participants who anticipated being later interviewed regarding their experience with the instrument.
Test-retest reliability
Twenty-five participants (Group 3) completed the HD-HI three times over a 1-month period with an average of 16.6 days between each administration. No participants achieved a maximum score on the HD-HI (ceiling effect) or minimum score (floor effect) on the HD-HI. No questions or subscales were removed due to a low reliability. Table 2 lists the ICC values for each of the HD-HI subscales. There was a high level of reliability when comparing responses from participants who completed the paper version of the HD-HI compared to their responses to the online version of the HD-HI (paper vs. online).
Known groups validity
Data from 201 participants (Group 4) was utilized for known groups analysis. There were significant differences in HD-HI total and selected subscale scores between cohorts of HD participants suspected to have different levels of disease burden. Specifically, higher HD-HI scores were found in HD participant groups with manifest HD, low TFC (total functional capacity score 2–10), and no employment (Table 3A, 3B; Supplementary Table 1A, 1B). Most notably, the largest difference in mean HD-HI total score was present between high and low TFC participants (p < 0.0001), followed by prodromal versus manifest HD participants, and employed versus unemployed (both p < 0.0001). Participants who require assistive devices (non-ambulatory) had greater mean HD-HI total scores (p < 0.01) in addition, individuals with manifest HD and high TFC scores (11-13) had greater HD-HI total scores than prodromal individuals (p < 0.01). There were no significant differences in mean HD-HI total scores across sex, education level, or age. Figure 2 outlines the association between individual TFC scores and HD-HI total scores across our sample. Figure 3 provides an area-under-the-curve (AUC) analyses of the discriminatory power of the HD-HI total score using TFC scores. Figure 4 plots HD-HI total scores versus estimated striatal dysfunction based on age and CAG repeat number (Age*(CAG –35.5)) [11].
Known Group Validity of the HD-HI
Known Group Validity of the HD-HI
Average HD-HI subscale, short form, and total scores were compared across known groups of HD patients assumed to have different disease severities.

HD-HI total score association with TFC scores.

An area under the curve Analysis of the HD-HI total score versus TFC groups (2–10 vs. 11–13).

Relationship of HD-HI total score to estimates of striatal dysfunction based on age and CAG repeat numbers.
Final internal consistency of the HD-HI subscales
We initially grouped questions into subscales using face validity and subsequently used factor analysis to confirm that the questions in each subscale were statistically measuring a similar concept. Upon finalization of the HD-HI (after beta and reliability testing), the internal consistency of the final 13 subscales was analyzed. Subscales had a high internal consistency with an average Cronbach alpha of 0.92 and a range of 0.78 to 0.97. (Table 2).
DISCUSSION
The Huntington’s Disease Health Index is a multifactorial disease specific PROM developed using published FDA guidance and designed for use in HD therapeutic trials. Comprised of 13 subscales, the HD-HI measures how a patient feels and functions while assessing their perception of their total disease burden. In comparison to NeuroQol, the HD-HI is a disease specific instrument designed for and specifically validated using extensive input from HD participants [5]. In comparison to other outcome measures (such as the HDQLIFE and HDQOL), the HD-HI was specifically designed to measure clinical gains and/or disease progression during therapeutic trials, support FDA drug labeling claims, and quantify a patient’s perception of their total disease burden and their disease burden in 13 unique symptomatic areas identified as being highly relevant to individuals with HD [6, 7].
The HD-HI, and other disease-specific PROMs like it [12–14], may have advantages over more generic instruments [15]. Disease-specific instruments have been shown to have reduced burden on participants, increased precision, and increased sensitivity to clinically significant changes [3, 4]. In addition, disease-specific instruments developed using our methodology are recommended by the NIH as a measure of disease burden, are preferred by patients, and have a higher correlation to functional state and disability status compared to generic and semi-generic patient reported outcome measures [15–17]. The FDA has identified PROMs as a useful tool to support drug labeling claims and has created specific guidelines for their development and implementation [18]. The methodology of the HD-HI’s creation was implemented to be compliant with these original guidelines and support the use of the HD-HI in clinical trials.
The HD-HI total score differentiated between groups of HD participants with varying disease severity. Due to the lack of a universally accepted mechanism for defining disease severity, these groups were formed based on the suspicion of greater disease severity. Statistically significant differences in total scores found between participants grouped by disease state (prodromal versus manifest HD), TFC score, and employment status, suggest that the HD-HI can detect changes in HD health state. Future studies are necessary to test the HD-HI’s ability to detect clinically significant changes in response to both disease progression and potential therapeutic gain. In addition, it is of interest to further examine the relationship between HD-HI total scores and estimated proximity to disease onset, as calculated by published age of disease onset and CAG length formulas [11].
For the first time, this research evaluated the equivalency of using one of our PROMs in a paper versus online format. Test-retest reliability of the HD-HI indicated that responses did not significantly change based on the format of administration. This suggests that the format of the HD-HI is relatively interchangeable and can be selected based on future patient and clinical study preferences.
We recognize some inherent limitations in this research. While we have developed over 100 instruments using the above-described methodology, a PROM designed for HD patients potentially has limitations related to the cognitive symptoms and dysfunction that can be experienced by HD patients [19]. While it seems reasonable that prodromal individuals and manifest HD patients early in the course of the disease can report and quantify their own symptoms, it is unknown how reliable a patient reported outcome measure will be during the latest periods of HD. In our previous cross-sectional study however, we demonstrated that individuals with manifest HD were able to detect the prevalence of many aspects of their disease [9]. Furthermore, despite cognitive impairment in HD, HD patient determinations were found to be similar to caregiver assessments [9]. The ability of patients with cognitive issues to report their own symptoms is not limited to HD. We have previously found that patients with myotonic dystrophy type-1 (a group known to have cognitive issues) are also capable of using our disease-specific instruments to detect and serially measure their symptomatic health [12, 21]. Lastly, we suspect that the group of participants utilized to validate the HD-HI were not a perfect representation of the HD community. Group 1 participants were recruited through social media advertising. As the survey was only provided in English, participants whose primary language is not English are underrepresented in our sample. Another potential limitation is the time required to complete the HD-HI. Participants who knew that they were going to be interviewed regarding the instrument took 17 to 22 minutes to complete the full instrument. This time will likely be less with general use. Prior studies have shown that disease-specific instruments of similar length developed using similar methodology are preferred by patients over shorter generic instruments such as the SF-36 [15]. While this completion time is reasonable for clinical trials with periodic assessments, it is likely less appropriate as a daily assessment. In such instances, the use of the HD-HI short form, which takes ∼1 minute, may be preferred.
PROMs like the HD-HI are needed for upcoming clinical trials [22]. The HD-HI adds to existing outcome measure infrastructure in HD [5–7]. Specifically, the HD-HI was designed from the beginning to fully satisfy published FDA guidance and to be capable of using a patient’s perspective to quantify disease burden in 13 granular areas of relevant symptomatic health. There is merit in having measurements capable of serially detecting relevant changes in patient status. It is also important to have mechanisms that give patients a voice in determining the efficacy of a therapeutic agent in the context of a clinical trial. The HD-HI is potentially one such mechanism.
The methodology and results described above support the validity, usefulness, and relevance of the HD-HI. Next steps will include studies that document the longitudinal responsiveness of the instrument and its subscales during natural history studies and therapeutic trials. In the meantime, our results identify the HD-HI as a potentially useful tool for use in HD clinical trials.
CONFLICT OF INTEREST
Dr. Dorsey has received honoraria for speaking at American Neurological Association, Excellus BlueCross BlueShield, International Parkinson’s and Movement Disorders Society, National Multiple Sclerosis Society, Northwestern University, Stanford University, Texas Neurological Society, and Weill Cornell; received compensation for consulting services from Abbott, Abbvie, Acadia, Acorda, Alzheimer’s Drug Discovery Foundation, Ascension Health Alliance, Biogen, BluePrint Orphan, Clintrex, Curasen Therapeutics, DeciBio, Denali Therapeutics, Eli Lilly, Grand Rounds, Huntington Study Group, medical-legal services, Medical Communications Media, Mediflix, Medopad, Medrhythms, Michael J. Fox Foundation, MJH Holding LLC, NACCME, Olson Research Group, Origent Data Sciences, Otsuka, Pear Therapeutic, Praxis, Prilenia, Roche, Sanofi, Spark, Springer Healthcare, Sunovion Pharma, Sutter Bay Hospitals, Theravance, University of California Irvine, and WebMD; research support from Acadia Pharmaceuticals, Biogen, Biosensics, Burroughs Wellcome Fund, CuraSen, Greater Rochester Health Foundation, Huntington Study Group, Michael J. Fox Foundation, National Institutes of Health, Patient-Centered Outcomes Research Institute, Pfizer, PhotoPharmics, Safra Foundation, and Wave Life Sciences; editorial services for Karger Publications; and ownership interests with Grand Rounds (second opinion service).
Chad Heatwole receives royalties for the use of multiple disease specific instruments. He has provided consultation to Biogen Idec, Ionis Pharmaceuticals, aTyr Pharma, AMO Pharma, Acceleron Pharma, Cytokinetics, Expansion Therapeutics, Harmony Biosciences, Regeneron Pharmaceuticals, Astellas Pharmaceuticals, AveXis, Recursion Pharmaceuticals, IRIS Medicine, Inc., Takeda Pharmaceutical Company, Scholar Rock, Avidity Bioscience, INC., Novartis Pharmaceutical Corporation, Swan-Bio Therapeutics, Neurocrine Biosciences, and the Marigold Foundation. He receives grant support from Duchenne UK, Parent Project Muscular Dystrophy, Recursion Pharmaceuticals, Swan Bio Therapeutics, the National Institute of Neurological Disorders and Stroke, the Muscular Dystrophy Association, the Friedreich’s Ataxia Research Alliance, Cure Spinal Muscular Atrophy, and the Amyotrophic Lateral Sclerosis Association.
Christine Zizzi provides consultation to Recursion Pharmaceuticals.
All other authors have no disclosures to report.
