Abstract
Background:
Direct-to-participant online reporting facilitates the conduct of clinical research by increasing access and clinically meaningful patient engagement.
Objective:
We assessed feasibility of online data collection from adults with diagnosed Huntington’s disease (HD) who directly reported their problems and impact in their own words.
Methods:
Data were collected online from consenting United States residents who self-identified as 1) having been diagnosed with Huntington’s disease, 2) able to ambulate independently, and 3) self-sufficient for most daily needs. Data for this pilot study were collected using the Huntington Study Group myHDstory online research platform. The Huntington Disease Patient Report of Problems (HD-PROP), an open-ended questionnaire, was used to capture verbatim bothersome problems and functional impact. Natural language processing, human-in-the-loop curation of verbatim reports involving clinical and experience experts, and machine learning classified verbatim-reports into clinically meaningful symptoms.
Results:
All 8 questionnaires in the online pilot study were completed by 345 participants who were 60.9% men, 34.5±9.9 (mean±SD) years old, and 9.5±8.4 years since HD diagnosis. Racial self-identification was 46.4% Caucasian, 28.7% African American, 15.4% American Indian/Alaska Native, and 9.5% other. Accuracy of verbatim classification was 99%. Non-motor problems were the most frequently reported symptoms; depression and cognitive impairment were the most common.
Conclusions:
Online research participation was feasible for a diverse cohort of adults who self-reported an HD diagnosis and predominantly non-motor symptoms related to mood and cognition. Online research tools can help inform what bothers HD patients, identify clinically meaningful outcomes, and facilitate participation by diverse and under-represented populations.
INTRODUCTION
Online study participation is attractive for a rare disease such as Huntington’s disease (HD) where research participation is challenging for patients who are distant from specialty centers and can also reduce patient burden and research site costs [1]. However, there is little information about the willingness and ability of diagnosed HD patients to participate in online direct-to-participant research. The “Making HD Voices Heard” pilot study on the myHDstory platform was designed to examine the feasibility and informativeness of obtaining research data directly from participants in their own words, without interacting clinicians.
Sponsored by the Huntington Study Group (HSG), myHDstory is the first large, fully online research platform to enable collection of symptoms reported directly by HD patients. Verbatim reporting can inform how patients with neurological diseases feel and experience their illness in terms of problems, functional consequences, and priorities, as demonstrated in Parkinson’s disease (PD) [2 –4]. It can also help to inform patients, families, clinicians, researchers, and regulators about the patient experience, course of illness, and clinically meaningful outcomes. When reporting is in patients’ own words rather than in categorical scales or checklists, classifying and analyzing these data require novel methods.
MATERIALS AND METHODS
Online platform
This pilot study “Making HD Voices Heard” was sponsored by the Huntington Study Group (HSG) and hosted on its myHDstory online platform (https://huntingtonstudygroup.org/myhdstory/) de-signed with Neurotargeting(https://www.neurotargeting.com), and Grey Matter Technologies, a wholly owned subsidiary of Modality.AI. The HSG (huntingtonstudygroup.org) is a worldwide organization of HD professionals including clinical investigators & coordinators, neurologists, psychiatrists, nurses, therapists, dietitians, social workers, genetic counselors, and scientists, engaged in the discovery and development of treatments that make a difference for HD patients and families. The HSG has conducted more than 40 multi-center clinical studies and trials since it was founded in 1993. The study protocol and consent documents were reviewed and approved by the Advarra, Inc. institutional review board (Pro00059188). Data were de-identified and stored securely on Amazon Web Services (AWS).
Participants
Participants were recruited by advertisements from the HSG, partner groups (HELP4HD, HD REACH, Huntington’s Disease Society of America) and HSG research sites, and enrolled from March 23, 2022, through August 8, 2022. Eligible participants were asked to verify they were adults age 18 years or older, resided in the United States or its territories, affirm they were diagnosed with HD by a doctor, able to answer online questions or direct someone else to enter their answers, ambulate independently, be largely self-sufficient in their personal care needs, and willing and able to provide informed consent. Participants needed to access a web-based, electronic computer device with secure internet connectivity. Individuals who did not meet all eligibility requirements were excluded from enrolling. Between June 7, 2022 and June 18, 2022, 200 eligible individuals were incentivized with a $25 gift card that was distributed sequentially and emailed to the first 200 participants who completed all assessments.
Assessments
Questionnaires were presented to participants in the order shown below. Participants were required to proceed through questionnaires in the presented order but could decline to complete any questionnaire. Enrolling participants were asked if another person assisted in navigating the online platform, but questionnaires were to be answered by the participant.
Number of Participants Completing each Questionnaire
Demographic, HD History, Genetic Test, HD Care, HD Research Participation, myHDstory Platform User Experience Results for those Completing all Study Questionnaires (n = 345)
Analyses
Statistical analysis focused on the feasibility of online participation and informativeness of questionnaires related to what participants reported about their HD. Descriptive statistics were used to analyze demographic data, self-reported TFC, and other categorical data, including t-tests for comparing means and Fisher’s exact tests for comparing proportions.
HD-PROP verbatims were categorized into domains and symptoms of HD using natural language processing (NLP) and human-in-the-loop curation, based on prior work in PD [2, 3], and modified for use in HD.
Symptom definition phase
The curators (N = 6) included an assembled group of four HD clinical experts and two “experience experts” who were provided with sample verbatim participant replies (“verbatims”). (The clinical experts had all diagnosed and treated individuals with HD, participated in HD clinical research studies, and had neurological, neuropsychological, or psychiatric training. The experience experts were either at risk for HD or a caregiver for family members with HD.) The curators were tasked with creating a symptom definition table that contained clinically recognizable categories (“domains”) and specific symptoms of HD. Formulation of HD Domains and Symptoms, and identification of 14 Domains, was based on prior curation approaches undertaken in PD, and modified for use in HD [2, 3].
The curators were also provided inclusion and exclusion boundaries for each specified HDsymptom.
NLP classification phase
In this phase, the curators were divided into three groups, each with one clinician and an experience expert to label and classify symptoms in their domains. Verbatims were sampled from the dataset based on the symptom definition table for each symptom. For each symptom with five or more verbatims, approximately 25% of verbatims were retained for the validation phase (N = 128), and the remaining (N = 211) were sent to the curators for classification. Symptoms with insufficient or ambiguous samples were adjudicated by the study principal investigator (PI, KEA) and co-principal investigator (Co-PI, IS) for symptom classification.
The curation process involved classifying each verbatim as fitting or not fitting in the particular symptom category, based on the inclusion and exclusion boundaries in the symptom definition table, and providing specific terms within the verbatim that helped identify them. The curation process was administered using Redcap, a secure HIPAA compliant survey tool. The responses from curators were combined, and the curation groups met with the study PI to further evaluate and adjudicate verbatims that did not have the same classification. For those instances where the two curators disagreed, an adjudication session was conducted, results reviewed, and a final determination made after discussion with the PI. The complete classified list of HD-PROP Domains and Symptoms is included as Supplementary Table 1. The 14 Domains included 6 Motor groups (Chorea, Bradykinesia, Gait, Other Motor, Postural Instability, Rigidity) and 8 Non-Motor groups (Autonomic Dysfunction, Cognition, Fatigue, Genetic/Hereditary, Side Effects, Pain, Psychiatric, Sleep). It should be noted that the Psychiatric Domain encompasses: affect/motivation/thought-perception/other psychiatric problems.
The final classified set of verbatims and terms were used to build a preliminary algorithm that was further expanded using Unified Medical Language System (UMLS) ontologies and NLP techniques such as word vectorization. A database of the verbatims was then created using Neo4j, a relational graph database tool, which used phrase query extraction in conjunction with the preliminary algorithm to classify all verbatims. The algorithm was further subjected to fine-tuning and optimization through manual inspection, as described in detail in prior work [3].
Data validation and optimization phase
The consolidated validation set of 128 retained (held-back) verbatims, regardless of their symptom classifications, were used to validate the machine-classified verbatims by all curators. As verbatims can be ambiguous and have multiple interpretations [7], this multi-label approach helped to determine the algorithm’s performance as well as establish the reliability of categorization by each of the six curators.
A validation sheet containing all 128 verbatims was created in MS Excel. Ten dropdown lists, containing all defined symptoms against each verbatim, were made available to the curators for selecting multiple symptom categories. A free-text field for additional symptoms and comments was made available, in case more verbatims could be classified into further symptoms.
All curators including the study PI participated in this validation exercise. If verbatims were classified by at least 3/6 (50%) curators and the PI, then the classification was considered valid. In certain situations, especially in the case of nuanced verbatims without 3/6 curators’ selection, the PI adjudicated the final classification. Of the 128 verbatims, 19 were marked by the curators as uninterpretable, and 109 verbatims were retained for validation. Metrics including accuracy (True Positive + True Negative) / (True Positive + False Positive + True Negative + False Negative) and F-1 score (2 x precision x recall)/(precision + recall) were calculated to validate the performance of the algorithm. These methods are detailed in [3] and shown in Table 3.
Analytic phase
The algorithm was further refined based on the validation metrics and feedback from the curators, after which the algorithm was operationalized for the programmed classification of verbatims.
RESULTS
Data were collected between March 23, 2022 and August 8, 2022 from 620 consenting participants; 345 (55.6%) completed all eight study questionnaires (see Fig. 1 for cumulative recruitment graph, and Table 1 for questionnaire completion numbers).

Study enrollment based on date of consent.
Cohort characteristics
The 345 participants who completed all 8 study questionnaires were 34.5±9.9 (mean±SD) years old and 60.9% male. Reported US residence included 37 states, with most participation in California (78 participants), New York (48), Florida (41), Georgia (24), and Texas (24). They reported an average HD age of onset in their mid-twenties, based on age of onset 9.5±8.4 years ago. Demographic and other features of the cohort are summarized in Table 2.
HD care and research
215 (63.5%) of respondents reported receiving HD specialty care. Only 60% of all participants reported they had undergone genetic testing for HD, despite all reporting they were diagnosed with HD by a doctor. The majority of participants had not been in a clinical trial or an observational study. These and other results are summarized in Table 2.
HD-PROP verbatim reports were obtained from 416 participants, representing 345 participants who completed all questionnaires, and another 71 who completed the HD-PROP but did not complete all questionnaires. Among these 416 participants, the average number of reported HD-related problems was 1.4. Of these, 20 participants (4.8%) entered data that were judged to be uninformative or unanalyzable because they were not recognizable as words, such as strings of numbers or foreign language characters. Another 133 participants (32%) had responses indicating no problems due to HD (e.g. “Good”, “No Problem” or “None”), resulting in 263 participants with reported problems and functional consequences. In total, 339 verbatim reports were collected among the 416 participants. Of this 211 were used for curation and symptom classification process. The remaining 109 verbatim reports were used to validate the algorithm.
In the algorithm validation phase, full (2/2 curators) agreement between curators for verbatims ranged from a low of 29% (for Mental Alertness Awareness) to 100% (for Loneliness-Isolation, Anger-Irritability, Concentration-Attention, Hallucinations, Apathy). For symptoms such as mental alertness and awareness, where there was more subjectivity in curation, differences were resolved through discussion with curators and review/updating of category boundaries. Concordance between the machine classified output and the curators classified output was considered to evaluate accuracy. Accuracy (proportion of verbatims correctly classified) ranged from 98% (Pain/Discomfort), and Balance to 100% for several symptoms (Anxiety-Worry, Depressive Symptoms, Chorea, Executive Abilities, Working Memory, Language Word Finding, Fatigue, Gait NOS, Speech, Swallowing Problems; see Table 3).
Validation metrics for symptoms reported by > 5% of participants
True positive (TP) = the number of cases correctly identified as symptom, False positive (FP) = the number of cases incorrectly identified as symptom, True negative (TN) = the number of cases correctly identified not as symptom, False negative (FN) = the number of cases incorrectly identified not as symptom, Accuracy = (TP + TN) / (TP + FP + TN + FN), Precision = TP / (TP + FP), Recall/Sensitivity = TP / (TP + FN), Specificity = TN / (TN + FP), F1 score = (2×precision×recall) / (precision + recall) where F-1 of 1.0 equals perfect precision.
Among the 263 participants who completed the HD-PROP (Fig. 2), non-motor symptoms were reported by 167 (63.5%) and motor symptoms by 96 (36.5%). The most frequently reported symptoms in the non-motor domain were classified as related to depression, reported by 61 (23.2%) of respondents, Anxiety-Worry, Concentration-Attention, and Memory symptoms, reported by 45 (17.1%) of respondents and Anger/Irritability symptoms reported by 37 (14.1%), Pain-Discomfort 34 (12.9%), and Swallowing Problems in the Autonomic Dysfunction Domain 15 (5.7%). Within the motor domains, the most commonly reported symptoms were: Chorea/Tremor/Restlessness 38 (14.4%), and Slowness 27 (10.3%) (Table 4 and Supplementary Tables 3 and 4). There were no significant age or sex differences between those reporting non-motor versus motor symptoms.
Domain and symptom reporting by participants reporting problems due to HD on the HD-PROP, showing symptoms reported by≥5% *
* Note that 416 Participants had PROP Data. Of these, 20 had data that were judged to be uninformative or unanalyzable, and 133 had responses indicating no problems due to HD, resulting in 263 participants with complaints.

Distribution of Motor and Non-Motor Symptom Verbatims.
Compared with the 133 participants who did not report problems, the 263 reporting problems on the HD-PROP were more likely to be non-Hispanic (62% versus 51.9%, p = 0.0005), Caucasian (44.9% versus 33.1%, p < 0.0001), older (mean age 35.6 versus 32.7, p = 0.0054), and have a higher mean Self-Reported TFC (6.8 versus 6.1, p = 0.0233).

Self-Report TFC Scores.
DISCUSSION
The primary aim of Making HD Voices Heard pilot study, the first on the myHDstory platform, was to examine the feasibility of online research data collection among participants who affirmed they had been diagnosed with HD and to interpret the problems that bothered participants due to their HD. The ability to recruit and consent 620 HD participants over 4.5 months was robust compared with the much slower and more deliberate pace of enrollment in clinical site-based studies [8 –11]. The reach of recruitment to 37 U.S. states and the 53% non-Caucasian racial diversity of the online cohort represented a large geographic catchment area. Approximately 55% of those who signed consent completed all eight study questionnaires, suggesting that large numbers of participants are needed to obtain complete data sets with a completely virtual study. It was also a more diverse population than the typical > 90% Caucasian participants representative of clinical site-based research [8 –12]. The direct-to-participant online approach is thus feasible as an operational research model of participants who self-identified as diagnosed with HD, at least in this cohort characterized overall by age of onset in their mid-twenties and current age in mid-thirties.
The second aim of the pilot study was to examine what HD participants reported in their own words as bothersome problems. Nearly two-thirds of participant-reported problems were non-motor, again affirming that HD from the patient’s point of view is much more than a movement disorder. This method for human-in-the loop curation and machine learning interpretation of verbatim reports has been used previously with a large database of > 25,000 PD patients [3]. PD patients reported an average of 3.4 problems that were classified as predominantly motor symptoms of tremor (46% of respondents), gait- balance problems (> 39%), and pain-discomfort (33%). As in PD, classification agreement between HD curators was high for most symptoms, indicating that verbatim responses were interpreted similarly both clinician and experience expert HD curators. The overall 0.99 accuracy and 0.96 precision of HD verbatim classification indicated the resulting machine learning model we developed on this curated data set was robust.
While motor symptoms referable to chorea were predominantly reported by HD participants, slowness (bradykinesia) and balance problems were also relatively common. The non-motor problems most commonly reported were classified as anxiety, depression, pain, impaired concentration, memory, and deficits in executive abilities. These symptoms are well recognized by clinicians but are also troubling problems for patients. This is clinically important since there are effective interventions for many non-motor HD symptoms, including dietary modifications to maintain weight, exercise to preserve balance and strength, and psychosocial support and pharmacotherapy for psychiatric symptoms [13].
Notably, about one-third of participants reported no problems due to their HD. This seemingly high proportion of participants who do not report problems may not be surprising to clinicians who may ascribe the lack of complaints to neglect, apathy, or anosognosia reflecting underlying neurodegeneration. But it seems likely that psychosocial factors such as compensatory behaviors, clinical context, personality, social milieu, fear of consequences or abandonment, or despair contribute to ‘no problems’ or lack of problems among HD patients. Reporting by family and care partners should help better inform problems bothering reticent HD patients and improve their care.
This pilot study was innovative in several ways. Direct-to-participant online research in HD patients, using keyboard and voice entry without clinician involvement, has not been reported. What bothers HD patients as reported online in their own words has not been previously analyzed in a systematic fashion. The TFC self-report has never been acquired online.
While online engagement of HD patients as research participants proved feasible in many respects, several shortcomings and lessons learned emerged from this pilot study experience. A major limitation of this direct-to-participant approach relates to authentication of research participants as diagnosed HD patients and documentation of participant-reported data and shared data such as CAG reports. It seems unlikely that enrolled participants would misrepresent themselves as having been diagnosed with HD, but imposter syndromes are recognized in clinical medicine. Providing gift card incentives for online enrollment and questionnaire completion may further challenge authentication and risk unintended consequences.
Online instructions and navigation could also be improved. A few participants reported their verbatim problem in the 3rd person, e.g., “HD is characterized by ... ” instead of the 1st person ‘my most bothersome problem is ... ” While 3rd person reporting may be a stylistic narrative for some HD patients, it invites copy and pasting text replies from outside sources. Better instructions and improved ease of navigation may discourage inauthentic reporting of others’ words instead of the participants’ ownwords.
Self-reported Total Functional Capacity was lower than expected by eligibility criteria for online study participant. Half of participants reported a TFC score lower than 7, despite the consent form eligibility requiring some degree of independence. Seventy percent of participants reported working full or part time, which is inconsistent with the low levels of capacity. In a clinician-guided study of the self-report TFC where 486 participants were premanifest, early, or late stage HD, Carlozzi and others [5] found the premanifest and early patients rated themselves as more impaired than did clinicians, but this discrepancy was not seen with later stage patients. It may be that the online nature of this study allowed participation by much more impaired individuals with HD than are usually included in more clinically supervised settings, such as the Carlozzi study, where the self-report TFC was administered with instructions that could have been personally clarified for participants. In this online study, technical assistance was permitted for data entry and may have facilitated participation by more advanced patients. Nonetheless, our instructions to participants, particularly for the self-report TFC, may have also lacked precision due to inadequate description of the TFC anchors and basic clarifications like higher number ratings indicate better function.
Participant authentication could be improved by requiring independent review of medical records, but this would be a major and costly imposition for many individuals and not without concerns about privacy protections of individuals and their HD families. One hybrid solution would be to link this direct-to-participant research with a robust multi-site research study such as the ENROLL-HD, which combines annual in-person clinical visits with opportunities for interim or parallel surveys in the setting of a rich biological and genetic database [14, 15]. The racial, ethnic, and geographic diversity observed in our pilot study could be exploited to better connect these underrepresented and perhaps underserved individuals with HD research and clinical sites. The maturation of telemedicine as a clinical and research tool as well as the development of the virtual UHDRS (vUHDRS) to obtain clinical assessments remotely could further combine the advantages of bringing research to the patients who may contribute data in the comfort and privacy of their home environment. The participants in this study were much younger, with an average age of 35, than those in prior observational studies of HD, such as Enroll-HD [14]. This may be due to the online nature of the study, allowing for younger individuals, who require more flexible study schedules, to participate.
Other limitations
Access to technology may be a challenge, particularly for older individuals and rural populations, who may lack devices, internet access, or both. Self-report may introduce certain reporting biases, so that our population may not be fully representative of HD patients. We studied those identifying as having an HD diagnosis, so results cannot yet be generalized to at-risk or high-risk-diagnostic individuals who also have problems and symptoms to report. The lack of clinician-based verification means there was no one to follow up on ambiguous answers or clarify assessment questions if they were not clear to participants. Improved wording and clarity on assessments could help some of these issues, but others are difficult to address in a purely self-report format. Inconsistencies in data, such as a self-report TFC, cannot be corrected in real time. Transcription of voice answers may not be exact, especially for participants with dysarthria. Since voice prints were not obtained, we were not able to review recordings and verify voice transcription accuracy.
Classification of the verbatim reports relies on interpretation by curators of what research participants report in their own words. To help address this concern, we included ‘experience experts’ who were family members living with HD and engaged in a consensus approach as curators to interpret verbatim replies based on their clinical and personal experiences.
Notwithstanding the limitations of our pilot study, many insights emerged. The HD-PROP is an innovative research instrument that was first used in 2012 to capture the problems facing HD patients who were participating in the REACH2HD randomized clinical trial [11]. In this setting, the HD-PROP was deployed to enable participants to report their problems in their own words to the site coordinator. The verbatims were transcribed on paper and in turn analyzed by relatively primitive natural language processing tools and unwieldy cluster analyses. Nonetheless, the HD-PROP output yielded results that paralleled the cognitive outcomes of interest. The development and maturation of the PD-PROP in the past four years, using machine learning and advanced curation techniques derived from a dataset of > 25,000 PD patients [2 –4] attest to the potential of this clinically meaningful tool in research settings. The PD-PROP can chart what bothers PD patients and the functional impact of their problems, provide a natural history of illness from the patient’s point of view, and serve as exploratory outcome measures in clinical trials. The HD-PROP has similar potential in observational and therapeutic research as the tool is applied to a larger sample size of diagnosed HD participants, calibrated along with categorical scales such as the HD-HI [16] and extended to the larger population of individuals who are known to carry the HD gene but have not been diagnosed.
Our human in the loop curation methods enabled interpretation of verbatim reports obtained directly from people affected by HD. These clinically meaningful data enrich our understanding of how people with HD feel and experience their illness. Use of an online platform can be particularly powerful for a rare disease, allowing for rapid enrollment and data collection from a wider geography and including more diverse populations than typically found in clinical trials. Future online research should improve authenticity, documentation, instructions, and support, and be extended to how reported problems change over time and in the pre-diagnostic phases ofHD [17].
Footnotes
ACKNOWLEDGMENTS
The authors have no acknowledgments to report.
FUNDING
This study was sponsored by the Huntington Study Group. Curation was funded by a pilot grant to Dr. Anderson from the Georgetown University Department of Psychiatry.
CONFLICT OF INTEREST
Karen E. Anderson was paid by the Huntington Study Group for her work as PI of this project. She is an Editorial Board Member of this journal, but was not involved in the peer-review process nor had access to any information regarding itspeer-review.
Lakshmi Arbatti is an employee of Grey Matter Technologies, a wholly owned subsidiary of Modality.AI.
Abhishek Hosamath is an employee of Grey Matter Technologies, a wholly owned subsidiary of Modality.AI.
Andrew Feigin is the Chief Medical Officer for the Huntington Study Group.
Jody Goldstein is an employee of the Huntington Study Group.
Elise Kayson is the VP of Clinical Operations for the Huntington Study Group.
Brett L. Kinsler is an employee of the Huntington Study Group.
Lauren Falanga is an employee of the Huntington Study Group.
Lynn Denise is an employee of the Huntington Study Group.
Noelle E. Carlozzi was paid by Georgetown University grant funding (Karen Anderson, PI) for curation work on this project.
Samuel Frank was paid by Georgetown University grant funding (Karen Anderson, PI) for curation work on this project. He is Co-chair of the Huntington Study Group.
Katie Jackson was paid by Georgetown University grant funding (Karen Anderson, PI) for curation work on this project.
Sandra Kostyk was paid by Georgetown University grant funding (Karen Anderson, PI) for curation work on this project. She is Co-chair of the Huntington Study Group.
Jennifer L. Purks was paid by Georgetown University grant funding (Karen Anderson, PI) for curation work on this project.
Kenneth P. Serbin is also known by the pen name Gene Veritas, which he uses for his blog on Huntington’s Disease,
. He was paid by Georgetown University grant funding (Karen Anderson, PI) for curation work on this project.
Shari Kinel is the CEO of the Huntington Study Group.
Christopher A. Beck is a statistician for the Huntington Study Group.
Ira Shoulson is an employee of Grey Matter Technologies, a wholly owned subsidiary of Modality.AI.
The UHDRS®, myHDstory®, and vUHDRS® are properties of the Huntington Study Group.
The HD-PROPtrademark is property of Modality.AI.
DATA AVAILABILITY
The data supporting the findings of this study are available on request from the Huntington Study Group. The data are not publicly available due to privacy, ethical restrictions, or other concerns.
