Abstract
Background:
Huntington’s disease can present at almost any age but traditionally, those with an onset ≤20 years are described as having juvenile onset Huntington’s disease (JOHD). They are more likely to have bradykinesia and dystonia earlier in the course of the disease. The Total Motor Score of the Unified Huntington’s Disease Rating Scale (UHDRS-TMS) is often used as the principal outcome measure in clinical trials.
Objective:
To identify a motor scale more suitable for JOHD patients.
Methods:
A working group reviewed the UHDRS-TMS and modified it by adding four further assessment items. Rasch analysis was used to study the performance of the modified scale in 95 patients with a mean age of 19.4 (SD 6.6) years.
Results:
The initial analysis showed a significant overall misfit to the Rasch model and a number of individual items displayed poor measurement properties: all items relating to chorea displayed significant misfit due to under-discrimination. Additionally, a number of items displayed disordered response category thresholds, and a large amount of dependency was present within the item set (96 out of 741 pairwise differences = 13%). An iterative process of scale re-structuring and evaluation was then undertaken, with a view to eliminating the largest sources of misfit and generating a set of items that would conform to Rasch model expectations.
Conclusion:
This post-hoc scale restructuring appears to provide a valid motor score that is psychometrically robust in a JOHD population. This scale restructuring offers a pragmatic solution to measuring motor function in a JOHD population, and it could provide the basis for the further iterative development of a more useful clinical rating scale for patients with JOHD.
INTRODUCTION
Huntington’s disease (HD) is a progressive neurodegenerative disorder inherited as an autosomal dominant condition. It is characterised by a movement disorder, selective cognitive impairment and disturbance of affect. The disease results from an expansion of a CAG repeated sequence in the first exon of the HTT gene. A CAG repeat length of 36–39 is associated with reduced penetrance whereas 40 or more is considered unequivocally abnormal. Onset may occur at almost any age but it is typically in midlife [1–4]. There is a negative correlation between age of onset and CAG repeat length, consequently those with JOHD often have a higher CAG repeat length.
The term juvenile onset HD (JOHD) is used to describe a patient having an age of onset of ≤20 years: in well developed countries, this represents approximately 5% of cases [5]. This is not a distinct subcategory but remains a distinction because the phenotype is more likely to include bradykinesia and dystonia at an earlier stage of the illness, and little in the way of chorea. Clear problems arise with this definition because of the cut-off age of 20 years: firstly, it is arbitrary; a patient with onset at 18 years may not be significantly different from one with an age of onset of 22 years; secondly, a patient with onset under 10 years may have very different needs from a patient with onset as a late teenager; thirdly, a patient now aged 30 years could still have had age of onset ≤20 years.
Currently, there is no treatment to alter the natural history of the disease but treatment trials are under way for adult onset Huntington’s disease [6] using standardised rating scales as outcome measures such as the Total Motor Score of the Unified Huntington’s Disease Rating Scale (UHDRS) [7].
However, to ensure that the impact of any interventions being studied in clinical trials is assessed accurately, it is vital that the outcome measures adequately represent the constructs that the interventions are meant to treat or modify.
The UHDRS was developed and psychometrically validated using traditional approaches. However, modern psychometric assessment methodologies such as item response theory (IRT) [8–11] and Rasch Measurement Theory (RMT) [12] offer various advantages over classical approaches, and can also reveal potential restrictions that may be present within these existing measures [13, 14].
Where traits cannot be measured directly, these are known as latent traits (e.g., depression, or quality of life). Latent traits are measured by indirect means, which is usually through multi-item scales. Rasch Measurement Theory provides a way to assess multi-item latent scales (i.e., Patient or Clinician-Reported Outcome Measures), to ensure that it is valid to add the individual items together to form an overall total score. The Rasch model is a unidimensional measurement model that satisfies the assumptions of fundamental measurement [15, 16], meaning that it provides a measurement template that scales can be tested against. The assumption that all items contribute independently to the total score is formally tested against the Rasch model, and any measurement anomalies within the item set are highlighted. The application of Rasch Measurement Theory provides a unified framework for several aspects of internal construct validity to be assessed. This includes the assessment of scale unidimensionality (whether all items are contributing to the same underlying construct), response category functioning (whether item response categories are working as they were intended to be used), response dependence (whether the response to any item has a direct implication to the response to any other item), scale targeting (relative distribution of item and person locations), and item bias (whether an item is working in the same way for specific groups, e.g., males and females). When rating scale data conform to the Rasch model, the ordinal raw score can be transformed into a linear, interval scale, thus validating the use of parametric statistical procedures, although it should be noted that the raw score will remain ordinal.
The purpose of this study was to use a Rasch analytic framework to investigate the psychometric properties of a modified UHDRS motor scale on patients with JOHD. The UHDRS has already been assessed within an item response theory (IRT) framework [17], but the majority of subjects within that study had adult onset HD. Additionally, RMT offers a different approach to IRT. Whereas the IRT approach seeks to explain the variance in the data by adjusting the model to fit the data, in the RMT approach the model remains fixed, seeking to obtain invariant measurement by ensuring that items meet the requirements of the fixed model [18, 19].
Furthermore, given the motor phenotype of JOHD, four items that quantified motor features common in JOHD were added to the UHDRS-TMS with a view to seeing if this helped the performance of the rating scale within a JOHD sample.
METHODS
A working group of the European Huntington’s Disease Network met to consider the issue of assessing JOHD patients for intervention studies. Based on the opinion of an expert group, a strategic decision was made to retain the structure of the Total Motor Score scale, but an additional four items were added: 1) an overall chorea score: the current motor scale is heavily weighted towards chorea with an assessment of seven areas (face, buccal, oral, lingual, trunk, right and left upper extremities) and as this clinical sign is less prominent in JOHD, an overall chorea may give a better assessment; 2) a repetitive hand-tapping task which quantifies bradykinesia; 3) a task timing how long it would take to drink 120 ml of water as another measure of capturing slowing of movements; and 4) a maximal tremor score because this has been suggested as a common clinical sign in JOHD. The modified scale has 39 items, each with five response categories, where a higher score represents a higher level of impairment; the additional items and responses are listed in Table 1.
Summary of the four additional items used in this study
This was a sub-study of the European Huntington’s Disease Network REGISTRY project [20]: the ethical approval for REGISTRY included sub-studies. More information on the study can be found at http://www.euro-hd.net/html/registry. Data from this extended JOHD motor assessment rating scale was collected from patients in Europe, the United States and Australia and submitted in paper form to the European Huntington’s Disease Network REGISTRY site at Ulm and entered onto an Excel spread sheet.
Participants
Data were collected from 27 sites using multiple raters. Initially, there were 111 participants (58 females, 53 males) with a mean age of 21.3 years (SD 8.05, Range 6–40). Although it is possible for a person to develop HD as a late teenager and now be aged 40, the object of this study was to focus on the performance of the rating scale among younger patients; therefore, the analysis was restricted to those patients ≤30 years at the time of the study. This reduced the sample size to n = 95.
Rasch analysis of the data
The JOHD motor scale data were analysed using RUMM2030 software to investigate whether the pattern of item responses observed in the data matched the expectations of the Rasch measurement model [22]. The following fundamental aspects of the scale were assessed using this approach: overall fit to the model, adequacy of the response categories, individual item and person fit, local dependency, unidimensionality, differential item functioning (DIF), targeting of the scale and person separation reliability index (PSI). All of these elements have been previously described elsewhere [23–25]. Briefly, the data are compared to the fit assumptions of the Rasch model, so the tests-of-fit should be non-significant for the model assumptions to be satisfied. Individual items should demonstrate chi-square and ANOVA fit statistics >0.05 (Bonferroni adjusted), and the same ranges are applicable for any DIF tests. A residual correlation (Q3) value of 0.2 was used to indicate dependency. This is slightly more lenient than the value of 0.2 above the average correlation that is currently recommended [25], but it was felt that this was a reasonable compromise, given the low sample size involved. A series of t-tests were used to assess unidimensionality [26], where evidence of unidimensionality is apparent when the lower bound 95% CI percentage of significantly different t-tests is <5%.
RESULTS
The 95 patients in this study had a male:female ratio of 45:50. The mean age was 19.4 years (SD 6.6 years). The age distribution was 6–30 years: 13 were aged 6–10 years; 34 were aged 11–20 years and 48 were ages 21–30 years. The mean CAG repeat length was 67.6 (SD 15.6). The median CAG repeat length was 63 (inter-quartile range 56–77) with a range of 46–117.
Analysis
The initial analysis showed a significant overall misfit to the Rasch model (see Table 2, Analysis 1), and a number of individual items displayed poor measurement properties. At this point, the key observation was that all items relating to chorea (both the original 7-area assessment and the overall chorea score) displayed significant misfit due to under-discrimination. This suggests that the chorea items are not usefully contributing to the total score of the item set due to a lack of discrimination within this patient group. An example of this under-discrimination is presented in (Fig. 1). Additionally, a number of items displayed disordered response category thresholds (see Fig. 2), and a large amount of dependency was present within the item set (96 out of 741 pairwise differences = 13%). It was noted that the majority of the dependency was clustering into groups of items that were related to the same concept; for example, dystonia in the right and left upper limb.
– Analysis stage summary fit details

Item Characteristic Curve of the under-discriminating Chorea item. The grey line represents the expected response curve, and the black dots represent the observed data. The ‘flat’ nature of the curve suggests that respondents are obtaining an approximately equal value on the item, regardless of their underlying level of motor function (as represented by the x-axis), and is therefore not discriminating across the level of motor function.

Example of items with ordered thresholds (Max. Finger Taps – upper plot) and disordered thresholds (Bradykinesia Drinking – lower plot). Each curve represents the inferred probability distribution of persons responding in a particular response category, given their underlying level of motor function. Each response category should emerge at some point as the most likely response on the underlying scale. In turn, all response category thresholds should therefore be ordered (see upper plot). Where response categories do not function as intended, response category thresholds become disordered (see lower plot).
An iterative process of scale re-structuring and evaluation was then undertaken, with a view to eliminating the largest sources of misfit and generating a set of items that would conform to Rasch model expectations. The summary results of each analysis stage are presented in Table 2.
Analysis 2: Restructuring of the data to reduce dependency
The first stage of data reconstruction was carried out in order to reduce the impact of item dependency. This was done by creating a number of composite items, which considered the maximum level of impairment shown within a dependent cluster of items. This approach conserves the informative clinical information, whilst taking account of the psychometric impact of inter-item dependency. Practically, this meant that the ‘right’ and ‘left’ elements of the items: pronate/supinate hands, rigidity of arms, finger taps, bradykinesia, hand-tapping, tremor upper extremity and tremor lower extremity were combined into single items, with the highest value of the right and left elements selected for the composite item. Similarly, the assessment of maximal dystonia was reduced from five elements down to three, with the retention of ‘trunk’ as a single item, but combining the ‘right’ and ‘left’ elements relating to the upper and lower extremity. Additionally, the assessments of eye movements were restructured to combine the ‘horizontal’ and ‘vertical’ elements of the ocular pursuit, saccade initiation, and saccade velocity items.
None of the chorea items appeared to discriminate across the level of motor function in the initial analysis, and all elements clustered in terms of the evidence of dependency. To investigate whether the under-discrimination remained present with a single maximal chorea value, all of the original 7 assessments of chorea were combined into a single maximal chorea item, selecting the highest observed value across all elements. The global chorea item was also retained individually at this stage.
These amendments resulted in the reduction of the item set from 39 to 22 items, each with five available response categories. These amendments are represented as ‘restructure run 1’ in Table 3, and the fit statistics are presented as Analysis 2 in Table 2.
Restructure framework
Analysis 3: Further restructuring of the data to remove dependency
A degree of response dependency was still present at this point, leading to further restructuring. This included the combination of all elements of saccade initiation and velocity into a single composite item, all elements of dystonia into a single composite item, and all elements of tremor into a single composite item. These further amendments resulted in the reduction of the item set to 18 items, each with five available response categories. These amendments are represented as ‘restructure run 2’ in Table 3, and the fit statistics are presented as Analysis 3 in Table 2.
Analysis 4: Removal of chorea due to under-discrimination
At this stage, both the newly created chorea composite item and the original global chorea item continued to display a large degree of misfit due to under-discrimination. Both of these items were duly removed from the item set, therefore reducing the scale to 16 items. The fit statistics at this stage are presented as Analysis 4 in Table 2.
Analysis 5: Collapsing of response categories due to disordered response
After accounting for the majority of the response dependencies and removing the clear anomalies of the chorea items, six items were still displaying disordered response categories, indicating that they did not seem to be functioning in the intended manner. With guidance from the patterns of disorder and expert input from an experienced HD clinician, the response categories of these six items were collapsed. In the case of the items relating to the retropulsion pull test, maximal dystonia and Luria tri-step score, the response categories were reduced from five to four. In the case of the items relating to tongue protrusion, maximal rigidity of arms and bradykinesia of drinking, the response categories were reduced from five to three. Although this is presented as a single analysis stage, all rescoring was carried out iteratively. The rescoring structure is presented in Table 4, with the pre and post-rescoring response category threshold plots presented in (Fig. 3). The fit statistics at this stage are presented as Analysis 5 in Table 2.
Rescoring structure

Item threshold map pre and post-rescoring. Plot shows the relative location of all response category thresholds across all items. Where no plot is shown for an item, the thresholds were disordered.
Analysis 6 and 7: Removal of tremor and dysarthria items
Following the rescore, the ‘max tremor’ composite item was still displaying a misfit due to an under-discrimination (similar to the chorea items). Additionally, the ‘dysarthria’ item was also displaying a misfit, although this item was indicating an over-discriminating response pattern in addition to displaying a lower-level dependency across a number of items, which seemed to be adversely influencing the dimensionality of the item set. The ‘max tremor’ and ‘dysarthria’ items were iteratively removed from the item set, and the fit statistics are presented as Analysis 6 and 7 respectively, in Table 2.
Following the removal of these two items, the overall fit was good (chi-square p = 0.297), individual item fit was good, dimensionality was acceptable (7/95 = 7.37% [95% CI = 3–11.8%]), all response categories were ordered, and there was no DIF by gender, handedness, or the version of the scale that was administered (adult or JOHD-specific). The relative person-item distribution (targeting) plot also indicated that the remaining item set still captured the full range of impairment, meaning that the scale reliability statistics remained high (see Fig. 4). However, some dependency remained apparent between the two items measuring balance (retropulsion pull and tandem walk) (0.316), and a lower-level dependency between the ‘max rigidity arms’ and ‘max dystonia items’ (0.238).

Relative person-item threshold distribution of final 14 items. Plot shows the relative distribution of logit locations of all item thresholds (below x-axis) and persons (above x-axis). These distributions should align where a scale is well-targeted to the population being measured.
DISCUSSION
To our knowledge, this is the first attempt to assess the motor scale of the UHDRS on a population of people with JOHD. The motor scale revealed areas of significant dependency between clusters of items relating to different elements of common concepts, including a consistent dependency between the right and left side of impairments to motor function. A post-hoc correction was applied to account for the apparent dependency, which also went some way to alleviating the apparent lack of unidimensionality within the item set. At a practical level, this reduces the amount of items that are summed to form the total score, but the key clinical information relating to the location of impairment is retained. Given the nature of the observed dependencies within the item set, it is reasonable to expect a similar finding to be observed within the UHDRS motor scale when applied to patients with adult onset HD.
Additionally, it appears as though the five-category response options are not working as intended across all items. This can mean that respondents are not differentiating between certain responses, and therefore the response choice between these categories becomes arbitrary. This may be the case where too many response options are presented, or if response category labelling is unclear, inconsistent, or difficult to quantify. For example, this appeared to be the case with the ‘tongue protrusion’ item, where respondents appeared to have difficulty distinguishing between response categories labelled such as ‘cannot fully protrude tongue’ and ‘cannot protrude tongue beyond lips’. In this case, the analysis has highlighted an issue which is made clearer when put into context. As these two response categories are not mutually exclusive, it is perhaps not surprising that respondents are not clearly differentiating between the available response options. Again, a post-hoc adjustment was made to the scale by collapsing some of the response categories of the items displaying disordered thresholds. Although this improved scale fit, it should be noted that the actual response options (as presented) have not changed. The process of collapsing merely treats certain response options as equivalent; the practical application of this is described below.
Furthermore, it was recognised from the outset of the original scale development, that the motor scale was heavily weighted towards chorea, and for this reason the global chorea score was introduced. Within the reference frame of the JOHD sample, it was shown that none of the items relating to chorea were usefully discriminating across different levels of the trait, including the new global chorea score. It is to be expected that a worsening chorea score would be associated with a worsening overall motor score, but this was not found to be the case in the JOHD sample. This has significant implications for clinical trials because the UHDRS-TMS is heavily weighted towards chorea. The objective of current research is to provide disease modifying treatments. The recent work by Schobel et al to modify the principal outcome measure is helpful but still uses theUHDRS-TMS in the formula [27].
Although the amended (collapsed) bradykinesia drinking item appeared to work well within the final item set, the feedback from scale users has raised concerns regarding this item. These concerns relate to the practical aspect associated with drinking 120 ml of water, as this may create a potential choking hazard. This item should therefore be considered carefully within any future scale development.
Of the remaining new items: the assessment of tremor did not prove useful because it was removed at the analysis stage; however the bradykinesia hand tapping assessment was retained.
The major limitation of this study was the sample size that was available for analysis (n = 95). A sub-analysis to investigate the effect of CAG repeat length on the performance of the scale could not be undertaken. However, given the relative rarity of JOHD, it took an enormous amount of time and resource to collect a sample of this apparently modest size, so this sample size should be considered in context. It is acknowledged that this small sample size may result in more instability surrounding the analysis results. However, the magnitude of the results and the replication of the key findings suggest a larger sample would deliver largely equivalent results. A further limitation is that age of onset was not collected systematically as data were collected separately from the main REGISTRY data, so we were not able to stratify the results into those with onset ≤10 years from those with onset in their late teens. We tried to mitigate this limitation by restricting the study to those ≤30 years of age.
Conclusion
This study was designed to assess the use of a modified motor scale in JOHD. The modifications from the adult-oriented UHDRS motor scale were modest, as it was thought that making additional assessments would result in the scale being more focused for the JOHD population. This study has revealed some significant implications for assessing future treatments in young people with HD. The analysis suggests that it is not valid to sum the 39 items (35 original, and 4 newly added) of the UHDRS-JOHD motor scale into a single total score.
On the basis of this result the HD research community could consider the following options: (1) Continue using the UHDRS-TMS and accept its limitations; (2) Retain the UHDRS-TMS, but additionally include the new hand tapping task; (3) Retain the UHDRS-TMS plus the hand tapping task so that it does not look any different to the practising clinician undertaking an intervention study. However, it will be necessary to pre-specify that any outcomes will be based on post-hoc changes to the scale, as described in this study; (4) Redesign the JOHD motor rating scale from the ground up, using qualitative and quantitative methods to iteratively develop a motor rating scale which is clinically meaningful.
Ideally, a rating scale should focus on functional assessments rather than an amalgam of clinical signs. Although option 4 would perhaps present the best solution, in the short term we would recommend option 3 because it has the practical merit of being easily adopted without significant further work. However, option 4 should be considered as a future direction in which to take this informative work, which is presented as part of an iterative development process.
CONFLICT OF INTEREST
The authors have no conflicts of interest to report in relation to this work.
Footnotes
REGISTRY Investigators of the European Huntington’s Disease Network
AUSTRALIA
AUSTRIA
BELGIUM
FRANCE
GERMANY
ITALY
POLAND
PORTUGAL
SPAIN
SWEDEN
SWITZERLAND
U.K.
