Abstract
Background:
Huntington’s disease (HD) is an autosomal dominant, neurological disease caused by an expanded CAG repeat near the N-terminus of the huntingtin (HTT) gene. A leading theory concerning the etiology of HD is that both onset and progression are driven by cumulative exposure to the effects of mutant (or CAG expanded) huntingtin (mHTT). The CAG-Age-Product (CAP) score (i.e., the product of excess CAG length and age) is a commonly used measure of this cumulative exposure. CAP score has been widely used as a predictor of a variety of disease state variables in HD. The utility of the CAP score has been somewhat diminished, however, by a lack of agreement on its precise definition. The most commonly used forms of the CAP score are highly correlated so that, for purposes of prediction, it makes little difference which is used. However, reported values of CAP scores, based on commonly used definitions, differ substantially in magnitude when applied to the same data. This complicates the process of inter-study comparison.
Objective:
In this paper, we propose a standardized definition for the CAP score which will resolve this difficulty. Our standardization is chosen so that CAP = 100 at the expected age of diagnosis.
Methods:
Statistical methods include novel survival analysis methodology applied to the 13 disease landmarks taken from the Enroll-HD database (PDS 5) and comparisons with the existing, gold standard, onset model.
Results:
Useful by-products of our work include up-to-date, age-at-onset (AO) results and a refined AO model suitable for use in other contexts, a discussion of several useful properties of the CAP score that have not previously been noted in the literature and the introduction of the concept of a toxicity onset model.
Conclusion:
We suggest that taking L = 30 and K = 6.49 provides a useful standardization of the CAP score, suitable for use in the routine modeling of clinical data in HD.
INTRODUCTION
Huntington’s disease (HD) is an autosomal dominant, neurological disease caused by an expanded CAG repeat near the N-terminus of the huntingtin (HTT) gene on chromosome 4. A leading theory concerning the etiology of HD is that both onset and progression are driven by cumulative exposure to the effects of mutant (or CAG expanded) huntingtin (mHTT) via the huntingtin protein (whole or toxic fragments) or RNA [1–4]. The CAP score (CAG-Age-Product, i.e., the product of excess CAG length and age) is a commonly used measure of this cumulative exposure that has been widely used as a predictor of a variety of disease state variables in HD including imaging (structural MRI [5, 6] diffusion tensor imaging [6, 7] and PDE10A PET imaging [8]), various empirical measures of HD signs and symptoms [9], wet biomarkers (neurofilament light [10]), the onset of motor symptoms [11], and disease stage [12].
As defined in [9] CAP has the following general
form
Since mHTT is present from conception in HDGECs, the CAP score can be thought of as the product of a measure of a toxic insult (excess CAG length) with the time over which the toxic insult is exercised (given approximately by age). In this respect CAP score is similar to the pack-years measure used in the study of the toxic effects of tobacco or the area under the curve (AUC) measure used in PK/PD analysis and toxicology. This equivalence assumes that CAG length is constant from conception, an assumption that is made in all analyses presented below. It is now known, however, that CAG length is unstable and tends to expand somatically, in a tissue-dependent manner [13], particularly in the striatum, a brain region where the most pronounced pathology is observed. CAG, as it appears in Equation 1, is measured (as is routinely done) in white blood cells which are believed, on the whole, to retain their original baseline values from conception. Reliable measurements of CAG length in living human brains is not currently available and, as a result, the practical significance of somatic expansion and the degree to which somatic expansion is reflected in the CAP score, remain unclear. We will return to these matters of interpretation in the discussion.
The utility of the CAP score has been somewhat diminished by a lack of agreement on the values of L and K across studies. The most common values for L that appear in the literature are L = 33.66 [11, 15]; L = 35.5 [16, 17]; and L = 30 [5, 9]. K has received less attention. Two papers have linked K with the expected age at motor onset [5, 9] such that CAP = 100 at this age. This linkage was carried out via a simple time-to-event model which models the time between birth and onset (i.e., age-at-onset, AO). A similar time-to-event model and value of K was presented in [11] but the starting time for this model was entry into the Predict-HD study [18] and K was chosen so that CAP = 1 when there is a 50% chance of a diagnosis in the next 5 years. More commonly the use of CAP with L = 33.66 or L = 35.5 is combined with K = 1 leaving the connection with AO undefined. The value of 35.5 is not supported by any model for AO but it coincides with the lower limit of CAG length at which HD diagnosis has been confirmed (36 repeats). CAP scores with all the above values of L and K are highly correlated (Table 1) so, for purposes of prediction, it makes little difference which is used. However, [19] reported values of CAP scores, computed on the same data, differ substantially when different values of L and K are used and CAP scores based on differing values of L, when evaluated at the age of onset, have substantially different correlations with CAG length. Finally, the relationship between summary statistics (i.e., means and standard deviations) based on CAP scores that are computed using differing values of L and K is dependent on the distribution of ages and CAG lengths in each study. All of this complicates the process of inter-study comparison. Standardization is clearly needed.
Correlations of CAP Scores for L = 27 to 35.5
In this paper, we propose that L = 30 and K = 6.49 be the preferred values for these parameters in HD research, unless special circumstances dictate otherwise, or a more physiologically based model becomes available. As we show below, the justification for this choice is that these values ensure that CAP 100 at the expected age of onset (under a reasonable definition of onset and a reasonable onset model). In addition, the use of this choice of L and K greatly reduces the correlation of CAP scores, evaluated at the age of onset, with CAG lengths. The most common use of the CAP score is in the prediction of continuous disease state variables in natural history studies which raises the concern that use of CAP scores with values of L substantially different from 30 may induce spurious correlations with CAG length.
To justify the above choice for L and K, we Define a novel AO model
that extends previous well-accepted models Use this model to compute values of L and K that
cause CAP ∼100 at the expected age-at-onset Validate the new model by showing that it produces AO results that
are:
Definitions of onset variables
A non-parametric method for deriving the optimal value of L is also presented.
Useful biproducts of the above program are Up-to-date AO results for the publicly available periodic dataset
(PDS5, Oct 31, 2020) release from Enroll-HD A refined AO model suitable for use in other
contexts A discussion of several useful
properties of the CAP score that have not previously been noted in the literature:
these include an alternative parameterization of CAP and a demonstration that,
properly defined, CAP at onset is independent of CAG length The introduction of the concept of a toxicity onset
model.
METHODS AND MATERIALS
Data
All models were fit to data from the PDS 5 release of the Enroll-HD database [20, 21]. Separate analyses for a total of 13 AO variables are presented. The first 12 AO variables appear in the Enroll data set in the Profile data file and reflect retrospective assessments of time of onset from the rater, the participant, and the participant’s family. The final variable DCL4 is defined prospectively in terms of the diagnostic confidence level (diagconf = 4) from the UHDRS motor assessment. Unlike the first 12 variables, DCL4 will always be left censored for participants that enter Enroll in the manifest state. That is, for participants that enter Enroll with diagconf = 4, age-at-onset according to DCL4 is known only to be less than or equal to age-at-study-entry. All 13 variables are defined in Table 2. The primary variable, hddiagn, provides the age of the participant’s medical diagnosis for HD. Like all of the first 12 onset variables, hddiagn differs from DCL4 in that it reflects retrospectively collected information on participants that enter the study in a manifest state. In this respect, the first 12 onset variables are similar to age-at-onset as defined in [19]. A second variable, sxrater, closely related to hddiagn, encodes the rater’s best estimate of the time of first occurrence of HD symptoms. Rater’s are trained but are not necessarily medical professionals. Thus sxrater and hddiagn target different landmarks in the course of the disease. Except for a few rare anomalies, hddiagn will be later than sxrater, particularly if the participant makes infrequent visits to the physician or if, at a given visit, the physician feels that it is in the best interest of the participant to delay a formal diagnosis. The variables sxsubj and sxfam provide retrospective assessments of the participant and the participant’s family as to the first occurrence of any symptom of HD. Variables ccmtrage, cccogage, ccdepage, ccirbage, ccvabage, ccaptage, ccpobage, and ccpsyage record the rater’s assessment of the first time that various symptoms were noted. Symptoms include (in order) motor, cognitive, depression, irritability, violent or aggressive behavior, apathy, perseverative obsessive behavior, and psychosis. See Table 2 for complete definitions of these symptoms. Only the motor symptoms are defined in relation to HD. In all other cases the rater is asked to record the first time that the symptom occurred without reference to the cause.
Following [19] our analysis considered only HDGECs with CAG lengths between 40 and 56 (inclusive) and ages at entry into the Enroll study between 20 and 80 (inclusive). Event times for each of the variables in Table 2 are classified as “Uncensored” (time of event recorded), “Right Censored” (event has not yet occurred at time of last observation) and “Left Censored” (event only known to have occurred prior to the time of last observation) or “Unclassifiable”. Unclassifiable events were dropped from the analysis as were cases with very early onset to be described in the Supplementary Material. Table 3 tabulates the censoring data for each variable. Internal Enroll documents show that study retention is high, particularly among pre-manifest participants (80% over 7 years) so that right censoring is mostly determined by the age of the subject at entry. Left censoring is mostly determined by the ability of participants or their caregivers to recall onset times or by the quality of medical records. Given all of this, we chose to treat censoring as uninformative.
Censoring status by onset variable
1See supplementary material.
CAP score onset models
The CAP score onset models used here are instances of a general class of models (which we
call toxicity onset models) that can be used whenever a toxic insult, suffered over time,
is believed to cause the onset of an event. Suppose that we have a model for the time
course of the toxic insult TOX(t, θ),
where θ is a parameter to be estimated. We define
The toxicity onset model assumes that the event occurs when AUC exceeds
a random limit. In mathematical symbols
The CAP score onset model is a toxicity onset model with
The clock for the event time (T) starts at birth and stops when either
an onset event or a censoring event takes place. Under the model, the cumulative
probability of onset, as a function of CAP, has the following form.
The number CAG0 in Equation 9 is a reference value (or centering constant). Model predictions are not affected by the choice of CAG0 provided that this parameter is chosen to be reasonably close to the population mean of CAG values in the study population. In what follows we always take CAG0 = 43.
The above parameterizations for CAP are equivalent if and only if
By fixing the value of μ0 at 100 in Equation 7, one can find values of L and K (or α and Kα) which force CAP to be equal to 100 at the expected age-at-onset a useful normalizing property. We retain the two definitions of CAP from Equations 8 and 9 because both have their advantages. In particular, the definition in Equation 8 captures the interpretation of CAP as a measure of cumulative toxicity. In contrast, the definition in Equation 9 produces models with parameters that are easier to distinguish (i.e., less highly correlated). More importantly, the special case where CAG length has no effect on AO, occurs when α= 0, L = –∞, making it very awkward to test the hypothesis of CAG independence using the parameterization of Equation 8.
Finally, we note that Equation 7 can be
expressed directly in terms of the age-at-onset (as opposed to the CAP-score-at-onset,
leading to an expression of the form
The parameterization using μ and σ is similar to that used in the model of [19] and is therefore useful when making comparisons with that model. The parameterization of Equation 11 is also useful in comparing models with differing values of L.
Four models (each based on Equation 7)
were fit. Model 1: μ
fixed at 100 Model 2: μ0
fixed at 100, α fixed at 1/13. Model
3: Kα fixed at 6.49/13 (Individually Optimized CAP Score
Model). Model 4: α fixed at 1/13,
Kα fixed at 6.49/13 (Standard CAP Score
Model).
For purposes of estimation, all models are defined in terms of the parameterization of Equation 9. The parameters L and K are then calculated using Equation 10. We note that, when α and Kα are fixed by design, α= 1/13 implies L = 30 and Kα = 6.49/13 implies K = 6.49. Models 1 and 2 are only fit to the age-at-diagnosis variable (hddiagn) and are used to find a definition of CAP score that is 100 at the expected age of diagnosis. Models 3 and 4 are fit to all 13 onset variables. Model 3 allows the effect of CAG length to be modeled separately for each onset variable. In addition, Model 3 is used to test the hypothesis of CAG independence (α= 0). Model 4 represents the recommended standardization for CAP Score.
Model fitting procedures
All models were formulated using standard time-to-event methodology, e.g., [22]. In particular, an uncensored event contributes a factor of f(t1) to the likelihood function, a right censored event contributes a factor of 1 –F(t2) to the likelihood function, and a left censored event contributes a factor of F(t3) to the likelihood function. Here F(x) is as defined in Equation 7; and t1, t2 and t3 are uncensored, right censored and left censored event times (see Section 1 of the Supplementary Material for operational definitions of these event times). Note that derivatives are always taken with respect to time and censoring is always modeled on the time scale.
All parametric survival models were fit using the STAN Bayesian Analysis Software Package [23] accessed through the Rstan package [24]. Models were fit using both optimization (LBFGS algorithm) and Hamiltonian Monte Carlo (NUTS algorithm). All reported modeling results were based on the former algorithm except for the Bayesian confidence intervals that appear in the Supplementary Material, and results on posterior correlations, which were computed using the latter algorithm. When Hamiltonian Monte Carlo was used, four chains with 4000 iterations (1000 of which were warm-up) were generated. Model fits are compared with non-parametric estimates of the survival curves based on observed CAP scores. Non-parametric survival curves were estimated using the Survival package in R [25]. In particular, non-parametric survival plots were obtained by applying the Surv function with interval censoring using CAP score as the time variable: this function implements the algorithm of Turnbull [26]. Data analysis and graphics were done in R version 3.61 [27].
Demonstration that CAP(AO, CAG) is independent of CAG
Under our model, it might be supposed that the CAP score captures all of the effects of CAG length on disease progression. This is a very strong statement and, while plausible in our view, very hard to justify in general. A weaker (but still very strong) form of the above statement is that CAP evaluated at the age-of-onset has a distribution in the population of HDGECs that is independent of CAG length. We present evidence in favor of this latter statement by showing that non-parametric estimates of CAP at onset agree with estimates based on our logistic model whenever sample sizes are large enough to support accurate nonparametric analysis. In addition, we show that the constant L can be estimated with some precision by finding values of L such that CAP at onset is uncorrelated with CAG length. Finally, we present graphs of the correlation of CAP(L) with CAG against L: these may be helpful in assessing the models based on our standardized value of L = 30 versus models based on outcome specific values of L or other values of L that have appeared in the literature.
RESULTS
Determining L and K
Parameter estimates for the primary age-at-onset variable (hddiagn) for Model 1 appear in the first row of Table 4 and show that when μ0 is forced to take on the value 100, the parameters L and α will take on the values 30.674 and (0.081. In light of this, it seemed reasonable to fix L at 30 for the standardized value of CAP and, applying Equation 10, α=1/13 = 0.077. The second row of Table 4 now gives a value of K = 6.594 and Kα = 0.507. The above value for K which is very close to 6.49 that was recommended in [28] and which, for reasons of historical continuity, we would like to retain. This is the basis for the recommendation of L = 30 and K = 6.49 (α= 1/13 and Kα = K/13). We note that the posterior correlation of K and L is –0.98 while the posterior correlation of Kα and α is 0.077: this justifies our use of the parameterization for CAP given in Equation 9 and also provides an explanation for the difference between the estimated parameter values given here and previously reported values and for the very modest changes that these shifts appear to have on model predictions.
Parameter estimates: model 1 and model 2 for onset variable hddiagn
Parameter estimates in Table 6, correspond to Model 4 and reflect results that obtain if the standardized CAP score is used. Finally, Table 5 shows results based on Model 3 which represent fits that are optimal for each individual onset variable. The variability in values of Kα between Tables 4–6 is due to the different parameter restrictions place on these models, as described above. Comparison of the log likelihood statistics between Model 3 and Model 4 appear in Table 7. These values are often statistically significant, sometimes markedly so. However, in light of the very large sample sizes in the current study, it is important to bear in mind that effects that are statistically significant may have little or no clinical significance. In the interest of providing a useful standardization of the CAP score, we adopt a policy of focusing primarily on clinical significance as defined in the graphical representations in Figs. 1–3, which describe variables hddiagn, sxrater, and ccdepag). These figures make use of the parameterization of Equation 11 to compare predictions from the individually optimized fits (Model 3) with the standard model (Model 4). The Supplementary Material (Section 3) provides plots for each of the onset events in Table 2. In many cases, the model fits are seen to be so close that a small random jitter had to be introduced in order to visually distinguish Model 3 and Model 4 results. In other cases, some deviations between Models 3 and 4 are apparent but, at least in our view, these are small.
Parameter estimates: standard model (model 4) by onset variable
Parameter estimates: individually optimized model (model 3) by onset variable
Chi-square tests to compare the standard model (model 4) with the individually optimized model (model 3) by onset variable (df = 1)

Comparison of standard CAP score models with individually optimized models (models 4 and 3) by onset variable hddiagn.

Comparison of standard CAP score models with individually optimized models (models 4 and 3) by onset variable sxrater.

Comparison of standard CAP score models with individually optimized models (models 4 and 3) by onset variable ccdepage.
Validating the onset model
Figures 4 and 5 compare the standard CAP score AO model (for all 13 onset measures) with the model of [19]. Agreement is generally good, except for the psychiatric onset variables that would be expected to follow a different pattern. We note that our preferred onset variable (time-to-diagnosis or hddiagn) produces slightly later AOs than does the model of [19]. In contrast, the model for DCL4 shows a slightly earlier onset than does the model of [19]. It is important to note that the model of [19] was fit to data described in that publication, not to the Enroll data. This data was most similar to the variables sxsubj, sxfam, and sxrater in our data set. The model of [19] was never intended to be used in predicting psychiatric onset. As a result, the large discrepancies in Fig. 5 represent differences in the underlying distributions rather than differences in the modeling procedures.

Comparison of standard CAP score models with the model of [19] (onset variables 1–6).

Comparison of standard CAP score models with the model of [19] (onset variables 7–13).
Figures 6 through 8 compare survival plots for the CAP score Model 4 compared with non-parametric survival curves for each onset variable. The nonparametric estimates of the survival curves are presented in the form of probability distributions together with upper and lower bounds for the 95% confidence interval based on the cumulative hazard function. The figures show that survival plots from the CAP score survival models are closely matched by nonparametric survival plots for the AO variables of Table 2. This provides evidence that the assumed logistic form of Equation 7 is, to a very good approximation, correct. A more detailed treatment of these issues appears in section 6 of the Supplementary Material where the plots of Figs. 6 –8 are presented separately for each value of CAG length between 40 and 56. In our view, all the above analyses produce plots that are remarkably close to the original logistic functional form. This is particularly true when the most common CAG lengths are considered (i.e., CAG lengths between 40 and 50). For CAG lengths larger than 50, the evident lack of fit may be ascribed to sample sizes that are too small for accurate non-parametric analysis.

Survival curves for the standard CAP score model compared with non-parametric survival curves onset variables 1–5.

Survival curves for the standard CAP score model compared with non-parametric survival curves onset variables 6–10.

Survival curves for the standard CAP score model compared with non-parametric survival curves onset variables 11–13.
Figures 9 to 11 present plots of the correlation of (CAG - L) AO with CAG length vs. L for each AO variable. These curves are seen to cross the x-axis at values of L which are remarkably close to the optimal values from Table 6: this suggests that imposing the condition that CAP at onset is uncorrelated with CAG length is sufficient to determine the value of the parameter L. In addition, these plots show that, when evaluated at AO, CAP scores with L = 33.66 and L = 35.5 have markedly higher correlations with CAG length than do CAP scores with L = 30. Only uncensored observations are used in these plots.

Correlation of (CAG –L) AO with CAG length for various values of L for AO Variables 1–5. Vertical lines are drawn at L = 30, L = 33.66, and L = 35.5. L1 is the value of L where the graph crosses the x-axis. L2 is taken from Table 5.

Correlation of (CAG –L) AO with CAG length for various values of L for AO Variables 6–10. Vertical lines are drawn at L = 30, L = 33.66, and L = 35.5. L1 is the value of L where the graph crosses the x-axis. L2 is taken from Table 5.

Correlation of (CAG –L) AO with CAG length for various values of $L$ for AO Variables 11–13. Vertical lines are drawn at L = 30, L = 33.66, and L = 35.5. L1 is the value of L where the graph crosses the x-axis. L2 is taken from Table 5.
Section 4 of the Supplementary Material provides Bayesian confidence intervals for the parameter estimates of the individually optimized model (Model 3) for each onset variable. Of particular interest is the observation that the 95% confidence intervals for α are always bounded well away from 0: this removes any doubt that might remain regarding the effect of CAG length on each and all of the onset variables.
For completeness, Section 5 of the Supplementary Material provides Bayesian confidence intervals for the parameters of the Standard CAP score models.
DISCUSSION
The literature on AO models in HD has now become extensive (see, for example the reviews in [29] and [30]). We do not claim that the onset models presented here have any advantages over existing models except insofar as they isolate the effect of exposure to mHTT (as measured by CAP score) on the prediction of onset. Indeed, it has been shown in [31] and elsewhere that the use of dynamic measures of clinical status can improve the prediction of onset events, over and above what can be done using age and CAG length alone. It was not our intention to advance a particular onset model. Rather our goal is to use the above onset models to create a rational method for standardizing the CAP score.
Specifically, we seek to carry out this standardization at this time in order to Avoid confusion when
comparing CAP scores across studies Update previous justifications of the CAP score using new
data Provide users with an operational
definition of the toxicity that is measured by CAP Elucidate the close connection between AO and
CAP Provide a baseline landmark against
which models for the effect of somatic expansion on AO can be
compared.
On balance, and in view of the above rationale, we feel that our recommendation for the use of L = 30 and K = 6.49 has held up well. At the same time, we realize that some reservations and objections may remain which we will address below.
To begin, we realize that onset (however it is defined) is not an event that occurs at a specific point in time. Onset differs in this respect from archetypal events (like death) that are the basis for time-to-event (or survival) analysis. Nonetheless, we feel that the literature on AO has made useful contributions to our understanding of the HD process and hope that the current work will extend this tradition. In our view, the most promising practical application of the current study lies in facilitating the use of the CAP score in the prediction of continuous disease state variables. Such predictions may be useful in the analysis of natural history studies and in the planning of clinical trials of potentially disease-modifying treatments. In the latter case, the CAP score’s connection with etiology of HD may make it a useful tool for quantifying a hypothesized drug effect; that is, the effect of a hypothetical drug may be likened to the reduction of CAG length in Equation 7 by a given number of repeats.
We would also like to draw attention to our demonstration that the distribution of CAP, evaluated at age-of-onset, is independent of CAG length and the related non-parametric method for determining L. While this observation has a somewhat technical sound to it, we believe that it is important in practice. In particular, we have shown that with values of L in common use (i.e., L = 33.66 or L = 35.5), CAP(AO) has a significant correlation with CAG length. This raises the possibility that the use of CAP scores in regression models with the above values of L could introduce spurious correlations with CAG length. In addition, CAP scores that have CAG dependent distributions when evaluated at AO are, by definition, of questionable validity as measures of the cumulative toxicity of mutant huntingtin.
We would also like to distinguish our work from [32], which elucidates the role of CAG repeats in long term progression of HD without attempting to incorporate this role into an exposure-response model as is done by the CAP score. Also of interest is [33], which demonstrates a role for the CAG dependence of disease progression after the onset event. This contradicts the time-to-event analysis of [34], which argued that the length of the interval between onset and death is independent of CAG length. In our view, the nature of CAG length dependence is altered, but not eliminated, by the onset event. In this respect we view the role of natural aging mechanisms, independent of but complementary to CAG induced toxicity, as a causal factor in the etiology of cognitive and motor decline in HD [35, 36]. It is also likely that disease stage will have an independent effect on toxicities leading to disease progression. While natural aging may have a causal effect on HD onset, it would be very difficult to estimate such an effect due to the sparsity of data on false “onset” in individuals who do not carry the HD mutation. The situation is quite different for continuous measures of motor and cognitive status which are routinely observed in both HDGECs and healthy controls making controlling for normal aging possible and, arguably, necessary.
The full promise of exposure-response models in HD, however, cannot be realized without
addressing the role of somatic expansion in HD pathology [37–39]. Under somatic expansion, CAG length becomes a time varying quantity
CAG(t). By analogy with Equation
7, we can define a CAP score onset model that takes somatic expansion into account.
Such a model can be formulated a toxicity onset model with
Some may be concerned by our treatment of CAP score as a modified survival time. One referee has pointed out that this practice is analogous to the way quality adjusted survival time (QAST) is handled in the oncology literature [42, 43] where it is acknowledged that the non-parametric analysis of such modified survival time variables requires special treatment to avoid the effects of induced informative censoring. While we acknowledge some similarities between QAST and CAP, we feel that the cases differ in some important respects. First, while QAST includes information gathered from each subject that is collected separately from time, the modifications of time implied by CAP includes only information on CAG length which is fixed from conception for each individual and is related to disease progression as a cause rather than an effect. Indeed, in the toxicity onset formulation (with and without somatic expansion), both TOX and AUC are parametric functions of time. In addition, when differentiation of F in Equation 4 is carried out, it is done with respect to T not AUC leading to models in which censoring is defined in terms of time not CAP. As a second point, CAP lends itself more than QAST to a grouping strategy. In particular, we performed analyses separately for each of the 17 CAG lengths (40–56) as we report in section 6 of the Supplementary Material. These analyses are relevant in that they argue not only for the correctness of the logistic functional form of the survival function but also for the proposition that this functional form applies regardless of CAG length. What is more, while we believe that the analysis of Figs. 6 through 8 are valid, there is no doubt that the analyses based on the above grouping strategy are valid, as the re-scaling factor for time which they employ reduces to a constant in each analysis. In sum, for the reasons given above, we argue that the analytic procedures of [42] and [43] are not needed for the non-parametric analysis of CAP or (arguably) CAP se .
Some may also be uncomfortable with taking L = 30 as a lower limit for CAG induced toxicity. As has already been mentioned CAG = 36 is the lowest value for which definitive diagnosis of HD has been made. That said some authors have suggested that some symptoms, similar to those of HD, have been observed for individuals with CAG lengths in the intermediate range of 27–35 repeats [44, 45]. It is interesting that the psychiatric symptoms so observed appear to be related to the psychiatric onsets which we found to follow a different distributional form compared to the more traditional motor symptoms. We note that our study differed from [44] in that cognitive onset also followed an altered pattern. It is also, of course, possible that some toxicity might be occurring in some individuals without producing any overt signs or symptoms.
Finally, we have observed from our analyses of both AO and continuous outcome measures, and
from the forms of Equations 1 and 15
that Ignoring the
effect of normal aging, when it is present, is likely to decrease the estimated value
of L Even modestly
positive values of λ in Equation 15
will tend to increase the associated estimate of
L. The upshot of the
above considerations is that, at this time, L = 30 should be treated
as a convenient, conventional, standardizing value and not a firmly established,
physiologically-based estimate.
Footnotes
ACKNOWLEDGMENTS
Data used in this work were generously provided by the participants in the Enroll-HD study
and made available by CHDI Foundation, Inc. Enroll-HD is a clinical research platform and
longitudinal observational study for Huntington’s disease families intended to accelerate
progress towards therapeutics; it is sponsored by CHDI Foundation, a nonprofit biomedical
research organization exclusively dedicated to collaboratively developing therapeutics for
HD. Enroll-HD would not be possible without the vital contribution of the research
participants and their families. The individuals who contributed to the collection of the
Enroll-HD data are also gratefully acknowledged; see ![]()
CONFLICT OF INTEREST
John H. Warner, Jennifer Ware, and Cristina Sampaio are employed by CHDI Management as advisors to CHDI Foundation, as was Amrita Mohan during this analysis. Jeffrey D. Long, James A. Mills, and Douglas R. Langbehn receive research funding from CHDI Foundation. In addition, Dr. Langbehn reports personal consulting fees and non-financial support from Voyager Therapeutics, personal consulting fees from Novartis, personal consulting fees from uniQure, personal consulting fees from Takeda, and personal consulting fees from AskBio, all outside the submitted work. Dr. Long is a paid committee member for F. Hoffmann-La Roche Ltd and uniQure biopharma B.V., and he is a paid consultant for PTC Therapeutics Inc, Remix Therapeutics Inc, Spark Therapeutics Inc, Triplet Therapeutics Inc, and Wave Life Sciences USA Inc. James Mills is a paid consultant for PTC Therapeutics Inc. and Triplet Therapeutics Inc.
