Abstract
Objective:
Rett syndrome (RTT) is a severe neurodevelopmental disorder affecting predominantly females and associated with variants in the MECP2 gene. Recent success in clinical trials have resulted in an expanded use of the Rett Syndrome Behaviour Questionnaire (RSBQ) for clinical and research purposes. Implementation of the RSBQ as a global clinical severity scale has raised concerns about its construct validity considering its content, structure, and psychometric features. To further understand RSBQ data, we analyzed RSBQ scores available in the literature with a focus on variability and influencing factors.
Methods:
We identified publications reporting RSBQ total and/or subscale scores and summarized relevant study information, such as type of investigation, administration method, and descriptive data. We then analyzed means and standard deviations, calculating variance-to-mean ratios (VMR), as a measure of variability, when raw score descriptive statistics were available. Where appropriate, we compared means and VMRs by Welch t-tests.
Results:
Of the 14 publications identified, raw total scores from 5 observational studies and 4 clinical trials (baseline) were available. Raw subscale scores from four of the five observational studies were also available. We found a wide but comparable range of mean total scores for observational studies and clinical trials. However, VMRs were significantly higher in observational studies. Subscale scores showed either high (i.e., General Mood, Breathing Problems) or low (e.g., Hand Behaviours, Body Rocking and Expressionless Face) variability. Available data demonstrated greater variability in pediatric than adult groups and less variability when using interviews or electronic RSBQ administration compared with paper forms. Total score changes over time did not affect variability. Although certain studies offered insight into the relationship between the RSBQ and other measures, overall, data were insufficient for characterizing how RSBQ variability relates to other factors.
Conclusions:
Our findings on score variability support the need for more comprehensive reporting of RSBQ data, cohort characterization, and methodology; and the deployment of standardized RSBQ administration methods, such as advanced data capture systems. There is potential for use of subscales as outcome measures, subject to further psychometric validation studies, including prospective investigations testing the stability of RSBQ scores and influencing factors. Further examining the relationship between RSBQ scores and other instruments will aid in its interpretation as a clinical outcome measure.
Introduction
Rett syndrome (RTT) is a severe neurodevelopmental disorder affecting approximately 1 in 9000 liveborn females, constituting the second most common cause of severe intellectual disability in females (Leonard et al., 1997). Over 95% of individuals with RTT have loss-of-function variants in the X-linked methyl-CpG-binding protein (MECP2) gene (Amir et al., 1999), which encodes a protein (MeCP2) involved in synaptic development and maintenance (Gemelli et al., 2006; Kaufmann et al., 2005). Although some males with MECP2 variants can present with RTT, the incidence in this group is unknown since their phenotype is still being characterized (Neul et al., 2019). Due to the lack of universal presence of MECP2 variants, the diagnosis of RTT continues to be clinical, based on regression of hand and spoken language functions, development of hand stereotypies, and impairment in ambulation (i.e., core features) (Neul et al., 2010). Additional neurological, behavioral, and systemic (e.g., gastrointestinal) manifestations, some of them considered supportive diagnostic criteria for atypical RTT (Neul et al., 2010; Fu et al., 2020a), highlight the complex neurological and systemic involvement in RTT.
The last two decades have seen remarkable progress in the treatment of RTT. In addition to management guidelines (e.g., Downs et al., 2009) derived from increasing knowledge on the disorder’s natural history (Fu et al., 2020a, 2020b), the use of animal models of MeCP2 deficit have been essential for the identification and testing of a broad range of potential therapeutic targets (Katz et al., 2016; Kaufmann et al., 2016; Palmieri et al., 2023; Panayotis et al., 2023). It has led to the first drug approved for the treatment of RTT symptoms (i.e., trofinetide, a synthetic analogue of glypromate, the N-terminal peptide of insulin-like growth factor-1 [IGF-1]) (Keam, 2023; Parent et al., 2023) and to the development of gene therapies (Palmieri et al., 2023; Panayotis et al., 2023). The Rett Syndrome Behaviour Questionnaire (RSBQ) (Mount et al., 2002) has played a key role as a severity endpoint in most of these clinical trials (Khwaja et al., 2014; O’Leary et al., 2018; Glaze et al., 2019; Neul et al., 2023; Percy et al., 2024a, 2024b) due to its relatively broad coverage of RTT’s clinical manifestations (Percy et al., 2023) and the lack of alternative measures.
While they are not considered part of the main diagnostic criteria, a wide range of behavioral problems have been reported since the initial descriptions of RTT (Hagberg et al., 1983). Among them, social indifference and withdrawal, anxiety-like symptoms, mood instability, disruptive behavior, and repetitive and perseverative behaviors (Mount et al., 2003; Robertson et al., 2006; Barnes et al., 2015; Buchanan et al., 2019, 2022). Based on the notion that these abnormalities are common and distinctive to RTT, Mount and colleagues designed the RSBQ as a caregiver questionnaire for differentiating behaviors and behavior-related symptoms in children with RTT from those with other severe intellectual disability (Mount et al., 2002). At a time when genetic testing was largely unavailable to support the diagnosis of RTT, characterizing the behavioral phenotype of the disorder was seen as additional evidence of diagnostic specificity (Percy et al., 2023). Over the years, this original aim has been expanded to further characterizing atypical behaviors in RTT (e.g., social impairment, anxiety) (Kaufmann et al., 2012; Barnes et al., 2015), determining the influence of factors such as type of MECP2 variants on challenging behaviors (Robertson et al., 2006) or examining the relationship between abnormal behaviors and other features of the disorder (e.g., age, overall clinical severity, sleep) (Barnes et al., 2015; Oberman et al., 2023; Downs et al., 2024a; Zhang et al., 2024). Although the content validity of the RSBQ has been somewhat supported by its long-term application in research (Percy et al., 2023) and a recent qualitative study evaluating caregiver perspectives about clinical changes detected by the instrument (McGraw et al., 2023), and its sensitivity has been demonstrated by its successful use as endpoint in drug trials (Abbas et al., 2024), concerns about its validity as an outcome measure remain.
Further to the original validation study (Mount et al., 2002), new psychometric studies have described different characteristics of the RSBQ, including internal consistency, factor structure, and intra-rater reliability. These analyses demonstrated strong psychometric properties for some features, such as total scores and General Mood subscale scores (Oberman et al., 2023). However, due to its narrow 0–2 range, several items show either floor or ceiling effects (Hou et al., 2020). Further, factor analyses of a large pediatric sample (i.e., 323 children) (Oberman et al., 2023) did not replicate the subscale structure reported in the original publication (Mount et al., 2002). And intrarater reliability and score stability over time have demonstrated suboptimal parameters (Hou et al., 2020).
In addition to the less favorable findings from the psychometric studies mentioned above, the expanded implementation of the RSBQ meant that it has been used in RTT populations outside the age range of the initial pediatric validation sample. Our recent psychometric study, which included a large sample of adults (i.e., 309), showed differences in score profiles and factor structure between children and adults (Oberman et al., 2023). The analyses also demonstrated differences in subscale score variability and other parameters that, in conjunction with the dissimilar number of the eight subscale items, can affect accuracy in the measurement of RTT features and their changes. For example, the three subscales with larger numbers of items (General Mood, Hand Behaviours, and Body Rocking and Expressionless Face) represent, in combination, 21 items or ∼47% of the 45 items’ total score. As most of the items in these subscales pertain to externalizing behaviors (e.g., six out of eight in General Mood) and hand and body stereotypies, these features have a disproportionate weight in RSBQ total scores.
The challenges of employing a nonweighted scale could be exacerbated by a number of other difficulties encountered with RSBQ administration. For example, the lack of clarity regarding the subscale allocation of one item (#44, “Appears isolated”) in the original study, or the inclusion of an item representing a positive behavior that needs to be reverse scored (#31, “Uses eye gaze to convey feelings, needs and wishes”). Also, gross motor items (#23, “Although can stand independently tends to lean on objects or people” and #39, “Walks with stiff legs”) were conceived of as motor behaviors (Richard Hastings, personal communication) but are often interpreted as pertaining to motor function. Thus, individuals who walk well or stand without leaning will score the same as those who cannot walk or stand on these items. Lack of detailed guidelines for instrument administration (e.g., paper vs. electronic form, specific instructions for raters) (Lane et al., 2006; Gwaltney et al., 2008; Jibb et al., 2020) and less than lay friendly wording (e.g., #24, “Restricted repertoire of hand movements”) (based on author’s experience) could also have substantial effects on scores as suggested by our and others’ investigations (Oberman et al., 2023).
Despite the shortcomings mentioned in the preceding sections, the RSBQ will continue to play an important role in the assessment of clinical outcomes in RTT (Percy et al., 2023; Abbas et al., 2024). The present study aimed to expand the psychometric profiling of the RSBQ, with the goal of evaluating mean scores, their variability, and influences of potential contributing factors reported in the literature across different datasets.
Methods
Instrument
The RSBQ is a caregiver questionnaire covering a range of clinical manifestations representing or associated with behaviors in RTT (Mount et al., 2002; Percy et al., 2023). The RSBQ has 45 items, with 38 of them grouped into 8 subscales (number of items): General Mood (n = 8), Breathing Problems (n = 5), Hand Behaviours (n = 6), Repetitive Face Movements (n = 4), Body Rocking and Expressionless Face (n = 6), Night-Time Behaviours (n = 3), Fear/Anxiety (n = 4), and Walking/Standing (n = 2). Items are scored using a Likert-like scale as follows: 0 (item is not true), 1 (item is somewhat or sometimes true), or 2 (item is very true or often true). All but one item represent atypical behaviors; thus, higher scores reflect greater severity. Item #31, “eye-gaze as a means of communication,” in the Body Rocking and Expressionless Face subscale represents a skill; therefore, it is reversed scored. In the original study, item #44, “appears isolated,” loaded similarly to the General Mood and to the Body Rocking and Expressionless Face factors, ultimately not being included in either one (Mount et al., 2002). The questionnaire form contains a brief general guidance statement: “…We would like you to think just about the characteristics she shows now. For each characteristic, you have to think whether or not it accurately describes her. If the characteristic does describe her you are asked to rate it…” This is followed by the aforementioned three score options. No guidance on specific items is included.
Recently, normative data and a revised factor structure, separate for children and adults with RTT, have been published (Oberman et al., 2023) and made available through the website of the International Rett Syndrome Foundation (https://www.rettsyndrome.org).
Data
Publications reporting RSBQ total and/or subscale scores and other variables related to scoring (e.g., method of administration) were identified through a search of the Pubmed, Scopus, and Web of Science databases (from inception to October 23, 2024, using “Rett Syndrome Behaviour Questionnaire” or “RSBQ” as queries and following a selection of the 27-item checklist of the PRISMA 2020 guidelines (Page et al., 2021) adapted to the purposes of the study). The following data were extracted from the publications by two authors (W.E.K. and K.V.B.): study type (observational study or clinical trial), timing of data collection (cross-sectional or serial/longitudinal), country of origin, number of subjects, age range of participants (grouped as pediatric or adult), clinical diagnosis (i.e., classical or variant), MECP2 variant status, administration method (paper, interview, or electronic, including off-site and in-person), reported score type (total and/or subscales), RSBQ total and subscale means and standard deviations (SDs), presence of psychometric analyses, and analyses of relationships (correlations) with other measures. While all the former variables were used for data interpretation, analyses of RSBQ score variability were only performed from studies reporting descriptive statistics of raw values. No scores were calculated from differences or regression models.
Analyses
Our evaluation of RSBQ score variability is based on 9 publications, from the 14 originally identified as meeting our inclusion criteria, as these were the only publications reporting descriptive statistics of raw scores. Coefficients of dispersion, also termed variance-to-mean ratios (VMRs), were calculated based on means and standard deviations. Scores from studies reporting different age groups, countries or origin, or completion method were tabulated separately. The main parameters for comparison between studies or datasets were mean scores and their variability estimated as VMRs. A VMR >1.0 corresponds to less uniform, more dispersed value distribution; conversely, a VMR <1.0 is interpreted as a more uniform, less dispersed value distribution. Thus, the higher the VMR, the greater the score variability. Comparisons of total and subscale mean scores and VMRs were performed using the Welch t-test (two-tailed) for samples with unequal variance. Analyses were conducted using the Calculator.net and GraphPad online calculators.
Results
Overview of the literature
We identified 14 publications meeting the inclusion criteria. Of these, eight were observational studies, including the original publication on the RSBQ (Mount et al., 2002), with only one reporting longitudinal data (Hou et al., 2020). There were six publications describing clinical trials. There were two papers from the mecasermin (IGF-1) program, corresponding to one phase 1 open label study and one phase 2 randomized controlled trial. There were four publications from the trofinetide program, representing one phase 2 and one phase 3 randomized controlled trials, and two open label follow-up studies to the pivotal trial. While most of the studies included other instruments, only three reported analyses of the relationship between RSBQ total and/or subscale scores and measures of abnormal behavior, sleep or functioning (Kaufmann et al., 2012; Barnes et al., 2015; Downs et al., 2024a). An overview of the data reported in these publications is presented grouped by age; pediatric in Table 1 and adult in Table 2.
Summary of Published Studies Reporting Pediatric Rett Syndrome Behaviour Questionnaire Scores
O, observational; CT, clinical trial; C, cross-sectional; L, longitudinal.
sample size of same cohort at three timepoints over 1–2 months.
method of electronic administration varied by cohort (off-site vs. in-person).
mean, SD or VMR of combined groups.
multiple countries.
cohort considered mostly pediatric based on demographics.
MECP2 variant.
included in Welch t-test comparing means of interventional versus observational studies.
Paper/Interview/Electronic/Online.
include investigations of content validity, reliability, and/or sensitivity.
partial cohort overlap.
all participants are from the same initial cohort (LAVENDER).
p = 0.730; ** p = 0.002.
--, not reported. ABC, Aberrant Behavior Checklist; ADAMS, Anxiety, Depression and Mood Scale; CGI, Clinical Global Impression; CHQ, Child Health Questionnaire - Parent Form 50; CSBS-DP, Communication and Symbolic Behavior Scales Developmental Profile; KSS, Kerr Severity Scale; MBA, motor-behavioral assessment; PGI, Parental Global Impressions; PSG, polysomnography; PTSVAS (VAS), parent-targeted visual analog scale (visual analog scale); RSSS, Rett Syndrome Severity Score; RTT-COMC, RTT-clinician rating of ability to communicate choices; RTT-CSS (CSS), RTT Clinical Severity Score; RTT-DSC, RTT-Clinician Domain Specific Concerns; SDSC, Sleep Disturbance Scale for Children; SSI, screen for social interaction; VABS, Vineland Adaptive Behaviour Scales (II or III); VMR, variance-to-mean ratio.
Summary of Published Studies Reporting Adult RSBQ Scores
O, observational; CT, clinical trial; C, cross-sectional; L, longitudinal.
multiple countries.
partial overlap of participants.
MECP2 variant.
Paper/Interview/Electronic/Online.
include formal investigations of content validity, reliability, and/or sensitivity.
‘--, not reported. SDSC, Sleep Disturbance Scale for Children.
The observational studies were variable in scope, methodology, and reported data. Some included total and subscale scores (Mount et al., 2002; Kaufmann et al., 2012; Oberman et al., 2023; Downs et al., 2024); others only total scores (Hou et al., 2020) or subscale scores (Robertson et al., 2006). One included only percentiles, without means and standard deviations (Barnes et al., 2015), which prevented the calculation of VMRs. One study grouped RSBQ items in ad hoc novel factors (Zhang et al., 2024). Four publications presented psychometric analyses evaluating structural features, content validity, reliability, and/or sensitivity, three on cross-sectional data (Mount et al., 2002; Oberman et al., 2023; Downs et al., 2024) and one focused on score changes over time (Hou et al., 2020). Two additional studies evaluated psychometric properties of subscales (Kaufmann et al., 2012; Barnes et al., 2015). Four observational investigations examined the influence of age (Oberman et al., 2023; Downs et al., 2024), MECP2 variant status (Robertson et al., 2006; Barnes et al., 2015; Oberman et al., 2023; Downs et al., 2024), sleep (Downs et al., 2024) and/or clinical severity (Barnes et al., 2015; Oberman et al., 2023; Downs et al., 2024) on scores. Four studies evaluated the relationship between scores on the RSBQ and other clinical outcomes (Kaufmann et al., 2012; Barnes et al., 2015; Downs et al., 2024; Zhang et al., 2024); however, the two that reported scores compatible with VMR calculations examined different independent variables (Kaufmann et al., 2012; Downs et al., 2024). One of the observational investigations was the largest study to date, including pediatric and adult data on more than 600 individuals and reporting multiple psychometric features (Oberman et al., 2023). There was a partial overlap between the sample in the Kaufmann et al. (2012) and one of the datasets in the Oberman et al. (2023) study and between two datasets in the Downs et al. (2024) and Oberman et al. (2023) studies.
The two mecasermin trial publications (Khwaja et al., 2014; O’Leary et al., 2018) reported only scores on the Fear/Anxiety subscale distributed by percentiles; therefore, its statistical parameters did not allow calculation of VMRs. All four trofinetide studies (Glaze et al., 2019; Neul et al., 2023; Percy et al., 2024a, 2024b) reported total but not raw subscale scores and only baseline values. Changes over time, involving several time points, were presented as least squares (LS) means or means of score changes (and their SEs). Therefore, only baseline data from the four trofinetide trials was available for VMR calculations. While the trofinetide studies corresponded to four different publications, there was overlap between the subject samples in the two open label studies and the phase 3 trial with the former representing subsets of the latter. Consequently, means and VMRs between three of the four datasets were interdependent.
In terms of scoring-related factors, a few studies, typically the most recent ones, included explicit information about administration method (Tables 1 and 2) and whether instructions beyond the general statement included in the RSBQ forms (see Methods) were employed.
Total and subscale mean scores and variability
Total scores from five observational studies and four clinical trials, involving exclusively (Mount et al., 2002; Kaufmann et al., 2012; Glaze et al., 2019; Oberman et al., 2023; Downs et al., 2024) or predominantly (Hou et al., 2020; Neul et al., 2023; Percy et al., 2024a, 2024b) pediatric datasets, were available for comparison. As mentioned above, there was some overlap in subject samples, in particular in the clinical trials, due to the nature of open-label extensions. Mean scores at baseline ranged from 34.10 (Hou et al., 2020) to 45.20 (Mount et al., 2002) and distributed similarly in observational studies and clinical trials. Accordingly, a Welch t-test showed no differences in mean scores between observational studies and clinical trials (39.50 vs. 40.83, respectively, Welch p = 0.730; with removed overlapping datasets, 40.05 vs. 42.80, respectively, Welch p = 0.429). In contrast, VMRs, which ranged from 3.14 to 6.11, were higher in observational studies than in trials. These observational study versus clinical trial differences were significant (mean VMRs 5.52 and 3.68, respectively, Welch p = 0.002; with removed overlapping datasets, 5.24 and 3.23, respectively, Welch p = 0.028).
RSBQ subscale scores were available for four of the five pediatric observational studies (Mount et al., 2002; Kaufmann et al., 2012; Oberman et al., 2023; Downs et al., 2024) (Table 3). Considering the different number of items in each subscale, subscale score variability, but not means, were analyzed. Two subscale VMR patterns were identified; subscales with higher VMRs (defined empirically as >2.0), namely General Mood and Breathing Problems, and subscales with lower VMRs (slightly above the homogeneity 1.0 cut-off, as <1.5), which included the remaining subscales, apart from Night-Time Behaviours. The Hand Behaviours and Body Rocking and Expressionless Face subscales showed the lowest VMRs, with values ≤1.10 in three of the four studies. A Welch t-test comparing higher VMR and lower VMR subscale mean VMRs (2.32 vs. 1.20) was significant (p < 0.0001). Analyses removing overlapping datasets demonstrated similar findings (higher VMR vs. lower VMR subscales mean 2.31 vs. 1.17, Welch p = 0.005). VMRs were higher for the total compared with subscale scores, in part, reflecting the larger number of items. However, number of subscale items and differences in subscale means among studies were not related to VMRs (Table 3). For instance, the two subscales with the lowest VMRs are among those with the largest number of items.
RSBQ Subscale Scores
High VMR (HVMR): > 2.0 (red) and low VMR (LVMR): < 1.5 (green).
Item # 44 excluded.
partial overlap of pediatric participants.
partial overlap of pediatric and adult participants. RSBQ, Rett Syndrome Behaviour Questionnaire; VMR, variance-to-mean ratio.
Overlapping datasets
As mentioned, analyses of RSBQ total and subscale scores excluding studies with partially overlapping datasets from observational studies and clinical trials (Kaufmann et al., 2012; Downs et al., 2024; Percy et al., 2024a, 2024b) demonstrated the same VMR profiles but decreased effect sizes.
Influence of age, RTT clinical type, and genotype on RSBQ scores
Our psychometric evaluations reported pediatric and adult groups (Oberman et al., 2023; Downs et al., 2024a). The former paper (Oberman et al., 2023) differentiated sites (Tables 1 and 2). Both studies, which had partially overlapped samples, reported higher mean scores in pediatric than adult groups. The publication by Downs and colleagues (2024) further differentiated pediatric and adult subgroups with higher mean scores for the youngest in each subgroup. In the present study, we expanded these analyses by characterizing score variability through VMR calculations. Total and combined subscale (i.e., combining the 38 items in the 8 subscales) scores were also more variable (higher VMRs) in pediatric than adult groups. Comparisons between the six pediatric datasets in Oberman et al. (2023) demonstrated highly variable mean scores ranging from 35.09 to 53.82, with the highest corresponding to individuals prescreened for a clinical trial. Similar differences were found among the five adult datasets in the same study, with mean scores between 30.86 and 45.81 and highest values corresponding to the prescreening subcohort. The pediatric and adult prescreening subcohorts also exhibited the lowest VMRs that, in addition to another pediatric and one other adult dataset, corresponded to VMRs in the range of those reported in the trofinetide studies (i.e., 3.14–4.42). Comparing the five studies of participants with classical RTT against the three studies that included subjects with either classical or atypical RTT (termed “mixed” in Tables 1 and 2) demonstrated significantly lower VMRs in those with classical RTT but no mean score differences (mean 39.48 vs. 40.87, p = 0.623; VMR 3.92 vs. 5.65, p = 0.020). Nonetheless, four of the five studies with classical RTT participants were clinical trials while all three with mixed RTT samples were observational. Only the recently published study by Downs and colleagues (2024) examined in detail the relationship between genotype and RSBQ scores (pediatric and adult). The publication reports that in the overall sample, within a relatively narrow range of total scores, MECP2 variants involving the nuclear localization signal, known to be associated with a more severe phenotype (Cuddapah et al., 2014), showed the highest total scores. However, this was not the case with VMRs, which were consistently high even when comparing variants associated with high and low severity, such as p.Arg255* (7.39) and p.Arg133Cys (7.24), respectively. The one exception was p.Arg294* with a VMR of 3.86, which was well below the 5.23–7.39 range for all other genotypes. Additional analyses by age groups demonstrated that in children with RTT, despite a wider range of mean scores and less differentiation between MECP2 variants, VMRs were consistently within the 5.03–7.79 range. Adult subjects showed lower mean scores and more variable VMRs (2.24–8.93), with three genotypes including p.Arg294* below 4.0 (Table 4).
Variability of RSBQ Total Scores by MECP2 Variant (Based on Downs et al. 2024)
VMRs < 4.0 in bold; representing genetic variants with the lowest reported VMRs. VMR, variance-to-mean ratio; RSBQ, Rett Syndrome Behaviour Questionnaire.
Influence of administration method on RSBQ scores
Tables 1 and 2 depict differences in RSBQ administration method across studies and by site and/or timepoint. While no clear relationship between completion method and mean scores can be inferred, studies using structured data collection via interviews or electronic administration, as those used in the clinical trials, tended to have lower VMRs compared with those using paper administration. A comparison of the two types of electronic administration, in-person versus off-site, did not show any differences in mean total score or VMR.
RSBQ score variability over time
Because only one study reported scores over time (Hou et al., 2020), and the authors assessed score stability using a variety of statistical approaches discussed in detail in the publication, we restricted our analyses to level of variability (i.e., VMR) among the three reported time points spanning a 1–2-month period. The VMRs were 4.88, 5.45, and 4.77, respectively, which demonstrated little effect of data collection time on variability and VMRs comparable with the other observational studies (Table 1).
Discussion
The RSBQ is the most widely used behavioral instrument and one of the most commonly implemented clinical outcome assessments in RTT. The expanded application of the RSBQ, purportedly as a global clinical severity scale, has raised concerns about its adequacy for this purpose, considering its content, structure, and psychometric features. To address the impact that certain factors, such as study type and administration method, may have on RSBQ scores, we conducted an analysis of total and subscale scores available in the literature, focusing on their variability (VMR) and their relationships with relevant study parameters. Of the 14 publications meeting our inclusion criteria, raw total scores were reported from 5 observational studies and 4 clinical trials (baseline scores). Raw subscale scores from four of the five observational studies were also available. We found a wide range of mean total scores, which were comparable between observational studies and clinical trials. However, VMRs were significantly higher in observational studies than in the trials. General Mood and Breathing Problems showed higher variability (i.e., higher VMRs), while Hand Behaviours and Body Rocking and Expressionless Face were examples of subscales with low VMRs. Available data also demonstrated greater variability in pediatric than in adult groups and less variability when using interviews or electronic administration compared with paper administration. Variability of RSBQ scores over a period of up to 2 months appeared stable.
Total RSBQ scores variability
Total scores for the RSBQ reported in the literature are variable. Although multiple factors (e.g., clinical severity) are also likely contributory, type of investigation seems to play a role, as clinical trials showed more homogeneous scores at baseline. Interestingly, mean scores per se were comparable among observational studies and clinical trials. On the basis of the two large-scale psychometric analyses (Oberman et al., 2023; Downs et al., 2024a), it can be concluded that younger individuals not only have higher mean RSBQ scores but their scores are also more variable than in adults. Based on the study of Downs and colleagues (2024) and the analyses presented here, MECP2 variants appear to have a modest role in determining mean scores and their variability.
Administration method and research type also appear to have influenced scores as results from clinical trials were associated with lower variability. Perhaps because trials tend to deploy the most stringent processes for protecting data quality and integrity due to a number of reasons, including increased regulatory scrutiny. For example, stricter adherence to study visit scheduling (related to patient safety monitoring), verification that the same caregiver completes the form every time, more advanced electronic data capture systems preventing the respondent from advancing if an item is unanswered, more frequent source data verification assessments, etc. These findings are further supported by the literature comparing paper with electronic administration methods that have concluded that the latter are equivalent or superior in terms of data completeness, ease of use, and efficiency (Lane et al., 2006; Gwaltney et al., 2008; Jibb et al., 2020). The results from our analyses emphasize the importance of implementing RSBQ administration protocols for reducing score dispersion (Downs et al., 2024b), particularly in studies involving children with RTT, considering that baseline variability is one of the factors decreasing the ability to detect changes after an intervention. Our data on RSBQ total score variability also indicate that, most likely, because of their enrollment criteria (e.g., clinical severity), clinical trials tend to recruit more homogeneous groups of individuals who may not be completely representative of the population affected by RTT. Use of the RSBQ for monitoring clinical outcomes in real world settings needs to take into consideration the differences between observational studies and clinical trials reported here.
Variability of RSBQ subscale scores
While the RSBQ includes items covering multiple features of the RTT phenotype (Mount et al., 2002; Percy et al., 2023), its application as a global severity measure is potentially problematic. The RSBQ does not cover several core features of RTT that represent top caregiver concerns because of their impact on quality of life; for example, impairments in communication and ambulation (Neul et al., 2023; Kaufmann et al., 2024). Furthermore, most RSBQ items are grouped into subscales representing a relatively well-defined set of clinical manifestations but with a highly variable number of items. Therefore, total RSBQ scores tend to reflect some symptoms more than others. The Aberrant Behavior Checklist (ABC) is an example of another behavioral measure with subscales containing different number of items (Aman and Singh, 2017; Sansone et al., 2012). The widespread application of the ABC, which includes its use as primary endpoint in trials that led to regulatory approval of two drugs for autism spectrum disorder (McCracken et al., 2002; Owen et al., 2009), is based on subscale and not total scoring (Eckert et al., 2019; Berry-Kravis et al., 2021). If RSBQ subscale scores would be employed as endpoints in RTT trials, as in the IGF-1 program (Khwaja et al., 2014; O’Leary et al., 2018), information about subscale score variability would be needed. The analyses reported here showed that, independent of number of items, some subscales tend to generate more homogeneous baseline scores (e.g., Hand Behaviours) making them more suitable as outcome measures.
Limitations and future directions
Paucity of published data methodology precluded arriving at solid conclusions regarding the effect of administration method on RSBQ scores and their variability. Stability and variability of RSBQ scores over time could not be assessed in our analyses because only one publication included longitudinal data (Hou et al., 2020). We could not determine the impact of RTT clinical type on RSBQ scores because clinical trials tend to recruit only individuals with classical RTT, while observational studies have recruited individuals with classical and atypical presentation. Although partially examined recently (Downs et al., 2024), the influence of other participant attributes, such as specific phenotypic parameters, on score distributions could not be evaluated with the methodology applied in this study but deserves further examination. Although important factors, analysis on the impact of content validity and RSBQ structural features, such as narrow rating options (0, 1, 2) limiting overall score range, were beyond the scope of this investigation. The practice of reporting changes or differences in scores in the context of regression models (e.g., LS means and their SEs), rather than raw scores, along with limited genetic and phenotypic information, limited our analyses of RSBQ scores from published clinical trials to baseline data. The virtual lack of analyses of the relationship between RSBQ scores and other commonly used measures in RTT also make RSBQ data interpretation difficult. The small number of published studies, the partial overlapping of some of the datasets, and the evaluation of studies with variable sample sizes, represented analytical shortcomings, although the exclusion of overlapping studies in their entirety did not affect the reported findings. Therefore, our results should be taken with caution and considered preliminary in nature. Despite these limitations, we observed greater variability in RSBQ scores in observational studies versus clinical trials and in pediatric versus adult groups. These findings are aligned with our clinical experience and prior studies, showing high variability of clinical manifestations at earlier ages in RTT (Tarquinio et al., 2017, 2018; Buchanan et al., 2019; Fu et al., 2020b) as well as with the outcome of clinical trials (Glaze et al., 2019; Neul et al., 2023; Percy et al., 2024a, 2024b).
The data reported here support the development of protocols for RSBQ administration, such as expanding instructions for study personnel or standardizing how questions are presented (items displayed one at a time vs. all on one page, etc.). Moreover, the possibility of employing subscales as outcome measures should be explored. Our findings also emphasize the importance of reporting key demographic features of participants (e.g., MECP2 variant type) and data collection methodology (e.g., paper vs. electronic) in clinical trials in order to replicate results and learn from our collective experiences in a rare disease field. A better understanding of the meaning of RSBQ scores (total and subscales) and their changes is needed, as are systematic investigations of the relationship between the RSBQ and certain biomarkers (e.g., electroencephalography, electrodermal activity), as well as other measures of abnormal behaviors, functioning, quality of life and clinical severity as recently reported (Downs et al., 2024). We conclude that comprehensive prospective studies testing multiple factors potentially influencing RSBQ scores and their stability would be indispensable.
Conclusions
Our analyses of published RSBQ scores, to gain further insight into the use of this instrument as a global severity scale, demonstrated not only variable total and subscale mean scores but also different patterns of score variability. Greater variability in total RSBQ scores was found in observational studies than in clinical trials and in pediatric than adult groups. Subscale RSBQ scores showed either higher (e.g., General Mood) or lower (e.g., Hand Behaviours) variability. Altogether, our data support a standardized administration of the RSBQ and the possibility of employing subscales as outcome measures. Ultimately, prospective studies testing multiple factors potentially influencing RSBQ scores, their stability, and their relationship with other measures, are essential for its adequate use and interpretation as a measure of clinical outcomes.
Footnotes
Disclaimer
L.M.O. is supported by the NIMH Intramural Research Program (ZIAMH002955). The opinions expressed in this article are the authors’ own and do not reflect the views of the National Institutes of Health, the Department of Health and Human Services, or the United States government.
Disclosures
W.E.K. was the Chief Scientific Officer of
