Measuring Methylphenidate Response in Attention-Deficit/Hyperactvity Disorder: How Are Laboratory Classroom-Based Measures Related to Parent Ratings?

Abstract

Background:

Methylphenidate (MPH) is an efficacious and normally well-tolerated treatment for attention-deficit/hyperactivity disorder (ADHD). Although treatment effects are usually assessed using parent-rating scales, these can be supplemented by more objective methods. Here we examine the associations between ratings and one such method, assessments made across the day in the laboratory classroom.

Method:

Comparison of Methylphenidates in the Analog Classroom Setting (COMACS) was made in a large (n = 184) placebo-controlled trial comparing Equasym XL^®/Metadate CD^®, Concerta^®, and placebo (PLA) using a Laboratory School protocol. Therapeutic effects were measured using direct observation, scores on a simple math productivity task and parent ratings.

Results:

Treatment effects were observed on all measures. Laboratory measures were correlated with each other, most strongly between Swanson, Kotkin, Agler, M-Flynn and Pelham Scale (SKAMP) inattention and Permanent Product Measure of Performance (PERMP). Parental ratings were correlated with classroom measures during the main morning period (1.5–4.5 hours after dosing) and to a lesser extent in the afternoon (6.0–7.5 hours after dosing), but not, by and large, immediately after dosing or in the evening. The morning correlations seemed stronger for female than for male participants.

Discussion:

The results suggest that parental ratings and direct observations tap different aspects of MPH response and that both may be required for comprehensive assessment.

Introduction

M ethylphenidate (MPH) is an efficacious treatment for attention-deficit/hyperactivity disorder (ADHD) (Taylor and Sonuga-Barke 2008). Evidence for this has come from many well designed placebo (PLA)-controlled, randomized trials (Taylor et al. 2004). Although other assessment approaches have been tried with more or less success (Madaan et al. 2008), measurement of MPH response in such trials has typically relied on parent (and to a lesser extent) teacher ratings of symptoms (Banaschewski et al. 2006). When collected using standardized and empirically validated scales (a number of which exist) (Demaray et al. 2003; Snyder et al. 2006), these measures provide a convenient, low-cost, and ecologically valid method of assessment that has been repeatedly demonstrated to be sensitive to change following treatment (Madaan et al. 2008). When using these measures, parents are asked to give an overall summary assessment of symptom levels independent of time of day and setting (although specific questions may be asked about school) over a set period of time (e.g., the last week or month). Although the practical advantage of such an approach is obvious, there are several factors that might limit its value. First, parent ratings are known to be affected by systematic bias. They are influenced by characteristics of the parent (Simonoff et al. 1998), such as their mental health status (Briggs-Gowan et al. 1996; but see Faraone et al. 2003), and characteristics of the child (Abikoff et al. 1993), such as their gender (Hartung et al. 2006) and ethnicity (Sonuga-Barke et al. 1993). Second, as mentioned above, these measures are designed to give a summary score independent of time of day. However, treatment effects may differ in clinically important ways depending on which time of day the parent is focusing on. For instance, some parents may focus on the best part of the day when medication effects are clear whereas others may ignore these times and describe behaviors that occur either before medication has started to work or after the effects have worn off. It is also possible that time-based differences may also vary between different children or groups of children (Sonuga-Barke et al. 2007; Sonuga-Barke et al. 2008).

Measuring such time-based effects is likely to be particularly important for drugs, like the stimulant drugs used to treat ADHD, which have a rapid onset and offset of action (Swanson and Volkow 2002). For example, with MPH, which is available as either immediate release (IR) or extended release (ER) formulations (with each of the ER formulations having a different release profile), clinical benefits will wear off at different times during the day, depending on the particular formulation that is used (Wolraich and Doffing 2004). Comparison of these different formulations requires the collection of time-series data across the day. Standard rating scales, which use summary scores across time, are deficient in this regard. Furthermore, the extent to which parents can observe their children's behavior varies considerably across the day/week. When children are at school, there are clearly large parts of the typical day during which the parent can only make indirect assumptions regarding adequacy of treatment response based on feedback from the child, the teacher, and/or other observers. This may be particularly problematic in situations where the child is not treated during the evenings, weekends, and/or holidays, as is common practice (i.e., drug holidays) in some countries (Martins et al. 2004). As a consequence of these factors, parent's ratings may often be overreliant on the child's behavior in the early morning, which is typically one period of the day where drug control is unlikely to be maximal (Swanson et al. 2004).

An alternative approach to measuring treatment effects using standardized laboratory-school settings has been developed to address some of the limitations inherent in standard rating scales. Typically these protocols use trained observers who make an independent assessment of a child's behavior at the time of the first (or in the case of once-daily formulations often the only) dose of medication and then at set times after this in highly structured and standardized settings. The most commonly used outcome measure in the laboratory school context is the Swanson, Kotkin, Agler, M-Flynn, and Pelham Scale (SKAMP) (Wigal et al. 1998). This provides separate ratings of attention and deportment—constructs that were designed to map onto the symptom domains of ADHD and related disorders (Swanson et al. 1998). These SKAMP ratings are often supplemented by a measure of academic productivity. For instance, the Permanent Product Measure of Performance (PERMP) (Wigal and Wigal 2006) assesses the impact of medication on productivity during simple math problems (Swanson et al. 1998). These sorts of measures when assessed at different times across the treatment day (one commonly used protocol samples behavior and performance every 1.5 hours after the delivery of the initial dose) have the potential to provide both a more objective and more independent assessment of symptoms than traditional parent ratings, and will be unaffected by potential parental biases and freed from the restrictions relating to the times of day at which behavior can be observed, as found with parent ratings.

They also provide a way to compare the pharmacodynamics (PD) of different MPH formulations and have therefore also been particularly valuable in the development and evaluation of the various once-daily formulations of MPH, which were designed to deliver different patterns of medication across the day with the goal of providing effective and continual symptom cover for extended periods (Swanson and Volkow 2002). However, these laboratory classroom-based measures may also have their limitations. First, they are far more expensive and time consuming to implement than parent ratings and are therefore unlikely to be used in routine practice to assess drug effects, either in trials or in the clinic. Second, they may lack a certain degree of ecological validity given the artificial nature of the setting in which they are employed; even when considered against other classroom settings, the laboratory classroom setting is highly structured. In this sense, the laboratory classroom may alter the expression of the disorder one is attempting to assess. Finally, although not restricted to any particular time of day, their extremely tight focus on one setting (i.e., the laboratory school classroom) means that they do not allow other therapeutic aspects to be assessed outside of the school classroom (i.e., playtime etc.) and/or outside the school (i.e., at home).

Despite the clear differences between parent-completed rating scales and laboratory classroom measures and the obvious importance of understanding their relative roles in the assessment of medication effects, almost nothing is currently known about how these measures are related to each other. In this paper, we set out to address this question by exploiting the data from a large scale PLA-controlled trial of two long-acting MPH formulations (Equasym XL^®/Metadate CD^® [EQXL-MCD] and Concerta^® [CON]): The Comparison of Methylphenidates in the Analog Classroom Setting study (COMACS) (Swanson et al. 2004). In this trial, treatment outcomes were measured using both parent ratings (the Swanson, Nolan, and Pelham [SNAP-IV] Rating Scale) (Swanson 1992) over the 3 weeks of the trial (one on each formulation and one on PLA) and laboratory school measures (SKAMP and PERMP; Wigal et al. 1998) on a single day at the end of each week on a large number of patients. This trial was particularly suited to this purpose because: (1) it was a crossover trial with a PLA condition and a measure of treatment response as measured by the rating scales and laboratory measures that could be directly compared; (2) it included a large sample so that it was powered statistically to identify associations of relatively small effect that might be expected when comparing two very different indices of treatment response using different approaches on different days; (3) a relatively large number of girls in the study allows for comparison of associations between genders on measures of treatment response; in order to explore the issue of parental bias.

In the analysis we addressed the following questions:

What was the relationship between the two laboratory measures (SKAMP and PERMP) of MPH response? If these measures are highly correlated than it is questionable whether both are required for a proper evaluation of medication response. We predicted modest correlations between measures and expected these to be most marked for the attention element of the SKAMP given the important role played by attention in academic productivity.

To what degree are parent ratings using the SNAP-IV correlated with behavior ratings and performance output in the laboratory classroom? This question can be rephrased as what do the SKAMP and the PERMP add to the standard ratings? Given their common rating elements and their focus on behavior rather than performance (SNAP-IV by parents and SKAMP by trained observers), we expected the correlations to be strongest between SNAP-IV and SKAMP. Furthermore we expected a degree of specificity, with SNAP-IV attention scales predicting SKAMP attention scores and SNAP-IV hyperactivity/impulsivity scores predicting SKAMP deportment scores. We expected the size of the correlations to be moderate given the difference in measurement mode and time of rating.

Does the relationship between parent ratings and laboratory measures change over the day? Our prediction was that the associations between SKAMP and PERMP and SNAP-IV ratings would be stronger at the times of day when parents are most likely to be able to observe treatment response directly and on a daily basis. That is we expected the correlations to be strongest in the early morning and late on in the day.

Finally in the light of the suggestion of the existence of bias in parent ratings, we explored whether the association between more objective measures recorded in the classroom laboratory and the parent ratings was different for girls and boys.

In the COMACS study the two long-acting MPH formulations compared had distinctly different dosing profiles. EQXL-MCD is designed to have approximately an 8-hour action, whereas CON is designed to have a 12-hour action). Furthermore, with equivalent daily doses (and assuming simultaneous early morning dosing), one would expect EQXL-MCD to have a larger effect in the earlier apart of the day (given its higher IR dose element [30% vs. 22%]) and CON to have a larger effect in the evening given its larger ER dose element and longer duration of effect. Four papers have so far been published using data from the COMACS study. The primary analysis (Swanson et al. 2004) confirmed the above PD predictions for effects across the day. A secondary analysis (Sonuga-Barke et al. 2004) found that when patients were compared across dosing strata to match the IR components, the EQXL-MCD advantage in the early part of the day over CON was no longer apparent. A third paper reported sex differences, with girls showing a faster onset of action in the early day followed by a shorter duration of action (Sonuga-Barke et al. 2007) when other factors were controlled and the effects of the two formulations were combined. A fourth paper explored individual differences in the PD of the formulations using growth mixture modeling and highlighted the existence of distinctive subgroups of patients with different PD patterns (Sonuga-Barke et al. 2008).

Method

Patients

Six- to 12-year-old children receiving treatment with doses of MPH between 10 and 60 mg/day (5–20 mg/administration, one to three times a day) were recruited for the multisite COMACS trial. The subjects were screened and enrolled by the principal investigator at each study site. Children were deemed otherwise healthy on the basis of an extensive medical history and physical examination; diagnosis of ADHD was confirmed by a clinical research interview carried out by a trained interviewer. Children were excluded if they had an intelligence quotient (IQ) below 80 or were unable to follow or understand study instructions. Other exclusion criteria included the presence of another severe mental disorder (e.g., psychosis, bipolar illness, pervasive developmental disorder, severe obsessive compulsive disorder, or severe depressive disorder), extreme aggressive behavior or destruction of property, or marked anxiety, tension, or agitation. Co-morbid psychiatric diagnoses were established at the screening visit by reference to Diagnostic and Statistical Manual of Mental Disorders, 4^th edition (DSM-IV) criteria (American Psychiatric Association 1994) (see Swanson et al. 2004 for details). A total of 184 patients (48 female) entered the trial. Eighty-two per cent of the patients met criteria for ADHD–combined type, 15% met criteria for inattentive type, and the remaining 3% met criteria for hyperactive/impulsive type. Approximately 25% of the children had a co-morbid condition (e.g., anxiety and oppositional defiant disorder). At prescreening, 54% of patients were on the three times a day (t.i.d.) equivalent formulation of CON and 23% on the twice a day (b.i.d.) equivalent formulation of MCD–EQXL. The remaining children were taking IR formulations. Children provided signed assent, and their legal guardians signed an institutional review board-approved consent form to participate in the study.

Design

COMACS was a 10-site, double-blind, placebo-controlled, crossover study comparing three treatment conditions: EQXL-MCD, CON, and PLA. Dose-level assignment was made according to the prestudy, clinically titrated daily dosing regimen for MPH and remained at that level for the study duration. Children treated with low doses (≤20 mg/day) of MPH were randomized to receive a daily dose of 20 mg of EQXL-MCD, 18 mg of CON, or PLA; those treated with medium doses (>20–40 mg/day) were randomized to receive 40 mg of EQXL-MCD, 36 mg of CON, or PLA; and children treated with high doses (>40 mg/day) were randomized to receive 60 mg of EQXL-MCD, 54 mg of CON, or PLA. Each of the three treatments was administered for 7 days (in the assigned sequence) without an intervening washout period, and the PD assessment was conducted on the seventh day of each treatment.

Assessment

Laboratory classroom assessment

This took place in the laboratory school on days 7, 14, and 21 (for a detailed description of the laboratory classroom facility, see Wigal and Wigal 2006). Two trained observers assessed patients during each classroom session on the SKAMP scale on the basis of a 1.5-hour cycle of activities, with separate assessments of attention and deportment being made at 0, 1.5, 3.0, 4.5, 6.0, and 7.5 hours and then at 12 hours after drug administration. The laboratory classroom scores (SKAMP) were completed by dedicated raters at each study site. These raters were required to complete prestudy standardization training administered by University of California–Irvine (UCI) staff; however, the interrater reliability was not formally assessed. The SKAMP has six deportment items (e.g., staying seated, interacting with others) and seven attention items (e.g., getting started, sticking with tasks). In addition, during each classroom session, a written 10-minute simple math test was administered to provide an objective measure from its PERMP. In this task, the difficulty of the task is adjusted to remain at approximately 95% correct, so that it indexes math productivity rather than math ability (Swanson et al. 1998).

Parent rating of ADHD symptoms

Parents of the children completed the SNAP-IV (Swanson 1992), which has 39 items derived from the DSM criteria for ADHD and oppositional defiant disorder (ODD). Parents respond on a Likert scale rating regarding the presence of these symptoms. Making use of only the DSM-IV symptoms, the scale yields ADHD-related factor scores for inattention and hyperactivity-impulsivity. The SNAP–IV scales were administered twice during each treatment week on days 3 and 6. SNAP-IV ratings on day 6 were used in the current analysis as these completed after a longer period of exposure to the drug.

Results

Initial data treatment

Missing data were handled on a case-by-case/analysis-by-analysis basis to maximize the power for each individual test. There was no replacement of missing data, but the missing values for SKAMP and PERMP for each case were accommodated where possible by excluding them from the calculations of mean scores for a particular variable (e.g., morning or evening response), so that only patients with missing data for a whole period or condition were excluded from the analysis. The reported analyses adopted a correlation-based approach to explore the relationship between drug effects as assessed by parent ratings and by laboratory classroom-based measures. For all analyses for each measure, our primary outcome is the treatment response in terms of the difference between PLA and active drug (either CON or EQXL-MCD) on the SNAP-IV, SKAMP and PERMP measures. For the purposes of the analysis the laboratory classroom day was partitioned into four segments in order to reduce the number of tests carried out: First thing (0 hours), morning (1.5–4.5 hours), afternoon (6.0–7.5 hours), and evening (12 hours). This approach was justified because preliminary analyses found no differences in patterns of correlations between different individual testing points within the morning and afternoon periods. It was expected that the properties of the different measurement approaches would be treatment independent—that the associations between parent ratings and laboratory classroom scores would be no different for CON and EQXL-MCD. Preliminary analysis supported this expectation. Therefore, correlation analyses were based on pooled data for the two formulations. Statistical significance was adjusted to control for testing of multiple correlations using the Bonferroni procedures. The adjusted p values resulting from this are reported for each analysis separately (see tables, below).

Although the efficacy data for the SKAMP and the PERMP has been published previously (Swanson et al. 2004), this is not the case for the SNAP-IV. Therefore to provide a context for the study of the three measures of interest, we first assessed treatment effects for CON and EQXL-MCD on SNAP-IV. There was no difference between the effects of the two formulations on either inattention symptoms (t[171] = 0.032; p > 0.97) or hyperactivity/impulsivity (t[171] = 0.167; p > 0.87). There were highly significant effects for both formulations compared with PLA on both subscales (inattention; t ^EQXL-MCD[172] = 7.53; p < 0.001); t ^CON[176] = 8.38; p < 0.001); hyperactivity/impulsivity; t ^EQXL-MCD[172] = 8.42; p < 0.0001; t ^CON[176] = 9.09; p < 0.001).

The relationship between SKAMP ratings and PERMP problems attempted and correct as measures of drug effects: Table 1 shows the correlations between treatment response as measured by the SKAMP and PERMP scores at different periods of the treatment day. Overall, the correlations were stronger between SKAMP attention and PERMP scores than for SKAMP deportment and PERMP. The mean correlation between SKAMP attention and PERMP attempted was 0.51 and with PERMP correct was 0.45. For deportment, the respective values were 0.29 and 0.31. There was no difference between the PERMP and SKAMP correlations for either the number of problems attempted and correct. Across the day, the only significant change in correlation involved a substantial drop in the correlation between PERMP and SKAMP attention in the afternoon. During this time SKAMP deportment-based effect measures were more strongly associated with PERMP measures.

Table 1.

The Correlation Between Treatment Response (Placebo-Active Drug) Measured by SKAMP and PERMP at Different Times of the Day, Including Effects for Both EQ-XL and CON in the Same Analyses

	PERMP
	Attempted	Correct
First thing (0 hours)
SKAMP Attention	0.58	0.29
SKAMP Deportment	0.20	0.16*
Morning (1.5–4.5 hours)
SKAMP Attention	0.58	0.60
SKAMP Deportment	0.32	0.36
Afternoon (6.0–7.5 hours)
SKAMP Attention	0.29	0.27
SKAMP Deportment	0.39	0.40
Evening (12 hours)
SKAMP Attention	0.60	0.67
SKAMP Deportment	0.28	0.32

Note: Degrees of freedom vary as a function of missing data for the different measures between 310 and 341. All correlations are Pearson r. All correlations significant p < 0.001 except * = p = 0.004 (corrected for multiple tests).

Abbreviations: SKAMP = Swanson, Kotkin, Agler, M-Flynn and Pelham Scale; PERMP = Permanent Product Measure of Performance; EQXL-MCD = Equasym XL^®/Metadate CD^®; CON = Concerta^®.

Do parent ratings correlate with laboratory classroom measures? Table 2 presents the correlations between the SKAMP and PERMP laboratory classroom measures and SNAP-IV ratings. There were significant correlations between SNAP-IV ratings and laboratory classroom measures. These varied little as a function of SNAP-IV dimension, with similar effects being seen for scores derived for the combined, inattentive, and hyperactive/impulsive dimensions. However, the significance and magnitude of the correlations varied markedly in terms of time of day. Measures taken immediately after dosing were by and large not significantly correlated with SNAP-IV scores after adjusting significance levels for multiple tests; the exception was SKAMP attention ratings, which showed highly significant but moderately sized correlations with both SNAP-IV dimensions and with the total SNAP-IV score. In contrast to the effects found with measures taken during other times of the day, these correlations were negative, suggesting an inverse relationship between measures of treatment effects as measured by SKAMP at time zero and those measured by SNAP-IV. The strongest and most consistent pattern of correlations was found for laboratory measures taken in the morning (1.5–4.5 hours). At this time point, all measures gave significant but moderately sized associations with SNAP-IV ratings. On this occasion, although the SKAMP attention and deportment effects were now much closer, the SKAMP deportment effects had increased considerably and now showed the largest effects. A far less consistent pattern was found with afternoon measures, with only SKAMP deportment showing consistent effects above r = 0.2. During the afternoon, PERMP measures were associated with treatment effects measured by the SNAP-IV inattention dimension. There were no significant associations between SNAP-IV ratings and the laboratory measures (SKAMP or PERMP) taken in the evening.

Table 2.

The Correlations Between SNAP Ratings and Laboratory Classroom Measures of Treatment Effect at Different Times of the Day

	SNAP-IV ratings
	Inattention	Overactivity	Combined
First thing (0 hours)
SKAMP Attention	−0.20	−0.18	−0.19
	p < 0.001	p = 0.001	p = 0.001
SKAMP Deportment	0.02	−0.02	−0.03
	p = 0.771	p = 0.775	p = 0.532
PERMP Attempt	0.13	−0.12	−0.14
	p = 0.016	p = 0.030	p = 0.010
PERMP Correct	−0.14	−0.12	−0.14
	p = 0.016	p = 0.033	p = 0.011
Morning (1.5–4.5 hours)
SKAMP Attention	0.21	0.22	0.21
	p < 0.001	p < 0.001	p < 0.001
SKAMP Deportment	0.25	0.25	0.25
	p < 0.001	p < 0.001	p < 0.001
PERMP Attempt	0.18	0.14	0.15
	p = 0.001	p = 0.009	p = 0.007
PERMP Correct	0.19	0.15	0.16
	p < 0.001	p = 0.005	p = 0.004
Afternoon (6.0–7.5 hours)
SKAMP Attention	0.14	0.12	0.1
	p = 0.015	p = 0.035	p = 0.073
SKAMP Deportment	0.23	0.21	0.21
	p < 0.001	p < 0.001	p < 0.001
PERMP Attempt	0.18	0.13	0.11
	p = 0.001	p = 0.039	p = 0.016
PERMP Correct	0.17	0.09	0.11
	p = 0.001	p = 0.100	p = 0.037
Evening (12 hours)
SKAMP Attention	−0.05	−0.09	−0.06
	p = 0.388	p = 0.119	p = 0.265
SKAMP Deportment	0.08	0.07	0.06
	p = 0.126	p = 0.188	p = 0.237
PERMP Attempt	0.03	−0.01	0.01
	p = 0.582	p = 0.794	p = 0.854
PERMP Correct	0.05	0.01	0.02
	p = 0.373	p = 0.774	p = 0.657

Note: Bold signifies significant results after correcting for multiple testing. Degrees of freedom vary as a function of missing data for the different measures between 309 and 341. All correlations are Pearson r; significant level adjusted for multiple testing is p = 0.001.

Abbreviations: SNAP = Swanson, Nolan, and Pelham Rating Scale; SKAMP = Swanson, Kotkin, Agler, M-Flynn and Pelham Scale; PERMP = Permanent Product Measure of Performance.

Are there gender differences in the associations between parental ratings and laboratory measures? Table 3 shows the patterns of associations between treatment effects broken down by gender. To reduce the number of comparisons made for these analyses, we focused on the combined SNAP-IV measure and employed an aggregate measure of SKAMP attention and deportment. Also we only used the PERMP correct score, given the great similarity of outcome for the two measures and the fact that the correct score tended to give the better predictions. In general, effects for PERMP-based measures were similar for males and females across the day, although there was higher correlation for females than males at 0 hours. There was a strikingly large difference for SKAMP-based measures during the morning and afternoon—the periods that showed the biggest correlations for the whole sample. For females, we saw large correlations (>0.4) and a moderate correlations (<0.2) for males. Given this, it appeared that the association between ratings and laboratory measures seen in the previous analyses was driven by females.

Table 3.

A Comparison of the Correlations Between SNAP Ratings and Laboratory Measures for Males and Females

	SNAP overall rating
	All	Males	Females
First thing (0 hours)
SKAMP	−0.12	−0.10	−0.18
	p = 0.021	p = 0.118	p = 0.084
PERMP Correct	−0.14	−0.10	−0.27
	p = 0.011	p = 0.144	p = 0.016
Morning (1.5–4.5 hours)
SKAMP	0.26	0.10	0.46
	p < 0.001	p = 0.005	p < 0.001
PERMP Correct	0.16	0.16	0.18
	p = 0.004	p = 0.013	p = 0.095
Afternoon (6.0–7.5 hours)
SKAMP	0.21	0.11	0.4
	p < 0.001	p = 0.089	p < 0.001
PERMP Correct	0.17	0.09	0.12
	p = 0.037	p = 0.100	p = 0.270
Evening (12 hours)
SKAMP	0.01	−0.01	−0.02
	p = 0.966	p = 0.975	p = 0.865
PERMP Correct	0.02	0.05	−0.08
	p = 0.657	p = 0.466	p = 0.470

Note: Bold signifies significant results after correcting for multiple testing. SNAP-based scores were for combined inattention and hyperactivity/impulsivity. SKAMP-based scores are the mean of the deportment and attention indices. Adjusted significance level is p = 0.002.

Abbreviations: SNAP = Swanson, Nolan, and Pelham Rating Scale; SKAMP = Swanson, Kotkin, Agler, M-Flynn and Pelham Scale; PERMP = Permanent Product Measure of Performance.

Discussion

In COMACS, both parental ratings and laboratory classroom-based measures provided evidence for the efficacy of the long-acting MPH formulations CON and EQXL-MCD. However, the magnitude of the correlations between parent ratings and laboratory school-based measures were in general only small to moderate. Furthermore, the size of the association varied as a function of time of day, with the largest effects being found during the morning for both attention and deportment and for deportment only, in the afternoon. This limited and inconsistent pattern of association between these different approaches to measuring treatment response could be explained partly by subtle differences in the content of SNAP-IV and SKAMP items, on the one hand, and also by the fact that parent ratings and laboratory observations were measured at different periods and on different time scales. However, even so, the results suggest that each approach seems to be tapping into some unique and distinctive elements and that SNAP-IV parent ratings and more objective indicators may play complementary roles in the assessment of MPH response. In this regard, it is interesting to note that researchers have also found a lack of association between clinical and both neuropsychological effects (Coghill et al. 2007) and quality of life ratings (Rimmer et al. 2007).

Both ratings and laboratory measures (as well as neuropsychological and quality of life measures) have their own strengths and weaknesses and together provide a potentially more complete picture of therapeutic effects. The laboratory measures are more objective and therefore may be more independent of some of the biases that often affect more subjective ratings (although idiosyncratic biases of individual raters on the SKAMP might also affect the assessment of individual children). As a consequence, they have the potential to provide an estimate that gets closer to the real effects of drugs on circumscribed children's behavior and performance. However, the laboratory school setting is necessarily rather artificial given its highly structured nature and this will inevitably alter the way children respond. Thus, the findings from the laboratory classroom may not be representative of drug effects in the real-world classroom given the different sorts of challenges that might exist within these less controlled and structured settings.

This limitation in the ability to generalize would potentially apply as much to the normal classroom as well as to other settings such as the home, and, indeed, this is the reason that clinical guidelines suggest that the clinician should obtain feedback from both home and school when monitoring response to medication (Banaschewski et al. 2006; National Collaborating Centre for Mental Health 2008). To test whether this is actually the case, teacher ratings of treatment effects would need to be included in the study. The SNAP-IV and other rating scales have the potential to provide a broader notion of treatment effects than the SKAMP and PERMP. However, the scores provided by parents are likely to pass through an important set of perceptual and motivational filters (Reid and Maag 1994), inasmuch as parents have notions of what aspects of the disorder are most significant for them (Bussing et al. 2003). Thus, for instance, the issue of impairment, although in principle is a different construct to that of symptoms, is likely to confabulate a parent's ratings of an individual symptom behavior —leading them to weight certain behaviors more than others in their overall judgement, perhaps through the operation of “halo effects” (Abikoff et al. 1993). In a similar way, raters may also give precedence to certain periods of the day when making their judgements—periods where they feel effective control is especially important.

In the current analyses, it is interesting that, contrary to predictions, there were no positive correlations between parent ratings and laboratory measures at either 0 hours or 12 hours, when one might have expected parents to have the most exposure to the effects of MPH in a normal setting. In fact, significant correlations were seen at their strongest during the mid-morning period. Could this reflect the fact that mid-morning effects of MPH are regarded as more significant in the overall estimation of efficacy than are those seen at other times of the day? The importance of the morning effects could be due to a number of elements. First, it could be that parents feel that the morning is the most important period for control in terms of patients functioning, perhaps because in many school systems it is the morning when the key academic work is done. Another possibility is that there is a sense that symptoms have the most impact during the morning or that in general symptom levels increase most in the morning when children are off medication. The pattern of PLA response across the day in the COMACS study supports the notion of rapid increase in symptoms over the morning period (Swanson et al. 2004). This morning imperative could be explored directly by asking parents which time of day they regarded as most important in terms of control. To address this question from a different perspective, one could also compare objective laboratory measures of PD effects against more time-specific parental ratings made across the day.

There were a number of other points of significance in the current findings. First, the SKAMP attention scale measure was correlated with SNAP-IV measures at 0 hours, although the patterns of correlations at this time of day were negative; higher SKAMP inattention scores were correlated with lower SNAP-IV scores. The time of day to which this pattern of correlation related coincided with a period of the laboratory school day when treatment was associated with more symptoms than was PLA (Swanson et al. 2004). This apparently negative effect has been discussed previously as being related to a possible early morning rebound-effect associated with the previous day's MPH exposure. This negative effect of medication as indexed by SKAMP attention did not seen to be reflected in overall parent ratings, which may suggest that time-specific measures are required to pick up this potentially important effects of MPH.

Second, PERMP and SKAMP measures of treatment response were strongly correlated with each other at all points across the day. As predicted, the effects were in general greater for SKAMP attention than deportment, an effect possibly founded on the shared cognitive element underpinning math productivity and attentional control. This provides good validating evidence for the SKAMP ratings. It does, however, beg the question of whether an objective test-based assessment of deportment could be developed to complement the PERMP measure.

Third, the pattern of correlations in the morning and to a lesser extent in the afternoon appeared to be driven primarily by the girls in the sample. This is an intriguing finding, especially given the sex differences in the PD of MPH response found for the COMACS study sample and reported previously (Sonuga-Barke et al. 2007). Here we found parents' ratings seemed to reflect far more strongly the effects of MPH on behavior and performance. What could this mean? From one perspective it could be that parent ratings of male and female patient's symptoms are determined by different factors, with ratings of girls being far more influenced by the actual objective effects of the drugs. Form a different perspective, it may be that the laboratory classroom is a more ecologically valid test setting for girls than for boys—perhaps being more representative of the normal classroom experiences and demands of females than males—at least as perceived by their parents.

The results have a number of practical implications in terms of selection of existing and the development of improved measurement tools for medication trials. One cannot say that either the laboratory-based nor the parental ratings are better measures of drug effects; they each provide different perspectives on therapeutic effects. Both are sensitive measures of MPH-related change. In choosing which ratings to use, parent ratings seemed especially insensitive to effects at the beginning and end of the day, and so if those periods of the day were seen as a particular focus for clinical interest, alternatives to SNAP-IV ratings would be required. However, it has to be recognized that this insensitivity to effects at different times of the day may be due to the fact these periods are not seen as important and so are discounted by the parents in their overall judgment of treatment effects. Measures of efficacy should include both time-specific ratings of symptom control and also ratings of the clinical significance of control at those times related to the impact of symptoms on functioning.

Turning to the laboratory measures, it appears that the SKAMP and PERMP provide complementary aspects of information, with PERMP overlapping with the more cognitively based SKAMP attention scale. The final issue of significance is the different effects for males and females. Could we say that parent ratings are more valid for girls than for boys in that they reflect objective change in symptoms as measured in the laboratory classroom? Future instrument development should address issues of the sex-specific relevance of scores on these different measures.

The most significant limitation of the study was that a direct like-for-like comparison of parent and trained observer ratings was not possible; parents were not asked to make their ratings of their child based on their behavior in the laboratory classroom. It is possible that if this were the case then the correlations would be significantly higher than found in the current analysis. A direct comparison would allow us to tease apart rater and method effects and provide a more definitive test of the relative value of ratings, observations, and objective tests. Similar issues are raised by the lack of a teacher-based rating. A second limitation was that no measures of parental attitudes to ADHD and its treatment were included in the study. These would have allowed a direct investigation of the processes underpinning the discrepancy between parent ratings and laboratory-based measures.

In summary, the current study provided evidence that while both parent ratings and laboratory classroom measures provide valuable indices of MPH response, the overlap between the two measures was rather limited, suggesting that each taps a different aspect of therapeutic change.

Disclosures

Edmund Sonuga-Barke is currently a consultant for Shire pharmaceutical and UCB-Pharma; has in the last 3 years received grant funding from Janssen Cilag, UCB-Pharma, and QB-tech; has recently served on the advisory boards for Shire pharmaceuticals, UCB-Pharma, and Flynn Pharma; and has spoken at events sponsored by UCB-Pharma, Shire Pharmaceuticals, and Janssen Cilag. David Coghill is currently a consultant for Shire Pharmaceutical; has in the last 3 years received grant funding from Shire Pharmaceutical and Eli Lilly; has recently served on the advisory boards for Shire Pharmaceuticals, UCB-Pharma, Eli Lilly, Pfizer, and Flynn Pharma; and has spoken at events sponsored by UCB-Pharma, Eli Lilly, Flynn Pharma, and Janssen Cilag. Marc De Backer is a full time employee of UCB-Pharma. Jim Swanson is or has been a consultant for Elli Lilly & Co, McNeil, Shire, Cephalon, Celltech, UCB, and Novartis; has received grant funding from McNeil, Shire, Cephalon, Celltech, UCB, and Novartis; and has received speakers fees from McNeil, Shire, Cephalon, Celltech, UCB, and Novartis.

References

Abikoff

, Courtney

, Pelham

, Koplewicz

. Teachers' ratings of disruptive behaviors: The influence of halo effects. J Abn Child Psychol, 21:519–533. 1993.

American Psychiatric Association. Diagnostic amd Statistical Manual of Mental Disorders, 4th. Washington (DC): American Psychiatric Association, 1994.

Banaschewski

, Coghill

, Santosh

, Zuddas

, Asherson

, Buitelaar

, Danckaerts

, Döpfner

, Faraone

, Rothenberger

, Sergeant

, Steinhausen

, Sonuga-Barke

EJS

, Taylor

. Long-acting medications for the hyperkinetic disorders. A systematic review and European treatment guideline. Eur Child Adolesc Psychiatr, 15:476–495. 2006.

Briggs-Gowan

, Carter

, Schwab-Stone

. Discrepancies among mother, child, and teacher reports: Examining the contributions of maternal depression and anxiety. J Abn Child Psychol, 24:749–765. 1996.

Bussing

, Gary

, Mills

, Garvan

. Parental explanatory models of ADHD; Gender and cultural variations. Soc Psychiatr Psychiatr Epidemiol, 38:563–575. 2003.

Coghill

, Rhodes

, Matthews

. The neuropsychological effects of chronic methylphenidate on drug-naive boys with attention-deficit/hyperactivity disorder. Biol Psychiatry, 62:954–962. 2007.

Demaray

, Elting

, Schaefer

. Assessment of attention-deficit/hyperactivity disorder (ADHD): A comparative evaluation of five, commonly used, published rating scales. Psychol Schools, 40:341–361. 2003.

Faraone

, Monuteaux

, Biederman

, Cohan

, Mick

. Does parental ADHD bias maternal reports of ADHD symptoms in children? J Cons Clin Psychol, 71:168–175. 2003.

Hartung

, Van Pelt

, Armendariz

, Knight

. Biases in ratings of disruptive behavior in children: Effects of sex and negative halos. J Atten Dis, 9:620–630. 2006.

10.

Madaan

, Daughton

, Lubberstedt

, Mattai

, Vaughan

, Kratochvil

. Assessing the efficacy of treatments for ADHD—Overview of methodological issues. CNS Drugs, 22:275–290. 2008.

11.

Martins

, Tramontina

, Polanczyk

, Eizirik

, Swanson

, Rohde

. Weekend holidays during methylphenidate use in ADHD children: A randomized clinical trial. J Child Adolesc Psychopharmacol, 14:195–206. 2004.

12.

National Collaborating Centre for Mental Health. Attention deficit hyperactivity disorder: Diagnosis and management of ADHD in children, young people, adults: NICE clinical guideline. London: National Institute for Health and Clinical Excellence. 72. 2008.

13.

Reid

, Maag

. How many fidgets in a pretty much: A critique of behavior rating scales for identifying students with ADHD. J School Psychol, 2:339–254. 1994.

14.

Rimmer

, Campbell

, Coghill

. ADHD: The impact on parent's, children's quality of life. Am Acad Child Adoles Psychiatr Ann Meeting Boston, 2007.

15.

Simonoff

, Pickles

, Hervas

, Silberg

, Rutter

, Eaves

. Genetic influences on childhood hyperactivity: Contrast effects imply parental rating bias, not sibling interaction. Psychol Med, 28:825–837. 1998.

16.

Snyder

, Hall

, Cornwell

. Review of clinical validation of ADHD behavior rating scales. Psychol Rep, 99:363–378. 2006.

17.

Sonuga-Barke

EJS

, Minocha

, Taylor

, Sandberg

. Inter ethnic bias in teachers' ratings of childhood hyperactivity. Br J Dev Psychol, 11:187–200. 1993.

18.

Sonuga-Barke

EJS

, Swanson

, Coghill

, DeCory

, Hatch

. Efficacy of two once-daily methylphenidate formulations compared across dose levels at different times of the day: Preliminary indications from a secondary analysis of the COMACS study data. BMC Psychiatr, 4:28. 2004.

19.

Sonuga-Barke

, Coghill

, Markowitz

, Swanson

, Vandenberghe

, Hatch

. Sex differences in the response of children with ADHD to once-daily formulations of methylphenidate. J Am Acad Child Adolesc Psychiatr, 46:701–710. 2007.

20.

Sonuga-Barke

EJS

, Van Lier

, Swanson

, Coghill

, Wigal

, Vandenberghe

, Hatch

. Heterogeneity in the pharmacodynamics of two long-acting methylphenidate formulations for children with attention deficit/hyperactivity disorder: A growth mixture modelling analysis. Europ Child Adolesc Psychiatr, 17:245–254. 2008.

21.

Swanson

, Wigal

, Greenhill

, Browne

, Waslick

, Lerner

, Williams

, Flynn

, Agler

, Crowley

, Fineberg

, Regino

, Baren

, Cantwell

. Objective and subjective measures of pharmacodynamic effects of Adderall in the treatment of children with ADHD in a controlled analog classroom setting. Psychopharmacol Bull, 34:55–60. 1998.

22.

Swanson

, Volkow

. Pharmacokinetic and pharmacodynamic properties of stimulants: implications for the design of new treatments for ADHD. Behav Brain Res, 130:73–78. 2002.

23.

Swanson

, Wigal

, Sonuga-Barke

EJS

, Greenhill

, Biederman

, Kollins

, Nguyen

, DeCory

, Hirshe Dirksen

, Hatch

. COMACS study group. A comparison of two extended release preparations of methylphenidate; pharmacodynamic effects across the day. Pediatrics, 113:e205–e215. 2004.

24.

Swanson

. School-based assessments, interventions for ADD students. Irvine (California): KC Publishing, 1992.

25.

Taylor

, Sonuga-Barke

EJS

. Rutter

, Bishop

, Pine

, Scott

, Stevenson

, Taylor

, Thapar

. Disorders of Attention and Activity. Rutter's Child & Adolescent Psychiatry, 5th. UK: Wiley-Blackwell, 2008; 521–542.

26.

Taylor

, Doepfner

, Sergeant

, Asherson

, Banaschewski

, Buitelaar

, Coghill

, Danckaerts

, Rothenberger

, Sonuga-Barke

, Steinhausen

, Zuddas

. European Guidelines for Hyperkinetic Disorder—first upgrade. Europ Child Adolesc Psychiatr, 13:7–30. 2004.

27.

Wigal

, Gupta

, Guinta

, Swanson

. Reliability and validity of the SKAMP rating scale in a laboratory school setting. Psychopharmacol Bull, 34:47–53. 1998.

28.

Wigal

, Wigal

. The laboratory school protocol: Its origin, use, and new applications. J Atten Dis, 10:92–111. 2006.

29.

Wolraich

, Doffing

. Pharmacokinetic considerations in the treatment of attention-deficit hyperactivity disorder with methylphenidate. CNS Drugs, 18:243–250. 2004.