Abstract
Objective:
We aimed to identify which of the 17 items comprising the Children's Depression Rating Scale-Revised (CDRS-R) can sensitively capture changes in depression severity.
Methods:
We used data from four studies involving two antidepressants. For each of the 17 CDRS-R items, we conducted item response analyses to identify and evaluate those that reflect changes in depression severity. We created plots of the item characteristic curves (ICCs) estimated by the graded response model, and option characteristic curves and ICCs using nonparametric item response theory. The change from baseline in the CDRS-R subscale score with specified reflective items by item response analyses and the effect size between the treatment group and placebo group were calculated and compared with those of the CDRS-R total score.
Results:
CDRS-R items #2 (difficulty having fun), #3 (social withdrawal), #10 (low self-esteem), #11 (depressed feelings), and #15 (depressed facial expression) have favorable profiles that reflect disease severity. Changes from baseline in the CDRS-R total score (least square mean ± standard error) at week 8 were −22.3 ± 0.7 and −23.9 ± 0.7 in the placebo group and treatment group, respectively (difference, −1.5; estimated effect size, −0.113), changes from baseline in the CDRS-R5 (CDRS-R subscale consisting of the specified reflective items [#2, #3, #10, #11, and #15]) score at week 8 were −8.4 ± 0.3 and −9.6 ± 0.3 in each group, respectively (difference, −1.2; effect size, −0.202).
Conclusions:
The item response analyses clarified the properties of 17 items of the CDRS-R for major depressive disorder in children and adolescents. The CDRS-R5 might optimize the assessment of changes in overall depression severity and differentiation of treatment responses.
Introduction
The Children's Depression Rating Scale-Revised (CDRS-R) (Poznanski et al. 1984) is an observer-rated 17-item scale that is completed by clinicians based on interviews (#1–#14) with the patients and parents and observations (#15–#17) during the interviews. This scale was modeled using the 17-item Hamilton Depression Rating Scale (HAM-D17; one of the most widely used outcome measures of major depressive disorder [MDD] for adults), and has been the most commonly used instrument in clinical trials to determine outcome measures to assess the severity of major depression in children and adolescents. However, the properties of CDRS-R as a measure of depression severity have not been well characterized. A recent systematic review suggested that it was unclear whether the CDRS-R can accurately or reliably measure depressive symptom severity in adolescents with MDD (Stallwood et al. 2021).
Regarding the HAM-D, which was first published in 1960 (Hamilton 1960), much of the criticism of the instrument had been derived. Among 17 items that consist of HAM-D, 6 items representing the “core symptoms” of MDD have been selected to create a shorter version of the HAM-D (HAM-D6) (Bech et al. 1975, 1984; O'Sullivan et al. 1997). Although the validity of psychometric properties of the HAM-D17 for measuring outcomes of antidepressive effects has been questioned, the HAM-D6 has been found to be psychometrically valid (Bagby et al. 2004; Licht et al. 2005).
In terms of inter-rater reliability, the HAM-D6 is superior to the Clinical Global Impression ratings (Ruhé et al. 2005) and is currently utilized as an outcome measure (Bech et al. 2006). Evans et al. evaluated the usefulness of each item of the HAM-D17 by item response analyses and clarified that several items showed preferable relationship with severity of depression, but many other items appeared not to be useful with regard to the ability to discriminate the depression severity (Evans et al. 2004). Therefore, evaluating the psychometric properties of the individual items of the CDRS-R should be important. The evaluation is also useful in identifying which symptoms are relevant to depression, and it may be helpful to refine the measurement scale to enable a more appropriate assessment of MDD.
It has been recognized the difficulty of demonstrating the efficacy of antidepressants for MDD patients, especially for children and adolescents, in clinical trials due to high placebo responses and small effect sizes (Cipriani et al. 2016). Although the causes would not be fully understood, the performance of CDRS-R may be one of the potentials. The item construction of the outcome measure (CDRS-R) is worth considering the development of “core depression symptoms” characterized as more sensitive items, leading to shorter version derived from 17-item CDRS-R.
The item response theory (IRT) method is useful to provide detailed information about item functioning and investigate the usefulness of each item in the health outcome measures (Embretson and Reise 2000; Hays et al. 2000). IRT is often utilized for developing and refining health outcome measures, including psychological measures (Reise and Waller 2009). This study attempted to characterize the performance of individual items of the CDRS-R as depression measure by IRT analyses and evaluate the usefulness of the shorter version of the CDRS-R. The data from four clinical studies of venlafaxine and sertraline in children and adolescents with major depression (Wagner et al. 2003; Emslie et al. 2007) were used in this investigation.
Methods
Data from four double-blind placebo-controlled clinical studies of the treatment of major depression in children and adolescents (0600B1-382-US, 0600B1-394-US, A0501001, and A0501017) were used for the analyses. Details of the clinical studies are shown in Table 1. The clinical studies were conducted in compliance with the Declaration of Helsinki and Good Clinical Practice. The institution of review board of each center approved the protocol. All patients provided written informed consent.
Characteristics of the Included Clinical Studies
The 17 items comprising the CDRS-R are as follows: impaired schoolwork (#1), difficulty having fun (#2), social withdrawal (#3), sleep disturbance (#4), appetite disturbance (#5), excessive fatigue (#6), physical symptoms (#7), irritability (#8), excessive guilt (#9), low self-esteem (#10), depressed feelings (#11), morbid ideation (#12), suicidal ideation (#13), excessive weeping (#14), depressed facial expression (#15), listless speech (#16), and hypoactivity (#17) (Poznanski and Mokros 1996). The items #4, #5, and #16 are rated from 1 to 5 and all others are rated from 1 to 7 with higher scores indicating increased pathology.
IRT analysis
We examined the properties of each item of the CDRS-R as a measure of the severity of depression based on the IRT (Embretson and Reise 2000; Reise and Waller 2009).
The item characteristic curves (ICCs) for each CDRS-R item were estimated using the graded response model by PROC IRT procedure in SAS® (An and Yung 2014). We also investigated the usefulness of each item using the nonparametric IRT (Testgraf software; Ramsay 1991, 2000). Option characteristic curves (OCCs), which show the relationship between the probability of endorsing a particular option of an item and the overall level of depressive severity, were developed. On these curves, the CDRS-R total score that was used as an indicator of the severity of depression (x-axis), and the relative frequency of rating each option for each assessment item (y-axis) were plotted. ICCs, which provide graphical illustrations of the expected score with the estimated 95% confidence limits for a particular CDRS-R item as a function of overall depressive severity, were also plotted.
Calculation of the effect size using the CDRS-R
Changes in the CDRS-R total score and CDRS-R subscale score from baseline were calculated over time using integrated data from four clinical trials. The least square means of the changes from baseline in the CDRS-R total score and CDRS-R subscale score for each timepoint were calculated in the placebo group and treatment group using an analysis of covariance, with the corresponding baseline value as the covariate. The effect size was calculated from the group differences and the standard deviations estimated from the model.
Results
The clinical studies that are used for the investigation are shown in Table 1. All of studies were double-blind placebo-controlled clinical studies of the treatment of major depression in children and adolescents. Of note, all of these studies failed to demonstrate the significant efficacy of the antidepressants compared with placebo in each study. The demographic characteristics of the pooled data are shown in Table 2.
Demographic and Baseline Characteristics of Patients Included in the Integrated Data from Four Studies
HAM-D and MADRS scores were collected in two studies.
CDRS-R, Children's Depression Rating Scale-Revised; CGI-S, Clinical Global Impressions-severity of illness; HAM-D, Hamilton Depression Rating Scale; MADRS, Montgomery Åsberg Depression Rating Scale; SD, standard deviation.
The profiles of the three graphs (ICC estimated using the graded response model, and OCC and ICC estimated using the nonparametric IRT) were generally consistent among the four studies and timepoints (data not shown); therefore, it was considered appropriate to integrate the four trials and all timepoints.
The ICCs (Fig. 1), which were estimated using the graded response model for each item, are shown. The OCCs (Fig. 2) and ICCs (Fig. 3) for each item estimated using the nonparametric IRT are also shown. An evaluation of these three graph types from a multilateral point of view indicated that, of the 17 items of the CDRS-R, items #2, #3, #10, #11, and #15 showed a more reflective profile of depression severity, and that items #4, #5, #7, #9, #12, #13, and #14 showed a less reflective profile of depression severity.

Item characteristic curves estimated using the graded response model for 17 items of the CDRS-R. CDRS-R total score (x-axis: expected score) and the relative frequency of rating each option for each assessment item (y-axis: probability) were plotted. Each line shows relative frequency of rating 1 in blue line, relative frequency of rating 2 in red line, relative frequency of rating 3 in green line, relative frequency of rating 4 in brown line, relative frequency of rating 5 in purple line, relative frequency of rating 6 in yellow green line, and relative frequency of rating 7 in light blue line. CDRS-R, Children's Depression Rating Scale-Revised.

Option characteristic curves estimated using the nonparametric item response theory for 17 items of the CDRS-R. CDRS-R total score (x-axis: expected score) and the relative frequency of rating each option for each assessment item (y-axis: probability) were plotted. Each line shows relative frequency of rating 1 in Line 1, relative frequency of rating 2 in Line 2, relative frequency of rating 3 in Line 3, relative frequency of rating 4 in Line 4, relative frequency of rating 5 in Line 5, relative frequency of rating 6 in Line 6, and relative frequency of rating 7 in Line 7. CDRS-R, Children's Depression Rating Scale-Revised.

Item characteristic curves estimated using the nonparametric item response theory for 17 items of the CDRS-R. CDRS-R total score (x-axis: expected score) and average rating (95% confidence interval) of each assessment item (y-axis: Probability) were plotted. Green vertical line shows 95% confidence interval of average rating. CDRS-R, Children's Depression Rating Scale-Revised.
We evaluated the change from baseline in the score of the CDRS-R5 [CDRS-R subscale consisting of the specified reflective items (#2, #3, #10, #11, and #15)] and the total score of the CDRS-R, using the integrated data from the four clinical trials (Table 3). We also evaluated the change from baseline in the score of the CDRS-R8 (CDRS-R subscale consisting of items #1, #2, #3, #4, #6, #8, #10, and #11), which have been suggested to be useful for detecting significant differences in improvements in depression (Bondar et al. 2020).
Change from Baseline in the Children's Depression Rating Scale-Revised (CDRS-R) Total Score, CDRS-R5, and CDRS-R8 for Each Timepoint
The LS means of the changes from baseline in the CDRS-R total score (or CDRS-R subtotal score) were calculated using an analysis of covariance with the corresponding baseline value as the covariate for each timepoint. CDRS-R5: CDRS-R subscale consisting of the specified reflective items (#2, #3, #10, #11, and #15), CDRS-R8: CDRS-R subscale consisting of items #1, #2, #3, #4, #6, #8, #10, and #11.
CDRS-R, Children's Depression Rating Scale-Revised; CI, confidence interval; LOCF, last observation carried forward; LS, least square; SE, standard error.
The difference between the treatment and placebo groups, and the effect sizes were calculated for each timepoint. The effect size observed for the CDRS-R5 (items #2, #3, #10, #11, and #15) was −0.202 at week 8 (last observation carried forward);. In contrast, the effect sizes observed for the CDRS-R total score and CDRS-R8 (items #1, #2, #3, #4, #6, #8, #10, and #11) were −0.113 and −0.122 at week 8 (last observation carried forward), respectively. The same analyses were conducted by children (<12 years old) and adolescent (≥12 years old) subgroups in the integrated data (Supplementary Table S1).
Discussion
Despite the wide use of CDRS-R in clinical trials as a measurement of depression severity, the properties of CDRS-R have not been well characterized. This is the first study to explore the properties of 17 items of the CDRS-R for MDD in children and adolescents using item response analyses. Of the 17 items of the CDRS-R, five items (#2, difficulty having fun; #3, social withdrawal; #10, low self-esteem; #11, depressed mood; and #15, depressed facial expression) showed an especially good relationship between the scoring pattern and the range of the total CDRS-R score. With regard to the ability to discriminate depression severity, the remaining items of the CDRS-R were less useful compared with those five items, and items #5 (decreased appetite), #7 (physical symptoms), #9 (excessive guilt), #12 (morbid ideation), #13 (suicidal ideation), and #14 (excessive weeping) were found to be especially less sensitive.
The property of CDRS-R clarified in our study using item response analyses was generally corresponded with the previous report of symptom clusters that responded to treatment (Bondar et al. 2020). Bondar et al. (2020) identified two symptom clusters for CDRS-R and found that one of them with eight items that could possibly detect significant differences in improvements in CDRS-R scores between the treatment group and placebo group. Among those eight items in the identified cluster, our investigation involving item response analyses revealed that four items (#2, difficulty having fun; #3, social withdrawal; #10, low self-esteem; and #11, depressed mood) matched well with the severity of depression.
Considering these findings, these four items are useful for evaluating the severity of depression. According to our evaluation, the remaining four items (#1, impaired schoolwork; #4, sleep disturbance; #6, excessive fatigue; and #8, irritability) in the cluster were less useful compared with the aforementioned four items. Furthermore, the other symptom cluster that consists of six items (#5, decreased appetite; #7, physical symptoms; #9, excessive guilt; #12, morbid ideation; #13, suicidal ideation; #14, excessive weeping) did not improve significantly with any active treatments compared with placebo (Bondar et al. 2020), and all of these items failed to show good profiles in our item response analyses. Considering these findings, those six items are less useful for evaluating the severity of depression. As for the remaining three items (#15, #16, and #17), which were not examined in the previous study (Bondar et al. 2020), this study suggested that item #15 (depressed facial expression) was useful.
In addition, Mayes et al. (2010) examined the correlation between each item of the CDRS-R and the total score when evaluating fluoxetine treatment for depression in adolescents to determine the reliability and validity of the CDRS-R. Five items had high total correlations (Mayes et al. 2010) and well reflected the severity of depression. These five items are consistent with the five items selected during this study.
The symptoms that were suggested to be useful for evaluating depression by our investigation of CDRS-R were consistent with the accumulated knowledge of the HAM-D17. The relationship between each score of the HAM-D17 and the severity of depression has been examined using item response analyses, and items #1 (depressed mood), #7 (work and activity), #2 (feeling of guilt), #10 (anxiety/psychic), #11 (anxiety/somatic), and #13 (somatic/general) were found to have better relationships with depression severity than other items (Evans et al. 2004).
The following items of the CDRS-R found in this study and items of the HAM-D found in another study (Evans et al. 2004) were considered to have similarities (item of the CDRS-R vs. item of the HAM-D): #2 (difficulty having fun) versus #7 (work and activity); #3 (social withdrawal) versus #7 (work and activity); #10 (low self-esteem) versus #2 (feeling of guilt); #11 (depressed feeling) versus #1 (depressed mood); and #15 (depressed facial expression) versus #1 (depressed mood). The CDRS-R and HAM-D items selected by item response analyses did not include any sleep-related items, suggesting that sleep disturbance may not be a sensitive indicator of the acute phase of MDD (Shelton et al. 2007).
To assess the clinical significance of the results of item response analyses of the CDRS-R, we calculated the effect size in terms of the differences in score reductions from baseline between placebo and treatment groups using the data of the clinical trials of children and adolescents with MDD. The effect size in the CDRS-R5 was larger than those in the CDRS-R score with all 17 items and CDRS-R8. Although the CDRS-R5 includes 5 out of 17 items of CDRS-R total score, the treatment differences observed in CDRS-R5 were not obviously different from those in CDRS-R total score.
As a result of shortening the scale, the variability was prominently smaller in the evaluation using CDRS-R5 compared with CDRS-R total score. It was reported that greater improvement of CDRS-R between antidepressant and placebo group was observed in adolescents rather than children (Wagner et al. 2003; Emslie et al. 2007). This trend was also observed in both CDRS-R5 and CDRS-R8. Furthermore, the greatest effect size was also reproduced in the CDRS-R5 for both children and adolescent subgroups. Since all four studies that were used in this investigation failed to demonstrate the efficacy of antidepressants, the effect sizes were small and obvious clinically meaningful improvement was not observed in change from baseline for CDRS-R5 as well as that for CDRS-R total score.
Large improvements were observed in the placebo group in this integrated data, and this high placebo response can contribute to the small effect size. Our findings suggested that the item construction of CDRS-R may contribute to the failure of detecting the treatment difference between antidepressant and placebo groups in the four studies. Considering that the effect size got increased in the subscale, it was suggested that the CDRS-R5 may be an alternative measure of depression severity that can sensitively detect the treatment differences in clinical trials.
It should be noted that there are some limitations to the conclusion that can be drawn from the current item response analyses. Despite relatively large number of patients data (∼700 patients) were used in our investigation, it was based on the four clinical trials with two antidepressants as treatment groups (venlafaxine and sertraline), and all studies failed to demonstrate the efficacy of antidepressant compared with placebo. In addition, the patients included in these trials were restricted by eligibility criteria. It is unclear whether our findings can be replicated in the different populations, but we confirmed that the results of item response analyses were generally consistent among the four studies, supporting the robustness of our finding. There is still a limitation to ascertain whether the results from the present investigation would have been generalized to broader major depression disease population.
Although our results were obtained from patients with limited characteristics instead of a wide range of patients with depression, by using data from four antidepressant clinical trials involving children and adolescents, this study revealed the characteristics as measurements of depression for each of the 17 items of the CDRS-R.
Conclusion
This study revealed the property and usefulness of 17 items of CDRS-R. CDRS-R5 (#2, difficulty having fun; #3, social withdrawal; #10, low self-esteem; #11, depressed mood; and #15, depressed facial expression) shows a good relationship with the severity of depression. CDRS-R5 might have potential to be a new useful measurement for not only clinical trials but also efficacy evaluations of antidepressants for major depression in children and adolescents in clinical practice.
Clinical Significance
Despite the wide use of CDRS-R, the properties of individual items as a measure of depression severity have not been well characterized. This study revealed the usefulness of each item using item response analyses. It was revealed that five items show good relationship with the severity of depression. Some items would be supposed to be lesser ability to discriminate the depression severity in 17-item CDRS-R. It was also suggested that the shorter version of CDRS-R enhanced the detection of treatment difference. These findings also contribute to develop a better outcome measure for efficacy evaluations of antidepressants, and the CDRS-R5 is suggested to be an alternative concise and sensitive measure for depression severity.
Footnotes
Acknowledgments
Conduct of statistical analysis and support on medical writing was provided by EPS Corporation and was funded by Viatris. The authors would like to thank them for their support.
Authors' Contributions
H.Y., T.I., and T.Y. contributed to conceptualizing and designing the study. All authors, including S.H. and K.N., contributed to interpreting the results, and drafting and revising the article. All authors approved the final article and agreed to be accountable for all aspects of the study.
Disclosures
H.Y., T.I., and T.Y. are full-time employees of Pfizer R&D Japan G.K. S.H. and K.N. are full-time employees of Viatris Pharmaceuticals Japan Inc.
Supplementary Material
Supplementary Table S1
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
