Abstract
Given the increasing interest in demonstrating effectiveness in psychiatric treatment, the current paper seeks to advance outcome measurement in child psychiatry by demonstrating how more informative analytic strategies can be used to evaluate treatment in a real world setting using a brief, standardized parent-report measure. Questionnaires were obtained at intake for 1294 patients. Of these, 695 patients entered treatment and 531 (74%) had complete forms at intake and follow-up. Using this sample, we analyzed the data to determine effect sizes, rates of reliable improvement and deterioration, and rates of clinically significant improvement. Findings highlighted the utility of these approaches for evaluating treatment outcomes. Further suggestions for improving outcome measurement and evaluation are provided.
Keywords
Introduction
The problems inherent in identifying and quantifying psychiatric symptoms have hampered outcomes research in the mental health field (Lundh, Kowalski, Sundberg, Gumpert & Landén, 2010; Shrout, 1998). For most disorders, psychiatry cannot use biological markers or physical indicators (like blood pressure or BMI) to assess treatment outcomes (Blais et al, 2011; Iyer, Rothmann, Vogler & Spaulding, 2005). Instead, mental health professionals usually rely on subjective judgments for monitoring patient progress. Facing increasing pressure from insurers and governmental agencies to evaluate treatment more rigorously, however, the mental health field has begun to develop methodologies for systematically assessing outcomes. The most common approach has been to use standardized questionnaires or rating scales (Gold et al., 2009; Murphy et al., 2011); this methodology is now sufficiently developed to permit tracking of outcomes over time and among various patient populations (Weisz et al., 2011; Wu, Snyder, Clancy & Steinwachs, 2010).
In child psychiatry, patient-report and parent-report measures have long been used as valid, reliable indicators of behavior problems, personality, and psychopathology (Smith, 2007). While some large-scale child psychiatry outcome measurement projects like the ones now in place in Sweden and the United Kingdom (Lundh et al., 2010) have relied solely on clinician report (e.g., the Children’s Global Assessment Scale) (Shaffer et al., 1983), obtaining a parent’s assessment adds important information since these ratings lessen the risk of clinician bias (Eisen, Wilcox, Leff, Schaefer & Culhane, 1999). In addition, using a parent-rated measure in clinical outcomes assessment invites parents to be shared members of the decision-making process and more actively involves them in the care of their children. Parent-rating scales can also provide clinicians with quantified information about the child’s functioning across a wide range of environments. For older children and teens, youth self-report data may serve similar functions. For many of the most widely used parent-report measures, like the Child Behavior Checklist (Rescorla et al., 2012) or Pediatric Symptom Checklist (Gall, Pagano, Desmond, Perrin & Murphy, 2000; Pagano, Cassidy, Little, Murphy & Jellinek, 2000), youth self-report versions of the form can be used as an adjunct to or instead of a parent-report form. In either case, the aim is to take into account the perspective of the parent, child, or another who knows the child well in order to obtain a more comprehensive understanding of the child’s experiences and functioning.
To date, most activity in the area of outcome measurement in psychiatry has focused on adult patients. The few pediatric outcome research projects that have been conducted have focused mostly on specific diseases or chronic conditions (Forrest, Shipman, Dougherty & Miller, 2003), a focus which has been called ‘narrow-band’ (as opposed to ‘broad-band’ or global with regard to the types of problems assessed) (O’Connell, Boat & Warner, 2009). Relatively little research has been conducted on child and adolescent mental health outcomes or the change processes seen in routine care (Hamilton & Bickman, 2008; Warren, Nelson, Mondragon, Baldwin & Burlingame, 2010; Weisz et al., 2011) and even less on broad-band measures of functioning (Murphy et al., 2012). As health reform increasingly focuses on outcomes, there is a need for studies looking into the effectiveness of child mental health treatment and for data analytic strategies that allow for rigorous and relevant evaluations of pediatric mental health outcome measures, as well as of outcomes themselves (World Health Organization, 1992).
Weisz and colleagues have suggested that the most valid answers to questions about youth treatment outcomes and change processes are not likely to come from controlled laboratory studies, but rather from an examination of these findings in real world service settings (Weisz, Doss & Hawley, 2005). Therefore, much of the work of outcome research is best performed in actual clinical practice, where researchers can observe diverse and heterogeneous patients being treated under usual conditions (Krumhoz, 2008). This entails real-time research and responsive data collection systems as well as valid and reliable outcome measures, and it is made easier by the availability of electronic data collection and storage technologies (Murphy et al., 2012).
An important consideration in evaluating outcome data is that change in group-level data may not translate into meaningful results at an individual patient level (Jones, 2002; World Health Organization, 1992). As such, proponents of evidence-based practice encourage defining change on the basis of individual criteria and classifying each patient in terms of success and failure (Dalton & Keating, 2000; Furukawa, Guyatt & Griffith, 2002; Walter, 2001). The use of such clinical thresholds has the benefit of evaluating whether a treatment has had a clinically worthwhile effect for a particular patient and whether it resulted in a greater improvement than no care at all (Jones, 2002).
The ‘reliable change index’ and ‘clinically significant change’ thresholds have been proposed as methods that are both relevant and rigorous for evaluating clinical effectiveness on an individual case level. According to Warren et al. (2010), evaluating youth outcomes with a focus on reliable change (RC) and clinically significant improvement (CSI) provides a useful supplement to traditional methods of examining average group response to treatment (Warren et al., 2010). Jacobson and colleagues defined the RC index as the minimal amount of change necessary to exceed measurement error and to ensure that pre–post differences are not due to chance. The statistical measure of CSI then encompasses two goals: the patient experiences reliable improvement according to the RC index score and his/her end score is outside of the range of clinical severity (more likely to fall within the non-disorder distribution) (Jacobson, Follette & Revenstorf, 1984; Jacobson, Roberts, Berns & McGlinchey, 1999; Jacobson & Truax, 1991).
The present study applies RC and CSI, as well as the already widely used metric of effect size, to evaluate outcomes among pediatric patients receiving treatment in a large outpatient child psychiatry clinic. In part, this study replicates with child patients the analytic methodology employed in a similar recent study of adult outpatients (Blais et al., 2011). To conduct these analyses, we calculated the RC index and the threshold for CSI for a widely used psychosocial questionnaire, the Pediatric Symptom Checklist (PSC), which is administered at three-month intervals as a routine part of care in a large outpatient child psychiatry clinic. We characterized changes in these outcome scores by determining the rates of reliable improvement or deterioration and then the percentage of subjects who achieved CSI from intake to first follow-up. A detailed description of this hospital-wide outcome measurement program is reported in two previous papers (Murphy et al., 2011, 2012).
Methods
Data collection
The collection of outcome forms at intake and every three months thereafter has been a requirement for all outpatients receiving psychiatric services at Massachusetts General Hospital (MGH) since July 2005 (Gold et al., 2009). For children, the form consists of two clinician-report measures, the Children’s Global Assessment Scale (CGAS) (Shaffer et al., 1983) and the Brief Psychiatric Rating Scale for Children (BPRS-C) (Overall & Pfefferbaum, 1982), and one parent-report measure, the PSC. All three forms are in the public domain and can be used without charge. In August 2007, the administration of the forms became electronic when digital pens were first used to collect questionnaire data and, in 2009, form data began to flow automatically into the electronic medical record (Murphy et al., 2011). Institutional Review Board approval was obtained to create a de-identified data repository for the electronic outcomes data used in this study.
The PSC was the main outcome measure used for the current study. It is a 35-item parent-report questionnaire assessing child psychiatric symptoms using a Likert scale rating of 0 = never, 1 = sometimes, and 2 = often. As a broad-band measure of overall functioning, total scores of 0–70 can be used to assess changes in functioning over time and, recoded using established cutoffs, as a categorical measure of impairment or non-impairment.
The PSC has been used extensively in surveys of child psychosocial functioning in a wide range of pediatric settings from general pediatrics to child psychiatry clinics to schools. The sensitivity to change for the PSC has been established in randomized controlled trials (RCTs) of psychiatric interventions (Stein et al., 2003), naturalistic follow-up studies of psychiatric treatment over three and six months (Murphy et al., 2011, 2012), and educational and public health programs (Hacker, Williams, Mysgmarjav, Cabral & Murphy, 2009; Kleinman et al., 2002; Murphy et al., 1998). In addition to its global or broad-band uses, the PSC contains three validated subscales for attention, externalizing, and internalizing problems which allow it to be used as a narrow-band measure for these areas as well (Gardner, Lucas, Kolko & Campo, 2007; Gardner et al., 1999).
Participants
For the current study, from the start of the program on 1 August 2007 until the end of data collection on 30 June 2012, 1910 patients less than 18 years of age arrived for intake appointments in the MGH Child and Adolescent Psychiatry Department. Although the PSC can be used with children as young as three, the cutoff score is lower for pre-schoolers (Jellinek et al., 1999; Murphy et al., 2012), so a decision was made for the current study to focus exclusively on patients in the primary and secondary school age range (6–17 years at intake)—therefore, 218 patients (11%) were excluded because they were less than six years of age, leaving 1692 patients. Of these, 1602 (95%) had a complete clinician form and 1294 (76% of total forms and 81% of clinician forms) had a complete parent form as well. This sample of 1294 child outpatients, aged 6–17 years and entering psychiatric treatment, constituted the full sample for this paper. Since less than half of all patients seen for intake entered treatment and had complete forms at three-month follow-up, Table 1 contrasts the 763 patients who had data at intake only with the 531 patients with intake and follow-up data in our primary analytic sample.
Demographics and baseline data for patients with and without three-month follow-up data.
Groups are mutually exclusive so as to allow for statistical comparisons between samples. Independent t-tests were used to evaluate continuous variables and chi-square tests were used to evaluate dichotomous variables.
p < 0.05.
p < 0.01.
N: number of participants; PSC: Pediatric Symptom Checklist; BPRS-C: Brief Psychiatric Rating Scale for Children; CGAS: Children’s Global Assessment Scale.
The diagnoses used in the study were clinically generated. Treating clinicians were required to enter one and no more than two DSM-IV diagnostic codes on each outcome form. We recoded the diagnoses from patients’ intake forms into five general categories: depressive disorders, anxiety disorders, attention disorders, bipolar disorders, and other disorders (all remaining conditions). Since having multiple diagnoses has been used as an indicator of greater psychiatric severity, we used the presence of two diagnostic codes on the intake appointment outcome form to indicate the presence of comorbid disorders to control for complexity. The type of treatment (psychotherapy, pharmacotherapy, or both) that a patient was receiving at the beginning of care was also indicated by the clinician on the intake form.
Data reduction
As noted above, of the 1294 child outpatients with initial ratings, we ascertained that 763 (59%) had data at intake only and 531 (41%) had at least one follow-up rating. It is important to note that many children who are seen for intake in this system do not actually enter treatment, since many patients are seen for consultation only or drop out after one or two sessions. By policy, follow-up forms are not required for patients who come for fewer than four visits. Using this definition, only 695 (54%) of the 1294 patients entered treatment and the 531 follow-up forms obtained represent 76% of all expected forms.
In order to calculate the percentage of cases meeting the criteria for CSI, on the PSC broad-band scale (Jellinek, Murphy & Burns, 1986; Kelleher et al., 1997) we used the established cutoff scores of 28 or higher to indicate categorical impairment. On the PSC narrow-band scales, we used the cutoff scores established in a large, nationally representative sample of pediatric outpatients: scores of 7 or higher indicated impairment on the attention and externalizing subscales and scores of 5 or higher indicated impairment on the internalizing subscale (Gardner et al., 1999). Conducting such analyses is important since excluding patients based on a predetermined level (minimum) of symptom severity is a common practice in RCT studies and using this approach with the current data would allow us to compare our results with those of RCT studies. In addition, initial symptom severity has been shown to be highly correlated with the magnitude of treatment benefit achieved in RCT, with the more severely ill patients showing greater benefit (Fournier et al., 2010). Therefore, only patients with three-month follow-up data and PSC scores above the cutoff for positive case status (>27) at their intake appointment were included in the CSI analyses.
As noted earlier, in this system follow-up ratings are collected at three-month intervals after the initial rating for as long as the patient remains in outpatient treatment. The number of follow-up ratings per patient ranged from 1 to 15 with a mean of 3.21 (SD = 2.59; mode = 1; median = 2) in our primary analytic sample. Based on the mean number of follow-up ratings, we estimated that patients in this ‘entered treatment sample’ received on average (approximately) 41 weeks of active treatment, with a range of three months to almost four years. The average number of follow-up appointments did not differ significantly among various diagnostic categories, comorbid conditions, or among patients receiving psychotherapy, pharmacotherapy, or combined treatment.
For pre-treatment assessment, only patients whose first form was listed as an initial rating were included in the outcomes analysis. For the post-treatment assessment, we used patients’ first follow-up rating from approximately three months into treatment. Descriptive statistics were used to summarize demographic information as well as baseline and follow-up scores for each measure.
Outcomes effect size calculation
Treatment effect size (ES) estimates for PSC scores were calculated in units of standardized pre- and post-mean change following the basic meta-analytic procedures (Becker, 1988; Minami, Serlin, Wampold, Kircher & Brown, 2008; Morris, 2000). As suggested by Minami et al. (2008), the initial SD was used for standardization rather than a pooled SD. ESs are presented as Cohen’s d (Cohen, 1988) computed for repeated measures and adjusted for the pre–post measure correlation.
RC index and CSI
In addition to ES, we also calculated rates of reliable change using the RC index and CSI for the PSC data. The RC index represents the minimal amount of change unlikely to be accounted for by statistical fluctuation or chance (Jacobson & Truax, 1991; Jacobson et al., 1984, 1999). We used the total scores of patients who were PSC cases at intake in our clinical sample to obtain the standard deviation of the initial PSC score (6.59) and found that employing the Jacobson and Truax (1991) formula, the RC index for the PSC Global total score is six points. Therefore, a pre–post change in PSC scores of six points or greater (in either direction) reflects real (psychometrically reliable) change (p < 0.05, two-tailed).
CSI is the most rigorous measure of treatment improvement used in this study. To achieve CSI a patient must have: 1) an initial PSC global score in the clinical range (> 27); 2) reliable change (–6 points); and 3) a follow-up PSC global score below the clinical cutoff (< 28). Patients who achieve CSI have responded both positively and significantly to treatment and have also reached a level of psychological health similar to that of non-patients.
Results
Table 1 provides background information for the 1294 patients in the study with valid data: the 763 patients who had only an initial rating form and the 531 patients who had an initial form plus at least one valid longitudinal follow-up form. As Table 1 shows, the only background characteristic on which the two subsamples differed significantly was that the patients with only intake data were somewhat older (12.11 years vs. 11.44 years) than those with longitudinal data; F(1, 1293) = 11.38, p < 0.05. The longitudinal sample consisted of 55% males and 45% females. A large majority of patients had commercial insurance, while 15% had Medicaid. Pharmacotherapy was the most common treatment type (48%), followed by psychotherapy only (37%), and then pharmacotherapy and psychotherapy combined (14%). About one-fourth of patients had a significant medical condition, and 30% had two reported psychiatric diagnoses while 70% had only a single diagnosis given.
As shown in the final three rows of Table 1, in terms of overall functioning, on two of the three measures patients who entered treatment and had follow-up data were functioning more poorly at intake compared with patients who did not enter treatment or have follow-up data. At intake, patients who went on to enter treatment had significantly lower scores on the clinician-completed CGAS (57.02 vs. 58.38; F(1, 1236) = 8.87, p < 0.01) and significantly higher scores on the clinician-completed BPRS-C (19.75 vs. 18.19; F(1, 1293) = 5.25, p < 0.05). These differences were, however, small in magnitude with effect sizes of d = 0.17 and d = 0.13, respectively. Parent-rated PSC scores did not differ significantly between patients who entered treatment and those who did not. Therefore, despite the presence of some statistically significant differences, the subsamples of patients who entered treatment and those who did not appeared to be relatively similar across demographic and functioning variables.
Table 2 presents the intake and follow-up PSC scores and effect sizes for subsamples of cases. In the full longitudinal sample, PSC Global total scores improved significantly from the intake (M = 25.70; SD = 11.37) to the three-month follow-up (M= 21.92; SD = 11.90); t(530) = 5.98, p < 0.001. The magnitude of the improvement showed a small effect size with Cohen’s d = 0.26.
Initial and three-month follow-up PSC scores by diagnosis for patients with pre/post data.
All analyses were paired t-tests. All t values are absolute values.
p < 0.001.
N: number of participants; PSC: Pediatric Symptom Checklist.
We then looked at the data for two higher-risk subsamples. As the second row in Table 2 shows, 231 patients (44% of the longitudinal treatment sample) had PSC scores above the cutoff score of 27 at intake and for these patients the improvement was greater than for those in the full sample, decreasing from a mean score of 36.28 (SD = 6.35) at intake to 29.48 (SD = 10.22) at follow-up; t(230) = 11.26, p < 0.001 with a large effect size indicated by Cohen’s d = 0.80. The third row in Table 2 shows that 356 patients (67% of the longitudinal treatment sample) had been categorized as at high risk because they had scores above the cutoff on at least one of the PSC subscales. The improvement for these patients was also greater than for those in the full sample but less than for patients in the PSC Global risk group, decreasing from a mean score of 31.01 (SD = 9.08) at intake to 26.12 (SD = 10.83) at follow-up; t(355) = 9.82, p < 0.001, with a moderate effect size indicated by Cohen’s d = 0.53. Table 2 also shows that improvements on individual subscale scores were large for the externalizing (d = 0.85) and attention (d = 0.82) subscales, moderate for the internalizing subscale (d = 0.77), and statistically significant (p < 0.001) for all three.
To ascertain whether there were differential rates of improvement based on diagnosis, disorder comorbidity, or type of care received, we compared improvement from intake to three-month follow-up between patients with comorbid versus single disorders, among diagnostic categories, and among treatment types for PSC Global cases at intake. Results (not shown in table) indicated that patients in all of these subgroups improved significantly (p < 0.001) and with large effect sizes, but that change scores did not differ significantly among the groups for each variable.
Table 3 shows rates of reliable deterioration, reliable improvement, and CSI on the PSC Global scale at the three-month follow-up. Once again, the first row of Table 3 presents data on the full longitudinal sample of patients treated in child psychiatry and shows the reliable deterioration rate of 15.6% compared with the reliable improvement rate of 32.8%. Row two shows that among patients who scored positive on the PSC Global scale at intake, there was a reliable deterioration rate of 6.1% and a reliable improvement rate of 51.1%. Of the 118 patients who showed reliable improvement, 35.9% achieved clinically significant improvement after three months of treatment. The last row of Table 3 presents data for the subsample of patients who scored positive on at least one PSC subscale at intake. Of these patients, 50.3% reliably improved while 11.8% reliably deteriorated on the PSC Global after three months. Rates of reliable improvement, reliable deterioration, and CSI did not differ significantly within the longitudinal treatment sample based on disorder comorbidity, diagnostic category, or treatment type (not shown in table).
Percentage of patients with each level of reliable change at three-month follow-up on the PSC by disorder.
Patients with an initial PSC score in the clinical range (> 27), reliable improvement (went down six points on the PSC), and a follow-up PSC score below 28.
N: number of participants; CSI: clinically significant improvement; PSC: Pediatric Symptom Checklist.
Lastly, we assessed criterion validity by comparing mean change scores on the BPRS-C and CGAS with the reliable deterioration, ‘no reliable change,’ reliable improvement, and CSI subgroups on the PSC. The top half of Table 4 shows three-month outcomes for patients who scored at risk on the PSC Global scale at intake. On average, patients who reliably deteriorated on the PSC over three months also worsened in functioning according to their increased BPRS-C scores. Patients who did not experience either positive or negative reliable change on the PSC had slightly decreased BPRS-C scores, while those who reliably improved on the PSC similarly demonstrated the most improved functioning (greatest score decreases) according to the BPRS-C. Post hoc analyses showed that these rates differed significantly among all subgroups; F(2, 223) = 13.35, p < 0.001. CGAS scores improved among all subgroups of PSC Global cases at intake, increasing (improved functioning) least among patients who did not reliably change on the PSC, slightly more among patients who reliably deteriorated, and most among patients who reliably improved. Post hoc analyses showed significantly different improvement rates only between the reliable improvement and ‘no reliable change’ subgroups; F(1, 192) = 14.52, p < 0.001.
Comparison of change scores on the BPRS-C and CGAS with levels of reliable change on the PSC at three-month follow-up.
p < 0.001.
p < 0.01.
Patients with an initial PSC score in the clinical range (> 27), reliable improvement (went down six points on the PSC), and a follow-up PSC score below 28.
N: number of participants; PSC: Pediatric Symptom Checklist; CSI: clinically significant improvement; BPRS-C: Brief Psychiatric Rating Scale for Children; CGAS: Children’s Global Assessment Scale.
As shown in the bottom half of Table 4, among patients who were cases on at least one PSC subscale at intake, BPRS-C scores increased (functioning worsened) among patients who reliably deteriorated on the PSC, decreased (functioning improved) slightly among patients who did not reliably change, and decreased most among patients who reliably improved on the PSC. Rates of change differed significantly among all subgroups; F(2, 343) = 13.74, p < 0.001. On the CGAS, post hoc analyses indicated significantly greater improvement for ‘subscale positive’ patients who reliably improved on the PSC than for patients who reliably deteriorated, F(1, 197) = 5.66, p < 0.05, or did not reliably change; F(1, 286) = 11.42, p = 0.001.
The final column of Table 4 shows that, as hypothesized, patients who achieved CSI on the PSC demonstrated the most progress on both clinician-rated scales with an average decrease of −8.97 (SD = 10.72) on the BPRS-C and an average increase of 5.66 (SD = 8.64) on the CGAS. Here, BPRS-C and CGAS outcomes were the same for the ‘PSC Global positive’ and ‘PSC subscale positive’ groups because the criteria for CSI specified that one’s initial PSC Global score was in the clinical range (>27); thus, we analyzed outcomes for the same sample within both groups.
Discussion
The present study uses common (effect size) and more recently recommended (RC and CSI) metrics to examine the impact of outcomes for psychotherapy and psychopharmacology provided to a broad spectrum of child psychiatry outpatients under real world clinical conditions, as measured by a widely used parent-report measure. In terms of effect size, over the first three months of care we found a large positive impact (d = 0.80) of treatment on PSC scores for patients who were classified as the most highly impaired at intake. In terms of newer metrics, over the same period of time 51.1% of the most highly impaired patients had reliably improved, compared with 6.1% who had reliably deteriorated. About two-thirds of patients who were PSC positive at intake (35.9% of all treated patients) had achieved CSI. These rates did not differ significantly across diagnostic groups and treatment modalities.
This naturalistic study of child psychiatry outcomes using a brief standardized parent-report measure produced results that were similar to results reported with adult outpatients in the same hospital system using a brief adult instrument (Blais et al., 2011). While not directly comparable due to different measurement tools and populations, the Blais et al. (2011) study does offer some perspective on the current findings. That study found that usual care in that program resulted in a moderate amount of improvement in psychological functioning with an effect size of d = 0.50, a reliable improvement rate of 38%, a reliable deterioration rate of 12%, and a clinically significant improvement rate of 19%.
Since the findings from the current study are even larger on all three metrics, they suggest that RC index and CSI may provide a valuable and viable supplement to effect size calculations in child as well as in adult psychiatry samples. The current study also suggested that although effect size continues to be a useful metric for assessing the impact of child psychiatric treatment in populations, use of the RC and CSI metrics provides more specific information about the impact of usual care on individual patients by showing the actual percentage who reliably and significantly improved or deteriorated.
Results from the RC and CSI analyses generally aligned with those from our effect size calculations, since a majority of high-risk patients showed statistically meaningful improvement and a large percentage of them had scores no longer indicating psychosocial risk three months into treatment. Within the effectiveness framework, more than one-third of the patients in this child psychiatry clinic could be classified as having recovered from their current episode of psychiatric illness within the first three months of treatment.
This proposed use of RC and CSI may be even more valuable in its ability to identify patients who did not improve meaningfully or who worsened considerably during the first three months of treatment. Since it is generally accepted that some patients get worse while in treatment and that others may require treatment for many months or even years before their illnesses remit, being able to identify these groups of patients by tracking outcomes routinely could become an important avenue for future research (Harmon et al., 2007; Lambert, 2007; World Health Organization, 1992).
The present study has a number of limitations that may impact the value and generalizability of the findings. Among these is the percentage of cases (59%) with missing data at follow-up. As noted earlier, although we had follow-up data on only 41% of patients with intake forms, the true rate of follow-up was 76% since only 54% of all patients actually started treatment and only these patients were expected to have follow-up forms.
Relatively low rates of follow-up are the rule rather than the exception in real world samples and the rates obtained here actually compare favorably. For example, Minami et al. (2008) reported a follow-up assessment rate of 44% in a study exploring the effectiveness of depression treatment for a large managed care sample. Brown, Burlingame, Lambert, Jones and Vaccaro (2001) reported a lower follow-up rate (approximately 27%) in an outcomes study conducted with data from a large managed care company data repository. Castonguay et al. (2010) achieved only a 31% follow-up rate in a study that was specifically designed to anticipate post-treatment data collection problems. Most recently, in their study of adult patients at our hospital, Blais et al. (2011) reported a follow-up rate of 28%.
One factor that clearly lowered our follow-up rates in this and the earlier adult outpatient study (Blais et al., 2011) was the rather long (13-week) interval between the initial assessment and follow-up in our system. This interval was chosen by departmental leaders when the outcome rating system was created based on administrative and insurance company requirements and concerns about clinician burden, without regard to research design. In retrospect, it seems likely that the choice of this interval made obtaining higher follow-up rates more difficult. Early withdrawal from mental health treatment is a common problem and it has been estimated that as many as 65% of patients leave psychotherapy before the 10th visit (Barrett, Chua, Crits-Christoph, Gibbons & Thompson, 2008). Therefore, our follow-up interval may have been ill-matched to known dropout patterns. As noted above, in addition to attrition, many patients are seen at our clinic specifically for treatment consultation with no expectation of continuing on with care. Follow-up data on these patients would not be captured regardless of the time interval employed. Future treatment effectiveness studies could consider employing briefer follow-up intervals to improve capture rates.
Children who left treatment before three months may also have been more impaired than those who continued with care, as indicated by their significantly higher (worse) clinician-rated BPRS-C scores at intake (shown in Table 1). It is possible that with data missing on some of our clinic’s more impaired children the results reported here somewhat underestimate the impact of treatment. On the other hand, the fact that intake PSC scores were not significantly different for the two groups and those who lacked follow-up data actually had significantly better global functioning at intake according to their CGAS scores suggests that the differences in initial functioning between patients who left treatment and those who continued were not overwhelmingly large either way and thus likely not a major factor in explaining the observed differences.
Furthermore, it is important to keep in mind that while our study was conducted in a naturalistic setting, this setting was an academic medical center. Academic medical centers may vary in unknown but meaningful ways from other more typical mental health treatment facilities, thereby limiting the generalizability of these findings.
Finally, we lacked specific information about how patients in our sample were treated for various disorders (e.g., types of medications and dosages, augmentation, or type of individual therapy), which may also affect the generalizability of our results to other child practices.
Summary
The present study provides further evidence that outcome measurement is feasible in routine clinical practice using brief, standardized instruments that are available in the public domain. Further, findings show that the application of common effectiveness and newly recommended analyses can enhance the value of data obtained from such routine outcome measurement. For example, the current study suggests that although treatment as usual appears to be, on average, beneficial (d = 0.80), reliable improvement was observed for only 51.1% of the PSC cases at three months, and 6.1% of patients who were severely impaired actually reliably deteriorated. Together, the present findings and those of Blais et al. (2011) begin to provide a model for analyzing and reporting usual care data in a manner that is consistent and comparable. We believe that the accumulation of similar data from other real world treatment centers will eventually allow us to compare the impact of treatments over time and across systems. We encourage other clinical researchers to adopt this methodology for evaluating real world treatments and help build the comparative empirical database needed to achieve these goals.
Footnotes
Acknowledgements
The continuing support of Department of Psychiatry at Massachusetts General Hospital made the current data analysis and paper writing possible.
Funding
This work was supported by the Fuss Family Fund (JMM; to Newton Wellesley Hospital).
