Abstract
Keywords
Though a significant number of graduate trainees in clinical social work and professional psychology offer outpatient psychotherapy in interdisciplinary settings such as community agencies, university counseling centers and training clinics, less is known about the effectiveness of the services those in training offer. Among studies examining outcomes, researchers have pointed to the relative effectiveness of graduate trainees as providers (Dyason, Shanley, Hawkins, Morrissey, & Lambert, 2019; Whipple et al., 2003; Öst, Karlstedt, Widen, 2012). Related studies have focused on trainee development, or have assessed success from the perspective of trainees, using and encouraging the use of student self-assessment measures (Ducharme, Rober, & Wharff, 2015). The majority of existing empirical studies focusing on client outcomes have relied on relatively small samples, with Dyason et al.’s 2019 systematic review noting 50% of published studies having samples fewer than 99 client participants. The minority of these studies have used client-administered measures to evaluate the extent to which clients in these clinical relationships improve, making statistically significant and clinically meaningful change.
This study sought to address this gap: to explore outcomes for the clients of interdisciplinary trainees (including both psychologists and social workers in training), using a larger than average sample size (N = 421), an empirical measure (the OQ 45.2), and a repeated measures design, following change on a weekly session-by-session basis, something similarly rare in the literature. We chose an interdisciplinary setting as one closely approximating the kind of outpatient settings in which many to most outpatient mental health providers go on to practice.
Literature Review
Psychotherapy’s Effectiveness
At least 60 years of research support the conclusion that psychotherapy is an effective and often powerful intervention (Asay & Lambert, 1999; Layard & Clark, 2015). Hubble et al.’s (1999, 2013) The heart and soul of change and Wampold and Imel’s (2015) The great psychotherapy debate speak to the broad research consensus that psychotherapy is an effective intervention for a wide variety of mental health conditions and problems in living. Asay and Lambert (1999) described this in detail when they summarized over six decades of research leading to this conclusion. Those who participate and persist in psychotherapy, on average, benefit and improve (in relation to those not undertaking psychotherapy and in relation to their own baselines), with effect sizes often greater than 1.0 for manualized treatments delivered by expert clinicians, and with moderate to strong effects for clients in community settings with providers trained as generalists (Forand, Evans, Haglin, & Fishman, 2011; Shedler, 2010). Asay and Lambert (1999) noted that 50% of adults experience clinically significant change and symptom relief within 8–10 sessions and 75% of participants benefit after 26 or more sessions. Lambert went on to note that 22% of clients in outpatient settings recover (vs. improve) within 15–19 sessions, with significantly more making clinically significant improvement. Shedler (2010) found the associated effect sizes to be higher at a population level than many widely used medications. For conditions such as depressive disorders with more severity, combining psychotherapy with medication offers a particularly effective standard of care (Black & Andreasen, 2021; Cuijpers et al., 2020).
This conclusion is similarly supported by a large number of studies evaluating the specific outcomes associated with particular treatments for specific conditions (Levy et al., 2020). These studies often evaluate empirically supported treatments for specific mental health conditions. Representing the American Psychological Association’s Division 12, Chambliss et al. (2001) published a summary of specific therapeutic approaches matched to particular diagnoses, with attention to the strength of research evidence supporting each (https://div12.org/psychological-treatments/). The American Psychiatric Association has followed and both provide comprehensive lists on their respective websites. The empirically supported treatment movement with its attention to diagnostic specificity is not without controversy and has been critiqued by authors such as Tolin et al. (2015), who have proposed alternate models.
Systematic reviews and meta-analyses similarly exist for specific theoretical approaches to psychotherapy. As examples, Butler, Chapman, Forman, and Beck (2006) and Hoffman, Asnaani, Vonk, Sawyer, and Fang (2012) independently reviewed 16 and over a hundred meta-analyses, respectively, pointing to the effectiveness of cognitive behavioral therapy (CBT) for a variety of psychiatric conditions, with Hoffman et al. identifying support as “very strong” (p. 427). Behavioral and exposure-based interventions appear to be particularly powerful and important in the treatment of anxiety disorders and in conditions such as obsessive compulsive disorder (OCD). Shedler (2010) has similarly provided effect sizes (ES = .97 for general symptom improvement) for psychodynamic therapies applied to a broad variety of specific diagnoses. These psychotherapies have similarly been studied and have been found to be effective when offered in both individual and in group formats. As one example, a group intervention for social anxiety was studied by Kawaguchi et al. (2013) and the treatment was associated with an effect size of .77 among completers.
In contrast to some common misperceptions, improvement is generally shown to be durable. Follow-up studies have documented the maintenance and even improvement in initial gains (sometimes referred to as an incubation effect) from 4 to 14 years post-treatment in the case of particular psychotherapies in the treatment of post-traumatic stress disorder (PTSD). In the study for social anxiety noted above, results were maintained at 1 year follow-up. However, some chronic conditions such as substance use disorders, where effect sizes associated with cognitive behavior therapy have been found to be smaller (Hoffman et al., 2012) and recurrent major depressive episodes, similarly carry a risk of relapse.
Though estimates vary, the path of recovery within psychotherapy appears to be characterized by the majority of improvement often occurring in the first 13–18 sessions, followed by a then tapered improvement in subsequent sessions. Levy, Worden, and Davies (2020) note that while dose-response curve and recovery rates can vary by diagnosis, recovery rates in cognitive behavioral therapy can be even stronger and improvement earlier (i.e., a steeper slope) for clients with anxiety disorders.
Trainees’ Effectiveness in Providing Psychotherapy
A number of studies have addressed the question of whether graduate trainees are effective in reducing client distress and whether this effectiveness differs when compared to more experienced clinicians (Forand, Evans, Haglin, & Fishman, 2011; Hiltunen, Kocys, & Perrin-Wallqvist, 2013; James, Blackburn, Milne, & Reichfelt, 2001; Owen, Wampold, Kopta, Rousmaniere, & Miller, 2019; Paine et al., 2019).
The findings have been encouraging. Dyason et al. (2019) authored a systematic review, summarizing 257 studies of trainee-provided psychotherapy provided specifically in psychology training clinics across 12 countries and concluded that across these studies, graduate-level trainees often provided effective psychotherapies on par with those of more experienced clinicians. These authors nuanced their findings by noting that while trainees were often effective, offering a near-equivalent performance, these treatments could take longer to produce symptom improvement. Lambert (2005) has noted, in particular, the risk of these therapies going off track (that is, of losing focus and/or resulting in deterioration or premature attrition). Both studies point to the promise and potential of trainees as providers and yet also identify the need for careful monitoring and support of those in training, both by way of (1) supervision, and (2) the use of feedback measures to help identify clients who might be becoming more symptomatic and/or at risk of ending therapy prematurely. Duncan (2010) has similarly concluded in his own research using one such measure (the partners for change outcome management system or PCOMS, partly derived from the OQ 45.2) along with Budge et al. (2013) that recent graduates sometimes outperform more experienced clinicians and that more longevity in practice is not necessarily synonymous with increased effectiveness on its own. Duncan (2010), Lambert (2007), and Miller and Moyers (2021) have all pointed to the range of effectiveness (i.e., clinical competence) across samples of community clinicians, when measured by client improvement, and when using formal measures of symptom improvement.
Additionally, students completing graduate training have been shown to be effective in helping clients experience symptom reduction and improvement in functioning utilizing specific therapy models including cognitive behavioral therapy (Forand et al., 2011; Hitunen et al., 2013; James et al., 2001) as well as when using psychodynamic approaches (Paine et al., 2019). Some of these studies have identified improved outcomes specifically for doctoral-level psychology trainees (Budge, Owen, & Kopta, 2013; Owen et al., 2019) particularly when applying a phase model of supervision. One representative meta-analysis by Stein and Lambert (1995), found a “modest but fairly consistent treatment effect size associated with training level” related to patient outcomes and symptom reduction (p. 192). This finding is also consistent with Forand et al.’s (2011) research. In contrast to Stein and Lamberts’ (1995) findings, and although focusing on a different clinical modality, Paine et al. (2019) found the training level of the clinician was not significant in relation to client outcomes when offering psychodynamic psychotherapy in a community mental health clinic.
To date, the literature suggests clients can improve with trainees as primary providers, regardless of initial symptom severity (Budge et al., 2013; Hiltunen et al., 2013; James et al., 2001). Forand et al. (2011) serves as a particularly good point of comparison with the present study, in that their study followed the improvement of 247 clients being treated by a sample comprised entirely of graduate trainees. Forand et al. similarly examined practice in a community setting, with trainees practicing open-ended and non-manualized psychotherapy. Their findings suggested that while clients with greater initial symptom severity were more likely to reliably improve than those with less, those with more initial symptom severity were less likely to recover. The distinction between improvement and recovery is one the OQ 45.2 makes as well. In the case of depressive disorders, more experienced clinicians outperformed trainees in relation to facilitating recovery. Despite this potential limitation, adult prevalence rates suggest that the largest number of depressive episodes at a population level are mild rather than severe (Layard & Clark, 2015).
Trainees’ Effectiveness in Delivering Dialectical Behavior Therapy
Dialectical behavior therapy (DBT) is a unique therapeutic intervention developed by Linehan in the 1970’s and 1980’s to address the need for a rigorous treatment approach to treat clients with particular symptoms presentations, specifically (1) chronic suicidality and then (2) borderline personality disorder (Linehan, 1993). Adherent DBT practices are implemented through group skills therapy, individual DBT therapy, and access to a DBT therapist via phone coaching.
Dialectical behavior therapy is considered a comprehensive treatment approach and has been shown to be effective in treating participants with a variety of diagnoses since its origins in treating suicidality and people with borderline personality disorder (Neacsiu, Ward-Ciesielski, & Linehan, 2012). Examples of such expansion include using DBT to treat substance use disorders, PTSD, non-suicidal self-injurious behaviors, bipolar and eating disorders among adolescents (Goldstein, Axelson, Birmaher, & Brent, 2007; Linehan, 2015; Neacsiu et al., 2012; Salbach-Andrae, Bohnekamp, Pfeiffer, & Lehmkuhl; 2008).
Most peer-reviewed, published research evaluating DBT utilizes highly trained clinicians and authors such as Stein and Lambert (1995) have called for the evaluation of these treatments in community practice. While less research examines trainees’ ability to conduct and implement DBT, one recent study provides insight into the potential effectiveness of DBT with graduate trainees. Specifically, when following 15 doctoral students in a graduate school setting implementing DBT services for 50 participants (final n = 34) over 6 months, Rizvi, Hughes, Hittman, and Vieira Oliverira (2017) found that clients improved with a statistically significant decrease in suicidal ideation and non-suicidal self-injurious behaviors. Additionally, outcomes were compared with those from providers in a randomized controlled trial for DBT effectiveness and were found to be “comparable in effect size” (p. 1608).
The Risk of Client Attrition
Several studies have looked at the role of attrition as a risk in psychotherapy. Attrition, in the literature, is generally defined as a client prematurely ending psychotherapy, or terminating before achieving an optimal or desirable response (Cooper, 2008). Researchers have calculated estimates of a necessary number of sessions broadly for a therapeutic response where a majority of clients in outpatient psychotherapy will likely make clinically meaningful change, when using measures such as the Outcome Questionnaire (OQ 45.2). Though estimates vary, the number is typically in the range of 13–18, with 12.7 sessions serving as a common mean (Hansen, Lambert, & Forman, 2002). In contrast to this, other studies suggest that in many settings, clients receive far fewer sessions. In 2008, Cully, Tolpin, Henderson et al. reviewed Medicaid data for more than 410,000 veterans diagnosed with depression, anxiety or PTSD and who were receiving outpatient mental health services through the Veterans Administration (VA). The authors found that the mean number of sessions attended for these adults was five. Authors such as Duncan (2010) and Miller, Southam-Gerow, and Allin (2008), the co-creators of the Session Rating Scale (SRS) and the Outcome Rating Scale (ORS), have offered individual clinicians ways of calculating their individual mean number of sessions by which most of their own clients have improved or recovered as a way of anticipating and of benchmarking improvement with the goal of reducing client attrition. This can serve as a personal metric and as a way of helping clinicians gauge the likelihood of their clients improving by way of further sessions.
Among studies that have explicitly examined attrition, several variables have been identified as increasing its risk. These include, but are not limited to age, education and income, including whether or not a client is insured (Roseborough, McLeod, & Wright, 2016). In these examples, younger age, less formal education, and lower incomes (including lacking medical insurance) were all associated with a greater likelihood of ending services prematurely, as were specific diagnoses, including obsessive compulsive (OCD) and substance use disorders, when conducting a survival analysis (Roseborough et al., 2016). This is consistent with larger, multi-site studies such as the federally funded Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study on the pharmacologic treatment of depression, which found clients with greater social and economic challenges demonstrated an increased risk of ending psychotherapy (Rush, Trivedi, & Wisniewski, 2006). This is important in that the same authors noted that earlier terminations were associated with less likelihood of symptom remission.
Finally, clinicians can decrease the risk of attrition in ways other than tracking their own client improvement rates, noted above. Miller and Moyers (2021) share research findings from the national Project Match study that suggest clients who experience their clinicians as lacking accurate empathy are at increased risk of “poor outcomes” (p. 30), including attrition and deterioration. The authors note that such experiences correlate with relapse for clients with substance use disorders. Cooper (2008) identifies negative countertransference and excessive interpretation as additional risks for client attrition. A study by Roseborough et al. (2016) gave attention to process variables in relation to attrition: noting that clients who had recently improved or worsened were more likely to remain in psychotherapy, whereas those who had plateaued as a subgroup were more likely to attrit. While this may sound like a client variable per se, Miller and Moyers remind readers that “counseling and psychotherapy are inseparable from the people who provide them” (p. 3) and that, when studied, outcomes vary widely by clinician.
Trainees and the Risk of Client Attrition
Overall, estimates of attrition among community clinicians providing outpatient psychotherapy tend to be about 20% (Swift & Greenberg, 2012). While these rates tend to be lower in randomized clinical trials (RCT’s) when treatments are delivered by expert clinicians, Lambert (2012) and others have noted that attrition rates are even higher for community clinicians treating children and adolescents (e.g., when measured using the Y-OQ) and are often in the 20–25% range. Lambert has noted that graduate-level mental health trainees (e.g., psychology practicum and social work interns) show a greater variability in outcomes than more experienced staff clinicians. Lambert similarly found that while interns can provide effective services, they tend to do so with more variability, can take longer, and are more at risk of having psychotherapies that go “off course” than more experienced, often staff clinicians. Miller and Moyers (2021) offer hope, noting that the skills associated with reducing attrition can be fostered and taught to clinicians in training.
Trainees and the Risk of Client Deterioration
Lambert and Shimokawa (2011) identify the somewhat paradoxical finding that (1) a predictable percentage of clients in psychotherapy will, during the course of treatment, become more symptomatic with clinical significance and (2) the finding that mental health practitioners tend to be poor at predicting who among their clients will deteriorate. Lambert and Shimokawa noted that even when a client was deteriorating in their study (becoming more symptomatic in a statistically and clinically significant way), that clinicians failed to identify this in the majority of cases. Lambert (2012) concluded that while deterioration is a risk for all clinicians, for graduate trainees, approximately 20% of cases, when studied, go “go off-track” at some point, and points to the importance of supervision and of routine monitoring of client progress, in real time, as part of trainees providing psychotherapy.
How many clients predictably deteriorate? Among expert clinicians in psychotherapy trial studies, Lambert (2007) notes that this number is in the range of 10%. Among community providers, studies such as Swift and Greenberg (2012) have found that a larger percentage of clients deteriorate, often in the range of 18%. This percentage is even higher for children and adolescents (where deterioration rates range from 15 to 24%), when using assessment measures such as the Outcome Assessment (Y-OQ), normed for children ages 4–17 (Rousmaniere, 2013).
Lambert (2012) attributes this failure of clinicians to identify clients who are worsening to variables such as an investment in and sometimes unfounded optimism about clients improving that can serve as a sort of cognitive filter (preventing clinicians from seeing the reality of some clients deteriorating as a form of confirmation bias), to a sometimes too optimistic belief in one’s ability to help, and also to the failure of clinicians to use standardized assessment measures such as the OQ 45.2 to track client change in a more systematic and empirical way. Trainees may similarly not yet have a large number of clients as points of comparison. Assessment and feedback tools such as Duncan et al. (2007) PCOMS include measures of clinically significant change (including deterioration) and can relay these to clinicians in real-time practice. Lambert reports both that assessment tools such as the OQ 45.2 can assess more items to help hold a larger perspective than an individual clinician can (without time to ask 45 individual questions in-person at each session) and that using such measures can reduce deterioration with a mean effect size of .33 (Lambert & Shimokawa, 2011).
Both Lambert and Duncan (2010) speak to the importance of graduate trainees learning to actively look at what is working, and as or more importantly, at what is not working in their clinical approach. Therapists’ perceptions of progress often differ from evidence of improvement using supporting assessment tools and clients getting this kind of feedback have a 3.9 greater chance of clinically significant improvement than those not (Lambert & Shimokawa, 2011, p. 77). Such feedback effects can produce treatment effects even more powerful than differences in theoretical or treatment approaches used (Lambert, 2013; Wampold & Imel, 2015). Routine assessment and monitoring of treatment response (e.g., the OQ 45.2) more than doubles the chances of a positive client outcome and “substantially reduces deterioration” (Lambert, 2012, p. 109). Lambert noted that in his university setting using such measures decreased client deterioration from 20% to nearly 5% (Lambert, 2017). Such measures can signal sudden improvements (approaches to capitalize one) as well as sudden worsening. Finally, when studied, both adolescents and adults who are early responders (i.e., those with evidence of symptom improvement in the first initial sessions) tend to have a better ultimate progress and likelihood of a positive outcome.
Method
Setting and Intervention
Founded in 2003, the host clinic is interdisciplinary and is co-sponsored by the university’s law school, school of professional psychology and school of social work at a large, private Midwestern university. The center offers legal, counseling and case management services without cost to participants. Counseling services include outpatient psychotherapy and dialectical behavior therapy (DBT) delivered exclusively by supervised graduate trainees in counseling psychology. The setting similarly offers both practicums and APA-approved internships. While DBT is supervised and delivered exclusively by psychologists, psychotherapy (individual and group) is delivered by both graduate social work internship and psychology practicum students. The center offers limited specialized services such as testing for learning disorders, neuropsychological testing, and specialized interventions such as exposure and response prevention for the treatment of obsessive compulsive disorder (OCD). Student trainees receive regular individual and group supervision. Group supervision is interdisciplinary and takes the form of case conferences. The clinic’s theoretical orientation is integrative, giving attention to the role of evidence in approaching treatment. The clinic is led by three co-directors: a psychologist, social worker, and an attorney, all of whom serve as university faculty. While the clinic does not offer on-site medical or psychiatric consultation, interns consult with clients’ medical and psychiatric providers as part of an interdisciplinary team.
The clinic’s mission is to serve those who are unserved or underserved in more common fee-for-service mental health settings. Clients are primarily urban, adult residents and approximately one-quarter identify as people of color. Clients present with both acute and persistent mental health conditions. The clinic serves unemployed, uninsured, and underinsured clients. In concert with its legal and case management services, the clinic similarly serves a large number of clients seeking asylum and similar legal protection. Graduate students have a typical caseload of five to seven clients who are assigned after being triaged and assigned by clinic directors, who offer both individual and group supervision. Approximately 12–19 trainees provide direct services annually at the center. These include social work interns, and both practicum and internship students in professional psychology. The data set used reflects trainees and their clients receiving services between 2008 and 2013. The study was reviewed by and received IRB approval (# 1388399) from the university hosting this practice setting.
Measurement and Operationalization
The Outcome Questionnaire (OQ 45.2) is a 45-item, self-administered assessment of psychiatric functioning. The OQ provides an overall score between 0 and 180, which aggregates three more specific sub-scores embedded within it. The three subscales assess clients’ symptom distress (SD), social role functioning (SR), and perceived strength of interpersonal relationships (IR), where higher scores represent greater psychiatric distress and more impaired functioning. The measure’s overall score has established psychometric properties (Lambert, Gregersen, & Burlingame, 2004) with support for the concurrent validity of its overall and symptom distress scales in particular (Boswell, White, Sims, Harrist, & Romans, 2013) and less established support for the validity of its other subscales when using factor analysis (Arader, 2021). It has been normed on over 100,000 adult clients across settings (e.g., inpatient, outpatient, EAP and community samples) internationally. This has enabled the OQ to establish benchmarks including a clinical cut-off (a score of 63 or higher as a measure of caseness), measures of clinically reliable change (identifying improvement or deterioration, involving a change of approximately 14 points in either direction in relation to the total score), and recovery (a change of 14 points and a final score below 63). The OQ is available in several languages in both a paper and in an electronic, cloud-based tablet version, called OQ Analyst. This electronic version uses an algorithm to help clinicians to identify clients who are at risk of deteriorating and/or attriting. Both versions flag endorsements of individual high risk items (for instance, questions asking about substance, suicidality, and the risk of workplace violence). This clinic setting used the paper version at the time these data were collected (2008–2013) and has since moved to the online, cloud-based version.
Caseness
Sixty-eight percent (n = 217 of 317) of the sample, those 18 and older, with both a baseline and at least one subsequent OQ score, met the criterion for caseness at intake (that is, had a total OQ score at or above 63). A small percentage of participants had intake OQ-45 scores of 100 or greater, scores that are more typical of clients presenting with serious mental illness (SMI). A subset of the participants took part in DBT group (n = 52, or 16% of the total sample). Across services, participants received a range of 1–122 counseling sessions. Cases included for the purpose of this study included n = 317 clients with more than one session, in that this study focused on change over time, using a repeated measures design.
Effectiveness Research
This study’s design is consistent with and serves as an example of effectiveness research. A randomized experiment with participants assigned to treatments measures treatment efficacy. In efficacy studies, the objective is to ascertain whether a specific intervention caused an observed effect. An effectiveness study does not claim to establish causal inference. The objective of this study was to demonstrate how much change occurs in a natural clinical setting, using open-ended psychotherapies where the nature of the intervention is not necessarily prescribed in advance. Studies like this allow variability in treatment length, client presenting problems and treatment approaches in a way consistent with “intent to treat” designs (see Dang & Kaur, 2016; Rush et al., 2006).
Data Analysis
This study utilized a treatment trajectory model. A trajectory is the path a client takes from intake to termination. In the case of the OQ-45, the best trajectory is a “negative slope” because such a slope indicates an improvement and/or remission of symptoms. Treatment trajectories allow researchers to compare individual clients using a common metric. Authors such as Field (2008) have pointed to the value of linear mixed model analysis for repeated measures in that it is particularly well suited to data sets characterized by what he describes as unbalanced designs. These include data sets with a large amount of expected attrition and where there are not equal numbers of people represented at each time point. Field points to its benefit in longitudinal research and in research where participants may follow different trajectories (that is, may have different and distinct slopes, allowing for attention to subgroups or to populations within a larger treatment population). Models such as this can often more accurately map distinct trajectories in the form of unique intercepts (in this case starting points in the form of initial baseline OQ scores) and subsequent slopes (i.e., treatment trajectories).
The Linear Mixed Model
Variable length treatment presents a problem of missing data. However, the linear mixed model for repeated measures is well equipped to accommodate missing data. Because the mixed model is, in part, a unique form of regression (combining elements of ANOVA and regression), it can simultaneously estimate the effects of brief and extended treatment trajectories, paying particular attention to the proximity and direction of recent observations. The authors used the linear mixed model procedure offered in the Statistical Package for Social Sciences software (SPSS, version 26). We chose this to explore changes in Outcome Questionnaire (OQ 45.2) scores over time in a way that (1) accounts well for the kind of missing data and attrition that characterize these data, and (2) in order to explore some potential interaction effects in the form of demographic variables that might affect clients' initial intercept or subsequent slope (that is, where they began treatment and how they progressed).
H1. The primary hypothesis predicted that clients would tend to improve over time. This was based on findings from previous studies pointing to graduate trainees providing effective services noted by Dyason et al. (2019) and others.
Beyond H1 as our hypothesis, the subgroup analysis that followed was exploratory. We examined the extent to which participants improved using a “slope as outcome” approach, including for those in DBT. We also looked descriptively at trends in relation to attrition and deterioration. We did not make a priori hypotheses regarding these, but instead explored them as areas of interest and as points of comparison with the existing literature. The mixed model was used to address H1. A descriptive analysis followed and was used to explore specific treatment trajectories.
Results
The data set contained treatment histories for 421 adult psychotherapy participants. Cases were removed if they met any of three conditions: an age below 18, if the age of the participant was missing, or where only the initial (intake) psychotherapy session was recorded for a participant. The age restriction was made, in part, as a protection built into the IRB design. The number of sessions filter was imposed because statistical analysis of trajectories requires time-series data with at least a pre- and post-data point. This three-step filter resulted in 317 useable records on which the subsequent analyses were performed.
Sample and Participant Characteristics
While the OQ 45.2 offers three component subscales, only the total OQ-45 scores were used, in part due to its more established validity. Total scores at or above 63 on the OQ-45 were interpreted as clinically elevated. The intake OQ-45 scores in this sample were on average above that cut-off (M = 74.78, SD = 26.20, 95% CI: [71.89, 77.68]). The intake scores serve as the baseline of symptomology and distress level, and subsequent changes were measured in relation to each person’s intake or first session (beginning) score.
OQ-45 Intake.
This sample included 52 clients (16%) whose treatment mode was dialectical behavior therapy (DBT). The intake scores for DBT clients as a group (M = 85.04, SD = 24.30) were higher than for non-DBT participants (M = 72.62, SD = 25.92). This difference in intercepts was statistically significant (t(314) = −3.18, p < .002).
OQ-45 Intake.
OQ 45.2 Intake Total Scores.
Consistent with the hypothesis (H1) that clients would tend to make positive change over time, the mean of the intercepts
Moderator variables
Several moderator variables were examined, looking for treatment-by-factor interaction over time. This was to assess whether treatment slope differed by demographic category. While intercepts varied, none of the tested moderators (e.g., gender, age, ethnicity, or participation in DBT) were statistically significant in relation to producing a unique slope (in the form of an interaction effect between a particular variable and change over time). The mixed model analysis revealed that the treatment slope did not differ by demographic categories reported.
Slopes as outcomes
While the moderator effects were not statistically significant, the researchers performed a descriptive analysis of the kinds of treatment trajectories characterizing clinic clients. The reason for this follow-up to our earlier use of the mixed model is two-fold. The first rationale was to visually demonstrate the diversity of treatment trajectories in light of the large standard deviation noted above; the second goal was to look more closely to see if the reason for not finding interaction effects in the mixed model might lie in the way the categories were defined. The raw session-level data were entered into SPSS, using the OMS system to create a file of person-level regression slope models. The purpose of this exploratory exercise was simply to visualize the treatment trajectories. The results shown here are intended merely to represent the variety of individual treatment trajectories, as a reminder that individual slopes are not homogenous.
Each client had a succession of session-level OQ-45 scores. Slopes as outcomes analyzes regression coefficients for each person to learn about who attains the largest slopes.
Slopes by Diagnosis.
Slopes were particularly pronounced for clients with V-code diagnoses (situational or problems in living). Treatment was associated with improvement for the 23 clients diagnosed with PTSD (slope = -2.42). Treatment was not associated with improvement for clients with bipolar disorder, alcohol and substance use disorder, and psychosis as suggested by the slopes being positive or nearly zero for each category.
Slope by DBT Participation.
Note. The smaller non-DBT comparison group represents clients specifically labeled as individual clients and reflects missing designations in other cases.
Slopes by Age.
Finally, treatment slopes were explored by race and ethnicity. Categories with fewer than 10–20 clients were interpreted with caution. For those groups with sufficient representation in the sample, it appears that psychotherapy was associated with improvement across ethnic and racial categories represented in the sample. This was particularly true for Latino/Latina clients, who had a group-level slope of −2.52.
Deterioration
An analysis was conducted to explore the incidence and trajectories of clients who deteriorated over the course of treatment. The first 16 sessions were chosen as the mean number of sessions for clients across the entire data set. Mean session scores were analyzed to distinguish between those whose OQ total scores had and had not increased by 14 points or more on the OQ-45. This 14 point increase in overall score is the definition of deterioration in guidelines for interpreting the OQ-45. There were 77 clients for whom calculations could be made on the 16th session, using the OQ-45. Of those 77 participants, 14 (18%) qualified as having deteriorated in relation to their own initial baseline score.
The authors noted the comparative trajectories of deteriorators versus non-deteriorators. While the two groups began with a nearly equivalent baseline score, on average, (M = 73, SD = 23 and 26, respectively), after the first session, the two groups diverged early in opposite directions. The top line shows the deteriorating clients (those with worsening overall scores), and their OQ-45 scores becoming more elevated over time. The bottom line shows a less varied and more consistent pattern of improvement. Using Cohen’s effect size, the deteriorating group has a large effect size of d = 0.86 in the positive (deteriorating) direction. The Cohen’s effect size for those who did not deteriorate was d = −0.43 which is a moderate negative (improving) direction. This is shown in Figure 1 below. Those clients who had clinically significant improvement at time 16 (the 16th) session, evidenced by a decreased total OQ score of 14 or more points, had an effect size of −1.22. Deterioration. Trellis plot of treatment trajectories.

Discussion and Applications to Practice
Consistent with existing literature (Öst et al., 2012), the authors found evidence of graduate trainees providing psychotherapy generally associated with client improvement, as evidenced in the linear mixed model by improving scores using the Outcome Questionnaire (OQ 45.2) on a session-by-session basis, at a training clinic where weekly sessions constituted the general standard of care. The lack of variables moderating outcomes in the form of interaction effects was striking. Consistent with Whipple and Lambert (2011) and with Cooper (2008), we did not find an interaction effect between gender and improvement over time, but noted that women in this sample began with a higher (more distressed) mean baseline OQ score than men. Variables associated with greater slopes as outcomes included clients diagnosed with v-codes (“problems in living”), a diagnosis of post-traumatic stress disorder (PTSD), and participation in dialectical behavior therapy (DBT).
The exploratory portion of this study's finding that trainees' clients did particularly well in DBT supports the findings in the study by Rizvi et al. (2017) at another university in a large urban center, where clients in trainee-led DBT reported similarly positive outcomes. This may point to these clients beginning as more symptomatic than the rest of the clinic sample (with a first session mean score of 85 vs. 74 for those beginning DBT vs. those beginning more general individual outpatient therapy) and thus more “room for movement” (i.e., a greater slope). Clients in DBT tended to present with more initial severity and, consistent with Forand et al. (2011), as clients presenting with more initial severity, they were less likely, as a group, to recover (in the form of later scores under the clinical cut-off off 63 on the OQ 45.2). It may also point to the larger mean number of sessions, the benefits of group treatment approaches, and of manualized, defined approaches to treatment for trainees beginning their professional practice. In light of Duncan (2010) and Nyman, Nafziger, and Smith’s (2010) emphasis on the importance of supervision in preventing therapies from going off course, we think here similarly of the ways in which team supervision and oversight are built into the treatment approach DBT utilizes. It could similarly reflect the intensity of the intervention. Consistent with the literature (Hoffman et al., 2012), clients in this data set presenting with substance use disorders tended to be associated with poorer outcomes. The smaller and positive slope in this data set for clients with a bipolar or bipolar-related diagnosis offers an opportunity for future research in that diagnosis-specific and empirically supported approaches are available, such as interpersonal and social rhythm therapy (Frank, 2008). Such approaches are often not widely practiced and may hold promise when offered in these more generalist, trainee settings.
We also acknowledge that there was a significant amount of missing data and small cell sizes in relation to these diagnostic categories and that this should temper any strong interpretations. However, this finding in relation to substance use disorders is consistent with findings from other published research on attrition in community clinics, such as Roseborough et al. (2016) study on attrition in community practice, where the diagnosis of a substance use disorder was associated with a three-fold elevation in the risk of premature attrition (a hazard ratio) in outpatient psychotherapy. It is also consistent with the findings of DiClemente (2018) in which the author acknowledges that while attempting change takes commitment and heightened motivation, recovery from addiction often requires even more heightened levels of motivation and commitment to change. Because this and other studies we reviewed point to increased effect sizes when examining the outcomes of particular disorder-specific treatments, these data may point to the potential value of substance-specific intervention approaches such as motivational interviewing and CRAFT (Meyers & Woulfe, 2003) that could be utilized by graduate trainees. Forand et al. (2011) similarly noted that effect sizes tend to strengthen among trainees when looking at specific diagnostic categories versus at treatment more broadly.
We found the mean number of sessions (n = 16 for N = 421 and 12.8 for n = 317) at this clinic as a positive finding in that it is consistent with the range suggested by Kopta, Howard, Lowry, and Beutler (1994) of 11 or more sessions, which serves as a sort of critical point with which to expect to see clinically meaningful change (this is the session by which a significant portion of their sample had achieved clinically reliable change). The mean of 12.8 sessions in this sample for the 317 clients persisting with more than one OQ score on record is consistent with the number of sessions in this sample for Hansen, Lambert, and Forman (2002) observed as necessary for a client to make clinically significant change on the OQ-45.2.
The degree of attrition we found supports the findings of other studies identifying this risk among community clinicians and trainees (Duncan, 2010; Swift & Greenberg, 2012). We noted that among the 317 clients with a baseline score and at least one additional session, n = 102 had an OQ score at session 12. This range is in keeping with other studies of ours of community mental healthcare clinics and may also reflect some of the unique characteristics of this clinic setting, including the role of cost-free services and the number of chronic stressors the clinic sample encounters. Some recent research is giving attention to prevalence rates of adverse childhood events (ACES) scores among outpatient medical and mental health clients (Barnett, Sheldrick, Liu, Kia-Keating, & Negriff, 2021). Ways to decrease attrition among trainees’ clients is a topic worth further exploration and is offered here as a recommendation for future research. We also acknowledge that the effectiveness design used in this study, evaluating open-ended treatments without definitive endings, did not allow us to look at a uniform posttest score, for instance, for those participating in DBT. This is a suggestion for future research as well in that it could offer a more precise effect size calculation. The degree of attrition we noted among DBT participants was comparable to the rate observed by Rizvi et al. (2017).
Perhaps the most significant finding is in relation to the immediate and often dramatic way in which clients who would go on to deteriorate (to worsen symptomatically in a clinically significant way, as measured on the OQ 45.2) did so early in these clinical relationships. The extent of these two divergent paths was a surprise finding. Both groups (those who would deteriorate vs. those who would go on to improve with clinical significance, signaled by an OQ 45.2 total score decreasing by 14 or more points) began with a similar intercept (baseline score at the first session) and yet two distinct pathways emerged early in these treatments. This divergence emerged both visually and statistically. The effect of this difference was pronounced. While the mean effect size for those who did not deteriorate was ES = −.43 (with lower scores representing less impairment), the mean effect size for those who went on to deteriorate was even stronger, at ES = 86: nearly twice as strong an effect. Clients who made clinically significant change had an effect size of −1.22. While clients’ starting score (their statistical intercept) did not predict the direction of their recovery, whether they worsened or improved in the first few sessions was potentially prognostic (others improve and leave). This all adds evidence to the assertion that clients benefit from close attention to these early trends.
This finding similarly lends evidence to and support for Lambert (2012) and Duncan’s (2010) practice implication derived from similar findings. Both point to the importance of clinicians closely monitoring the trajectory their clients take early in treatment (improving, worsening or plateauing). This can take the form of asking about the clients’ subjective sense of their improvement or worsening and again makes the case for the value of using formal client feedback measures such as the OQ, OQ Analyst, and the PCOMS, all of which can be reviewed with clients throughout treatment and which can lead to adjustments in the approaches used as sources of in vivo feedback. We know of examples of training clinics not only recommending and using, but of requiring interns to use these measures and to review them with both clients and with supervisors in order to decrease the risk of early attrition or preventable deterioration. A broader, national openness to such measures is supported by Mours, Clark, Gathercoal, and Peterson (2009), who surveyed 407 American Psychological Association (APA)-approved training programs and found that while the minority (47%) of responding internship sites reported using these, the majority (79%) of clinic supervisors were in favor of their regular use as part of assessing client progress. The same study noted that survey respondents often overestimated the cost and accessibility of such measures.
It reminds clinicians, too, to be aware of the reality that not all clients will improve. Missing client deterioration can lead to serious consequence such as missing even suicide risk. Jobes (2000) and others have noted that a significant percentage of outpatient mental health clinicians fail to actively monitor for changes in suicidal risk during the course of their treatment. The failure to note that a client is increasingly symptomatic could similarly prevent clinicians from noting risks as serious as suicide that emerge or worsen later in the course of a clinical relationship, particularly when clients are not improving and may become demoralized (Taylor, 2006).
Other client-administered measures such as the Cooper Norcross Inventory of Client Preferences (C-NIP) by Cooper and Norcross (2016) are available to clinicians. The C-NIP asks clients explicitly about their preferences for the nature and focus of their psychotherapy as a service, with the goal of improving the clinician’s attunement to things like how directive a therapy is, how much it focuses on past versus present, how much advice a clinician offers, and whether or not it includes assignments outside of sessions. Such measures can positively impact an initial alliance, one of the common factors most associated with a positive outcome in psychotherapy (Hubble et al., 1999).
This study raises some questions that can serve as topics for future research. While we were not able to address differences between individual clinicians as providers, Miller and Moyers (2021) and others have noted the significant variability in outcomes among individual providers. Miller et al. (2007) refer to the phenomenon of exceptional providers in their paper on pathways to clinical excellence, describing some clinicians as ‘super’ (p. 3). The authors of the present study partnered with a training clinic and made the collective decision not to compare disciplines (i.e., psychology vs. social work interns) in the spirit of supporting interdisciplinary collaboration versus fostering competition. However, authors such as Miller and Moyers (2021) are inviting clinicians and researchers to begin to give attention to what have been historically referred to as therapist effects, not as something interfering with studying a treatment in its pure form, but are instead encouraging clinicians, supervisors and educators to understand the traits of effective clinicians as skills or as stances that can be taught, encouraged, and developed over time. Miller and Moyers call on those educating graduate clinicians to help trainees convey Rogerian virtues of acceptance, empathy, and congruence and have pointed to correlations with client outcome. Future studies could consider examining therapist effects within their larger data sets and even at the impact of levels of training (such as master-level and doctoral trainees and length of practice experience.) and of supervisory models and their potential impact on trainee competence and outcomes. This is a topic Nyman et al. (2010) have noted is beginning to receive attention in the broader literature.
We are particularly interested in and hopeful about the role of future research in relation to the impact of trainees (1) making regular and explicit use of feedback measures and (2) of following early treatment directions their clients take to examine their impact upon both the risk of client attrition and the likelihood of eventual improvement. We see this as worthwhile in that existing research (Lambert, Hansen, & Finch, 2001) supports these as effective strategies (i.e., ways of routinely seeking feedback), broadly, in community practice and for trainees. We similarly observed that many studies in the literature examining trainee competence are discipline-specific and in light of how many practitioners go on to practice in interdisciplinary settings, we recommend sampling from interdisciplinary training clinics as this study did.
Finally, the authors offer this study as an example of some of the meaningful ways in which university researchers can partner with university and community clinicians to ask and begin to answer questions that are useful to clinicians and that can be leveraged to improve practice in some concrete and immediate ways. The use of the mixed model analysis as a statistical approach is one we recommend others consider in that as Field (2008) notes, it is particularly well suited for unbalanced designs that often characterize psychotherapy data (i.e., those with significant attrition over time and for those with potential nested subgroups or populations within a population). The use of effect sizes is also valuable in that their computation is relatively straight-forward and they allow comparisons across studies where we noted authors using quite varied behavioral health questionnaires and measures.
While clinicians can find it anxiety-producing to evaluate their own practice, clinicians and trainees are often assured by and find it rewarding to witness the positive impact of their own practice in such tangible ways. It is not only clients who benefit from the use of feedback measures. Trainees can increase their sense of competence and confidence. Tools such as the OQ 45.2 are often brief and relatively reasonable to administer, and offer test-retest reliability (.84 in the case of the OQ) (Lambert, Gregersen, & Burlingame, 2004). Feedback measures such as the OQ 45.2 can be incorporated into trainees’ practice, offering real-time feedback about things as important as suicidal risk, violence, substance harms, and the opportunity for client and clinician to “course correct” or to adjust their approach in response (Duncan, 2010; Lambert, 2007; Lambert, Whipple, & Kleinstauber, 2018; Whipple & Lambert, 2011). These tools can also inform and become a part of trainee supervision. Finally, we concur with Dyason et al. (2019), who in their systematic review, call for more research from and coordination among independent counseling training centers, noting these community and university settings offer potentially rich sources of data when combined, and sources for potential collaboration. They can model research as a part of ongoing practice evaluation. All of these offer opportunities for training and community clinics to deepen and to enrich efforts at practice evaluation many have already begun.
Footnotes
Acknowledgments
The authors want to thank Dr. McLeod for sharing statistical knowledge and support and the training clinic for hosting this study and for sharing their work.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
