Abstract
The therapeutic alliance has been reliably associated with outcome across psychotherapies. We investigated the alliance–outcome relationship in the early sessions of cognitive behavioral therapy of depression using a model that disaggregates within- and between-persons variance while estimating the reciprocal relation between variables. We used this model in a combined data set from two studies totaling 191 patients. In our primary model, we found evidence for a predictive within-patients relationship between alliance and symptoms such that symptoms predicted regressed change in alliance and alliance predicted regressed change in symptoms. In a more conservative detrended model, these relationships were not significant. Given that (a) most of the variability in alliance scores is between patients, (b) the size of the alliance–outcome relationship is modest, and (c) the alliance–outcome relationship is not robust to detrending, our findings suggest that alliance plays at most a small role in improving patient outcomes in cognitive behavioral therapy of depression.
Keywords
The therapeutic alliance is likely the most extensively studied psychotherapy process variable. Several meta-analyses have aggregated this body of research to generate estimates of the alliance–outcome relationship (Flückiger, Del Re, Wampold, & Horvath, 2018; Martin, Garske, & Davis, 2000), and evidence suggests significant heterogeneity in these estimates. However, insofar as many constituent studies have methodological limitations, estimates in the research may be much less informative than desired. Recently developed analytic approaches to modeling the reciprocal relations in panel data have several advantages in testing potential causal relations. Given the heterogeneity of estimates observed and the view that the role of the alliance may vary in different treatments, it is important to test these models in a variety of samples (Muran & Barber, 2011). To date, only a few studies have examined the alliance and outcome using such a modeling approach. These studies have tended to examine heterogeneous treatments for diagnostically diverse patient samples. This is particularly important given that some research has suggested that the alliance may play a less important role in CBT than in treatment approaches in which the relationship is posited to play a more central role (Zilcha-Mano, Eubanks, & Muran, 2019). In this article, we report on our effort to model the reciprocal relation of the alliance and outcome in CBT specifically.
Desirable Methodological Features
Several important features are desirable when examining the relation of the alliance and outcome. Ideally, testing the impact of the alliance on outcome would include measurements that precede the symptom improvement these measures may predict (Feeley, DeRubeis, & Gelfand, 1999). If the predictor variable temporally precedes the dependent variable, any such relation is not likely to be accounted for by reverse causality. In some studies, the outcome variable has been the difference between a posttreatment measurement and a measurement at intake, whereas the alliance variable is measured midtreatment. In this case, a relation between the alliance and change in symptoms could be accounted for in several ways (i.e., the alliance causing outcome, a reverse-causal relation, or a third variable introducing a spurious relation).
Repeated assessments of the alliance and outcome offer multiple advantages. They offer increased statistical power and, with appropriately spaced assessments, allow for assessing the lag of interest. Although the optimal lag for these relationships is often not known, methodologists recommend the assessment frequency be chosen to capture the lag the researchers hypothesize is operating (Ebner-Priemer & Trull, 2009). In the context of psychotherapy, changes from session to session are likely to provide a reasonably fine-grained characterization of the alliance that correspond to periods over which one might see symptom improvement (but see Eubanks, Muran, & Safran, 2018).
Repeated assessment also allows researchers to disaggregate within- and between-persons effects (Wang & Maxwell, 2015). When repeated measures are collected, any association identified may be driven by the between-persons or within-persons variation. Although between-persons associations might be due to a variety of stable personal characteristics or third variables, within-persons associations are not. Although this approach was taken in some more recent studies assessing alliance and outcome (e.g., Fisher, Atzil-Slonim, Bar-Kalifa, Rafaeli, & Peri, 2016; Rubel, Rosenbaum, & Lutz, 2017), the vast majority of the alliance–outcome literature is composed of studies that did not disaggregate within- and between-persons variability. Such studies leave open the possibility that results are confounded by a stable patient characteristic that may influence both the predictor and the dependent variable (Allison, 2014; Curran & Bauer, 2011; Wang & Maxwell, 2015).
Another desirable feature in research testing the alliance–outcome relationship with repeated measures is to control for a lagged dependent variable. Insofar as there is either a true causal effect of a variable on itself at a later time point or a correlation between error in the measurement of a variable across time, models need to account for a lagged dependent variable (Allison, 1990). Unfortunately, common approaches to including a lagged dependent variable can be problematic. Allison (2015) suggested that such an approach should be avoided because it introduces “severe bias” to model estimates. Specifically, the coefficient for the lagged dependent variable is usually too large, and the other coefficients in the model are usually too small (Allison, 2015; Falkenström, Finkel, Sandell, Rubel, & Holmqvist, 2017).
Fortunately, some recent modeling approaches do allow one to include a lagged dependent variable without introducing bias. One such approach that allows for each of the desirable features we have described is Hamaker, Kuiper, and Grasman’s (2015) random intercept cross-lagged panel model (RI-CLPM). The RI-CLPM allows for examination of within-persons relationships among repeated measurements of variables. Like a traditional cross-lagged panel model, it allows for modeling the effect of X on Y while also modeling the effect of Y on X. Such a reciprocal model is appropriate for studying the alliance and outcome because experts have suggested that the alliance may contribute to the outcome and that outcome may also contribute to the alliance. In fact, empirical findings of both of these relationships have often been reported (Flückiger et al., 2018; Webb, Beard, Auerbach, Menninger, & Björgvinsson, 2014). The RI-CLPM also parses between-persons variation as opposed to within-persons variation, which allows testing whether a within-persons relationship is evident. The RI-CLPM is useful for testing whether and how two variables are related and determining the relative magnitude of each variable’s predictive relation with the other (Hamaker et al., 2015). Given these substantial advantages, we used the RI-CLPM to examine the reciprocal relationship between alliance and symptoms in CBT of depression.
Reciprocal alliance–outcome models were tested in several recent articles. The samples used in these articles often had a diverse range of presenting problems and included a variety of treatments. In three articles, researchers examined these models in samples of patients that varied widely in their presenting problems (including depression, anxiety, substance use disorders, and psychosis) and in the treatments that were used (Falkenström, Granstrom, & Holmqvist, 2014; Falkenström, Kuria, Othieno, & Kumar, 2019; Xu & Tracey, 2015). We are aware of only one study that focused on depression treatments. In that study, Zilcha-Mano, Dinger, McCarthy, and Barber (2014) found support for a reciprocal alliance–outcome relationship in a depressed sample who received either supportive-expressive psychotherapy, clinical management combined with pharmacotherapy, or clinical management combined with placebo. However, it is important to investigate the alliance–outcome relationship in specific treatments and patient populations. Several researchers have suggested that the importance of the alliance may vary across treatments depending on the extent to which the therapist-client relationship is central in each treatment (e.g., Siev, Huppert, & Chambless, 2009). There is also evidence to suggest that the alliance–outcome relationship varies as a function of patient characteristics (Lorenzo-Luaces, DeRubeis, & Webb, 2014; Zilcha-Mano & Errázuriz, 2015). For example, this relationship appears stronger among depressed patients with fewer prior episodes (Lorenzo-Luaces et al., 2014). In a study that separated within- and between-patients variation, Zilcha-Mano and Errázuriz (2015) found that the within-patients alliance–outcome relationship was stronger among patients with greater symptom severity in their study of patients participating in psychotherapy in a general mental health clinic in Chile. Such findings suggest it will be important to apply analytic approaches with desirable features to patients participating in different treatments and presenting with different clinical problems.
To our knowledge, this article is the first to report on examination of a reciprocal alliance–outcome model for CBT of depression. We hypothesized that there would be a significant reciprocal relationship between alliance and symptoms in which alliance significantly predicts subsequent regressed change in symptoms and symptoms significantly predict subsequent regressed change in alliance when these relations are modeled together.
Method
Participants
We drew from two samples of patients participating in CBT of depression. Both samples were of patients with a primary diagnosis of major depressive disorder (MDD) as assessed by the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders–IV (First, Spitzer, Gibbon, & Williams, 2002). The first sample was composed of 66 patients (see Adler, Strunk, & Fazio, 2015), and the second sample consisted of 125 patients (see Schmidt, Pfeifer, & Strunk, 2019). For each data set, one additional participant could not be included because they discontinued after their intake and before participating in any sessions. For both samples, inclusion criteria were having current MDD, being 18 years of age or older, and providing informed consent. Exclusion criteria were having a diagnosis of bipolar I, history of psychosis, presence of a primary diagnosis other than MDD that necessitated treatment other than that being offered, current suicide or self-harm risk precluding outpatient treatment, substance dependence in the past 6 months, or clear indication of secondary gain. The first study also planned to exclude patients with an IQ below 80 (with testing occurring only when clinically indicated), but no patients were excluded on this basis. The second study also excluded patients with a diagnosis of bipolar II. Combining these samples yielded a total of 191 patients.
In the combined sample, most participants were White (82%), and the largest remaining groups were African American (8%) and Asian American (7%); the modal patient had completed some college (39%). The majority of participants were women (58%), and the mean age of our sample was 33.5 years (SD = 13.3, range = 18–70 years). All participants provided consent, and study procedures were approved by our institutional review board.
Therapists
Therapists were nine graduate students (five men and four women) under the supervision of a licensed clinical psychologist (D. R. Strunk). Therapists were pseudorandomly assigned to patients, with assignments made on the basis of openings in each therapist’s caseload, and the patient’s intake assessor was not permitted to serve as the patient’s therapist. The study protocol called for twice-weekly therapy sessions for the first 4 weeks; after that, time between meetings was collaboratively established as weekly or biweekly. In the final 4 weeks of treatment, the once-weekly sessions resumed.
Depression severity
The Beck Depression Inventory–II (BDI-II; Beck, Steer, & Brown, 1996) is a 21-item self-report measure of depressive symptom severity participants completed before each session. The BDI-II is a widely used measure with excellent psychometric properties (Beck et al., 1996).
Working alliance
The short form of the Working Alliance Inventory–Client Version (WAI; Horvath & Greenberg, 1989; Tracey & Kokotovic, 1989) consists of 12 items rated on a 7-point Likert-type scale. The WAI shows excellent and robust psychometric properties (Horvath & Greenberg, 1989). Participants filled out this measure after each of their first four sessions.
Analytic strategy
We used a RI-CLPM model implemented in Mplus (Version 8.3; Muthén & Muthén, 2017). Although BDI data were available at every session, we collected WAI data only at the end of the first four sessions; therefore, our model contains four observations of WAI but five observations of BDI, given that WAI at Session 4 predicts BDI at Session 5. In the RI-CLPM model, observed variables are parsed into within- and between-persons components. This is illustrated in Figure 1; the top of the figure shows the observed BDI scores, each of which was predicted by the latent random intercept of BDI (the between-persons component). Each observation of BDI was also predicted by its own latent within-persons variable. Observed WAI scores were treated similarly. On the within-persons level, we model the autoregressive effects of within-persons BDI and within-persons WAI, respectively. In addition, the within-persons variables are correlated at the first session, whereas their error terms are correlated at Sessions 2 through 4. The primary effects of interest are the cross-lagged associations between within-persons BDI and within-persons WAI. Within-persons BDI at a given session predicts within-persons WAI at the end of the same session, whereas within-persons WAI predicts within-persons BDI at the start of the next session. We refer to these cross-lagged parameters as predicting regressed change in BDI and WAI because we controlled for the previous values of these variables (see Cohen, Cohen, West, & Aiken, 2003; Hamaker et al., 2015). We did not assess change using difference scores.

The random intercept cross-lagged panel model (RI-CLPM). BDI = Beck Depression Inventory–II; WAI = Working Alliance Inventory; RI = random intercept. “W” before a variable name denotes the within-persons component of that variable. Error terms are represented by u and v. This diagram shows the decomposition of observed BDI and WAI scores into within- and between-persons components in the RI-CLPM model and the standardized estimates of the effect of WBDI on WWAI and WWAI on WBDI at each time point.
The initial model was run with the following parameters constrained to be equal: the grand means of BDI and WAI, the factor loadings of the latent variables (which were constrained to 1), the autoregressive parameters for within-persons WAI and for within-persons BDI, the parameters predicting within-persons BDI from within-persons WAI, the parameters predicting within-persons WAI from within-persons BDI, and the covariances between the residuals of the within-persons variables.
The estimated relation between two variables may be inflated if there is a linear change over time in both (Wang & Maxwell, 2015). Because we observed linear trends in BDI and WAI over time, we also evaluated a model that accounted for these trends (i.e., a detrended model). Detrended models may inflate the possibility of Type II error in cases in which removing variability related to the linear trends removes some of the variability in the relationship that the model is intended to estimate. Therefore, we present the results of both our primary model without detrending and a detrended model (see Falkenström, Solomonov, & Rubel, 2020). The detrended model followed the modeling approach described in Curran, Howard, Bainter, Lane, and McGinley (2014). This model, the latent curve model with structured residuals, is equivalent to our RI-CLPM but with the addition of a latent slope factor predicting WAI and a latent slope factor predicting BDI. The parameters that were constrained to be equal in the initial RI-CLPM were also constrained to be equal in the detrended model, with the exception of the grand means, which must be allowed to freely vary for the model to be detrended (for example code, see the Supplemental Material available online).
Results
Correlations among variables at each session are provided in Table 1. These are raw correlations, reflecting the combination of within- and between-patients variability. BDI and WAI were not significantly related to each other at Sessions 1 and 2 but showed a small yet significant correlation at Sessions 3 and 4. Means and standard deviations for BDI and WAI at each session are also reported in Table 1. We also calculated intraclass correlation coefficients (ICCs) to determine the level of between-patients variability in scores. The remaining variation is either within patients or error. Most of the variance for both WAI total scores and BDI-II scores was between persons (ICCs = .79 and .75, respectively).
Means, Standard Deviations, and Correlations Among Beck Depression Inventory and Working Alliance Inventory Scores
Note: Sample sizes range from 172 to 190. BDI = Beck Depression Inventory; WAI = Working Alliance Inventory.
On average, patients reported an improvement in BDI-II scores of 5.37 points (SD = 8.82) between Sessions 1 and 5, t(170) = 7.96, p < .001, d = −0.6. Patients also reported an increase in WAI scores of 2.1 points (SD = 7.76) from Session 1 to Session 4, t(168) = −3.5, p < .001, d = 0.28.
RI-CLPM: modifications
The initial model did not fit well: χ2(34, N = 191) = 128.23, p < .000, root mean square error of approximation (RMSEA) = 0.12, 90% confidence interval (CI) = [0.10, 0.14], probability RMSEA ≤ 0.05 = 0.00, comparative fit index (CFI) = .94, standardized root mean square residual (SRMR) = .10, Akaike information criterion (AIC) = 10,897.27. Therefore, as suggested by modification indices (MI; with the minimum MI value set to 5), all of the grand means of BDI were freed (MI value for BDI1 = 24.76; MI value for BDI5 = 6.37). This suggested modification is consistent with our finding that BDI scores changed over time. Freeing the means of the observed variables represents one of the changes required for detrending a model; it allows the model to separate out fluctuations in the group means from the cross-lagged parameters. This model accounts for changes in the overall means of the variables, whereas a detrended model adjusts for individual-specific trajectories (Falkenström et al., 2019). The fit of the resulting model was better: χ2(30, N = 191) = 73.01, p < .000, RMSEA = 0.09, 90% CI = [0.06, 0.11], probability RMSEA ≤ 0.05 = 0.01, CFI = .97, SRMR = .07, AIC = 10,850.06. A χ2 difference test also suggested this model fit the data better: χ2(4, N = 191) = 55.22, p < .001. In this test, the null hypothesis is the simpler model (the model with more degrees of freedom). A low p value suggests that this simpler model should be rejected in favor of the more complex model. Therefore, we rejected the initial model in favor of the model with the grand means of BDI estimated freely.
RI-CLPM: results
Within-persons WAI significantly predicted within-persons BDI, with an increase of 1 point in within-persons WAI predicting a 0.31 decrease in within-persons BDI (SE = 0.05, 95% CI = [−0.40, −0.22], p < .001). The standardized estimates ranged from −0.28 to −0.34; these estimates vary slightly because each standardized estimate is calculated using the standard deviation of the predictor and dependent variable, and this changes at each session. Regressed change in BDI also predicted regressed change in WAI, with a 1-point increase in within-persons BDI predicting a 0.62 decrease in WAI (SE = 0.19, 95% CI = [−0.30, −0.94], p = .001). Standardized estimates ranged from −0.54 to −0.64. 1
Detrending: model details and results
The detrended model had acceptable fit: χ2(20, N = 191) = 40.23, p = .005, RMSEA = 0.07, 90% CI = [0.04, 0.11], probability RMSEA ≤ 0.05 = 0.12, CFI = .99, SRMR = .07, AIC = 10,837.28. Regressed change in within-persons WAI did not significantly predict regressed change in within-persons BDI (estimate = −0.00, SE = 0.08, 95% CI = [−0.14, 0.13], p = .96). Regressed change in BDI also did not predict regressed change in WAI (estimate = 0.38, SE = 0.43, 95% CI = [−0.32, 1.08], p = .37).
Discussion
When we tested a reciprocal model of the alliance–outcome relationship without detrending, we found that the alliance predicted subsequent symptom change and that symptom change predicted subsequent alliance. The model we used allowed us not only to examine reciprocal relationships between our variables but also to estimate these relations using within-persons variability specifically. Furthermore, our model allowed us to control for prior assessments of depressive symptoms. However, in a detrended model, we did not observe significant relationships between the alliance and symptom change. The alliance–outcome relationship we observed was modest, and much of the alliance scores appeared attributable to patient factors. These findings are consistent with the view that the alliance has limited impact on outcomes on average in CBT for depression. The relationship we identified was not robust to detrending, suggesting further reason for caution.
At first glance, it may appear that similar estimates to ours are available in the meta-analytic literature. When one looks at all available studies of the alliance–outcome relationship without regard for their methodological features, the average relationship appears to be about .28 (Flückiger et al., 2018). However, model differences are substantial enough to raise questions about the extent to which these estimates can be meaningfully compared. Our estimate reflects the portion of the alliance that is not explained by stable between-persons differences. Meta-analytic estimates have been based largely on raw scores, driven by a combination of within- and between-persons variability, and thus are fundamentally different from estimates that use only within-persons variability.
This makes it especially important to consider how our RI-CLPM estimate of the alliance–outcome relationship compares with other findings using the same analytic approach. When estimating the relation of alliance and subsequent symptoms using the RI-CLPM, Falkenström et al. (2019) reported standardized estimates ranging from 0.16 to 0.21. Our estimates of a similar relationship ranged from 0.27 to 0.32, reflecting a slightly stronger alliance–outcome relationship. Our sample was composed exclusively of patients with depression, and our outcome measure was focused on depressive symptoms specifically. In contrast, Falkenström and colleagues’ sample was composed of a wider range of disorders and included a variety of psychotherapies. Despite these differences, the results of their detrended model mirrored ours: They found that alliance did not predict symptom change and that symptom change did not predict alliance. This raises the question of whether the alliance and outcome may appear related only because both variables display linear change and not because of a causal relationship between the two. An alternative possibility is that detrending our data was an overly conservative approach that may have resulted in Type II error. The results of our detrended model, although not conclusive, cast doubt on the robustness of the within-persons alliance–outcome relationship in the context of CBT of depression. When considering the magnitude of the effects we identified and the degree to which the alliance appeared to be attributable to patient factors, our findings leave us skeptical that variability in the alliance contributes substantially to the therapeutic benefits of CBT of depression.
The amount of within-persons variability in the alliance was considerably smaller than the portion of between-patients variability (79% of variance was between-patients). The variability in the therapeutic alliance was largely accounted for by differences between patients, who had stable tendencies in their alliance scores. This finding is consistent with previous research finding that 67% to 69% of the variance in alliance scores on two subscales of the WAI was attributable to between-patients differences in a trial of CBT of depression (Sasso, Strunk, Braun, DeRubeis, & Brotman, 2016). Our estimates are somewhat higher than those of Falkenström, Granström, and Holmqvist (2013), who found that 54% of the variance observed in the patient-reported WAI was between-patients variability among patients participating in psychotherapy in a primary care setting in Sweden. Within- and between-patients variability in alliance scores is not frequently reported in the alliance research, limiting our ability to assess how our estimates compare with other studies. Regardless, in the studies we identified that did report it, the majority of variability in alliance was between-patients. Barber and colleagues (2014) provided a particularly powerful demonstration of the patient’s role in determining the alliance. They found that patients’ pretreatment expectations for the alliance strongly predicted alliance ratings after patients met their therapists and began treatment (rs ≈ .50). Thus, at least in some samples, patient differences appear to account for a considerable portion of the variation in alliance scores.
Our findings suggest the alliance appears to be determined by patients to a large degree, the magnitude of the alliance–outcome relationship is only modest, and reciprocal modeling of the alliance and outcome provides support for the alliance as a predictor of outcome only when the model does not include detrending. Efforts to foster the alliance remain quite popular (see Muran & Barber, 2011). Although we are aware of no arguments against fostering a positive alliance, our findings lead us to conclude that either other variables or a more complex mix of a number of factors are likely to be needed to provide a more satisfactory account of determinants of patients’ therapeutic outcomes. However, it is important to note that these inferences are based on an analysis of the early sessions of a naturalistic study of CBT for depression. It may be that the alliance plays a more important role among some patients or when a more diverse set of intervention approaches is used (Hofmann & Hayes, 2019; Zilcha-Mano, 2018).
Limitations
There are several limitations worthy of note. First, although our model has a number of positive methodological features, even such complex models do not establish a causal relationship with the certainty of experimental designs. Second, the time between measurements varied; the time from a symptom assessment to the next alliance assessment was about 1 hr (i.e., from the beginning to the end of a session), whereas the time from an alliance assessment to the next symptom assessment was several days (i.e., at the following session). Although we think our time lags are reasonable, it is often difficult to know which time lags are most appropriate for the effects of interest. Third, we modified our model to improve fit. Such modifications come with a risk of capitalizing on chance associations and generating sample specific findings. Although we tried to be thoughtful in our model modifications, we cannot rule out this possibility. Finally, it is important for future research to explore whether relations of the alliance and outcome vary across different cultural contexts.
Conclusion
We found support for a reciprocal model of the alliance–outcome relationship in which alliance change predicts symptom change and symptom change predicts alliance change during early sessions of CBT of depression. Our findings were obtained in a model with a number of methodologically desirable features. Nonetheless, the alliance is to a large degree determined by patients, the magnitude of the alliance–outcome relationship is only modest, and support for the reciprocal alliance–outcome relationships was limited to the model that did not account for linear change in these variables. We encourage further research examining whether there are particular patients for whom the alliance may be more consequential.
Supplemental Material
sj-pdf-1-cpx-10.1177_2167702620959352 – Supplemental material for Reevaluating the Alliance–Outcome Relationship in the Early Sessions of Cognitive Behavioral Therapy of Depression
Supplemental material, sj-pdf-1-cpx-10.1177_2167702620959352 for Reevaluating the Alliance–Outcome Relationship in the Early Sessions of Cognitive Behavioral Therapy of Depression by Megan L. Whelen, Samuel T. Murphy and Daniel R. Strunk in Clinical Psychological Science
Footnotes
Transparency
Action Editor: Stefan G. Hofmann
Editor: Kenneth J. Sher
Author Contributions
M. L. Whelen, S. T. Murphy, and D. R. Strunk developed the study concept. Data collection was completed by D. R. Strunk. M. L. Whelen performed the data analysis and interpretation with S. T. Murphy both under the supervision of D. R. Strunk. M. L. Whelen and S. T. Murphy drafted the manuscript, and D. R. Strunk provided critical revisions. All of the authors approved the final manuscript for submission.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
