Abstract
The feasibility of using a randomized design in a psychoanalytic outcome study was evaluated. Our hypothesis was that it would be feasible to randomize patients to psychoanalysis three or four times weekly on the couch for five years, supportive expressive therapy once or twice weekly for up to forty sessions, and cognitive behavior therapy once or twice weekly for up to forty sessions. Successful randomization was defined as a 30% recruitment rate among eligible patients. Recruitment began in September 2009 and closed in April 2010. A total of 132 subjects responded to study advertisements, 107 of whom (81%) were triaged out. The remaining 25 were scheduled for the first of two clinical interviews, and 21 of 25 (88%) completed the interview. Eleven of the 25 (44%) were determined to be eligible based on inclusion and exclusion criteria. Eight of the 11 accepted the idea of randomization and completed the diagnostic assessment phase. Calculated on the basis of 8 of 11 eligible patients accepting randomization, the 95% confidence interval was that 39% to 92% of eligible subjects would participate in a larger study of this design. Our findings support the feasibility of implementing an RCT comparing psychoanalysis as defined by the American Psychoanalytic Association (three or four times weekly on the couch for approximately five years) with shorter-term dynamic or cognitive behavioral therapy once or twice a week. Pre-treatment characteristics of these eight patients are presented, as are initial reliability data for the treatment adherence scales used in this trial.
In the past decade there has been considerable interest in comparing the effectiveness of psychoanalysis with that of shorter-term dynamic and cognitive behavioral treatments of lower frequency, often evidence-based. The execution of a number of methodologically rigorous studies (Grande et al. 2006; Huber, Klug, and von Rad 2002; Knekt et al. 2011; Sandell et al. 2000) has stimulated active debate within the scientific community about how best to design such studies (Fonagy et al. 2002; Gabbard, Gunderson, and Fonagy 2002; Leichsenring 2004). When designing an outcome study, one has to make a series of choices about study design. Each design is different, bringing different advantages and also potential disadvantages. When choosing a design, investigators take into account three questions: level of evidence afforded by the design (How persuasive are the results?); degree of generalizability of the findings (Are the patients and the treatment comparable to the actual clinical situation?); and feasibility of implementation (Can patients be recruited and the study executed?).
Randomization in Psychoanalytic Outcome Studies
Ideally one would like to design a study with a high level of evidence that is both highly feasible and highly generalizable. However, it is often maintained (see, e.g., Barber 2009; Leichsenring 2004; Sandell, Blomberg, and Lazar 1997; Westen, Novotny, and Thompson-Brenner 2004) that a high level of evidence in an outcome study is achieved at the expense of generalizability and feasibility. These trade-offs are crucial in deciding whether to use randomization of treatment assignment in a psychoanalytic outcome study.
Despite the high value placed on randomized designs by the larger scientific community, analytic researchers have raised concerns about them, some scientific and others pragmatic (Fonagy et al. 2002; Grande et al. 2006; Green 1996; Leichsenring 2004; Sandell 2001). Consequently, most psychoanalytic researchers have not made use of randomization in designing outcome studies, instead choosing naturalistic study designs in which patients either are assigned to a treatment on the basis of clinical evaluation or select their treatment themselves.
A concern raised by some analytic researchers about randomization stems from the view that analysis is not simply a treatment, analogous to CBT or medication or even dynamic therapy, but rather is a unique experience embedded in a unique relationship. Self-selection, analyst-patient match, and a high level of patient motivation and analyst conviction, it is argued, are intrinsic to analytic treatment. Randomization, because it disrupts or prevents development of these essential components, is considered incompatible with conducting a psychoanalytic treatment. As a result they conclude, a priori, that RCTs can tell us little about psychoanalysis.
Even researchers who favor randomization recognize that RCTs may have limitations with respect to generalizability and feasibility. With respect to the first, the question is whether patients who are willing to participate in an RCT are representative of patients being treated in clinical practice. If the two patient groups are significantly different with respect to unmeasured or perhaps even unknown variables, an RCT will tell us little about the effectiveness of a treatment in patients outside the research setting. In the case of analysis, patient factors such as motivation for analysis and the expectation of benefiting from the treatment, along with therapist-patient match and therapist conviction that analysis is the best treatment for this patient, are thought to characterize analytic treatments outside the research setting, and whether the effectiveness of analysis depends to any significant degree on these attitudinal factors is unknown. A randomized clinical trial in which patients seeking one therapy are randomly assigned to one of several treatments and one of several study therapists, with no attention to motivation or match, is clearly different from the conditions of psychoanalysis as practiced in the ordinary clinical situation.
As for feasibility, some researchers contend that a study that randomizes patients to analysis and shorter-term, less intensive therapies is infeasible because of patient preference and the refusal of too many patients to accept assignment to analysis. The concern is that patients who want analysis want only analysis, and that those who do not expressly want it will not accept assignment to a treatment conducted three or four times weekly for four or more years.
Thus, many psychoanalytic researchers believe that RCTs, the study design generally agreed to provide the highest level of evidence when comparing active treatments, are difficult to implement at best, and may produce findings that are poorly generalizable to clinical practice in the community. It should be noted, however, that while these beliefs about RCTs may be strongly held, they have yet to be tested; there is little empirical evidence to back up these assumptions. For example, while it is often stated that patients who do not expressly want psychoanalysis will not accept randomization to this long and intensive treatment, this belief has not been studied empirically. Moreover, some of the beliefs commonly held by analytic researchers regarding randomization are difficult to test. For example, given that there are no consensually held clinical criteria for recommending psychoanalysis, and very limited data on patients who enter psychoanalysis in clinical practice, it is not possible to determine whether the patient sample entering an RCT of psychoanalysis is representative of the actual clinical population.
Review of Prospective Psychoanalytic Outcome Studies
Four prospective outcome studies have been completed that compare analysis to psychotherapy: the Stockholm Outcome of Psychoanalysis and Psychotherapy Project (STOPPP: Sandell et al. 2000), the Heidelberg-Berlin Study (Grande et al. 2006), the Helsinki Psychotherapy Outcome Study (Knekt et al. 2011), and the Munich Psychotherapy Study (MPS; Huber, Klug, and von Rad 2002). Only one of these, the MPS, employed a randomized design. These prospective studies paid close attention to the challenges of study design, with the goal of obtaining an acceptably high level of evidence with regard to differential outcome while balancing the demands of feasibility and generalizability. STOPPP made treatment assignment by patient preference and made no attempt to ensure relative equivalence between the psychotherapy and psychoanalytic cohorts. In the Heidelberg-Berlin Study, investigators attempted to match patients in the analytic and therapy cells with respect to age, sex, education, and personality organization. In the Helsinki study, the analytic and therapy patients met clearly specified inclusion and exclusion criteria; group differences were taken into account in a statistical model to enhance comparison of outcomes.
STOPPP was a large, naturalistic outcome study comparing psychoanalysis three to five times weekly on the couch with dynamic therapy once or twice a week. The sample consisted initially of 756 people in subsidized treatment or on waiting lists for treatment; data collection was completed and analyzed for 331 patients in psychotherapy and 74 in psychoanalysis. Treatment assignment was made on the basis of patient preference (i.e., patients selected whether they wanted to be on a waiting list for analysis or on one for therapy), and no attempt was made to match patients in the two cells. The most striking finding was that after a decrease in symptomatology, measured during treatment by the Symptom Check List–90 (SCL-90), among patients in both psychoanalysis and psychotherapy, only those receiving psychoanalysis continued to improve after termination. The investigators maintain that one benefit of this design is that self-selection provides a more “ecologically valid” (i.e., generalizable) comparison than would a different approach to treatment assignment. However, patient preference may be linked to treatment outcome, either directly or indirectly via other moderators (e.g., years of education, number of previous treatments), and therefore also limits the validity of comparing treatment outcomes (Sandell, Blomberg, and Lazar 1997).
The Heidelberg-Berlin study, a naturalistic, quasi-experimental comparative outcome study, compared analysis (at least three times weekly on the couch, minimum 150 sessions) with dynamic psychotherapy conducted weekly for 25 to 100 sessions. Psychoanalysts in private practice in Heidelberg and Berlin invited patients to participate in the study; after the patient’s clinical evaluation, analyst and patient agreed on a course of treatment, either therapy or analysis. A total of 76 patients agreed to enroll, and 59 were included in the final data analysis. Patients in therapy and analysis were matched by age, sex, and education and, for many in the sample, by personality organization as evaluated by independent raters viewing videotaped interviews. In addition, the investigators made use of a relatively extensive pre-treatment assessment battery that allowed for group comparison. Grande et al. (2006) report relatively high comparability between the therapy and the analysis patients across a number of dimensions including nature and severity of symptoms, severity of impairment, and diagnosis. However, there were significant differences with regard to level of education, sick days from work, and scores on a self-report measure of interpersonal problems. In addition, the analytic patients fell within a more narrow range of structural integration, though average scores of severity were the same. As with the STOPPP study, the finding that the treatments were effective for the patients who selected them cannot be translated into a comparison of the relative effectiveness of the two treatment conditions (Grande et al. 2006).
The Helsinki study (Knekt et al. 2011) was a large, quasi-experimental comparative outcome study conducted in Finland. The study compared three kinds of psychotherapy using a randomized design and included a group of self-selected patients who were treated in analysis. Patients were recruited from a variety of psychiatric services in the Helsinki region and were characterized at baseline using an extensive battery of structured interviews and self-report measures. Patients were randomly assigned to one of three psychotherapeutic modalities: (1) solution-focused therapy (SFT), a brief goal- and solution-focused treatment of twelve sessions over no more than eight months; (2) short-term psychodynamic psychotherapy (SDP) for twenty weekly sessions; (3) long-term psychodynamic psychotherapy (LTDP) conducted two or three times a week for approximately three years. Additionally, as noted, some patients meeting eligibility requirements were allowed to self-select treatment in psychoanalysis, four or five sessions weekly on the couch for approximately five years. For the psychotherapy conditions, 506 outpatients were recruited and assessed as eligible, of whom 27% refused participation. Of the remaining 367 patients, 326 were randomly assigned to one of the three psychotherapies in a 1 (SFT): 1 (SDP): 1.3 (LTDP) ratio; 41 self-selected analysis. Of the 326 patients assigned to psychotherapy, 36 declined to participate based on their treatment assignment. Overall, then, 34% of the 506 eligible patients refused randomization to psychotherapy. Because the investigators believed patients would not accept randomization to analysis, patients from a group meeting eligibility requirements both for the study at large and for suitability for analysis could self-select that treatment. Patients were characterized at baseline using an extensive battery of structured interviews and self-report measures. Though the analytic patients were in many ways comparable to the therapy patients, they were better educated, more self-critical, and less likely to be on psychotropic medication. They also had higher ratings of symptoms of anxiety, lower ratings on sense of coherence and well-being, better reflective ability in response to trial interpretations, and greater motivation for treatment (the last two items were inclusion criteria for the analytic treatment cell). The investigators used a statistical analysis to adjust for known differences between the analytic and the therapy patients, and on this basis felt it reasonable to do a comparative analysis of outcomes after adjusting for known differences. The authors commented on the need for randomized controlled trials for patients suitable for psychoanalytic treatment (Knekt et al. 2008).
The Munich study (Huber, Klug, and von Rad 2002) is the most recent analytic outcome study to be completed and the only randomized study of analytic treatment in the literature. Patients presenting with ICD- and DSM-IV-defined major depressive disorder and a minimum score on the Beck Depression Inventory were recruited among patients presenting to the outpatient department of the Institute of Psychodynamic Medicine, Psychotherapy, and Medical Psychology of the Technical University of Munich. Each patient received an extensive intake assessment, including an audiotaped interview. Three experienced psychoanalysts reviewed the tape and decided whether the patient could ethically be randomized. The investigators report that 100 patients were successfully randomized to analytic treatment on the couch two or three times a week for three years, weekly dynamic therapy with an average duration of 2.8 years, or long-term CBT once or twice weekly for an average duration of 2.2 years. The authors have yet to publish a description of the randomization process that would shed light on issues of feasibility and generalizability; it remains unclear how many eligible patients were screened, how many were rejected by the analysts reviewing the audiotapes for suitability for randomization, and how many patients themselves rejected participation or randomization.
In combination, the Helsinki and MPS studies suggest that it may be feasible to randomize patients to analysis of relatively low session frequency and other less intensive dynamic and cognitive behavioral treatments. However, it remains untested whether patients can be successfully randomized between analysis as currently defined in the United States (three or four sessions weekly on the couch for four or five years) and less intensive and shorter dynamic and cognitive behavioral treatments. In this context “successful” means that at least 30% of eligible patients will accept randomization and that there are no significant differences in the assessed variables (e.g., age, sex, diagnosis, personality dimensions, quality of object relations) between patients who participate and those who decline.
The Comparative Outcomes in Psychotherapy and Psychoanalysis Study (COPPS)
As described in the accompanying article (Roose et al. 2012), the goal of COPPS is to compare the outcome of psychoanalytic treatment to the outcome of other commonly practiced treatments of lesser intensity and shorter duration. Outcome will be measured at the level of personality structure and personality functioning. The study is designed to answer the question “Does psychoanalysis result in clinical improvement that is greater than or different from gains resulting from supportive expressive or cognitive behavioral therapy?” To the degree that patients in analysis and therapy cells are not equivalent, differences in outcome between cells may reflect patient, in contrast to treatment, differences. Thus, the investigators responsible for the design of COPPS thought it scientifically crucial to ensure, as much as possible, the equivalence of the patients assigned to each treatment condition by means of random assignment.
But is randomization feasible? The pilot study reported here was designed to answer this question. We evaluated a series of patients responding to advertisements recruiting subjects to participate in a study of long-term psychotherapy. We hypothesized that at least 30% of eligible subjects would agree to participate, give informed consent, and begin a randomly assigned treatment.
Method
An advertisement was posted online and placed in local newspapers offering free long-term psychotherapy or psychoanalytic treatment for eligible participants struggling with depression and interpersonal problems. The project research coordinator (MT) screened calls in response to the advertisement to ensure that callers understood the nature of the study. Subjects who were clearly ineligible (e.g., seeking treatment for substance abuse) or who failed to respond to two follow-up calls were triaged out. Remaining subjects were then scheduled for a ninety-minute standard clinical evaluation with one of the project’s principal investigators (SR), at which time they were asked their informed consent to receive the evaluation (New York State Psychiatric Institute IRB #5280R, Evaluation at the Adult and Late Life Depression Center). Subjects also completed the Beck Depression Inventory (BDI) and the Inventory of Interpersonal Problems (IIP-32). Their eligibility for the study was determined based on the inclusion and exclusion criteria presented in Table 1. If at the end of the evaluation subjects met those criteria, the study was described to them. They were also given written descriptions of the three treatment modalities—dynamic psychotherapy, cognitive-behavioral therapy, and psychoanalysis—to read and discuss with the evaluator.
Inclusion and exclusion criteria
At the end of the initial evaluation, eligible subjects were asked if they wished to continue the evaluation process or to decline further participation. Subjects who opted to continue then participated in a diagnostic assessment (with MT) that included the Structured Clinical Interview for DSM-IV-TR for Axis I disorders (SCID), the Longitudinal Interval Follow-up Evaluation: The Range of Impaired Functioning Tool (LIFE-RIFT), the Hamilton Anxiety Rating Scale (HAM-A), and the Hamilton Rating Scale for Depression (HRS-D). In addition, subjects completed the Social Adjustment Scale Self-Report (SAS-SR), the Quick Inventory of Depressive Symptomatology Self-Report (QIDS-SR), the Beck Anxiety Inventory (BAI), and the Schedule for Nonadaptive and Adaptive Personality (SNAP) during the diagnostic assessment. This assessment generally took two hours.
After the diagnostic assessment, subjects were scheduled for a final clinical meeting (with SR), during which the results of the diagnostic assessment were discussed. If after this evaluation subjects still met the inclusion criteria, the study was explained again and they were asked if they wished to participate. Once informed consent was obtained, subjects were randomized to a treatment cell. This study was approved by the New York State Psychiatric Institute IRB (#5954, Comparative Outcomes of Psychotherapy and Psychoanalysis).
The Psychoanalytic Outcome Research Committee agreed that a 30% rate of recruitment of eligible patients would be necessary for the results of an outcome study to be generalizable. Confidence intervals were used to establish projected recruitment rates: If 13 patients were offered randomization and all 13 refused, then the 95% confidence interval (CI) for the rate of recruitment would be between 7% and 28%, or below the minimum recruitment rate of 30% needed to make the study feasible. In contrast, if after the first 14 patients the recruitment rate was 50%, then the 95% CI for the rate of recruitment would be between 24% and 76%, approaching the threshold of feasibility. After each eligible subject either accepted or rejected randomization, the 95% CI for rate of recruitment was recalculated to determine whether it would be possible to draw a conclusion regarding the study’s feasibility. Thus, there was no predetermined N for this pilot study. Rather, the study would continue until it was determined whether the top of the CI was below 30% (meaning we could not expect 30% of eligible subjects to accept randomization) or whether the bottom of the CI was above 30% (meaning we could expect to enroll at least 30% of eligible subjects).
The Shedler-Westen Assessment Procedure–II Q-sort (SWAP-II)
The SWAP-II Q-sort (Westen et al. 2011; Westen et al. in press) is a set of 200 cards containing statements that describe different aspects of personality and psychological functioning. A clinician or interviewer with a thorough knowledge of a patient sorts the cards into categories, from those that are inapplicable or not descriptive of the patient to those that are highly descriptive. The rater arranges the set of 200 personality descriptions into eight categories ranging from 0 (irrelevant or inapplicable) to 7 (highly descriptive). The Q-sort creates a fixed distribution of SWAP-200 items that resembles the right half of a normal distribution. The fixed distribution requires the rater to assign a specified number of items to each score category (8 in pile 7; 10 in pile 6; 12 in pile 5, etc.). This process protects against measurement error and heteroscedasticity by requiring all raters to use each value the same number of times.
The SWAP-II provides two kinds of scales, the first being descriptions of prototypical patients illustrating a given personality disorder. These hypothetical patient descriptions are referred to as prototype scales, as they accurately reflect the clinical and theoretical understanding of many practicing clinicians (Westen et al. 2011). The SWAP-II personality trait scales are description of patients derived from traditional factor analysis (Westen et al. in press). The studies by Westen and colleagues support the reliability and validity of the SWAP-II in the assessment of personality pathology.
An advanced graduate student in an APA-approved clinical psychology doctoral program, with extensive clinical and research experience with the SWAP, conducted a Clinical Diagnostic Interview (CDI; Westen and Muderrisoglu 2003, 2006) with all eight of the eligible patients who agreed to participate. The interview was conducted before they began treatment and took approximately two hours. This clinician was blind to any other clinical data collected in the study at that point. The CDI differs from structured PD interviews in that it does not ask patients primarily to describe themselves (although it does not avoid face-valid questions about behaviors, intentions, or phenomenology (e.g., whether the patient has self-mutilated or thought about suicide). Instead it asks patients to provide detailed narratives about their symptoms, their education and work history, and their relationship history, focusing on specific examples of emotionally salient experiences. From the information generated in the CDI, a clinician can then make judgments about how the patient characteristically thinks, feels, regulates impulses and emotions, views the self and others, and behaves in significant relationships, as reflected in the ranking of the items. These CDI interviews were digitally recorded and then transcribed verbatim. A second advanced graduate student in an APA-approved clinical psychology doctoral program, also with extensive clinical and research experience with the SWAP, then independently reviewed each transcript and completed the SWAP Q-sort.
The COPPS Adherence Scale (COPPS-AS)
As described by Roose et al. (2012), the COPPS adherence scale is based on (1) relevant items from existing measures of treatment fidelity with demonstrated reliability and validity (Ablon and Jones 2005; Hilsenroth et al. 2005; Hollon et al. 1988; Strunk et al. 2007); (2) empirical review of the research process literature to develop related items; and (3) consultation with the treatment cell clinicians and supervisors. The COPPS-AS is a brief measure of the significant features of psychonalysis (PSA) and supportive-expressive (SE) and cognitive-behavioral (CB) psychotherapy designed to assess therapist activity occurring during the therapeutic hour, as well as the techniques employed. The measure consists of randomly ordered items rated on a 7-point Likert scale ranging from 0 (“not at all characteristic”), 2 (“somewhat characteristic”), 4 (“characteristic”), to 6 (“extremely characteristic”). The COPPS-AS is completed by independent raters and constructed to contain three subscales: one measuring PSA features (COPPS-PSA), SE features (COPPS-SE), and CB features (COPPS-CB). Each of the subscales is designed to measure therapist activities and techniques emphasized significantly more in its treatment modality than in the others.
In completing the COPPS-AS, the rater’s task is to search for evidence that a therapist activity has occurred in the session. A general principle for each item is that scores of 1 or 2 suggest some attempt by the therapist to engage in the behavior or action delineated by the item, with limited or no follow-up exploration. Ratings of 4 indicate that the technique tapped by the item is addressed on separate occasions by the therapist with some follow-up exploration. Scores of 5 or 6 suggest continued efforts by the therapist to exhibit the behavior or action indicated by the item with sustained follow-up.
Four advanced clinical psychology graduate students enrolled in an APA-approved clinical psychology doctoral program provided the ratings on the COPPS-AS. Before rating any COPPS sessions included in the present study, the four coders underwent supervised training in the evaluation of various treatment techniques and in rating video recordings for a range of different treatment sessions. Each of the judges independently rated at least thirty sessions during the training phase of the project; their ratings were used in a preliminary computation of interrater reliability (intraclass correlation coefficient), on which all of them showed a “good” level of initial interrater agreement (> .60; Fleiss 1981). The videos used in these training sessions included (1) training tapes from the APA Psychotherapy Videotape Series; (2) other available training tapes from expert therapists; and (3) videotaped sessions of patients included in other programmatic psychotherapy research. Following the preliminary interrater reliability analysis, these coders began rating videotaped sessions of patients in the COPPS study.
For the eight patients receiving the three different treatments during the pilot study, digital audiotapes of one session from each of the first, second, third, sixth, and ninth months of treatment were used if available. In all, 43 sessions from the pilot treatment sample were viewed and rated (PSA = 6, SE = 19, CB = 18). All sessions were rated independently by two of the four judges, sessions were arranged in random order, and entire sessions were evaluated. To prevent rater drift, reliability meetings were held regularly during the coding process.
Results
Subject Recruitment
Recruitment began in September 2009 and closed in April 2010. A total of 132 subjects responded to the study advertisement, of whom 107 (81%) were triaged out (See Figure 1).

Subject recruitment
Clinical Evaluations
The remaining 25 subjects were scheduled for a first clinical interview, of whom 21 (88%) completed it. Fourteen of these 21 subjects (67%) were female, their mean age was 40 (SD = 8.5), their mean BDI score 26 (SD = 11), and their mean IIP score 57 (SD = 22).
Eleven of 25 subjects (44%) were determined to be eligible based on inclusion and exclusion criteria. At this point two of the eligible patients refused to participate in this randomized study; one wanted only CBT, and the other specifically refused psychoanalysis.
Diagnostic Assessments
Of the remaining nine eligible patients scheduled for diagnostic assessment, one failed to show up. Thus, eight of eleven potentially eligible subjects accepted the idea of randomization and completed the diagnostic assessment phase.
Second Clinical Interview and Randomization
All of the eight subjects who completed the diagnostic assessment showed up for the final interview. The results were discussed, and all agreed to participate. After signing informed consent statements for the study, they were randomized to treatment, three to cognitive behavioral therapy, three to dynamic psychotherapy, and two to psychoanalysis.
Seven of the eight subjects (88%) were female, mean age 38 (SD = 5.9, Range = 32–49); five (63%) were Caucasian, two (25%) were African American, and one (13%) was Asian. Regarding work status, four (50%) were unemployed, while the other four were employed on a full- or part-time basis. Six of these subjects (75%) were single and had never been married, one was divorced, and one was married; none of them were parents.
By DSM-IV criteria, seven (88%) met diagnostic criteria for a current mood disorder, five (63%) for a current anxiety disorder, and five for a personality disorder. Subjects in this group suffered from moderately severe depression and mild to moderate anxiety, and had moderate to severely impaired social functioning (See Table 2).
Structured assessment data (N = 8)
Termination of the Study
Calculated on the basis of eight of eleven eligible patients accepting randomization, the 95% confidence interval was that 39% to 92% of eligible subjects would participate if we proceeded to the full 360-patient study.
SWAP-II Prototype and Personality Trait Scales
Table 3 presents the mean T scores for the SWAP-II prototype and personality trait scales of the eight patients based on the independent ratings of the pre-treatment CDI transcripts. These scores reveal sample characteristics of depression, internalizing symptoms, emotional avoidance, anxious-somatization, and unstable commitments. These ratings also demonstrated that those in the sample possessed the psychological resources and health necessary to make use of the different treatments.
SWAP-II prototype and personality trait subscales (N = 8)
Note: T = Standardized T score; where T 50 is the nonclinical mean and 1 standard deviation unit, positive or negative, is equivalent to 10 T score points.
Reliability of COPPS-AS
Our original version of the COPPS-AS contained 21 items independently rated by the four judges for all 43 sessions. However, during the initial organization of the COPPS-AS ratings data, two of the original items—the PSA item “The therapist draws connections between the therapeutic relationship and significant patient relationships (i.e. care-takers) from the past” and the SE item “The therapist notes the appearance of symptoms in the context of the session and links these to the patient’s CCRT or relational patterns”—were found to have occurred at an extremely low base rate (3 instances each). Because the reliability of these items could as a result not be evaluated, they were removed from any further analyses. In addition, a third item—the CB item “The therapist explores the underlying assumptions that develop or maintain symptoms”—was rated with good-to-excellent reliability (ICC > .70), yet provided relatively poor differentiation among the three treatment groups. This item too was removed from further analyses. This led to a Beta version of the COPPS-AS, with 18 total items, 6 items per subscale (PSA, SE, and CB), presented in the Appendix. The Beta version will be the focus of the analyses to follow.
Based on 43 sessions, each rated by two of the four judges, the interrater reliability of the 18-item COPPS-AS was examined using the one-way random effect model intraclass correlation coefficient (ICC [1,1]; Shrout and Fleiss 1979), as well as the Spearman-Brown correction for the one-way random effect model ICC representing the mean reliability across two raters (ICC [1,2]; Shrout and Fleiss 1979). As shown in Table 4, all the COPPS-PSA, COPPS-SE, and CPPS-CB items achieved ICC (1, 1) values in the fair (.40–.59; Fleiss, 1981) to excellent range (≥ .75; Fleiss 1981), and all ICC (1,2) values were in the good (.60–.74; Fleiss, 1981) to excellent range (≥ .75). In addition, ICCs for all three of the COPPS-AS subscale scores (PSA, SE, CB) were in the excellent range. Further statistical analyses of COPPS items reported in the psychometric characteristics and initial validation stage of this study used the average rating of the two coders represented by the ICC (1,2) values.
Interrater reliability for COPPS-AS items and subscales (N = 43)
COPPS-AS Treatment Session Differentiation
An analysis of variance (ANOVA) was used to determine if differences existed for the three subscale scores across the three treatments (PSA, SE, CB). As hypothesized, each of these three analyses demonstrated that significant differences in these techniques were present across the three types of treatment sessions (df = 2, 40, F > 6.0, p < .005). Examination of post-hoc group contrasts revealed that both the PSA and the SE sessions were rated significantly higher (p < .001) than the CB sessions on the COPPS-PSA subscale. Conversely, CB sessions were found to be rated significantly higher (p < .001) than both PSA and SE sessions on the COPPS-CB subscale. For the COPPS-SE subscale, SE sessions were found to be rated significantly higher (p < .001) than CB sessions, and demonstrated a trend toward significance in the differentiation of PSA sessions (p = .09), representing a moderate effect (d = .64) favoring SE treatment. The lack of significant differences between the SE and PSA sessions on the COPPS-PSA subscale, and to a lesser extent the COPPS-SE, could be due to both conceptual and technical overlap of these treatments. However, it is also quite possible that this may represent an issue of statistical power, given the limited number of overall sessions (43) and the very small number of rated PSA sessions (6). As a reslt, we consider this entire set of analyses quite preliminary and in need or further replication. Given this current trend toward significance and moderate effect, it seems quite likely that by substantially increasing the number of rated sessions this effect will grow to reach conventional levels of significance, allowing the COPPS-SE to differentiate analysis and dynamic therapy even more. In addition, we believe that adding additional sessions, especially PSA sessions, will allow a more fair evaluation of the ability of COPPS-PSA scale items to differentiate analysis and dynamic therapy.
Discussion
The present pilot study evaluated the feasibility of randomizing patients between PSA, SE, and CBT. We found that eight of eleven eligible subjects (73%) agreed to be randomized, enabling us to predict with a 95% confidence interval that between 39% and 92% of eligible subjects would participate in the full study. One eligible patient refused to participate, wanting only CBT, another specifically refused analysis, and a third simply did not show up for diagnostic assessment. Our findings extend the results of the Munich study and establish that it is possible to randomize patients to high-intensity, open-ended psychoanalytic treatment, three to four times weekly on the couch, and psychodynamic and cognitive behavioral therapies of limited duration. This finding supports the feasibility of implementing a large-scale, multi-site RCT of psychoanalysis in the United States. Moreover, it strongly argues against rejecting randomization in comparative studies of psychoanalysis on the grounds of feasibility. The results of the Munich study and this pilot study serve as a reminder that a belief that a condition is true (e.g., that patients will not accept randomization to psychoanalysis), no matter how strongly asserted or widely held, is only a belief until experimentally tested.
Having addressed the feasibility of using a randomized design in a psychoanalytic outcome study, we are left with the question, Even if feasible, is randomization desirable? The major advantages of randomization are a high level of evidence and a high level of acceptance by the broader scientific community. In the absence of a randomized design, there is a strong likelihood of between-group differences on clinical and demographic characteristics and thus a very limited basis on which to draw conclusions about the differential effects of treatments. While it may be possible to take these differences into account by statistical methods, nested within natural group formation are inevitable differences in unmeasured, indeed perhaps unknown, variables that may significantly affect response to treatment. A further benefit of randomized designs is that, with adequate sample size, randomization makes it possible to look at subgroups of patients within the cohort to identify which groups of patients are likely to do better in which treatments. Randomization of the entire sample makes it possible to assume randomization within any subgroup within the sample and to make comparisons between outcome in therapy and analysis. On the other hand, many questions can be raised about the generalizability of findings of an RCT comparing analysis to other treatments of shorter duration and lesser intensity. These concerns have been cogently articulated by Barber (2009), Leichsenring (2004), Sandell (2001), and Westen, Novotny, and Thompson-Brenner (2004), among others. While it is possible that, as has been suggested, randomization of patients to analysis may result in a poorly generalizable study population, diminished treatment alliance and therapist/patient match, and decreased patient motivation, there are few empirical data confirming these possibilities. Even commonsense expectations may be found invalid when subjected to empirical study, and some data suggest this may be the case for these beliefs about randomization. For example, several studies have reported finding few differences between depressed patients who are willing to be randomized and those expressing a strong preference for one treatment over another. The largest such study, conducted by Chilvers et al. (2001), randomized 103 general practice patients with major depressive disorder to enhanced treatment with antidepressant medication or six psychotherapy sessions, while 220 additional patients were assigned to their treatment preference. With the exception that patients preferring antidepressant medication were more severely depressed, no significant differences in baseline clinical or demographic characteristics were found between the randomized sample and the patients assigned to their preference. Patients receiving their preferred treatment did not improve significantly more at eight weeks than did patients randomized between the treatments. Further, there were no differences in dropout rates or patient satisfaction between the randomized sample and the preference groups. These data suggest that assumptions made about the poor generalizability of RCTs of psychoanalysis, though apparently commonsensical, may be incorrect, and that rejecting the high evidentiary value of an RCT on this basis therefore has no merit. The only way to resolve this issue is to study it empirically.
The present study was also an initial investigation of the development, reliability, and validity of a new measure assessing psychotherapy techniques used in treatment sessions, the COPPS-AS. In particular, this is one of the first attempts to provide a functional adherence scale for the practice of psychoanalysis. Some may have the opinion that evaluating adherence to psychoanalytic principles is impractical or impossible, if not also downright undesirable. However, for the first time, through both “bottom-up” examination of the psychotherapy process research and input from close collaboration with psychoanalysts, we have developed a short scale of psychoanalytic practice that has demonstrated excellent interrater reliability and promising validity. These results from all three subscales suggest that the COPPS-AS possesses excellent interrater reliability, with items and subscales comparing favorably to (or even exceeding) the reliability of other therapist activity measures reported in the literature, and demonstrate that the therapist activity covered in the COPPS-AS can be reliably coded with adequate training. Thus, it seems that the COPPS-AS items are written in clear, descriptive, experience-near terms that allow raters to assess the use of a technique or intervention being employed in a session and does not require inferences about internal mental processes or highly specialized clinical and theoretical knowledge.
In sum, there is a compelling need for methodologically rigorous outcome research comparing psychoanalysis to treatments of lesser intensity and duration and of different orientation. In developing a comparative outcome study it is important to select a design that optimizes both the scientific validity and the clinical relevance of potential findings. Until recently it has been assumed that it is not feasible to use a randomized design to study psychoanalytic outcome. A variety of alternative approaches have been employed, each with limitations. The findings we report here support the feasibility of implementing an RCT comparing psychoanalysis as defined by the American Psychoanalytic Association (three or four sessions a week on the couch for approximately five years) with shorter-term dynamic or cognitive-behavioral therapy once or twice a week. Our findings are consistent with and extend recent experience in Europe.
Footnotes
Appendix: COPPS Adherence Scale
Acknowledgements
The authors thank Kristen Capps, Anthony Mullin, Frank Pesale, and Jenelle Slavin-Mulford for COPPS-AS ratings and Jason Mayotte-Blum and Meaghan Leahmann for CDI interviews and SWAP-II scoring.
