Abstract
Emotional reactions are a vital part of the therapeutic relationship. The Feeling Word Checklist–24 (FWC-24) is an instrument asking the clinician (or the patient) to report to what degree he or she has experienced various feelings during a therapeutic interaction. The aim of this study was to assess the factor structure of the clinician-rated FWC-24 when taking dependencies in the data into account. The sample was deliberately heterogeneous and consisted of 4,443 ratings made by 101 psychotherapists working with different psychotherapy methods in relation to 191 patients of different ages, genders, and with different primary diagnoses. A random intercept-only model revealed large intraclass correlation coefficients at the therapist level, indicating that a multilevel analysis was warranted. A two-level exploratory factor analysis with therapists as the between level and patients plus sessions as the within level was conducted. The items from FWC-24 were found to be best represented by four factors on the between level and four factors on the within level. The factor structures were largely similar on the two levels and were labeled Engaged, Inadequate, Relaxed, and Moved. The different factors explained different amounts of variance on different levels, indicating that some factors are more therapist dependent and some more patient dependent.
Keywords
A consistent finding from psychotherapy research is that aspects of the therapeutic relationship are robust predictors of outcome (e.g., Bhatia & Gelso, 2013; Horvath, Del Re, Flückiger, & Symonds, 2011; Lambert & Barley, 2002; Safran, Muran, Demaria, Eubanks-Carter, & Winston, 2014). Like all social contacts, the therapeutic relationship is to a large extent formed and regulated by emotional processes (Black, Hardy, Turpin, & Parry, 2005). Therapists’ emotional reactions have the potential to interfere with, or facilitate, the treatment process and to be used as material for reflection in the therapeutic work. Empathic interest, genuineness, and tolerance from the therapist toward the patient have been shown to be associated with a strong alliance (Ackerman & Hilsenroth, 2003; Baldwin, Wampold, & Imel, 2007; Røssberg, Karterud, Pedersen, & Friis, 2007; Summers & Barber, 2003). Furthermore, the therapist’s own emotional reactions may be considered as important information guiding clinical assessment as well as therapeutic interventions. It is essential for the therapist to be attentive to and make constructive use of feelings evoked during sessions (Safran et al., 2014). Unexamined or unrecognized emotional reactions in the therapist, either positive or negative, may contribute to alliance ruptures, patient dropout, and attenuated treatment effect. The therapist’s management of own feelings has been shown to be strongly associated with the outcome of therapy (Hayes, Gelso, & Hummel, 2011).
Findings like these have led to attempts to systematize emotional aspects of the therapeutic relationship as well as characteristics of the therapist’s reactions that might affect the psychotherapy process. Instruments like the Psychotherapy Process Q-Set (Jones, 1985) and the Countertransference Factors Inventory (Van Wagenor, Gelso, Hayes, & Diemer, 1991) have been developed for assessing therapists’ actions and emotional involvement in recorded psychotherapy sessions. Other ways to capture therapists’ emotional reactions have been to ask therapists to self-report emotional reactions during or after sessions (e.g., Betan, Heim, Conklin, & Westen, 2005; Meyer, 1988). The Feeling Word Checklist (FWC; Whyte, Constantopoulos, & Bevans, 1982) is a self-report questionnaire in which clinicians or patients are asked to report the occurrence and/or degree of various feelings in psychotherapy sessions or inpatient care settings. The FWC exists in several versions with various numbers of items answered either dichotomously (yes/no) or on rating scales.
Development of the FWC
The FWC was developed as a 30-item scale with yes or no options. Almost all studies on the FWC have been made with clinician ratings. The first study on the FWC showed typical response patterns related to the clinician, the clinician’s professional role, the patient, as well as the unique match between clinician and patient (Whyte et al., 1982). Holmqvist and Armelius (1996) replicated these findings with the exception that they did not find a response pattern related to the professional role of the clinician. Taking this research further, they discovered that aspects of the therapist’s self-image influenced their response on the FWC (Holmqvist & Armelius, 2000). The FWC has also been used to investigate associations between the therapist’s feelings and other aspects of the psychotherapy process, such as patient characteristics or outcome (e.g., Dahl, Røssberg, Bøgwald, Gabbard, & Høglend, 2012).
The ideal number of FWC items is debated, with different versions ranging from 24 to 58 items. More items may generate more nuanced ratings. On the other hand, the instrument then becomes more time consuming for clinicians, causing possible data loss and risk of less reliable ratings (Dahl et al., 2012).
Previous Factor Analyses of the FWC
The diverse versions of the clinician rated FWC have yielded differing factor solutions in previous studies on the instrument (see Table 1). Data have been collected in different clinical settings, such as inpatient/institutional care (Hoffart & Friis, 2000; Holmqvist & Armelius, 1994; Katsuki, Goto, Takagi, Ozdemir, & Someya, 2006; Røssberg, Hoffart, & Friis, 2003) or individual psychotherapies (Dahl et al., 2012; Holmqvist, Hansjons-Gustafsson, & Gustafsson, 2002; Ulberg, Nærdal, Olsen, & Eide, 2013). Analyses have been made on Swedish (Holmqvist & Armelius, 1994; Holmqvist et al., 2002), Norwegian (Dahl et al., 2012; Hoffart & Friis, 2000, Røssberg et al., 2003; Ulberg et al., 2013), and Japanese (Katsuki et al., 2006) versions of the scale. Furthermore, different authors have chosen different statistical methods for eliciting factors. Most studies have used principal component analysis, but a few have used proper factor analysis. The number of factors reported varies between three (Hoffart & Friis, 2000) and seven (Holmqvist & Armelius, 1994; Røssberg et al., 2003), and explained variance varies between 41% (Dahl et al., 2012) and 69% (Holmqvist et al., 2002). Generally, all studies have found at least one factor reflecting positive feelings of engagement and interest, and at least one reflecting negative feelings of disengagement and/or frustration. Furthermore, some studies have also found a factor reflecting feelings of neutrality or relaxation (Holmqvist & Armelius, 1994; Ulberg et al., 2013).
Previous Factor Solutions on the FWC.
Note. FWC = Feeling Word Checklist; PCA = principal component analysis; FA = factor analysis.
Methodological Issues
The ambiguity regarding the factor structure of the FWC in different studies is partly due to the different versions of the instrument. Other reasons may be too little variability among items, low variability among clinical settings and patients within previous studies combined with high variability among clinical settings and patients between studies, as well as the fact that the items sometimes have been rated dichotomously (yes/no) and sometimes on rating scales (Røssberg et al., 2003).
Furthermore, all previous studies on FWC factors share the limitation that interdependencies due to nesting in the data have not been taken into account. This is a statistical issue especially relevant in analyses of clinician-rated instruments. Typically, each therapist has rated his or her reactions toward each of a number of patients. This creates dependencies among patients within therapists. The probably incorrect assumption has typically been made that the variance in ratings is solely at the patient level. This leads to bias due to ignored systematic variance at the therapist level (Little, 2013). Since the FWC is not only therapist rated but actually focuses on the therapist, albeit in relation to different patients, the therapist level becomes especially relevant. Several studies (e.g., Holmqvist, 2001; Holmqvist & Armelius, 2000; Whyte et al., 1982) have shown that therapists’ responses to the FWC tend to be relatively consistent over patients and time. This supports the notion that variability in the FWC to some degree reflects systematic and consistent differences between therapists. This component of variability may stem from different factors, one of which is individual emotional response styles among therapists. Different response patterns could also be due to raters’ apprehension of social desirability and to idiosyncratic interpretations of both the feeling words presented and the associations between them (Little, 2013).
When multiple ratings are made by the same rater, nonrandom sources of error contaminate the data, leading to several problems. First, it may lead to bias in parameter estimates, standard errors, and indices of fit. Second, what is assumed to be random errors in the model may in fact be nonrandom, since they may be due to dependency on a higher level. Finally, the factor structure obtained may in fact be, at least in part, based on spurious correlations due to nonrandom variances in the raters. The covariances on the within and between levels will therefore best be represented in different models (Little, 2013). A more appropriate way of investigating the factor structure of the FWC would thus be using a multilevel exploratory factor analysis (MEFA). Although multilevel analysis is more complicated than regular factor analysis, a distinct advantage of this method is that it makes it possible to disentangle therapist response sets from emotional responses attributable to specific patients, or to specific phases of therapy. The aim of the present study was to examine the underlying factor structure of the FWC-24 while taking the relevant levels of nesting in the data into consideration.
Method
Instruments
The FWC-24 is a 24-item self-report questionnaire for use by clinicians and/or patients. In this study, we used the Swedish clinician version. The instruction asks the clinicians to report to what degree they have experienced various feelings when interacting with a patient, in this case after each session in individual psychotherapies. The 24 feeling words are rated on 4-step scales ranging from 0 (not at all) to 3 (very much).
Sample Description
With the aim of getting a large and heterogeneous sample, FWC-24 data were retrieved from four different data sets, stemming from separate psychotherapy projects carried out in Sweden: the Cognitive Behavioral and Interpersonal Psychotherapy Project in Sundsvall (CIPPS), the Linköping University Relational and Interpersonal Psychotherapy Project (LURIPP), the Erica Process and Outcome Study (EPOS), and the Erica Short-Time Psychotherapy Project (ESTPP). Hence, the data comprises ratings from diverse psychotherapy settings (short-term, time-limited, and open-ended long-term), methods (cognitive behavioral therapy, psychodynamic psychotherapy, relational psychotherapy, and interpersonal psychotherapy) as well as ages of the patients (children, adolescents/young adults, and adults; see Table 2). In total, the sample consisted of 4,443 FWC-24 reports, provided by 101 therapists in relation to 191 patients. Sixty-three of the psychotherapists saw only one patient.
Description of Data Samples.
Note. FWC = Feeling Word Checklist; CIPPS = Cognitive Behavioral and Interpersonal Psychotherapy Project in Sundsvall; LURIPP = the Linköping University Relational and Interpersonal Psychotherapy Project; EPOS = the Erica Process and Outcome Study; ESTPP = the Erica Short-Time Psychotherapy Project; PDT = psychodynamic therapy; CBT = cognitive behavioral therapy; IPT = interpersonal psychotherapy; BRT = brief relational therapy.
The LURIPP has therapists working with both IPT and BRT.
The CIPPS (Ekeblad, Falkenström, Vestberg, Andersson, & Holmqvist, 2015) is an ongoing randomized study on the effects of interpersonal psychotherapy (IPT) and cognitive behavioral therapy (CBT) for major depressive disorder. Therapy length is 10 to 14 sessions. The sample in this study consists of 61 adult patients (69% female, 31% male) included at the time of the present analysis. Twenty-six of the patients had concurrent personality disorders. Of the 30 therapists, 21 delivered CBT and 9 delivered IPT. The therapists were psychologists, psychiatric nurses, social workers, nurse-assistants, and occupational therapists, all with at least basic training in psychotherapy. All therapists were trained in either CBT or IPT, and had regular supervision as well as regular training seminars.
The LURIPP (Swedish Research Council for Working Life and Welfare, 2008) is an ongoing randomized outcome study investigating the effects of IPT and brief relational therapy (BRT) for adult patients with major depression. Therapy length in both conditions is 16 sessions. The sample in this study consists of the first 19 consecutive patients included at the time of this analysis. Seven therapists delivered the treatments. All had completed at least basic psychotherapy training and had in addition undergone 2 years of specific training in BRT and IPT. All therapists in LURIPP deliver both IPT and BRT and receive continuous supervision in both modalities.
The EPOS (Carlberg, Thorén, Billström, & Odhammar, 2009; Odhammar & Carlberg, 2015; Odhammar, Sundin, Jonson, & Carlberg, 2011) is a multicenter effectiveness study on psychodynamic child psychotherapy. Thirty-two children aged 5 to 10 years with mixed diagnoses were included. Parallel sessions were held with the parents. A subsample of 28 children and their parents was available at the time of this study, with one therapy having two parallel parental contacts, one therapy missing FWC data for the child sessions and one therapy missing data for the parental sessions, leading up to a total of 55 patients. The psychotherapies varied between weekly or biweekly, with a total of 20 to 152 sessions. Fifty-three psychologists/social workers in Sweden delivered the therapies. The majority had specialist training in psychodynamic child psychotherapy and those who did not received regular supervision. Experience ranged between 1 and 30 years.
The ESTPP (Thorén, 2015) is an outcome study of short-term psychodynamic psychotherapy for children and adolescents/young adults. Thirty-nine children with mixed disorders were included in the present data set. The most prevalent disorders in the younger age group (4-11 years) were anxiety and attention/disruptive behavior disorders, and in the older age group (16-23 years) anxiety and depressive disorders. Therapy length was 12 sessions. For the 17 children in the younger age group, parallel work with their parents (12 sessions) was provided, leading to a total of 56 patients being included in this data set. Eleven fully qualified psychotherapists with long experience of child and adolescent psychotherapy delivered the therapies. Their mean age was 54 (range: 45-63 years) and their occupations were psychologists, social workers, and psychiatrists.
Statistical Analysis
Multilevel Exploratory Factor Analysis
First, it was considered important to investigate how much of the variance in FWC item scores that could be accounted for by therapist effects. If there is little variance at the therapist level, a MEFA is not warranted. According to Muthén (1997), a multilevel structure of the data should be modeled when intraclass correlation coefficients (ICCs) are .10 or higher in a sample larger than 15. To calculate ICCs, a null (i.e., random intercept only) three-level model was run, with sessions (repeated measurements) as Level 1, patients as Level 2, and therapists as Level 3. On the patient level, ICCs (shown in Table 3) varied between .08 and .14, with .11 as the median. On the therapist level, ICCs varied between .11 and .37, with .22 as the median. 1 Thus, for all items, at least 49% of the variance was due to variance among repeated measurements (i.e., 100% to 14% to 37% to 49%). This indicated that multilevel modelling was clearly warranted.
Descriptive Statistics.
Note. ICC = intraclass correlation coefficient.
Patients plus repeated measurements level. bTherapists level.
With the data structure of the present sample, the best option would probably have been to conduct a three-level EFA. However, we are unaware of any statistical software that can estimate a three-level EFA. As the ICCs indicated that the therapist level contributed essential variance, more so than the patient level, we chose to merge Levels 1 and 2 (repeated measures and patients) and estimate a two-level model with therapists at the between level (Level 2) and patients plus repeated measurements (sessions) at the within level (Level 1). The factor structure was thus explored by conducting repeated two-level EFAs on the 24 items with combinations of increasing numbers of factors up to 5 on the two levels (with more than 5 factors the following ones had eigenvalues < 1). Factors were rotated to the Promax (Geomin) criterion. The MEFAs were run in Mplus version 7.1 (Muthén & Muthén, 2012). Due to the few response categories (four) and the relatively high skewness and kurtosis of many items (see Table 3), data were analyzed as ordered categorical (ordinal), using the robust weighted least squares estimator (Muthén, du Toit, & Spisic, 1997). This estimator assumes a normally distributed latent variable for each item, and thresholds for the point where the likelihood of endorsing adjacent response categories is equal (50/50) are estimated. This way, equal intervals between response categories do not need to be assumed, and deviations from normality of observed variables are not as much of a problem as with continuous data (Kline, 2011). Sessions where all data were missing were excluded by default, while sessions with occasional missing data were included, assuming them to be missing at random.
Model Fit Evaluation
When conducting multilevel analysis, evaluating model fit for the entire model may cause several problems. When sample sizes are larger on the lower level (patients, in this case) than on the higher (therapists) level, model fit evaluation obtained by the standard approach is likely to be dominated by the model fit of the lower level. Therefore it may not be sensitive to a lack of fit at the higher level. Furthermore, a model fit evaluation of the entire model resulting in poor fit does not indicate whether the model is poor at the higher, lower, or both levels (Yuan & Bentler, 2007). For the purpose of evaluating level-specific sources of misspecification, levels should therefore be evaluated separately (Ryu, 2014). This was done by estimating a full multilevel model with a saturated therapist level model (i.e., degrees of freedom = 0; Ryu & West, 2009). Since a saturated model perfectly fits the observed covariance matrix, any lack of fit will then be due to misspecification on the patient level model. To evaluate model fit of the therapist level model, the same procedure was followed with a saturated patient level model.
Model fit was tested using the chi-square test of exact fit, whereas approximate model fit was assessed with the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the standardized root mean square residual (SRMR). For RMSEA, a close model fit is considered for values of .05 or less, whereas point estimates of .05 to .08 are considered an acceptable level of approximation. Values of .10 and above indicate bad fit (Byrne, 2006). CFI values of 0.96 or higher combined with SRMR levels of .09 or lower are considered acceptable (Hu & Bentler, 1999). Table 4 displays fit indices for the repeated MEFAs with various combinations of numbers of factors on the two levels. In general, fit was excellent. The solution settled for was chosen primarily on the grounds of simple structure, indicated by smallest number of items with more than one loading > .40 or < − .40 and smallest number of items with no loading > .40 or < − .40.
Fit Statistics for the Exploratory Factor Analysis.
Note. WLSMV χ2 = chi-square statistic from the robust weighted least squares estimator; df = degrees of freedom; CFI = comparative fit index; RMSEA = root mean square error of approximation; 90% CI = 90% confidence interval for the RMSEA; SRMR = standardized root mean square residual (within and between level).
Results
Exploratory Factor Analysis
As shown in Table 4, model fit was excellent for several of the factor solutions investigated. However, considering model fit, eigenvalues, simple structure, interpretability, and explained variance, four factors on the between level, and four factors on the within level appeared to offer the best fit to the data. CFI and RSMEA were recalculated according to formulas proposed by Ryu and West (2009). Since this did not change the results in any relevant way (the results were the same up to the third decimal), the results presented stem from the original calculations. The CFI (0.996) as well as the RMSEA (.009) and SRMR (.034 and .051 for the within and between level, respectively) indicated close model fit.
Within-Level Exploratory Factor Analysis
The four factors explained 64% of the variance on the within level. Item loadings are presented in Table 5. The first factor, which was named Engaged, explained 40% of the variance and contained feeling words that represent therapist emotions such as happy, playful, and energetic. The second factor, labeled Moved, explained 12% of the variance and included words representative of the therapist’s touched emotions such as surprised, moved, overwhelmed, and warm. The third factor, labeled Inadequate, explained 8% of the variance and consisted of feelings of a clearly negative character representing the therapist feeling of little help, annoyed, and bored. The last extracted factor, labeled Relaxed, explained 5% of the variance and subsumed words indicating feeling calm rather than tense and nervous.
Feeling Word Checklist–24: Pattern Matrices for Within- and Between-Therapist Levels, Obtained via Promax Rotation.
Note. Loadings > .40 or < − .40 are bolded in the table.
The within-level EFA, carried out with a saturated between-level structure, showed close model fit as indicated by the CFI (0.995) as well as the RMSEA (.014) and the SRMR (.034).
Between-Level Exploratory Factor Analysis
The four factors explained 74% of the variance on the between level. To a large extent, the factors were similar to the ones achieved in the within-level EFA and can be seen as representing the same constructs. However, the order among the factors, based on the variances accounted for, was clearly different (see Table 5). The first factor, Inadequate, explained 40% of the variance. The second factor, Relaxed, explained 19% of the variance. The third factor, Moved, explained 8% of the variance. The fourth factor, Engaged, explained 7% of the variance. The between-level EFA, carried out with a saturated within-level structure, showed excellent model fit, as indicated by the CFI (1.00) as well as the RMSEA (.000) and the SRMR (.051).
Comparing Within- and Between-Level Factor Structures
Table 5 shows within- and between-level factor loadings. Inspecting Table 5, it is obvious that although most of the items for each factor have similar loadings at the within and between levels, the patterns of loadings as a whole are not very similar. For example, the Engaged factor has relatively strong loadings on the feelings content (.59), free (.50), and open (.48) at the within level, while all of these loadings are small at the between level (content = .22, free = − .01, open = .18). On the other hand, on the between level, the word nervous loads quite strongly on the Engaged factor (.45), while on the within level, this loading is almost zero (.05). There are examples of differential patterns of loadings on all factors. Since formal measurement invariance testing is not possible in EFA, we instead explored similarity across levels by simply correlating factor loadings on the within and between levels. These analyses showed statistically significant and relatively strong correlations across levels (Engaged: r = .63, p = .001; Moved: r = .82, p < .001; Inadequate: r = .94, p < .001; Relaxed: r = .88, p < .001). This means that between 40% (Engaged) and 88% (Inadequate) of variance in factor loadings are the same across levels. The variance that is not explained by similarity across levels would be “cluster bias,” that is, factor structures varying among therapists (Jak, Oort, & Dolan, 2013) plus measurement error for the correlations. From this, it would seem that the amount of cluster bias probably is quite large for the Engaged factor, while the other three factors may not be as sensitive to this issue.
Discussion
The aim of the present study was to investigate the factor structure of the FWC-24 when taking the relevant levels of nesting into account. Data were collected from four psychotherapy projects in Sweden with the deliberate aim of getting a rich and heterogenic sample. One hundred and one therapists provided 4,443 FWC-24 reports in relation to 191 patients. The multilevel structure of the data demanded use of MEFA. The items from FWC-24 were best represented by four factors on the within level and four factors on the between level. The model explained a large share of the variance, and model fit was excellent.
The factor solutions on the within and between levels were similar in terms of number of factors, and the factors could easily be conceptualized as representing the same constructs, all conceptually coherent and clinically recognizable. However, different factors explained different amounts of variance on different levels, indicating that the different factors are of different importance on the patient/session and the therapist levels, respectively. Specifically, it seems that there is more heterogeneity among therapists regarding their negative feelings (Inadequate) or more neutral ones (Relaxed) than regarding their plainly positive feelings toward patients. Conversely, when controlling for therapists’ personal styles, it seems that differences within therapists, in relation to different patients and time points during therapy, are largest regarding positively tinged feelings of involvement (Engaged and Moved). In other words, negative and relaxed feelings seem to vary more between therapists, while positive feelings seem to vary more between patients within therapists. Thus, feelings of inadequacy and relaxation seem to be more therapist dependent, whereas feelings of engagement and warmth seem to be patient dependent to a larger degree.
Although factors were similar across levels of analysis, there were differences in factor loadings across levels. Formal measurement invariance testing was not possible, but a tentative conclusion from our findings would be that the FWC-24 shows configural invariance (i.e., factors are the same) across therapists, but not invariance of factor loadings (weak or metric invariance) - at least not for the Engaged factor. However, even for the Engaged factor, there may be partial invariance of loadings, since most loadings were similar in magnitude, but a few differed considerably. The therapist level concerns how average differences in feeling-word endorsement among therapists cluster together. For instance, if Therapist A tends to report higher degrees of feeling annoyed compared with other therapists, does Therapist A also report higher average degrees of feeling cold, bored, and indifferent? The patient level, on the other hand, concerns how therapist feelings with different patients and/or at different sessions, after average differences among therapists have been removed, cluster together. As an example, if therapists feel more energetic with a particular patient, or at a particular session, do they also tend to feel more playful, free, and happy? Since it is the same person (i. e., the therapist) doing ratings for both levels, it would seem natural that item clustering would be the same across levels. However, apparently, this is not so for all items, and if item clustering differs across levels, the meaning of raw scores will be ambiguous. For example, if we have a high rating for the feeling ”free,” should this feeling be counted together with the “Engaged” (using the within-level factor structure) or the “Relaxed” subscale (using the between-level structure)? This issue warrants further investigation in future research, since measurement noninvariance across clusters is a limitation of a measure in the sense that each time it is used, the therapist level needs to be differentiated from patient- and repeated-measures levels. To enable differentiation of levels, there must be data on a sufficient number of patients for each therapist so that the general response styles of therapists can be separated from feelings felt with a particular patient in a particular session. However, if a measure exhibits measurement invariance across therapists, raw scores can be used without disaggregation.
One possibility is that it is the tendency to report, rather than the actual experience of negative feelings that varies between therapists. Some therapists may feel more comfortable to be aware of this type of feelings, while others may be reluctant to recognize or admit them. Less controversial feelings such as happiness and playfulness may, in turn, be reported in a more open and honest way, generating larger variances in these items, as they follow the actual fluctuations in the relationship more closely. This idea is supported by the fact that the mean values are substantially lower for most of the negative feeling words compared with the positive feeling words. This could be related to a general idea that psychotherapists should hold positive feelings toward their patients, whereas negative feelings are considered less professional, making these feelings less readily recognized and admitted. Dahl et al. (2012) point to the possibility that some therapists may take a disengaged stance as a defensive act against aggressive feelings. This could explain why, in our factor solutions, the word “indifferent” belongs to the same subscale as words such as “annoyed” and “bored.” Notably, “indifferent” may have a slightly different meaning in this context than in an everyday context. Being indifferent toward a patient as a therapist can in fact be seen as a rather aggressive stance, since this is a deviation from the “therapeutic ideal” where involvement and engagement is expected. This highlights the issue of some items being more difficult to interpret than others, opening up both for them being understood differently by different therapists as well as creating uncertainty regarding the interpretation of the results. One other such item is “overwhelmed,” belonging to the “Moved” scale in the analyses at both levels. The other words related to that scale are more clearly positively tinged, while one can argue that “overwhelmed” may not be a desirable feeling for a psychotherapist. In fact, the item “overwhelmed” belongs to more negatively tinged factors in several other studies, for example, Overwhelmed (Røssberg et al., 2003), Inadequate (Dahl et al., 2012; Ulberg et al., 2013), or Dejected (Holmqvist et al., 2002). Betan et al. (2005) finds that the statement “I feel overwhelmed by his or her needs” belongs in a factor named Helpless/Inadequate, together with other feelings of anxiousness and being unhelpful. However, being overwhelmed by someone’s needs is a very specific way of being overwhelmed, which may be easier to interpret as negative. Betan et al. (2005) found that feeling inadequate and/or overwhelmed was particularly associated with axis II-pathology in patients and Colli, Tanzilli, Dimaggio, and Lingiardi (2014) found the same, particularly for borderline personality disorder. Røssberg et al. (2013) found that therapist ratings on the “Overwhelmed” scale was negatively related to outcome on global assessment of functioning in a patient sample with personality disorders. The meaning and implications of the “Moved” scales found in this analysis, therefore, need to be further evaluated in future studies.
It is clear that social desirability is an important perspective to take into account when discussing the FWC and therapists’ emotional reactions in general. On the other hand, it is possible that positive feelings are indeed more prominent for most therapist encounters with patients, and that positive feelings and expectations are more of a “baseline state” with negative feelings as deviations in certain situations. Furthermore, it is not possible to say, on the basis of the present study, whether identified differences between therapists regarding the reporting of negative feelings reflect a defensive stance, an unwillingness to report, or truly “genuine” emotions.
The issues described above are in no way unique to the FWC. In contrast, response error is a widely discussed problem in research based on self-reported data. Bradburn, Sudman, and Wansink (2004) identify four basic factors related to response error: memory, motivation, communication, and knowledge. Translated to the present context, respondents may have forgotten what they felt, they may have been unmotivated to give honest accounts (due to social desirability), they may have interpreted the feeling words and the rating scales differently, or they may have not known to what extent they felt a certain feeling. Further research on the relationship between FWC reports and psychotherapy process and outcome variables is needed to investigate these matters more closely.
There is some conceptual overlap but also clear differences between the factors in this study and factors obtained in previous studies on the FWC. Several studies on the FWC have found more than four factors. However, all previous studies that we know of, specifically conducted on data from individual therapies (i.e., Dahl et al., 2012; Holmqvist et al., 2002; Ulberg et al., 2013), have, like we, identified four factors. These studies, however, found more negative factors than we did. Dahl et al. (2012) found two negative and two positive factors, Ulberg et al. (2013) two negative, one positive, and one neutral factor, and Holmqvist et al. (2002) three negative and one positive factor. Regarding the studies from Dahl et al. (2012) and Holmqvist et al. (2002), one explanation could be that they used longer versions of the scale, containing more negative feeling words. The study by Ulberg et al. (2013) was conducted on adolescents only, a patient group where it might be possible that negative feelings are more likely to be evoked. The Engaged factor in our study corresponds largely to the Confident factor found by Ulberg et al. (2013) and by Dahl et al. (2012), as well as the Positive factor found by Holmqvist et al. (2002). The Inadequate factor in our study can be represented as a combination of the Inadequate and the Disengaged factors in the studies conducted by Ulberg et al. (2013) and Dahl et al. (2012). It is also similar to the Distant factor found by Holmqvist et al. (2002). The Relaxed and Moved factors do not show clear similarities to any of the factors found in previous studies. Differences may reflect the multilevel nature of the data and the consequences of taking them into account or not.
The mean scores in the present study are more in line with the ones found by Holmqvist et al. (2002) and Ulberg et al. (2013) than the ones found by Dahl et al. (2012). One possibility is that the shorter length of the checklist opens up for more nuanced ratings, whereas the longer one is rated with more zeroes (or missing values imputed as zeroes). Dahl et al. (2012) also discuss the low mean scores found in their study as a possible consequence of a different heading of the questionnaire. While the present study as well as Holmqvist et al. (2002) used a questionnaire with the heading “Together with the patient during this session, I have felt,” Dahl et al. (2012) used a questionnaire with the heading “Countertransference.” This might have affected the feelings reported, since the word “countertransference” is a theoretical concept that may have specific (e.g., negative) connotations to some therapists.
The FWC-24 is the shortest available version of the FWC. Advantages of a shorter version include it being less time consuming, therefore making it more clinically usable. Also, the longer versions increase the risk of data loss and less conscientious ratings by the therapists (Dahl et al., 2012). However, a disadvantage of the short version that can be argued for is that it is not representative enough of the spectrum of feelings that therapists may experience with patients and hence not allowing nuanced ratings. For example, the 24-version set of items lacks more intense or nuanced feelings of anger, having “annoyed” as the only option. Dahl (2012) pointed out that Pope, Keith-Spiegel, and Tabachnick (1986) found that 86% of psychotherapists reported having been sexually attracted to their patient and 63% experienced feelings of guilt or confusion about this. Despite this, all versions of the FWC lack sexual or romantic feelings such as “aroused” and “attracted.” This can be seen as a limitation of the instrument. On the other hand, it is possible that an even shorter version of the checklist would be sufficient to capture the most important feelings, or groups of feelings, and that the psychometric properties of such a checklist would be as good as of this one. This is a possible area for future research.
Strengths and Limitations
The most important limitation of this study is the fact that we could not conduct a MEFA separating patients from repeated measurements within patients. However, ICCs for a separate patient level were relatively low (range: .078 - .136) especially compared with ICCs for the other levels, indicating that the most important levels were indeed separated in the analysis.
One of the strengths of the study is the large and heterogeneous sample. This study investigates a sample of 104 therapists, which is considerably more than in previous studies on the FWC, including only between 6 (Dahl et al., 2012) and 41 (Ulberg et al., 2013) therapists. Also, the therapists in the present study worked with different psychotherapy methods and the patients consisted of children, adolescents, parents as well as adults. The fact that previous study samples have been homogeneous regarding psychotherapy method and patient sample has been discussed as a limitation and a possible explanation for the divergent results between studies. The heterogeneous sample of therapists as well as clinical settings and patients enhances the generalizability of the results from the present study. Furthermore, it can be argued that the fact that the MEFA comprises ratings from different time points, rather than only one, enhances the likelihood that the factor solutions found are valid and stable over time. On the other hand, the very heterogeneity of the sample may conceal the existence of moderators that are reflected in the divergent results of different studies.
Conclusions and Potential Implications
When analyzing data with multiple levels, nesting needs to be taken into account. In the present study, a two-level EFA revealed four factors on the therapist level as well as on the patient/repeated-measurements level. The factor solutions on the two levels were similar and seen as representing the same constructs: Engaged, Inadequate, Moved, and Relaxed. The factor solutions had excellent model fit. Furthermore, they described meaningful constructs to some extent overlapping with factors found in previous studies. The factors may be useful in future research regarding therapists’ feelings in relation to the process and outcome of psychotherapy. The fact that we now have a basis for subscales on the therapist level as well as the patient-session level gives opportunities for more precise studies on this important subject.
Footnotes
Acknowledgements
The authors wish to thank Gunnar Carlberg, PhD, professor and Fredrik Odhammar, MSc, for generously providing data from the EPOS project.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
