Abstract
The Brief Experiential Avoidance Questionnaire (BEAQ) is a 15-item short form of the Multidimensional Experiential Avoidance Questionnaire. This study aimed to investigate psychometric properties of a German translation of the BEAQ in a student and a clinical population. The BEAQ showed high internal reliability and overall acceptable convergent and discriminant validity. The BEAQ displayed adequate 7- to 13-day test–retest reliability and captured changes in experiential avoidance when experiential avoidance was targeted in treatment. Confirmatory factor analyses indicated that a bifactor structure where the BEAQ is modeled as one general and five specific factors that correspond to the Multidimensional Experiential Avoidance Questionnaire subscales fit the data adequately. All items (except Item 1 in the clinical population) loaded on the general factor and common variance was approximately equally spread across the general and specific factors. The Distress Endurance subscale was not included in this model, since it is represented by only one item, which showed poor performances and low associations to the BEAQ’s total score in both samples. We recommend further research into the BEAQ’s factor structure to substantiate our preliminary findings.
With comorbidities between disorders so high (Jacobi et al., 2014), research interests have begun to shift toward investigating shared transdiagnostic mechanisms across disorders (e.g., Mansell et al., 2008). Instead of categorical diagnoses, transdiagnostic constructs are increasingly emphasized as vulnerability and maintenance factors of psychopathology and suitable assessment instruments are needed to capture these constructs.
One example of a transdiagnostic mechanism discussed in the onset and maintenance of several disorders is experiential avoidance (Harvey et al., 2004; Hayes et al., 1996). Experiential avoidance (EA) emphasizes how people relate to their inner experiences, including emotions, thoughts, and physical sensations (Hayes et al., 1996). EA spans both the unwillingness to be in contact with these adverse inner experiences and the measures taken to avoid the cause of these experiences and lessen the adverse experiences themselves (Hayes et al., 1999). EA thus is a broad concept that includes overt avoidance behaviors as well as more subtle avoidance tendencies. Conceptualized as a transdiagnostic vulnerability, EA shows correlations with negative affect (Gámez et al., 2011) and symptoms of depression and anxiety (Chawla & Ostafin, 2007). High associations with other mechanisms discussed in the onset and maintenance of psychopathology like neuroticism (Latzman & Masuda, 2013), anxiety sensitivity (Spinhoven et al., 2017), emotion and cognition suppression, lack of distress tolerance, and intolerance of uncertainty (Hayes et al., 2013) have been reported in the literature. There is emerging evidence that the transdiagnostic vulnerabilities overlap (e.g., Spinhoven et al., 2016; Spinhoven et al., 2017) and may share a common underlying basis (e.g., Mansell & McEvoy, 2017), which makes the assessment and discrimination of EA (and other vulnerabilities) challenging. The relationship between EA and mindfulness highlights this: EA and mindfulness show conceptual overlap and have been portrayed as beneficial/dysfunctional counterparts of each other (Mitmansgruber et al., 2009), they are correlated (Baer et al., 2006; Masuda & Tully, 2012; Moore et al., 2009; Thompson & Waltz, 2010) and seem to be intertwined in therapy (Hooper et al., 2010; Kearney et al., 2012; Parsons et al., 2019; Rezaei & Hosseini Ramaghani, 2018).
For EA, the most widely used instrument is the Acceptance and Action Questionnaire (AAQ) and its revised version, the Acceptance and Action Questionnaire II (AAQ-II; Bond et al., 2011; Hayes et al., 2004). However, the AAQ-II is a measure of psychological inflexibility—entailing both EA and acceptance—making the AAQ-II unsuitable to capture EA in isolation (Bond et al., 2011). In addition, several studies question the AAQ-II’s discriminant validity because the AAQ-II exhibits higher correlations with measures of negative affect/symptoms than with other measures of avoidance (Gámez et al., 2011; Rochefort et al., 2018; Tyndall et al., 2019; Vaughan-Johnston et al., 2017; Wolgast, 2014).
The Multidimensional Experiential Avoidance Questionnaire (MEAQ) taps into this and was developed to assess the full range of EA with six scales Behavioral Avoidance, Distress Aversion, Procrastination, Distraction/Suppression, Repression/Denial, and Distress Endurance (Gámez et al., 2011). However, the MEAQ’s length of 62 items is a barrier for psychotherapy research where brief measures can facilitate repeated assessment of constructs. For the Brief Experiential Avoidance Questionnaire (BEAQ), 15 items of the MEAQ were selected based on high loadings on a general factor as well as by considering content from all of the six subscales to guarantee a brief, but broad assessment of EA (Gámez et al., 2014). The created measure shows high internal consistency, expected correlations to the long form, psychopathology and other measures of avoidance (Gámez et al., 2014). It also can be discriminated from negative affectivity and neuroticism (Gámez et al., 2014). Tyndall et al. (2019) compared the AAQ-II and BEAQ and found that the BEAQ exhibited favorable discriminant validity over the AAQ-II. With regard to the BEAQ’s factor structure, the authors concluded based on scree plot examination that one general factor best described the data across three study phases (Gámez et al., 2014). However, a recent investigation of the BEAQ questioned the BEAQ’s unidimensionality (Byllesby et al., 2020). In this study, the BEAQ’s factor structure was explored in two treatment seeking veteran samples (N = 179 and 257). The veterans in this study seemed considerably more avoidant (M = 66.7-68.8) than the samples in the original validation study (M = 52.02-56.41 for the patient samples and M = 43.33-49.37 for the nonclinical samples; Byllesby et al., 2020; Gámez et al., 2014). In the confirmatory factor analyses (CFA), the one-factor model fit was poor and three items did not significantly load on the factor (Items 4, 6, and 7 and Items 1, 3, and 6). Modification indices seemed to be high and suggested that the inclusion of several residual covariances between item pairs would improve fit. Overall, the BEAQ seems to be a short and feasible measure of EA. Its brief measurement of the broad construct and its distinction from negative emotionality would make it appealing to use in psychotherapy research.
Objective
The overall aim of the present study was to translate the BEAQ into German and to examine its validity and reliability in a student and clinical population. Additionally, we aimed to fill gaps in the original validation study by investigating the construct’s stability over time and sensitivity to change. Based on the literature, we hypothesized that the German translation would show good internal consistency and higher correlations to other measures of avoidance than to measures of pathology, positive/negative affect, personality, and mindfulness. As to the BEAQ’s stability over time, we expected that a 7- to 13-day retest-reliability in the range between .80 and .90 would be adequate, based on results of other measures of transdiagnostic processes/emotion regulation (see Method section). We hypothesized that EA would be malleable by a treatment targeting EA and that the BEAQ would depict these changes. Based on the original validation study, a one-factor structure should best describe the BEAQ. However, the results of a recent study and the fact that the BEAQ was constructed to cover all six subscales of the MEAQ question the unidimensionality of the scale. The conceptualization of EA as a broad general construct that entails different facets like overt avoidance behaviors and more subtle avoidance tendencies suggests that a bifactor structure that includes a general factor and a set of specific factors may represent an appropriate modeling approach for the BEAQ. Following these considerations, we aimed to explore the BEAQ’s factor structure and examine whether the BEAQ retains the MEAQ’s subscales. For this purpose, we aimed to examine a one-factorial, multifactorial, and bifactor model.
Study 1: Reliability and Validity in a Nonclinical Student Sample
Method
The aim of Study 1 was to explore the BEAQ’s internal and test–retest reliability as well as its construct validity in a nonclinical student sample. In order to assess test–retest reliability, we invited participants to two assessments spaced 7 days apart. Besides the BEAQ, the first assessment included demographic as well as convergent and discriminant measures. In the second assessment, we solely assessed the BEAQ.
Recruitment and Procedure
We posted an invitation to participate in the study in 73 university or study program groups on social media platforms. The invitation included information on the background of the study as well as a link to the 20-minute long survey that was hosted on https://www.soscisurvey.de/. Of 1,080 students, who gave online written consent to participate, we included 596 students in Study 1. We excluded participants if they (a) failed to complete the majority of assessments (n = 451), (b) filled in the assessment more than three times faster than the average (n = 21), or (c) stated that they did not fill in the assessment conscientiously (n = 12). Of the 596 included students, 72.1% were female and 27.9% male. The mean age was 21 years (SD = 2.93) and 74% of participants were single, 25% were in a relationship and 1% was married. To increase participation in the first assessment, participants had the chance to win one of three 50 Euro coupons for an online retailer.
Seven days after the first assessment of the BEAQ, we invited participants to complete the second assessment of the BEAQ within three days (7- to 13-day test–retest reliability). Of the 596 students who were eligible, 356 participated in the second assessment (59.73% response rate). For the second assessment, we applied the same exclusion procedure. Following this procedure, we included 319 participants to calculate test–retest reliability. Of the 319 included participants, 71.2% were female and 28.8% male. The mean age was 21 years (SD = 2.74). 73.7% of participants were single, 25.4% were in a relationship, and 0.9% was married. To increase participation in the second assessment, participants had the chance to win one of 20 fountain pens. Participants who participated in the second assessment did not differ significantly from participants who did not participate in the second assessment, in age, t(594) = 0.73, p = .47; gender, χ2(1) = .25, p = .62; relationship status, χ2(2) = .13, p = .94; or initial BEAQ scores, t(594) = 0.07, p = .94.
Measures
Brief Experiential Avoidance Questionnaire
The BEAQ (Gámez et al., 2014) is a 15-item short form of the MEAQ (Gámez et al., 2011). The short form covers content from all six domains of the MEAQ reflecting the broad concept of EA. Example items include “I go out of my way to avoid uncomfortable situations” and “When something upsetting comes up, I try very hard to stop thinking about it.” The BEAQ is internally reliable (mean Cronbach’s α = .84) and can be distinguished from negative emotionality (Gámez et al., 2014). Items are rated on a 6-point scale from 1 (strongly disagree) to 6 (strongly agree). For the purpose of this study, the English original was translated to German and back-translated by two independent bilingual native speakers of English and German. The back translation was equivalent to the original scale with only negligible deviations in language syntax.
Convergent validity
To investigate the BEAQ’s convergent validity, we included the following self-report measures in the first assessment:
Acceptance and Action Questionnaire II (AAQ-II). The AAQ-II (Bond et al., 2011; Hoyer & Gloster, 2013) is a unidimensional measure of psychological inflexibility and EA. The original scale, the AAQ-I, was developed within the Acceptance and Commitment Therapy framework and was revised for its inconsistent psychometric properties (Bond et al., 2011). The revised AAQ-II comprises seven items of the original measure and is rated on a 7-point scale from 1 (never true) to 7 (always true). Example items are “My painful memories prevent me from having a fulfilling life.” or “My thoughts and feelings do not get in the way how I want to live my life.” The AAQ-II shows high internal reliability as well as high test–retest reliability.
Cognitive–Behavioral Avoidance Scale (CBAS). The CBAS (Ottenbreit & Dobson, 2004; Röthlin et al., 2010) is a multidimensional measure of avoidance with four subscales: Behavioral Social, Cognitive Nonsocial, Cognitive Social, and Behavioral Nonsocial avoidance. The measure was originally developed to assess avoidance in depressed patients. The 31 items are rated on a 5-point scale from 1 (not at all true for me) to 5 (extremely true for me). Example items include “When uncertain about my future, I fail to sit down and think about what I really want” or “Rather than getting out and doing things, I just sit at home and watch TV.” Both the subscales and the total scale showed adequate internal consistency and 3-week test–retest reliability (Ottenbreit & Dobson, 2004). The measure also displayed good convergent and discriminant validity (Ottenbreit & Dobson, 2004).
Discriminant validity
To investigate the BEAQ’s discriminant validity, we included the following self-report measures in the first assessment:
Short Form of the Symptom Checklist (SCL-K-9). The SCL-K-9 (Klaghofer & Brähler, 2001) is a German short form of the Symptom-Checklist 90 (Derogatis, 1977). The SCL-K-9 assesses general pathology with nine items on a 5-point Likert-type scale from 0 (not at all) to 4 (extremely). The SCL-K-9 was derived from the SCL-90 by including the item of each of the nine subscales with the highest correlation to the Global Severity Index. The SCL-K-9 shows high correlations with the SCL-90 (.93) and high internal consistency (.87), making it a brief, but valid alternative to the long form (Klaghofer & Brähler, 2001).
Generalized Anxiety Disorder Screener (GAD-7). The GAD-7 (Löwe et al., 2008; Spitzer et al., 2006) is a seven-item screening instrument for Generalized Anxiety Disorder. Participants rate the occurrence of symptoms of anxiety in the past 2 weeks on a 4-point scale ranging from 0 (not at all) to 3 (nearly every day). The GAD-7 is unidimensional and internally reliable (.89). Although originally developed to screen for Generalized Anxiety Disorder, the GAD-7 can be used as an indicator of elevated anxiety across all anxiety disorders (Kroenke et al., 2007).
WHO Well-being Index (WHO-5). The WHO-5 (Bech et al., 2003; Brähler et al., 2007) is a five-item unidimensional measure of well-being. Items are rated on a 6-point scale from 5 (all of the time) to 0 (at no time). It can also be used to screen for depressive symptoms (Krieger et al., 2014). The WHO-5’s internal consistency is high (Brähler et al., 2007) and the measure shows strong associations with other measures of well-being (Brähler et al., 2007) and depression (Krieger et al., 2014).
Positive and Negative Affect Schedule (PANAS). The PANAS (Krohne et al., 1996; Watson et al., 1988) measures positive and negative affect with two 10-item scales. Participants rate their affect on a 5-point scale from 1 (very slightly or not at all) to 5 (extremely). Both scales show high internal consistency (between .86 and .90 for the positive affect and between .84 and .87 for the negative affect scale) and are quasi-independent (Watson et al., 1988).
Satisfaction With Life Scale (SWLS). The SWLS (Diener et al., 1985; Glaesmer et al., 2011) is the most widely used measure of life satisfaction. It assesses life satisfaction one-dimensionally with five items applying a 7-point scale from 1 (strongly disagree) to 7 (strongly agree). Internal consistency (.87) and 2-month retest reliability (.82) are high (Diener et al., 1985).
Short Form of the Big Five Inventory (BFI-10). The BFI-10 (Rammstedt & John, 2007) is a 10-item short form of the 44-item Big Five Inventory (John et al., 1991). The BFI-10 contains two items for each of the Big Five scales which are rated on a 5-point scale from 1 (disagree strongly) to 5 (agree strongly). Overall, the short form shows acceptable psychometric properties and is an overall good representation of the scales of Extraversion, Neuroticism and Conscientiousness (Rammstedt & John, 2007). Agreeableness and Openness, however, are not as well represented by the short form as the other domains (Rammstedt & John, 2007). Despite this, the BFI-10 seemed to be a suitable measure for our purpose since our main hypotheses are concerning extraversion and neuroticism.
Southampton Mindfulness Questionnaire (SMQ). The SMQ (Chadwick et al., 2008) assesses a relevant domain of mindfulness, the reaction to distressing thoughts and images. The 16 items are rated on a 7-point scale from 0 (strongly disagree) to 6 (strongly agree). The SMQ is unidimensional and internally reliable (.89; Chadwick et al., 2008). For this study, we used our own translation of the SMQ, as no official German translation was available at this point. This German version of the SMQ showed high internal reliability of .83 in the student and .90 in the clinical population (described further in Study 2) as well as 7-day test–retest reliability of .79 in the student and .81 in the clinical population. After our assessment period, a German version of the SMQ was published (Böge et al., 2020). The published German version is very similar to the translated version used in the current study.
Mindful Attention and Awareness Scale (MAAS). The MAAS (K. W. Brown & Ryan, 2003; Michalak et al., 2008) measures a different facet of mindfulness: acting with awareness. Example items are “I break or spill things because of carelessness, not paying attention, or thinking of something else” or “It seems I am “running on automatic” without much awareness of what I’m doing.” The 15 items are rated a 6-point Likert-type scale from 1 (almost always) to 6 (almost never). The internal consistency (.82) and test–retest reliability (.81) are high.
Kentucky Inventory of Mindfulness Skills (KIMS). The KIMS (Baer et al., 2004; Ströhle et al., 2010) is a 39-item multidimensional measure of mindfulness with the four subscales Observing, Describing, Acting with Awareness and Accepting without Judgement. Items are rated on a 5-point Likert-type scale from 1 (never or very rarely true) to 5 (always or almost always true). Internal consistency (.83 to .91) and test–retest reliability (.65 and .86) are acceptable.
Sample Size
Sample size was not calculated a priori. Post hoc power analyses were conducted with R package semPower (Moshagen & Erdfelder, 2016). The results indicated that the sample size (n = 596) in the student population was associated with a power >.99 to reject a wrong model of exact fit, given the misfit in our sample of root mean square error of approximation (RMSEA) = .099 (degrees of freedom [df] = 77) for the one-factor model, .054 (df = 67) for the five-factor model and .06 (df = 66) for the bifactor model.
Statistical Analysis
Statistical analyses were run in R Studio (R Core Team, 2013; RStudio Team, 2019). Data and R Code are available at doi: 10.17605/OSF.IO/CP26U. We explored the BEAQ items’ means, standard deviations, distributions, difficulties, and item-total correlations as well as internal consistency (Cronbach’s α) with the R package sjPlot (Lüdecke, 2018). Pearson’s correlations were calculated between the two assessment points of the BEAQ to assess test–retest reliability and between the BEAQ, its subscales and measures of avoidance, pathology, affect, personality, and mindfulness to explore the BEAQ’s subscales and convergent and discriminant validity. We ran CFAs with the CFA-function of R package lavaan (Rosseel, 2012) with the robust weighted least squares (weighted least square mean and variance adjusted) estimator which is recommended for violations of multivariate normality and ordinal data (T. A. Brown, 2015). We tested (a) a unidimensional model, (b) a multifactorial model, in which items corresponded to the MEAQ’s original subscales, and (c) a bifactor model that included a general factor and specific factors that correspond to the MEAQ’s subscales. In the bifactor model, factors were uncorrelated and factor loadings were constrained to be equal when factors were only represented by two indicators to ensure model identification. Model fit was evaluated using the comparative fit index (CFI), Tucker–Lewis index (TLI), RMSEA, and standardized root mean square residual (SRMR). Fit was judged based on the following recommendations (see, e.g., T. A. Brown, 2015; Hu & Bentler, 1999): CFI: acceptable fit .90 to .94, good fit .95; TLI: acceptable fit .90 to .94, good fit .95; RMSEA: acceptable fit .06 to .08, good fit < .06; SRMR: acceptable fit .06 to .08, good fit < .06. For the bifactor model, Omega Hierarchical (ωH) was calculated as a reliability measure and explained common variance (ECV) and percentage of uncontaminated correlations (PUC), were calculated to evaluate the unidimensionality of the scale (Dueber, 2017; Reise, 2012; Rodriguez et al., 2016).
Results
Item Analysis and Reliability
The BEAQ’s mean, standard deviation, distribution, difficulty, item-total correlation (discrimination), and α if deleted in the student population are displayed in Table 1. The BEAQ total score showed normal distribution as assessed by the Shapiro–Wilk test (p > .05). The BEAQ’s internal consistency (Cronbach’s α) was .81 and test–retest reliability was r = .86 which can be considered good. On an item level, skew, kurtosis and difficulty were acceptable for all items. Item 6 demonstrated poor discrimination, while all other items showed reasonably good to good discrimination. Item 6 also showed the highest α if deleted value.
BEAQ Total Score and Item Analysis in the Student Sample (N = 596).
Note. BEAQ = Brief Experiential Avoidance Questionnaire.
Convergent and Discriminant Validity
The correlation matrix between the BEAQ, its subscales and convergent and discriminant measures is displayed in Table 2. The BEAQ showed significant correlations with both convergent and discriminant measures but higher correlations were observed between the BEAQ and other measures of avoidance than with the discriminant measures.
BEAQ’s Correlations With Convergent and Discriminant Measures in the Student Sample (N = 596).
Note. BEAQ = Brief Experiential Avoidance Questionnaire; BEAQ subscales: BA = Behavioral Avoidance, D/S = Distraction/Suppression, Pro = Procrastination, DA = Distress Aversion, R/D = Repression/Denial. DE = Distress Endurance. DE is only measured by one item (Item 6). PANAS scales: PA = positive affect, NA = negative affect. BFI-10 scales: Extra =Extraversion, Neuro = Neuroticism.
p < .05. **p < .01.
Factor Structure
The Kaiser–Meyer–Olkin (KMO) measure indicated sampling adequacy for the analysis, KMO = .83, which can be considered good (Kaiser & Rice, 1974). On an item level, all KMO values were >70, expect for Item 6 that showed KMO = .51. We consequently dropped Item 6 from the subsequent analyses. Item 6 was the only item representing the Distress Endurance subscale that was thus not included in the multifactorial and bifactor model. Factor loadings of all three models, the one-factor, multifactorial model, where the items correspond to the MEAQ subscales, and bifactor model, are displayed in Figure 1. The one-factor CFA indicated a poor fit of the data (CFI = 0.74, TLI = 0.70, RMSEA = 0.1[0.09, 0.11], SRMR = 0.09). The modification indices suggested that adding residual covariances between items would improve model fit. The five highest modification indices were found between item pairs 5 and 10, 4 and 9, 1 and 12, 12 and 13, and 1 and 15. These item pairs coincide with the MEAQ’s subscales, suggesting that the MEAQ’s subscales are retained in the BEAQ. The multifactorial model (CFI = 0.93, TLI = 0.91, RMSEA = 0.05[0.05, 0.06], SRMR = .05) and bifactor model showed more adequate fit (CFI = 0.92, TLI = 0.89, RMSEA = 0.06[0.05, 0.07], SRMR = 0.05). While the item loadings were higher in the five-factor model than in the one-factor model, the bifactor model showed that several items did not have significant loadings on their specific factors when taking into account the general factor: Item 7 (Distress Aversion subscale), Items 3 and 13 (Distraction/Suppression subscale), and Items 2, 11, and 14 (Behavioral Avoidance subscale). In the bifactor model, all items loaded significantly on the general factor (i.e., factor loadings >.30) and ωH was .76, indicating internal reliability. ECV was .55 indicating that the common variance is approximately equally spread across general and specific factors and PUC was .84.

Factor loadings of one-factor, five-factor, and bifactor model in the student sample (N = 596).
Study 2: Reliability and Validity in a Clinical Sample
Method
Psychometric properties for the BEAQ were assessed in two clinical samples. The first sample was a sample of outpatients currently undergoing psychotherapy in a university-based outpatient clinic. The second sample was a sample of participants enrolled in a study on an internet-based transdiagnostic intervention (registered as DRKS00014820 at www.drks.de). Participants in this study were randomized to receive a 10-week guided internet-based intervention based on the Unified Protocol (Barlow, 2011) or to waitlist.
We combined data of the outpatient and online patient sample to conduct the item analysis and investigate the internal reliability as well as factor structure of BEAQ. Test–retest reliability and construct validity were solely assessed in the outpatient sample. The BEAQ’s sensitivity to change was investigated in the online therapy sample by comparing changes in EA in the treatment and waitlist group following all active modules.
Recruitment and Procedure
Sample 1: Outpatient sample
We recruited patients currently in psychotherapy in a university-based outpatient clinic. We invited all patients (n = 115) of the outpatient clinic by a personal email to participate in the survey. The invitation included information on the background of the study as well as a link to the 20-minute long survey that was hosted on https://www.soscisurvey.de. Of the 56 participants (48.7% response rate), who completed the assessment, we excluded 3 participants who had completed the assessment twice. Of the 53 participants, who we included in the study, 41 got diagnosed with a depressive disorder, 22 with an anxiety disorder, 9 with an eating disorder, and 1 had a bipolar disorder, diagnosed pretreatment with a structured clinical interview (Wittchen et al., 1997). For three participants, data on a structured clinical interview was not available, but all three participants were diagnosed by their clinician with an anxiety disorder. Participants were 64.2% female and 35.8% male. The mean age was 36.26 years (SD = 11.85). Regarding their relationship status, 58.5% of participants were single, 13.2% were in a relationship, and 5.7% were married. However, 22.6% did not give information on their relationship status.
In order to assess test–retest reliability, we invited participants of the outpatient clinic to two assessments spaced 7 days apart. We included the BEAQ, demographic as well as discriminant and convergent measures in the first assessment and solely assessed the BEAQ in the second assessment. Of the 53 participants, 50 filled out the second assessment point. To increase participation, we compensated all participants who filled in both assessments with a 10 Euro coupon for an online retailer.
Sample 2: Online therapy sample
We recruited participants in mental health forums as well as on social media platforms. Indeed, 129 participants filled in the preassessment of the online therapy study and were included in the factor analysis, 59 had a principal diagnosis of a depressive, 59 of an anxiety, and 11 of a somatic symptom disorder, diagnosed with a structured clinical interview via telephone (Margraf et al., 2017). Participants were 68.2% female and 31.8% male. The mean age was 37.31 years (SD = 12.47). 45.7% of participants were single, 54.3% were in a relationship.
Measures
In addition to the measures described in Study 1, we assessed the following measure in the outpatient clinical sample:
Behavioral Activation for Depression Scale (BADS)
The BADS (Fuhr et al., 2016; Kanter et al., 2007; Manos et al., 2011) was developed to assess changes in avoidance and activation during behavioral activation. The short form comprises 8 of the original items of the BADS that were rationally and statistically selected plus one additional item (“I did things that were enjoyable”; Manos et al., 2011). The 9 items are rated on 7-point scale ranging from 0 (not at all) to 6 (completely). The measure is internally reliable and shows satisfactory construct validity. While two distinct factors of activation and avoidance were found for the English original, the validation of the German translation favored a one-factor solution (Fuhr et al., 2016).
Sample Size
Sample size was not calculated a priori, but was limited to participants in the outpatient clinical and online therapy study. Post hoc power analyses were conducted with R package semPower (Moshagen & Erdfelder, 2016). The results indicated that the sample size (n = 182) in the clinical population was associated with a power >.99 to reject a wrong model of exact fit, given the misfit in our sample of RMSEA = 0.116 for the one-factor model. For the five-factor model, power to reject a wrong model was .72, given the misfit of RMSEA = .05 (df = 67). For the bifactor model, power was .72, given the misfit of RMSEA = .05 (df = 66).
Statistical Analysis
Statistical analyses were run in R Studio (R Core Team, 2013; RStudio Team, 2019). Data and R Code are available at doi:10.17605/OSF.IO/CP26U. We followed the same procedure as described in Study 1 for the item analyses, internal- and retest reliability, evaluation of convergent and discriminant validity, and CFAs. Sensitivity to change was investigated with a mixed analysis of variance with one between (treatment group) and one within (time) factor with R package afex (Singmann et al., 2020).
Results
Item Analysis and Reliability
The BEAQ’s mean, standard deviation, distribution, difficulty, item-total correlation (discrimination), and α if deleted in the combined clinical population (outpatient and online therapy sample) are displayed in Table 3. The BEAQ’s total score showed normal distribution as assessed by the Shapiro–Wilk test (p > .05). On an item level, skew, kurtosis and difficulty were acceptable for all items. Item 6 demonstrated poor discrimination, while all other items show reasonably good to good discriminations. Item 6 also showed the highest α if deleted value. The BEAQ’s internal consistency (Cronbach’s α) in the combined clinical sample was .80 and test–retest reliability in the outpatient sample was r = .77.
BEAQ Total Score and Item Analysis in the Clinical Sample (Outpatient and Online Therapy Sample Combined; N = 182).
Note. BEAQ = Brief Experiential Avoidance Questionnaire.
Convergent and Discriminant Validity
The correlation matrix between the BEAQ’s total scale, subscales, and convergent and discriminant measures in the clinical outpatient sample is displayed in Table 4. The BEAQ showed significant correlations with both convergent and discriminant measures. The highest correlations were observed between the BEAQ and the discriminant measures of the AAQ-II and CBAS. A similarly high correlation was observed between the BEAQ and positive affect.
BEAQ’s Correlations With Convergent and Discriminant Measures in the Clinical Outpatient Sample (n = 53).
Note. BEAQ = Brief Experiential Avoidance Questionnaire; BEAQ subscales: BA = Behavioral Avoidance, D/S = Distraction/Suppression, Pro = Procrastination, DA = Distress Aversion, R/D = Repression/Denial. DE = Distress Endurance. DE is only measured by one item (Item 6). PANAS scales: PA = positive affect, NA = negative affect. BFI-10 scales: Extra = Extraversion, Neuro = Neuroticism.
p < .05. **p < .01.
Factor Structure
The Kaiser–Meyer–Olkin (KMO) measure indicated sampling adequacy for the analysis, KMO = .77, which can be considered adequate (Kaiser & Rice, 1974). On an item level, all KMO values were > 60, expect for Item 6 (KMO = .57). We consequently dropped this item from the subsequent analyses. Since Item 6 is the only item of the Distress Endurance scale, this subscale was not included in the models with multifactor and bifactor model. Factor loadings of all three models, the one-factor, five-factor and bifactor model, are displayed in Figure 2. The one-factor model indicated a poor fit (CFI = 0.66, TLI = 0.60, RMSEA = 0.12 [0.10, 0.13], SRMR = 0.11). Item 1 showed a low factor loading and the inspection of modification indices indicated that adding several residual covariances between items would improve model fit. The five highest modification indices were found for item pairs 4 and 9, 1 and 15, 12 and 15, 1 and 12, and 5 and 10. The highest modification indices also coincided with the MEAQ’s subscales in the clinical population. The multifactorial model (CFI = 0.94, TLI = 0.92, RMSEA = 0.05[0.03, 0.07], SRMR = 0.06) and bifactor model showed more adequate fit (CFI = 0.95, TLI = 0.92, RMSEA = 0.05 [0.03, 0.07], SRMR = 0.06). Overall, the items showed higher loadings on the facets than on the general factor when comparing the multifactorial with the one-factor model. However, in the bifactor model, Item 7 (Distress Aversion subscale) and Items 2, 8, and 14 (Behavioral Avoidance subscale) did not load significantly on their specific factors when taking into account the general factor. All items (except Item 1) loaded significantly on the general factor in the bifactor model (i.e., factor loadings >.30). ωH was .72, indicating internal reliability. ECV was .47, indicating that the common variance is approximately equally spread across general and specific factors and PUC was .84.

Factor loadings of one-factor, five-factor, and bifactor model in the clinical sample (outpatient and online therapy sample combined; n = 182).
Sensitivity to Change
In the online therapy sample, we explored whether the treatment group showed higher changes in EA following all active modules of an online adaption of the Unified Protocol in comparison with waitlist. Of the 65 participants randomized to treatment, 30 (46.15%) filled out the BEAQ assessment following the modules that targeted EA. Of the 64 randomized to waitlist, 51 (79.69%) filled out the BEAQ assessment after 9 weeks (corresponding to the average duration participants in the treatment group needed to complete the modules that targeted EA). EA decreased more in the group receiving the online transdiagnostic treatment than in the waitlist group, F(1, 79) = 4.28, p = .042. Means, standard deviations, and within group effect sizes for the BEAQ for both groups are displayed in Table 5. The between group effect (Cohen’s d) between treatment and waitlist (after 9 modules/9 weeks) suggests a moderate effect, Cohen’s d = .61, 95% confidence interval [0.15, 1.07].
Sensitivity to Change of the BEAQ in Online Therapy Sample: Comparison Between Treatment and Waitlist Group.
Note. Means and standard deviations for the BEAQ for the treatment and waitlist group for the baseline and module/week 9 assessment including within effect sizes (Cohen’s d) from baseline to post. BEAQ = Brief Experiential Avoidance Questionnaire; CI = confidence interval.
Discussion
We investigated psychometric properties of the German translation of the BEAQ in a student and a combined clinical sample of patients enrolled in an outpatient clinic or online therapy program. As expected, the combined clinical sample showed higher BEAQ scores (51.74) than the student population (44.77), with very comparable means to the original validation study (Gámez et al., 2014). Cronbach’s α in the student and clinical populations indicated good internal consistency. Indeed, 7- to 13-day retest reliability was comparable to retest reliabilities reported for other transdiagnostic processes, indicating that the BEAQ measures EA relatively stable. Longer time intervals are needed to investigate whether EA constitutes a more trait-like quality. Retest reliability in the student (.86) was higher than in the clinical sample (.77). We cannot exclude that participants in therapy experienced intended changes in EA, even in this narrow period of 7 to 13 days, leading to lower associations between measurement points.
The BEAQ subscales of Behavioral Avoidance and Distress Aversion showed the highest association to the BEAQ’s total score in the student sample (.76-.81), while the Behavioral Avoidance, Distraction/Suppression, Procrastination, and Distress Aversion showed similarly high correlations to the BEAQ total score in the clinical sample (.75-.81). As hypothesized, the BEAQ total scale was more strongly correlated to measures of avoidance than with psychopathology and mindfulness. One exception to that pattern was the high correlation between the BEAQ and positive affect (PANAS) in the clinical outpatient sample. While the focus on avoidance in emotional disorders has primarily been on the avoidance of negative emotions, researchers are now also highlighting the role of avoidance and deficits in positive emotion regulation in the emotional disorders (e.g., Carl et al., 2013). The majority of our clinical outpatient sample suffered from a primary or comorbid depressive disorder—depressed patients specifically not only avoid negative but also positive emotions and exhibit decreased positive affect, which might have contributed to this finding. As for the BEAQ’s subscales, they correlated lower with measures of psychopathology, negative affect, and neuroticism, in comparison with the other avoidance measures. However, certain aspects of EA, as measured by the subscales of the BEAQ, seem to be more strongly associated with experiential or behavioral avoidance, while others are more strongly associated with mindfulness aspects.
How does the BEAQ’s correlation patterns compare with that of the other avoidance measures in the present sample? Both the AAQ-II and CBAS showed less clear patterns to measures of convergent and discriminant validity. The AAQ-II showed higher correlations with symptom distress, anxiety, and negative affect than with the avoidance measures in the student sample. The AAQ-II’s unspecific correlations were even more pronounced in the clinical group: similar heights of correlations were observed with the BEAQ and CBAS as well as symptom distress, positive and negative affect, depression/well-being, and mindfulness. The CBAS showed the highest correlations with the AAQ-II and BEAQ, but its associations to other measures was higher than those of the BEAQ. The AAQ-II and CBAS also correlated higher with mindfulness measures than the BEAQ. One limitation needs to be pointed out: We hypothesized that the BADS would serve as another convergent avoidance measure but the lower correlations between the BADS and the other avoidance measures indicate that the BADS’ conceptualization with its focus on behavioral activation and avoidance seems to differ conceptually from the broader construct of EA.
Overall, these findings speak in favor of the convergent and discriminant validity of the BEAQ in comparison with the other avoidance measures. Our study on the convergent and discriminant validity was restricted to the use of self-report measures, so correlations between different constructs can be expected to some extent (e.g., Campbell & Fiske, 1959). Future studies should investigate convergent and discriminant validity using different methods (Campbell & Fiske, 1959), for example, by using implicit measures, ecological momentary assessments, interviews, or third-party assessments.
For the MEAQ, the Behavioral Avoidance and Distress Aversion subscales seemed to represent the core construct of EA best and the individual scales were not overly related (Gámez et al., 2011). Despite this, the BEAQ was constructed to cover content of all six scales. Thus, it may be that the variety and breadth of item content came at the expense of unidimensionality, which was corroborated by the poor fit of the one-factor CFAs in both populations. While the use of rule of thumbs for model fits in CFAs has been increasingly criticized in recent years (e.g., McNeish et al., 2018; Sellbom & Tellegen, 2019), the poor fit gave indications that the original unidimensional model was misspecified. The fact that the highest modification indices in both populations coincided with the original scales of the MEAQ supported the notion that the BEAQ retains the original facets of the long form to some extent. The better fit of the multifactorial model would indeed indicate that the BEAQ retains the original facets of the MEAQ. However, the bifactor models showed that the common variance was approximately equally split between the general and specific factors in both samples, suggesting that the BEAQ has a strong general factor. In line with this, a proportion of the items, for example, of the Behavioral Avoidance subscale, do not load on their specific factor when the general factor is taken into account. In light of the high PUC (>.80) and relatively high Omega hierarchical (>.70), the BEAQ seems to be “unidimensional enough” to warrant the use of the total score (Reise et al., 2013). A bifactor approach that depicts the general EA dimension and includes specific facets of EA therefore seems appropriate to describe the BEAQ. When using and interpreting the BEAQ’s total score, it still may be useful to additionally examine the subscales to determine in which areas EA is particularly pronounced.
The Distress Endurance scale showed the lowest associations to the BEAQ’s total scale and subscales and is only represented by one item (Item 6, “Fear/anxiety won’t stop me from doing important things”). Item 6 showed overall poor performances and was thus not included in the factor analyses. Our findings—alongside Byllesby et al.’s (2020) who also found that Item 6 showed the lowest loadings in the one-factor CFAs in both clinical populations—would imply that Item 6 could potentially be removed from the BEAQ scoring. Its reverse item formulation may be problematic (Weijters et al., 2013) and future studies should investigate whether Item 6 should be retained in the BEAQ.
We also explored the BEAQ’s sensitivity to change. Participants who received the treatment targeting EA exhibited a greater decrease in EA as measured by the BEAQ over time than participants in the waitlist group. The BEAQ, thus, seems suitable to depict changes during treatment. Changes in EA were moderate in magnitude, which might be related to the trait-like assessment of EA. Additionally, EA can only be adequately captured when participants have an awareness of their avoidance tendency. This makes high introspection and reflection an important requirement. The intervention that patients underwent in our clinical online therapy sample focused on avoidance, emphasizing aspects of behavioral avoidance as well as more subtle avoidance tendencies like distraction or safety behaviors. It is possible that the program actually raised the participant’s awareness of the extent of their avoidance, which might have led to higher ratings of avoidance following the avoidance modules in some participants. A combination of higher awareness and actual decreases might explain the only moderate change of EA. A state or implicit measure of EA, as for example proposed by Hooper et al. (2010), may give more insight on more implicit and subtle avoidance tendencies.
Limitations of our study include the high dropout for the sensitivity to change analysis as well as the limited generalizability of the student and clinical populations. Another limitation is the relatively small sample size in both clinical samples. For the bifactor and multifactorial model, the power to reject a wrong model, given the misfit we observed, was lowered in the clinical population in comparison to the student population. While the results of the five-factor and bifactor CFAs in both populations were very comparable, future studies with larger clinical samples should substantiate these preliminary findings on the BEAQ’s factor structure.
The broad conceptualization of EA poses a challenge for the assessment of EA. Overall, the psychometric properties and the BEAQ’s favorable pattern of convergent and discriminant validity compared with other measures of (experiential) avoidance support its utility in capturing EA as a broad construct.
Footnotes
Acknowledgements
We thank Anastasiya Zhukova, Anna Batsch, and Nino Inauri for their help with data collection.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
