Abstract
The present study reports on the initial validation of the eight-item version of the Avoidance and Fusion Questionnaire for Youth (AFQ-Y8) as a school mental health screener for identifying clinical-level depression and anxiety caseness within a sample of urban high school students (N = 219). Results indicated that responses to the AFQ-Y8 yielded better data–model fit and comparable internal consistency and convergent validity in relation to responses to the longer, 17-item version of the measure. Findings from receiver operating curve (ROC) analyses showed that scores derived from the AFQ-Y8 had excellent discrimination ability for correctly classifying students with and without clinical-level depression (area under the curve [AUC] = .91) and anxiety (AUC = .92), and that a cutoff score of ≥15 yielded optimal sensitivity (.86, .92) and specificity (.88, .87) for accomplishing these purposes. Taken together, findings suggest the AFQ-Y8 is a technically adequate instrument for both measuring psychological inflexibility and classifying students with clinical-level internalizing problems. Implications for future research and practice are discussed.
The psychological inflexibility model is a transdiagnostic approach for understanding the development, maintenance, and treatment of mental health problems (Levin et al., 2014). As an empirically derived process, psychological inflexibility is grounded in Relational Frame Theory (RFT), which is a behavioral analytic theory of language and cognition that explains how private events (i.e., thoughts and feelings) come to transform the stimulus functions of overt behavior (i.e., bodily actions) through a generalized form of operant behavior known as arbitrary applicable relational responding (Hayes, Barnes-Holmes, & Roche, 2001). RFT, in turn, serves as the core therapeutic process underlying Acceptance and Commitment Therapy (ACT), which is a behavioral approach to intervening with mental health problems through targeting the problematic relational response repertoires that develop between private events and overt behaviors (Hayes, Strosahl, & Wilson, 2011). Psychological inflexibility is posited to contribute to mental health problems “when language and cognition interact with direct contingencies to produce an inability to persist or change behavior in the service of long-term valued ends” (Hayes, Luoma, Bond, Masuda, & Lillis, 2006, p. 6). Although this problematic process has been theoretically implicated in most diagnosable mental health problems, research shows it is particularly relevant to depression and anxiety disorders (Levin et al., 2014).
Within the realm of RFT and ACT, the psychological inflexibility model is typically operationalized via six subcomponents, which are understood to be distinct yet interrelated functional behavioral processes: (a) cognitive fusion, (b) experiential avoidance, (c) dominance of the conceptualized past or feared future, (d) attachment to the conceptualized self, (e) lacking clarity and contact with values, and (f) unworkable action toward valued ends (Hayes et al., 2011). According to Hayes et al. (2006), cognitive fusion is characterized by “excessive or improper regulation of behavior by verbal processes, such as rules and derived relational networks” (p. 6), while experiential avoidance is understood as actions taken “to alter the form, frequency, or situational sensitivity of private events even when doing so causes behavioral harm” (p. 7). These two processes can be cyclically related in mental health problems, as cognitive fusion facilitates experiential avoidance, which, in turn, reinforces cognitive fusion, ad infinitum. That said, two of the other processes within the psychological flexibility model are purported to facilitate cognition fusion or experiential avoidance. Specifically, dominance of the conceptualized past or feared future further, which refers to one’s inability to get in contact with one’s present moment experience, and attachment to the conceptualized self, which references one’s preoccupation with narratives about oneself at the expense of learning from present moment experience, both contribute to increased fusion and avoidance. Furthermore, the last two processes within the model describe the problematic behavioral results of the previously mentioned processes. Lacking clarity and contact with values refers to one’s inability to put oneself in contact with freely chosen and desirable long-term outcomes, while unworkable action references one’s ineffective efforts toward maintaining fused and avoidant behavioral habits (Hayes et al., 2006; Hayes et al., 2011).
Meta-analysis of laboratory-based studies indicates that interventions targeting each of the six components of the psychological inflexibility model have independent therapeutic effects (Levin, Hildebrandt, Lillis, & Hayes, 2012), while meta-analysis of treatment efficacy studies indicates that interventions based on combinations of these six components are generally effective for improving a variety of mental health problems (Öst, 2014). Although each of the subcomponents of psychological inflexibility can be assessed in ACT using interview and experiential methods (Hayes et al., 2011), empirical work has also undertaken to develop and validate self-report behavior rating scales that can function as formal clinical measures of these processes (see Association for Contextual Behavioral Science, 2015). To date, the bulk of research in this area has been devoted to measuring psychological inflexibility in adults, while much less empirical attention has been given to measurement issues with youth. So far, the only formal measure of psychological inflexibility that has been developed and validated with youth is the Avoidance and Fusion Questionnaire for Youth (AFQ-Y; Greco, Lambert, & Baer, 2008). Instead of targeting all six subcomponents of the psychological inflexibility model, the AFQ-Y intentionally targets only the first two components—cognitive fusion and experiential avoidance—which, as mentioned above, might be considered the most pivotal processes for therapeutic purposes.
The initial development and validation study of the AFQ-Y produced both 17-item and eight-item versions of the measure—the longer version (AFQ-Y17) being intended as a robust clinical instrument and the shorter version (AFQ-Y8) being intended as a screening instrument for population-based work (Greco et al., 2008). Results from this development study indicated that scores derived from both versions of the measure had strong internal consistency (AFQ-Y17: α = .90; AFQ-Y8: α = .83) and at least adequate data–model fit to a unidimensional measurement model of psychological inflexibility (AFQ-Y17: comparative fit index [CFI] = .90, root mean square error of approximation [RMSEA] = .06; AFQ-Y8: CFI = .99, RMSEA = .03). Convergent validity analyses with two large community samples of youth (n = 513, n = 675) indicated that scores derived from both versions of the measure had large positive correlations with scores from measures of overall mental health problems (AFQ-Y17: r = .64; AFQ-Y8: r = 63) as well as with self-reported anxiety scores (AFQ-Y17: r = .68; AFQ-Y8: r = .56); moderate-to-large positive relationships with scores from a self-reported measure of thought suppression (AFQ-Y17: r = .53; AFQ-Y8: r = .48); moderate positive associations with scores from a measure of self-reported somatization (AFQ-Y17: r = .37, .45; AFQ-Y8: r = .39, .42); and negligible-to-small positive relationships with scores derived from teacher-reported measures of behavior problems (AFQ-Y17: r = .11; AFQ-Y8: r = .08, .14; Greco et al., 2008). Moreover, divergent validity analyses with these same samples indicated scores from both versions of the measure had moderate-to-large negative correlations with scores of self-reported mindfulness (AFQ-Y17: r = −.53; AFQ-Y8: r = −.54), small-to-moderate negative associations with scores of self-reported quality of life (AFQ-Y17: r = −.30, −.39; AFQ-Y8: r = −.29, −.43), and negligible-to-small negative relationships with teacher-reported social skills (AFQ-Y17: r = −.08, −.13; AFQ-Y8: r = −.04, −.18) and academic competence (AFQ-Y17: r = −.15, −.19; AFQ-Y8: r = −.11, −.17; Greco et al., 2008). Taken together, findings from the original series of development studies suggest both the AFQ-Y17 and AFQ-Y8 are technically adequate instruments for measuring the purported construct of psychological inflexibility with youth.
Since Greco et al.’s (2008) development study, the technical adequacy of the AFQ-Y17 has been confirmed with a general sample of college students (N = 387; Fergus et al., 2012), an inpatient sample of adults with anxiety disorders (N = 115; Fergus et al., 2012), and an inpatient sample of youth (N = 111; Ventra, Sharp, & Hart, 2012). Findings from these studies have indicated that responses to the longer version of the measure yielded at least adequate data–model fit with a unidimensional measurement model of psychological inflexibility (college sample: CFI = .95, standardized root mean square residual [SRMR] = .06; inpatient adult sample: CFI = .96, SRMR = .07; no data–model fit analyses were conducted with the inpatient youth sample) and were characterized by strong internal consistency (college sample: α = .93; inpatient adult sample: α = .90; inpatient youth sample: α = .89). Convergent validity analyses conducted with these samples further indicated positive relationships between scores derived from the AFQ-Y17 and scores from self-reported measures of internalizing problems (inpatient adult sample: r range = .31-.47; youth inpatient sample: r = .63; no data available for the college student sample) and externalizing problems (youth inpatient sample: r = .40, .53; no data available for the college student or adult inpatient samples). Results from the adult inpatient sample also demonstrated that scores from the AFQ-Y17 shared approximately 50% of their variance with scores from the Acceptance and Action Questionnaire II (Bond et al., 2011), which is the most commonly used measure of psychological flexibility (targeting only cognitive fusion and experiential avoidance) for adults. Furthermore, findings from the study with inpatient youth, using receiver operating curve (ROC) analyses, suggested the AFQ-Y17 might function as an effective screening instrument for identifying youth with clinical-level anxiety disorders (area under the curve [AUC] = .78, sensitivity = .73, specificity = .71; Ventra et al., 2012). That said, it is noteworthy that, to date, no research has been conducted to validate the use of scores derived from the shorter version of the measure for its intended purpose as a population-based measure of youth’s psychological inflexibility.
The purpose of the present study, then, was to progress the line of empirical work regarding the AFQ-Y8 by investigating the technical adequacy of this measure as a population-based school mental health screener. Universal mental health screening is an increasingly recommended practice for clinicians working in school settings, as research has repeatedly demonstrated that (a) youths’ mental health problems are associated with poor educational and life outcomes, (b) the majority of youth experiencing such problems will go unidentified and untreated in clinical settings, and, therefore, (c) schools are likely to serve as de facto mental health care systems for most youth (Dowdy, Furlong, Eklund, Saeki, & Ritchey, 2010). Considering that psychometric findings regarding common self-report mental health screeners used in secondary schools—such as the Behavioral and Emotional Screening System (Kamphaus & Reynolds, 2007) and the Strengths and Difficulties Questionnaire (Goodman, 2001)—generally show poor sensitivity (values < .60), there is an empirical warrant for exploring the classification utility and diagnostic accuracy of alternative self-report screening instruments. Given the transdiagnostic theory underlying the AFQ-Y8 as well as its brevity, this measure appears to be both a contextually appropriate and feasible instrument for school-based mental health screening purposes (see Glover & Albers, 2007; for a review of screener selection criteria). However, there is currently no evidence to indicate the AFQ-Y8 is actually a technically adequate instrument for this purpose, and thus the present study was undertaken toward this end. In light of the context sketched above, it was hypothesized that (a) responses to the AFQ-Y8 would yield at least adequate data–model fit and indicate strong internal reliability with the present sample, (b) scores derived from the AFQ-Y8 would demonstrate moderate-to-large positive correlations with self-reported measures of depression, anxiety, and academic problems, and (c) scores from the measure would indicate at least adequate discrimination ability for classifying students with self-report scores suggesting clinical-level depression and anxiety caseness.
Method
Participants
Participants were 219 adolescent students, who were attending a small, public high school located in a midsized urban city within the southern region of the United States. The sample was 54.8% female, 96.3% Black or African American, and ranged in age from 14 to 19 years (M = 16.30, SD = 1.29). The proportion of participants from each grade level was as follows: Grade 9 = 23.7%, Grade 10 = 31.1%, Grade 11 = 16%, and Grade 12 = 29.2%. District data indicated all students were eligible for free or reduced lunch. At the time of the study, approximately 420 students were enrolled in the school and invited to participate; however, parental consent, student assent, and useable surveys were received from only approximately 52% of the sampling pool. This participation rate, although sub-optimal, is consistent with other applied research studies that lack funding resources to incentivize participation. Participating students completed the school mental health survey, which consisted of demographic items, the AFQ-Y, and criterion measures (described below), using a secure online server, which was accessed in homeroom classes using a mobile laptop computer lab provided by the school. On average, participants completed the survey in approximately 20 min.
Measures
The primary measure of interest in the present study was the AFQ-Y8; however, the AFQ-Y17, which includes all of the items in the AFQ-Y8 and more, was also used for the purposes of establishing the comparative validity of the shorter version of the measure with the present sample. All AFQ-Y items are directly phrased (e.g., “The bad things I think about myself must be true” and “My life won’t be good until I feel happy”; see Table 1 for a complete listing of all items), requiring no reverse scoring, and are arranged along a 5-point, agreement-based response scale (0 = not at all true, 1 = a little true, 2 = pretty true, 3 = true, 4 = very true; Greco et al., 2008). Composite scores for both versions of the measure are created by summing the respective item scores. The psychometric properties of both the AFQ-Y8 and AFQ-Y17 were reviewed above (see the Introduction; Fergus et al., 2012; Greco et al., 2008; Ventra et al., 2012), and responses to both versions of the measure yielded strong internal consistency with the present sample (α = .84, .89). Although the transdiagnostic psychological inflexibility model has been implicated for most mental health problems, the present study investigated the utility of the AFQ-Y8 as a screener for clinical-level depression and anxiety only, as evidence suggests that psychological inflexibility is most strongly related to these target disorders (Levin et al., 2014) and because unlike externalizing problems (e.g., aggression and hyperactivity/impulsivity), these internalizing problems are much less likely to be identified by educators and caregivers (Kauffman, 1999).
AFQ-Y Items and CFA Loadings.
Note. AFQ-Y = Avoidance and Fusion Questionnaire for Youth; CFA = confirmatory factor analysis.
The Center for Epidemiological Studies Depression Scale for Children (CESD-C; Faulstich, Carey, Ruggiero, Enyart, & Gresham, 1986) was selected as a convergent criterion measure for the AFQ-Y8, as it is a widely used and validated self-report measure of depression for adolescents (e.g., Radloff, 1991; Roberts, Andrews, Lewinsohn, & Hops, 1990; Rushton, Forcier, & Schectman, 2002). The CESD-C is comprised of 20 items that can be summed to create an overall depression composite score. All items are arranged along a 4-point response scale (1 = not at all, 2 = a little, 3 = some, 4 = a lot) that asks respondents to report their experience of symptoms during the past week. Most items are directly phrased (e.g., “I felt like I was too tired to do things”), but four items are indirectly phrased (e.g., “I felt like I was just as good as other kids”) and therefore require reverse scoring. Previous research has indicated that scores from the CESD-C have adequate test–retest reliability (r > .50) and strong internal consistency (α > .80) in administrations with adolescents (e.g., Radloff, 1991; Roberts et al., 1990). Strong internal consistency was also observed with the present sample (α = .85). Adopting the classification strategy commonly used in research when information regarding actual clinical diagnosis is unavailable, a normative cutoff score of 2 SD above the mean was designated as the proxy threshold for clinical-level depression caseness in the present study. Previous normative research with high school students has indicated that the 2 SD cut-point is marked by a CESD-C score ≥39 (Radloff, 1991), and descriptive findings from the present sample indicate that the distribution of depression scores closely resembled that observed in previous normative research (see Table 2).
Descriptive Statistics and Bivariate Correlations for All Study Measures.
Note. AFQ-Y = Avoidance and Fusion Questionnaire for Youth; CESD-C = Center for Epidemiological Studies Depression Scale for Children; BAI = Beck Anxiety Inventory; SAPS = Subjective Academic Problems Scale. Effect size (r) interpretation: .00–.09 = negligible, .10–.29 = small, .30–.49 = medium, ≥ .50 = large.
p < .05. **p < .01.
The Beck Anxiety Inventory (BAI; Beck, Epstein, Brown, & Steer, 1988) was also selected as a convergent criterion measure for the AFQ-Y, given its wide use and previous validation as a self-report measure of anxiety for adolescents (Jolly, Aruffo, Wherry, & Livingston, 1993; Osman et al., 2002; Steer, Kumar, Ranieri, & Beck, 1995). The BAI consists of 21 items that can be summed to create an overall anxiety composite score. All items are brief and directly phrased symptom statements (e.g., “Nervous,” “Fear of losing control,” and “Scared”), requiring no reverse scoring, and are arranged along a 4-point response scale that asks respondents to rate levels of symptom severity during the past month (0 = not at all, 1 = mildly—but it didn’t bother me much, 2 = moderately—it wasn’t pleasant at times, 3 = severely—it bothered me a lot). Previous research has demonstrated that scores from the BAI have adequate test–retest reliability (r > .70) and strong internal consistency (α > .80) in administrations with adolescents (e.g., Osman et al., 2002; Steer et al., 1995). Strong internal consistency was also observed with the present sample (α = .92). Again adopting the classification strategy commonly used in research when information regarding actual clinical diagnosis is unavailable, a normative cutoff score of 2 SD above the mean was designated as the proxy threshold for clinical-level anxiety caseness in the present study. Previous normative research has indicated that the 2 SD cut-point is marked by a BAI score ≥36 (Beck & Steer, 1993), and descriptive findings from the present sample indicate that the distribution of anxiety scores closely resembled that observed in previous normative research (see Table 2).
The Subjective Academic Problems Scale (SAPS; Renshaw, 2015) was selected as a school-specific convergent criterion measure for the AFQ-Y, as it assesses self-perceptions of academic performance problems that are commonly associated with youth mental health problems (see Dowdy et al., 2010, for a brief review of such problems). The SAPS consists of seven items that can be summed to create a general academic performance problems score. All items are directly phrased (e.g., “I get bad grades on tests and exams”), requiring no reverse scoring, and are arranged along a 4-point response scale (1 = almost never, 2 = sometimes, 3 = often, 4 = almost always). Previous research demonstrates that scores derived from the SAPS have strong internal consistency (α = .80) as well as convergent validity with scores from several measures of self-reported mental health problems (r range = .38-.51; Renshaw, 2015). Adequate internal reliability was observed with the present sample (α = 78). Unlike the other criterion measures (CESD-C and BAI), the SAPS was used solely as a continuous measure in this study. Given that the relationship between academic problems and psychological inflexibility was hypothesized to be mediated by internalizing symptoms characteristic of depression and anxiety, it was expected that the association of scores derived from the AFQ-Y with scores from this criterion measure would be somewhat attenuated compared with its associations with scores resulting from the other criterion measures.
Data Analyses
Two phases of data analyses were conducted to validate the AFQ-Y8 as a potential school mental health screener for clinical-level depression and anxiety. First, comparative psychometric analyses were conducted to investigate the latent structure and convergent validity of both the AFQ-Y8 and AFQ-Y17. Specifically, confirmatory factor analysis (CFA) was conducted on both measurement models, following the recommendations of Bandalos and Finney (2010) and Mueller and Hancock (2008). To determine the goodness of data–model fit, a combination of absolute and incremental fit indices were used. CFI and Tucker–Lewis index (TLI) values between .90 and .95 and RMSEA values (with an accompanying 90% confidence interval [CI]) and SRMR values between .05 and .08 were considered to indicate adequate data–model fit, while CFI and TLI values > .95 and RMSEA and SRMR values < .05 were considered indicative of good data–model fit (Kenny, 2014). Regarding factor loadings, λ ≥ .50 were taken to be strong loadings, as they accounted for ≥ 25% of the variance extracted from each item by the latent factor, while λ ≥ .33 were considered to be adequate loadings, as they accounted for ≥ 10% of the variance. For latent construct reliability, which is an indicator of internal consistency at the factor level (as opposed to the observed score level), H ≥ .70 were considered desirable, as they indicate a strong intrafactor correlation over repeated administrations (Mueller & Hancock, 2008). Following comparison of the structural validity of the two measurement models, bivariate correlations were conducted to investigate the association of scores derived from both versions of the AFQ-Y with scores derived from the CESD-C, BAI, and SAPS, to determine the relative convergent validity of scores resulting from both versions of the measure.
After demonstrating that responses to the AFQ-Y8 evidenced comparative structural and convergent validity to responses to the AFQ-Y17, the next phase of data analyses focused on evaluating the classification utility and diagnostic accuracy of scores derived from the shorter version of the measure, for the purposes of evaluating the AFQ-Y8’s functionality as a potential school mental health screening instrument. Specifically, a pair of ROC curve analyses—using scores from the AFQ-Y8 to create proxy caseness variables for clinical-level depression and anxiety (1 = case, 0 = non-case; derived using the cutoff scores outlined above) that served as the anchor measures—were conducted following the recommendations of Streiner and Cairney (2007). AUC values were interpreted to examine the overall classification utility of the AFQ-Y8, with values closer to 1 indicating greater power to discriminate between youth with clinical versus non-clinical scores on the proxy diagnostic criterion measures (.50-.70 = low discrimination ability, .70-.90 = moderate discrimination ability, > .90 = high discrimination ability; Streiner & Cairney, 2007). Diagnostic accuracy was then evaluated by evaluating the sensitivity and specificity values associated with various AFQ-Y8 cutoff scores for both depression and anxiety caseness (≥ .90 = excellent, ≥ .80 = good, ≥ .70 = adequate). Once an optimal cutoff score was established (optimizing sensitivity and specificity values), this value was then used to create a caseness variable for psychological inflexibility (1 = “case,” 0 = “non-case”), which was further used to calculate the caseness prevalence rate of psychological inflexibility as well as the positive and negative predictive values for the optimal cutoff score. Finally, the AFQ-Y8’s caseness variable was used as the grouping factor in a series of independent samples t tests examining between-group differences across scores obtained on the aforementioned three criterion measures (CESD-C, BAI, and SAPS). All data analyses were conducted using SPSS and Amos version 22.
Results
Given responses to the AFQ-Y were relatively normally distributed, CFA were conducted using the Maximum Likelihood estimator. Moreover, as a result of the electronic administration of the screening survey, which prompted students to complete all items prior to submitting their responses, no missing data were observed and thus no procedures were warranted to account for missingness. CFA findings indicated that responses to the AFQ-Y8 indicated good data–model fit to the unidimensional psychological inflexibility measurement model (χ2 = 40.92, df = 20, p < .001, SRMR = .045, CFI = .960, TLI = .944, RMSEA [90% CI] = .069 [.038, .100]), that responses were characterized by robust factor loadings (see Table 1), and that the underlying psychological inflexibility factor was characterized by strong latent construct reliability (H = .85). CFA results for responses to the AFQ-Y17 indicated a poor data–model fit to the unidimensional psychological inflexibility measurement model (χ2 = 342.16, df = 119, p < .001, SRMR = .078, CFI = .820, TLI = .794, RMSEA [90% CI] = .093 [.081, .104]), that responses were characterized by adequate-to-robust factor loadings (λ range = .39-.71), and that the single factor had strong latent construct reliability (H = .90). Given the poor observed fit for the latter model, modification indices were examined, with results suggesting that covarying the error terms for several pairs of items (i.e., 1-6, 4-9, 7-9, 9-11, and 9-17; see Table 1 for item content) would improve data–model fit. These additional parameters were added to the model and the CFA was rerun, this time yielding improved and adequate (by all indices except TLI) data–model fit (χ2 = 238.13, df = 114, p < .001, SRMR = .068, CFI = .900, TLI = .880, RMSEA [90% CI] = .071 [.058, .083]) that was again characterized by adequate-to-robust factor loadings (see Table 1) and strong latent construct reliability (H = .90). Furthermore, findings from bivariate correlations yielded an extremely strong positive association between the composite scores derived from both versions of the AFQ-Y (81% shared variance), strong positive associations between scores from each version of the AFQ-Y and scores from the criterion measures of depression and anxiety, and moderate positive associations between scores from both versions of the AFQ-Y and scores from the measure of subjective academic problems (see Table 2).
Given that the first phase of data analyses showed that responses to the AFQ-Y8 yielded a sound measurement model and that scores derived from the measure indicated adequate convergent validity with scores from the criterion measures, the follow-up classification utility and diagnostic accuracy analyses were deemed appropriate. Results from the ROC curve analyses indicated scores from the AFQ-Y8 had excellent discrimination ability for correctly classifying students with and without clinical-level depression caseness (AUC [95% CI] = .91 [.84, .99], SE = .04, p < .001) and anxiety caseness (AUC [95% CI] = .92 [.88, .97], SE = .02, p < .001). Moreover, consideration of the sensitivity and specificity indices associated with various AFQ-Y8 cutoff values suggested a score of 15 was optimal for identifying both clinical-level depression and anxiety caseness (see Table 3). Using this cutoff score (AFQ-Y8 ≥ 15) to derive case and non-case psychological inflexibility groups within the present sample, a 16.9% prevalence rate (n = 37) was observed for students classified with psychological inflexibility caseness. Prevalence rates for depression and anxiety caseness were 6.4% (n = 14) and 5.5% (n = 12), respectively. These prevalence rates were used in follow-up calculations, evidencing moderate positive predictive values (.33, .30) and a high negative predictive values (.99, .99) for this particular cutoff score. In addition, t-test findings confirmed substantive between-group differences for psychological inflexibility caseness across scores for the three criterion measures, yielding mean differences characterized by large effect sizes (Hedges’ g > .90) suggesting more optimal mental health and academic performance outcomes for the non-case group (see Table 4).
Sensitivity and Specificity Values for AFQ-Y8 Cutoff Scores.
Note. AFQ-Y = Avoidance and Fusion Questionnaire for Youth; CESD-C = Center for Epidemiological Studies Depression Scale for Children; BAI = Beck Anxiety Inventory.
Optimal AFQ-Y cutoff score and associated sensitivity and specificity values.
Between-Group Differences for Psychological Inflexibility Cases vs. Non-Cases.
Note. Cases = AFQ-Y8 score ≥ 15; CI = confidence interval; CESD-C = Center for Epidemiological Studies Depression Scale for Children; BAI = Beck Anxiety Inventory; SAPS = Subjective Academic Problems Scale.
Effect size (g) interpretation guide: .00-.19 = negligible, .20-.49 = small, .50-.79 = medium, ≥. 80 = large.
Discussion
The present study aimed to progress the line of empirical work regarding the usefulness of AFQ-Y8 as a population-based measure of youths’ psychological inflexibility by investigating its technical adequacy as a school mental health screener for adolescents with clinical-level depression and anxiety caseness. Findings from the first phase of data analyses indicated that, compared with responses to the longer, 17-item version of the measure, responses to the AFQ-Y8 indicated a more robust measurement model and were characterized by slightly stronger concurrent validity coefficients with scores from the criterion measures of depression, anxiety, and academic performance problems. A similar pattern of comparative convergent validity coefficients was observed in the AFQ-Y’s original development study (see Greco et al., 2008), suggesting that scores from the shorter version of the measure have slightly stronger relationships with scores derived from measures of other self-reported mental health problems than do scores from the longer version. For the purposes of the present study, the upshot of these structural and convergent validity findings was that the AFQ-Y8 was deemed a viable measure of psychological inflexibility which could then be further examined regarding its classification utility and diagnostic accuracy. Results from the second phase of data analyses indicated that scores derived from the AFQ-Y8 functioned effectively to screen for clinical-level depression and anxiety, indicating robust classification utility (depression: AUC = .91, anxiety: AUC = .92) and diagnostic accuracy. Regarding diagnostic accuracy, findings indicated that the selected AFQ-Y8 cutoff score of ≥ 15 was useful for identifying 86% of secondary students with depression caseness and 92% of those with anxiety caseness (sensitivity). Of those positively identified, results indicated that approximately 1/3 cases were likely to be true positives (positive predictive value), suggesting a moderate false positive rate. That said, findings further showed that the AFQ-Y8 cutoff score of ≥15 was useful for identifying 88% and 87% of secondary students with non-caseness for depression and anxiety, respectively (specificity). Of those negatively identified, results indicated that approximately 10/10 non-cases were likely to be true negatives (negative predictive value), suggesting an inconsequential false negative rate.
Taken together, findings from the present study suggest that scores derived from the AFQ-Y8 have promising construct validity, classification utility, and diagnostic accuracy as a school mental health screener for secondary students’ depression and anxiety. Although it appears that the classification utility and diagnostic accuracy of scores from the AFQ-Y8 observed in the present study were more promising than that observed in previous research using the AFQ-Y17 as a mental health screener for inpatient youth (see the review provided in the Introduction; Ventra et al., 2012), it is noteworthy that the criterion for caseness used in the earlier research was actual clinician-made diagnoses of an anxiety disorder, as opposed to the proxy clinical-level scores from self-reported diagnostic measures used in the present study. Furthermore, considering that the longer version of the measure was used and a different population was targeted (i.e., inpatient adolescents vs. general high school sample), direct comparison of results from the present study to the previous study seem inappropriate. Thus, further research is warranted to investigate the classification utility and diagnostic accuracy of scores derived from the AFQ-Y8 as a mental health screener with both varying samples of youth and using varying caseness criteria, as it seems plausible that results might vary as a function of these aspects of research design. That said, within the context of school-based mental health screening conducted with adolescents, other caseness criteria (e.g., clinician-reported or parent-reported diagnoses) are likely to be more difficult to obtain and may be further removed from current functioning than are concurrent self-reports of mental health problems.
Although findings from the present study are promising, these results warrant contextualization within the scope of a few key methodological limitations. First, participants were demographically homogeneous (i.e., majority Black or African American, eligible for free-or-reduced lunch, attending the same public high school) and were obtained via convenience sampling. Generalization studies are therefore warranted with larger and more diverse samples prior to drawing broad conclusions regarding the technical adequacy of the AFQ-Y8 for mental health screening purposes in secondary schools. Second, considering that all convergent criterion measures were collected concurrently using self-report behavior rating scales, findings may be biased by common-method variance (see Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). To investigate this possibility, a post hoc CFA was conducted with all of the items used in the present study regressed onto their respective latent factors, which were covaried, while also controlling for the effects of an unmeasured latent methods factor (see Podsakoff et al., 2003, for more on this analytic approach). Findings from this analysis indicated that approximately 12% of the variance in item responses across the latent factors could be attributable to a common-methods factor, suggesting that some, but far from the majority, of response variance was accounted for by common-method bias. Given that this shared variance estimate may fluctuate as a function of the particular self-report measures used in the research design, it is recommend that future research in this area of screening continue to explore the possibility of such bias. Finally, given the relatively small sample size in the present study, advanced statistical analyses of the AFQ-Y8’s measurement invariance across key demographics (e.g., gender and grade level) could not be conducted. Therefore, future research with larger samples is warranted to investigate the technical adequacy of the AFQ-Y8 as an equitable screener for effectively identifying youth with clinical-level mental health problems in schools.
Notwithstanding these limitations, results from the present study have potential implications for the practice of universal mental health screening in schools. Using the criteria put forth by Glover and Albers (2007) for selecting universal screening instruments, the AFQ-Y8 shows promise as a contextually appropriate, usable, and technically adequate measure for identifying clinical-level depression and anxiety in secondary students. To begin, the AFQ-Y8 measures a construct (psychological inflexibility) that has theoretical and empirical relevance to youths’ mental health functioning, and it does so using an instrumentation method (brief self-report behavior rating scale) that is acceptable within school settings. Next, the AFQ-Y8 is low-cost, as it is a non-commercial measure that can be feasibly completed in a few minutes within a classroom setting. Scoring the AFQ-Y8 also requires no advanced training or technology, as items are simply summed to create a composite score, which is then judged to be at or below the clinical-level cutoff value (≥ 15). Finally, evidence indicated that scores derived from the AFQ-Y8 functioned as a highly effective indicator for discriminating between students who were classified with depression and anxiety caseness, suggesting that the classification utility and diagnostic accuracy of this measure is more promising than that observed for other common self-report mental health screeners used in schools (see Goodman, 2001; Kamphaus & Reynolds, 2007). That said, results also indicated that an AFQ-Y8 cutoff score of ≥ 15 yielded a higher false positive rate (approximately 2/3) than other common self-report screeners (again, see Goodman, 2001; Kamphaus & Reynolds, 2007), yet this trade-off seems preferable within the context of school mental health screening, as false positives can be detected using a multi-gated process while false negatives cannot (see Glover & Albers, 2007). Given its moderate likelihood of producing false positives, it is suggested that applied usage of the AFQ-Y8 as screening instrument in schools be followed-up by a planned second-gate screening, such as a brief semi-structured interview or administration of a more robust self-report measure of mental health problems, to rule out potential non-caseness among identified students. This procedure is likely to help school-based practitioners triage mental health screening results for the purposes of providing treatment to those who are most in need, not simply those who initially screen positive.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
