Abstract
The present studies report on the initial development and validation of the Youth Internalizing Problems Screener (YIPS), which is a 10-item self-report rating scale for assessing general internalizing problems and identifying depression and anxiety caseness within the context of school mental health screening. Results from Study 1 (N = 177) demonstrated that responses to the YIPS yielded a single-factor latent structure, that scores derived from the scale had concurrent validity with scores from measures of student subjective well-being and problem behavior, and showed that scores derived from the YIPS demonstrated incremental validity in comparison with scores from another common internalizing problems screener for predicting self-reports of broad student functioning. Findings from Study 2 (N = 219) confirmed the latent structure and internal reliability of responses to the YIPS, demonstrated that scores derived from this scale had strong associations with scores from criterion measures of depression and anxiety, and showed that YIPS scores had good-to-excellent power for accurately discriminating between youth scoring at or above the clinical caseness thresholds on criterion measures of depression and anxiety. Taken together, results suggest the YIPS shows promise as a technically adequate instrument for measuring general internalizing problems and identifying depression and anxiety caseness among secondary students. Implications for future research and practice are discussed.
Youths’ mental health problems have been traditionally classified into two general categories: internalizing and externalizing problems. Internalizing problems are characterized by patterns of excessive and aversive private behaviors (i.e., thoughts and feelings) that are directed toward the self, whereas externalizing problems are characterized by patterns of excessive and disruptive public behaviors (i.e., physical and verbal actions) that are directed toward the social environment (Forms, Abad, & Kirchner, 2011). Research has demonstrated substantial comorbidity between internalizing and externalizing problems in youth, with estimates typically ranging from 20% to 50% (Zahn-Waxler, Klimes-Dougan, & Slatterly, 2000). Yet research has also shown that the developmental course of internalizing problems is less stable than externalizing problems, that internalizing problems are far less likely than externalizing problems to be identified by caregivers and educators, and that youth with identified internalizing problems are far less likely to receive treatment compared with youth with identified externalizing problems (e.g., Merikangas et al., 2011). Considering these findings, the overarching purpose of the present work was to advance the identification of youth with internalizing problems in school settings.
As a general category, internalizing problems is comprised of two primary subtypes of mental health disorders: depression and anxiety (Forms et al., 2011). Empirical findings have consistently shown that both subtypes of internalizing problems are associated with similarly poor youth outcomes across social, academic, physical health, and other life domains—leading the World Health Organization (2014) to declare that internalizing problems are the greatest contributor to illness and disability in adolescents worldwide. Given that mental health problems in youth are associated with such poor concurrent and future educational outcomes (McLeod & Kaiser, 2004), many researchers have recommended that schools sponsor universal mental health screening initiatives to identify at-risk youth and then match them with appropriate treatment and educational supports (e.g., Dowdy, Furlong, Eklund, Saeki, & Ritchey, 2010). Providing such services in school settings allows mental health professionals to offer more services to more youth experiencing internalizing problems, many of whom would not be identified nor treated in traditional clinic-based settings. Despite these potential benefits, universal mental health screening has yet to become common practice, with estimates indicating that only approximately 13% of schools or districts engage in screening for this purpose (Bruhn, Woods-Groves, & Huddle, 2014). Although there are several barriers preventing schools from sponsoring mental health screening initiatives, one of the most common is likely the lack of contextually appropriate, technically sound, and usable instruments for accomplishing this purpose (Glover & Albers, 2007). This is especially true when it comes to screening for internalizing problems in secondary schools, as the majority of scholarly work in this area has focused on validating the latent structure and classification utility of informant-report behavior rating scales for use in primary schools (e.g., Cook et al., 2011; Eklund & Dowdy, 2014).
Within the context of secondary schools, we suggest that brief self-report behavior rating scales are preferable to informant-report measures for screening youths’ internalizing problems for at least two reasons. First, self-report measures are more feasible for data collection, analysis, and decision-making purposes in this setting, as they require a single administration that yields a single data point per student. This avoids the practical pitfalls associated with aggregating and analyzing informant reports from multiple teachers as well as the classical conceptual conundrum associated with the lack of cross-informant agreement among caregivers (e.g., Achenbach, McConaughy, & Howell, 1987). In addition, considering that internalizing problems are characterized by patterns of excessive and aversive private behaviors, which may or may not be associated with overt behavioral problems observed by teachers (e.g., social withdrawal or avoidance), we suggest that self-reports are more functionally valid for measuring this primarily subjective class of mental health problems. Although there is often concern regarding young children’s ability to accurately discriminate their private behavior, such concerns are typically alleviated by the time youth reach adolescence, where self-reporting via interviews and rating scales is considered an essential component of clinical and psychoeducational assessment (see Whitcomb & Merrell, 2012).
Currently, one of the most common self-report instruments used for screening internalizing problems in secondary schools is the Behavioral and Emotional Screening System (BESS), which is part of the Behavioral Assessment System for Children, Second Edition (BASC-2; Kamphaus & Reynolds, 2007). The BASC-2 BESS is a 30-item instrument intended to measure broad mental health problems, and thus, its results are interpreted using a single scale score that represents higher or lower levels of broad impairment. The BASC-2 BESS has been shown to have excellent internal and test–retest reliability, high specificity (.95) and negative predictive power (.95), low sensitivity (.59), and moderate positive predictive power (.68; Kamphaus & Reynolds, 2007; see Glover & Albers, 2007, for guidance in interpreting conditional probability values). The primary drawbacks to using the BASC-2 BESS as a screener for internalizing problems are that it has poor sensitivity and that its diagnostic accuracy estimates apply to mental health problems in general and not to internalizing problems in particular. In an attempt to remedy this issue, a new version of the BESS, which is part of the third edition of the Behavioral Assessment System for Children (BASC-3), offers several subscale scores, including a stand-alone measure of internalizing problems (Kamphaus & Reynolds, 2015). However, this newer version has yet to be as thoroughly researched or as widely adopted in practice as the BASC-2 BESS. Furthermore, considering the BASC-2 BESS is a commercial instrument, the financial costs alone of this measure may make its use prohibitive for secondary schools with budgetary constraints.
Another one of the most common self-report instruments used for screening internalizing problems in secondary schools is the Strengths and Difficulties Questionnaire (SDQ; Goodman, 2001). Unlike the BASC-2 BESS, the SDQ is a noncommercial measure (see www.sdqinfo.com), which removes the potential financial barrier associated with adopting the measure for use in school mental health practice. Also unlike the BASC-2 BESS, the SDQ offers subscales for measuring multiple types of mental health problems: emotional problems, conduct problems, hyperactivity, and peer problems (Goodman, 2001). Research has shown that the Emotional Symptoms Scale (ESS) of the SDQ, which is the five-item subscale targeting internalizing problems, has adequate test–retest and internal reliability, high specificity (.96), and negative predictive power (.96), yet low sensitivity (.29) and positive predictive power (.29; Goodman, 2001). Thus, the primary drawback of using the ESS–SDQ as a screener for internalizing problems in secondary schools is that it may fail to identify over three quarters of youth with depression or anxiety caseness—suggesting the measure is largely ineffective for its intended purpose.
Given the criteria established by Glover and Albers (2007) for evaluating universal screening instruments, we suggest that the available empirical evidence indicates that neither the BASC-2 BESS nor the ESS–SDQ are optimal for screening youths’ internalizing problems in secondary schools. Specifically, although the ESS–SDQ appears to be contextually appropriate, the BASC-2 BESS targets a broader construct than internalizing problems and therefore measures more than is necessary for identifying depression and anxiety caseness. Next, although the length and cost of the ESS–SDQ suggest it is usable and affordable, the length and cost of the BASC-2 BESS suggest that it is much less so. Finally, and probably most importantly, previous research indicates that both the BASC-2 BESS and ESS–SDQ have suboptimal diagnostic accuracy as screeners, with reported sensitivity values indicating they fail to identify 40% to 70% of youth who are actually experiencing clinical-level mental health problems (according to criterion measures). Considering the suboptimal classification utility of these two common self-report screeners, the specific purpose of the present studies was to initiate the development and validation of a new self-report behavior rating scale—the Youth Internalizing Problems Screener (YIPS)—that might function more effectively as a universal mental health screener for assessing general internalizing problems and identifying depression and anxiety caseness among secondary students.
Study 1
Study 1 initiated the development and validation of the YIPS using a multiphase strategy that was patterned after best-practice recommendations for the construction of behavior rating scales (see Joint Committee on Standards for Educational and Psychological Testing, 2014). To begin, we outlined a rationale for creating a new measure (see the justification provided in the Introduction) and operationalized the scope and nature of the proposed instrument (i.e., a brief self-report behavior rating scale for assessing internalizing problems that could function as a mental health screener in secondary schools). Next, we drafted pilot items for the new measure, selected convergent validity measures to examine associations with the pilot measure, and identified an appropriate target sample (i.e., secondary students attending a local public school) to pilot the measure with. Finally, we administered the survey, processed the data, and analyzed the structural and convergent validity of responses to the pilot measure.
Method
Participants
Participants were adolescents in Grades 9 to 12 attending a small, public high school in a midsized urban city located in the southern region of the United States. The sample consisted of 177 students, who were 52.4% female and 97.3% Black or African American, and ranged in age from 14 to 19 years old (M = 16.53, SD = 1.46). The proportion of participants from each grade level was as follows: Grade 9 = 28.1%, Grade 10 = 16.2%, Grade 11 = 28.1%, and Grade 12 = 27.5%. At the time of the study, the school’s enrollment was approximately 350 students, and district data indicated all students were eligible for free or reduced lunch. Although all students at the school were invited to participate in the study, parental consent, student assent, and usable survey responses (characterized by ≤10% missing data and plausible response patterns) were obtained for only approximately 50% of the total sampling pool. Although suboptimal, this participation rate is consistent with other applied research studies that lack funding resources to incentivize participation. Participants completed a paper-and-pencil survey that was administered by their homeroom teachers, who used a standardized administration protocol provided by the authors.
Measures
The YIPS pilot measure was developed using a multistep scale construction process. First, Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; American Psychiatric Association, 2013) criteria were reviewed for Depressive Disorders and Anxiety Disorders to identify core symptom themes for internalizing problems. Next, common self-report behavior rating scales that have demonstrated technical adequacy for measuring adolescents’ internalizing problems were reviewed to identify item content themes: the Center for Epidemiological Studies Depression Scale for Children (CESDC; for example, Radloff, 1991), the Beck Depression Inventory–2 (e.g., Steer, Kumar, Ranieri, & Beck, 1998), the Beck Anxiety Inventory (BAI; for example, Beck & Steer, 1993), the Screen for Child Anxiety Related Emotional Disorders (e.g., Birmaher et al., 1997), the ESS–SDQ (e.g., Goodman, 2001), and the BESS (e.g., Kamphaus & Reynolds, 2007). Based on these reviews, seven original pilot items were generated to represent key thematic symptom domains for depression, and seven additional pilot items were generated to represent key thematic symptom domains for anxiety (see Table 1). All items were directly phrased, making reverse scoring unnecessary, and were arranged along a 4-point, relative frequency-based response scale (1 = almost never, 2 = sometimes, 3 = often, 4 = almost always). Item wording and response-scale options were kept as brief as possible to enhance the measure’s feasibility as a potential school-wide screening instrument. Readability analyses of the 14-item YIPS pilot measure indicated grade-level estimates ranging from a lower bound of 2.5 (Flesch–Kincaid Grade Level) to an upper bound of 5.9 (Coleman–Liau Index), with an average grade-level estimate of 3.8.
YIPS Pilot Items, Targeted Internalizing Domains, and Factor Loadings.
Note. EFA = exploratory factor analyses; CFA = confirmatory factor analysis.
Item not included in final 10-item measure.
The SDQ (Goodman, 2001) was selected as a convergent criterion measure for the YIPS pilot instrument, as it is a commonly used and technically adequate self-report measure for screening youths’ mental health problems in schools. The SDQ is comprised of five, five-item subscales: ESS, Hyperactivity Scale (HS), Conduct Problems Scale (CPS), Peer Problems Scale (PPS), and Prosocial Behavior Scale (PBS). The first four subscales can be combined to create a Total Difficulties Scale (TDS), whereas the last subscale is considered an indicator of youth well-being. The majority of SDQ items are directly phrased (e.g., “I worry a lot”), yet some items targeting externalizing behaviors are indirectly phrased (e.g., “I think before I do things”), requiring reverse scoring. All items are arranged along a 3-point agreement-based response scale (1 = not true, 2 = somewhat true, 3 = certainly true). Previous research demonstrates that responses to most SDQ subscales are characterized by adequate test–retest reliability, borderline to adequate internal reliability, and concurrent validity with diagnosed internalizing problems—with the exception of the PPS, which has generally shown weaker psychometrics than the other subscales (Goodman, 2001).
The Student Subjective Wellbeing Questionnaire (SSWQ; Renshaw, Long, & Cook, 2015) was selected as another convergent criterion measure for the YIPS pilot instrument, as it measures school-specific subjective well-being, which is considered antithetical to mental health problems. The SSWQ is comprised of four, four-item subscales: Joy of Learning Scale (JLS), School Connectedness Scale (SCS), Educational Purpose Scale (EPS), and Academic Efficacy Scale (AES). These four subscales can be combined to create a Total Student Wellbeing Scale (TSWS). All items are directly phrased (e.g., “I feel like I belong at my school”), requiring no reverse scoring, and each item is arranged along a 4-point, relative frequency-based response scale (1 = almost never, 2 = sometimes, 3 = often, 4 = almost always). Previous research shows that responses to the SSWQ subscales and composite scale are characterized by at least adequately internal reliability, and that scores derived from the subscales and composite scale have convergent validity with academic achievement and with various health-related and risk-related behaviors (Renshaw, 2015; Renshaw & Arslan, 2016; Renshaw & Chenier, 2018).
Students’ self-reported academic achievement (SRAA) was also used as a convergent validity measure and was assessed using a single item adapted from the California Healthy Kids Survey (WestEd, 2014): “During the past 12 months, how would you describe the grades you received in school?” This item was arranged along a 9-point grade-range response scale, with higher responses indicating higher academic achievement (1 = mostly F’s, 2 = mostly D’s and F’s, 3 = mostly D’s, 4 = mostly C’s and D’s, 5 = mostly C’s, 6 = mostly B’s and C’s, 7 = mostly B’s, 8 = mostly A’s and B’s, 9 = mostly A’s). Previous research using this item has conceptualized SRAA as a proxy for students’ grade point average (GPA), operationalizing a response of “9” as analogous to a 4.0 GPA, “8” as analogous to a 3.5 GPA, “7” as analogous to a 3.0 GPA, and so on (e.g., O’Malley, Voight, Renshaw, & Eklund, 2014).
Data analyses
Several phases of data analyses were conducted to investigate the structural and convergent validity of responses to the 14-item YIPS pilot measure. First, exploratory factor analyses (EFA) were conducted to investigate and refine the latent structure of responses to the pilot measure. Next, internal reliability and other observed scale characteristics were calculated to provide initial psychometric information for responses to the refined measure. Following, the concurrent validity of scores derived from the YIPS were explored using a series of bivariate correlations with scores from the SDQ scales, SSWQ scales, and SRAA. Finally, the incremental validity of YIPS scores was explored in relation to scores derived from a criterion screener for internalizing problems—the ESS–SDQ—using a series of hierarchical regressions to predict self-reported scores from three key measures of broad student functioning: SRAA, TDS–SDQ, and TSWS–SSWQ. All analyses were conducted using SPSS Version 22.
Results
Preliminary analyses indicated that responses to several YIPS pilot items were substantially nonnormally distributed (skewness and kurtosis > |2|), characterized by a positive skew, and thus, EFA was conducted using the principal-axis factoring method with a promax rotation. Results from the original EFA yielded three factors with eigenvalues > 1, which accounted for approximately 51% of the variance and were characterized by an adequate sample size (Kaiser–Meyer–Olkin measure of sampling adequacy = .86), lack of singularity (Bartlett’s test of sphericity χ2 = 586.01, df = 91, p < .001), lack of multicollinearity (Determinant = .02), moderate item communalities (range = .22–.49), and moderate-to-strong interfactor correlations (φ range = .40–.65). However, visual inspection of the scree plot and findings from a parallel analysis both suggested a one-factor solution would provide a better fit to the data. Analysis of the pattern matrix loadings also indicated five cross-loading items (λ > .30 across more than one factor) and a lack of conceptual coherence for item–factor content, further suggesting the original three-factor solution was nonoptimal. Thus, EFA was rerun to constrain simpler solutions.
Results from the constrained two-factor solution indicated a strong interfactor correlation (φ = .69), one nonloading item (λ < .30), and two cross-loading items, with no clear conceptual coherence for the pattern of item loadings across the two factors. Findings from the constrained one-factor solution indicated a range of loadings from .41 to .68, with similar levels of item communalities. Given the rationale underlying the development of the YIPS as a screener for general internalizing problems as well as the lack of conceptual coherence among the pattern of item loadings observed in the three-factor and two-factor solutions, the one-factor solution was ultimately selected as the preferred measurement model. Subsequently, the measure was shortened to 10 items to improve its feasibility as a potential universal screening instrument in secondary schools, while still maintaining adequate content validity. To truncate the measure, the four lowest loading items (λ range = .41–.46) were removed, one by one, and each time EFA was rerun constraining a one-factor solution. The internal reliability of the original 14-item scale was strong (α = .84) and only slightly attenuated when removing each of the four lowest loading items (Δα = −.02). EFA factor loadings for the finalized, 10-item YIPS measurement model are presented in Table 1.
Observed scale characteristics for the final 10-item YIPS indicated that responses to the measure were relatively normally distributed and had strong internal consistency reliability (see Table 2). Bivariate correlations between YIPS scores and the scores derived from the SDQ scales indicated large positive correlations with the ESS and TDS, moderate positive correlations with the other problem behavior scales, and a negligible correlation with the PBS. Correlations between the YIPS scores and scores derived from the SSWQ scales indicated a negligible association with the EPS and small negative correlations with all other subjective well-being scales, including the composite scale. A small negative correlation was also observed between YIPS scores and SRAA scores (see Table 3).
Observed Scale Characteristics for the YIPS and Concurrent Validity Scales: Study 1.
Note. YIPS = Youth Internalizing Problems Screener; ESS = Emotional Symptoms Scale; SDQ = Strengths and Difficulties Questionnaire; HS = Hyperactivity Scale; CPS = Conduct Problems Scale; PPS = Peer Problems Scale; PBS = Prosocial Behavior Scale; TDS = Total Difficulties Scale; JLS = Joy of Learning Scale; SSWQ = Student Subjective Wellbeing Questionnaire; SCS = School Connectedness Scale; AES = Academic Efficacy Scale; TSWS = Total Student Wellbeing Scale; SRAA = self-reported academic achievement.
Bivariate Correlations Between YIPS and Concurrent Validity Scales: Study 1.
Note. YIPS = Youth Internalizing Problems Screener; EPS = Emotional Symptoms Scale; SDQ = Strengths and Difficulties Questionnaire; HS = Hyperactivity Scale; CPS = Conduct Problems Scale; PPS = Peer Problems Scale; PBS = Prosocial Behavior Scale; TDS = Total Difficulties Scale; JLS = Joy of Learning Scale; SSWQ = Student Subjective Wellbeing Questionnaire; SCS = School Connectedness Scale; AES = Academic Efficacy Scale; TSWS = Total Student Wellbeing Scale; SRAA = self-reported academic achievement.
p < .05. **p < .01.
Compared with the correlations between scores from the concurrent validity measures and scores from the ESS–SDQ, which was selected as the criterion screener of internalizing problems, correlations with the YIPS scores were generally characterized by slightly stronger effect sizes in the expected directions (see Table 3). Furthermore, results from the hierarchical linear regressions demonstrated that YIPS scores contributed predictive power above and beyond that provided by scores from the ESS–SDQ alone for all three general student functioning outcomes: SRAA, TDS–SDQ, and TSWS–SSWQ. Specifically, when entered into the second step of the regression model, YIPS scores accounted for 4% to 8% additional variance for each concurrent outcome variable (a small-to-moderate change in effect size) and substantially reduced the standardized regression coefficients of the EPS–SDQ scores when considered as the sole predictor in the first step of the model (see Table 4).
Hierarchical Linear Regressions for Concurrent Student Outcomes: Study 1.
Note. SEE = standard error of the estimate; SRAA = self-reported academic achievement; ESS = Emotional Symptoms Scale; SDQ = Strengths and Difficulties Questionnaire; YIPS = Youth Internalizing Problems Screener; TDS = Total Difficulties Scale; TSWS = Total Student Wellbeing Scale; SSWQ = Student Subjective Wellbeing Questionnaire.
p < .05. **p < .01. ***p < .001.
Study 2
The purpose of Study 2 was to build upon the findings of Study 1 by further validating the structural and convergent validity of responses to the YIPS as a school-based screener for youths’ internalizing problems. To begin, we replicated the latent factor structure of the 10-item YIPS via confirming the psychometrics of the measurement model. Next, we further investigated the convergent validity of scores derived from the YIPS with scores from more robust criterion measures of depression and anxiety. Following, we examined the diagnostic accuracy of responses to the YIPS for correctly discriminating between students classified with and without clinical-level caseness on the criterion measures of depression and anxiety.
Method
Participants
The sample for Study 2 was selected to be demographically similar to the sample from Study 1 for replication purposes. Participants were 219 adolescent students in Grades 9 to 12, who were attending a small, public high school located in the same midsized urban city. The sample was 54.8% female and 96.3% Black or African American, and ranged in age from 14 to 19 years (M = 16.30, SD = 1.29). The proportion of participants from each grade level was as follows: Grade 9 = 23.7%, Grade 10 = 31.1%, Grade 11 = 16%, and Grade 12 = 29.2%. All students at the school were eligible for free or reduced lunch. At the time of the study, approximately 420 students were enrolled in the school and invited to participate; however, parental consent, student assent, and usable surveys were received from only approximately 52% of the total sampling pool. Participating students completed the survey using a secure online server, which was accessed in homeroom classes using a mobile laptop computer lab.
Measures
The 10-item YIPS developed in Study 1 was the target measure in the present study. The CESDC was selected as a convergent criterion measure for the YIPS, as it is a widely used and technically adequate self-report instrument for measuring depression among adolescents (e.g., Radloff, 1991). The CESDC is comprised of 20 items that can be summed to create an overall depression composite score. All items are arranged along a 4-point, relative frequency-based response scale (1 = not at all, 2 = a little, 3 = some, 4 = a lot) that asks respondents to report their experience of symptoms during the past week. Most items are directly phrased (e.g., “I felt like I was too tired to do things”), but four items are indirectly phrased (e.g., “I felt like I was just as good as other kids”) and therefore require reverse scoring. Previous research shows that adolescents’ responses to the CESDC have adequate test–retest reliability and strong internal reliability. Adopting the classification strategy commonly used in research when information regarding actual clinical diagnosis is unavailable, a normative cutoff score of 2 SD above the mean (≥ 39; Radloff, 1991) was designated as the threshold for depression caseness in the present study.
The BAI was also selected as a convergent criterion measure for the YIPS, as it is a widely used and technically adequate self-report instrument for measuring anxiety among adolescents (e.g., Beck & Steer, 1993). The BAI consists of 21 items that can be summed to create an overall anxiety composite score. All items are brief and directly phrased symptom statements (e.g., “Nervous” and “Scared”), requiring no reverse scoring, that are arranged along a 4-point response scale that asks respondents to rate levels of symptom severity during the past month (0 = not at all, 1 = mildly—but it didn’t bother me much, 2 = moderately—it wasn’t pleasant at times, 3 = severely—it bothered me a lot). Previous research demonstrates that adolescents’ responses to the BAI have adequate test–test reliability and strong internal reliability. Again adopting the classification strategy used in research when information regarding actual clinical diagnosis is unavailable, a normative cutoff score of 2 SD above the mean (≥ 36; Beck & Steer, 1993) was designated as the threshold for anxiety caseness in the present study.
Data analyses
Several phases of data analyses were conducted to further validate scores derived from the YIPS. First, confirmatory factor analysis (CFA), using the maximum likelihood (ML) estimation method, was conducted to affirm the latent structure of responses to the 10-item measure. Although the item-level data were nonnormally distributed, departures from normality were not severe, and therefore, the ML estimator was considered to be robust in this circumstance. To determine the goodness of data-model fit yielded by CFA, a combination of absolute and incremental fit indices was evaluated. Comparative fit index (CFI) values between .90 and .95 and root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR) values between .05 and .08 were considered to indicate adequate data-model fit, whereas CFI values >.95 and RMSEA and SRMR values <.05 were considered indicative of good data-model fit (Kenny, 2015; Kline, 2010). Next, latent construct reliability (coefficient H; see Mueller & Hancock, 2008), observed internal consistency reliability (Cronbach’s α), and other observed scale characteristics were calculated to provide additional psychometric information regarding responses to the measure. Following, the convergent validity of YIPS scores was explored using bivariate correlations (Pearson’s r) with observed scores from the CESDC and BAI.
After confirming initial convergent validity with the criterion measures, the diagnostic accuracy of responses to the YIPS was investigated by conducting a pair of receiver operating characteristic (ROC) curve analyses, wherein YIPS scores functioned as the test variable and the depression and anxiety caseness classifications (derived using the CESDC and BAI cutoff scores noted above; 1 = case, 0 = noncase) served as the anchor variables. Area under the curve (AUC) values were examined and interpreted, with values closer to 1 indicating greater power to discriminate between youth with caseness versus noncaseness scores on the criterion measures. Once an optimal cutoff score was established, this value was then used to create an internalizing problems caseness variable (1 = case, 0 = noncase), which allowed for the calculation of a screening identification rate in the present sample. These values were subsequently used to calculate positive and negative predictive values for the YIPS cutoff score. All latent-level analyses were conducted using Amos Version 22, whereas all observed-level analyses were conducted using SPSS Version 22.
Results
Findings from the CFA yielded an adequate data-model fit to the one-factor measurement model—χ2 = 86.20, df = 35, p < .001, CFI = .907, SRMR = .054, RMSEA [95% confidence interval (CI)] = .082 [.060, .104]—that was characterized by strong latent construct reliability (H = .84) and robust item loadings (see Table 1). After confirming the measurement model, observed scale characteristics for responses to the YIPS were calculated, indicating strong internal consistency reliability and a relatively normal distribution (see Table 5). Moreover, results from the bivariate correlations yielded strong positive correlations between scores derived from the YIPS and those from the CESDC (r = .71, p < .01) and BAI (r = .72, p < .01).
Observed Scale Characteristics of the YIPS and Concurrent Validity Scales: Study 2.
Note. YIPS = Youth Internalizing Problems Screener; CESDC = Center for Epidemiological Studies Depression Scale for Children; BAI = Beck Anxiety Inventory.
Findings from the ROC curve analyses indicated that responses to the YIPS had excellent discrimination ability for correctly classifying students with depression caseness (AUC [95% CI] = .93 [.85, 1.00], SE = .04, p < .001) and good discrimination ability for correctly classifying students with anxiety caseness (AUC [95% CI] = .89 [.78, 1.00], SE = .05, p < .001). Moreover, consideration of the sensitivity and specificity indices associated with various YIPS values indicated a cutoff score of 21 was optimal for identifying both depression and anxiety caseness, as it optimized the balance between sensitivity and specificity values across both criterion measures (see Table 6). Using this cutoff score to derive screening caseness groups within the present sample, a 20% prevalence rate (n = 44) was observed for those identified with clinical-level internalizing problems. Using the criterion measures’ clinical-level cutoff scores, prevalence rates for depression and anxiety caseness were 6.4% (n = 14) and 5.5% (n = 12), respectively. These caseness prevalence rates were used in follow-up calculations, yielding low-to-moderate positive predictive values (.30, .25) and high negative predictive values (.99, .99) for the designated YIPS cutoff score (21).
Sensitivity and Specificity Values for YIPS Cutoff Scores: Study 2.
Note. YIPS = Youth Internalizing Problems Screener; CESDC = Center for Epidemiological Studies Depression Scale for Children; BAI = Beck anxiety inventory.
Optimal YIPS cutoff score and associated sensitivity and specificity values.
Discussion
The overarching purpose of the present studies was to initiate the development and validation of a new, brief measure of youths’ internalizing problems—the YIPS—that might function more effectively as a screener for depression and anxiety in secondary schools. To this end, Study 1 established that responses to the YIPS yielded an internally reliable 10-item measure that appears, at least on face value, to provide equal representation to items targeting key symptom themes characteristic of both depression and anxiety disorders. Study 1 also demonstrated that scores derived from the YIPS had convergent validity with scores from several measures of student subjective well-being and problem behavior, providing initial evidence in favor of convergent validity. Another key finding from the first study was that YIPS scores demonstrated incremental validity in comparison with scores from the ESS–SDQ for predicting self-reported scores from measures of academic achievement, total mental health problems, and total student well-being. Building upon these promising results, Study 2 confirmed the latent structure and internal reliability of responses to the 10-item YIPS and demonstrated that its observed scale scores had strong associations with criterion measures of depression and anxiety, providing further evidence in favor of convergent validity. Study 2 also probed the diagnostic accuracy of the YIPS, demonstrating that responses to the measure had good-to-excellent power for discriminating between students scoring at or above the caseness thresholds on criterion measures of both anxiety and depression. Taken together, the upshot of evidence generated by these initial studies suggests that the YIPS shows promise as a technically adequate instrument for both measuring general internalizing problems and identifying depression and anxiety caseness among adolescents in secondary schools.
Although the results of these studies are encouraging, it is imperative that they be understood within the scope of their methodological limitations. Specifically, it is noteworthy that participants in both studies were demographically homogeneous and were obtained via convenience sampling that resulted in a suboptimal participation rate. Thus, generalization studies are needed with more diverse samples prior to drawing broader conclusions regarding the technical adequacy and diagnostic accuracy of the YIPS. Furthermore, given that the RMSEA index is positively biased with small sample sizes and models with lower degrees of freedom (Kenny, Kanishan, & McCoach, 2015), and that it was the only index in the present study that indicated less than adequate data-model fit among the CFA findings, future research with larger sample sizes is also recommended. In addition, considering all concurrent criterion measures were self-reported, findings may be influenced by common-method variance, suggesting future research is warranted using alternative criterion measures to establish internalizing problems caseness. Finally, given the relatively small and heterogeneous sample used in the present study, advanced statistical analyses investigating differential item functioning as well as measurement invariance across key demographic groups could not be conducted. Further research with larger and more diverse samples is therefore warranted to investigate additional psychometrics of the YIPS.
Despite these limitations, findings from the present studies have potential implications for research and practice related to school mental health assessment. Considering the criteria set forth by Glover and Albers (2007) for evaluating universal screeners, we suggest that the YIPS shows promise as a contextually appropriate, usable, and technically adequate screener for assessing and identifying internalizing problems among adolescents in secondary schools. Regarding contextual appropriateness, the YIPS was designed to measure a construct that has demonstrated relevance to youths’ educational and other valued life outcomes, and it does so using an instrumentation method that is acceptable within secondary school settings. Regarding usability, the YIPS appears to be low cost, both financially and administratively, as it is a noncommercial measure that can be completed in a few minutes within a classroom setting, using traditional paper-and-pencil or online survey protocols. Scoring the YIPS also requires little training and no advanced technology, as the 10 items are simply summed to create a composite score, which is then judged in relation to the caseness cutoff score (21). Last, regarding technical adequacy, evidence from the present studies suggests that scores derived from the measure are internally reliable, characterized by construct validity, and have effective classification utility. Most importantly, the YIPS appears to be a more sensitive screener for identifying youth who are actually experiencing internalizing problems compared with other measures that are commonly used for this purpose (i.e., the BASC-2 BESS and the EPS–SDQ). And although the YIPS has lower positive predictive power compared with other common measures, we suggest this trade-off is preferable within the context of school mental health screening, as false positives can be further detected using a multigated screening process, whereas false negatives cannot.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
