Equivalence Testing of a Youth Risk and Needs Assessment

Abstract

Truancy can have a detrimental impact on student outcomes. Risk assessments are used to identify behavioral and emotional problems associated with school truancy. Although imperative for informing decisions about student’s welfare, risk assessments generally lack substantial validity evidence. Specifically, supporting score interpretation across cultural groups through measurement invariance (MI) is needed for such assessments. This study examined MI, specifically factor invariance (FI), of the Washington Assessment of the Risks and Needs of Students (WARNS) across African American, Latinx, and Caucasian students using confirmatory factor analysis with traditional FI criteria and the equivalence testing (ET) procedure. Results from traditional criteria suggested that the factor structure is similar across groups. The ET procedure demonstrated minor model misspecifications.

Keywords

risk assessments invariance ethnicity equivalence testing

Truancy, defined as unexcused absences from school, is linked to risky behaviors, including substance abuse, poor emotional wellbeing, and criminal activity. The U.S. juvenile court system had approximately 50,000 petitioned truancy cases in 2016 (Hockenberry & Puzzanchera, 2018). In response, efforts have increased to identify factors that predict risks and needs among students (Case & Haines, 2010), which continue to be a formidable challenge for researchers, school counselors, and youth service providers. The identification of these factors often involves risk assessments to aid the identification of trajectories of offending behaviors.

School administrators and juvenile justice practitioners have adopted various student standardized risk assessments. The assessment scores help guide the type and level of intervention needed for youth who are chronically absent from school. Despite widespread use, risk assessment scores must have evidence to support intended uses. Many risk assessments lack information supporting validity inferences. Evidence has focused on scoring inferences via factor analysis (e.g., Olver et al., 2009) and extrapolation inferences via correlations of scores with theoretically related variables (e.g., criminal outcomes; Schwalbe, 2007). Even within this evidence, the assumption is that a risk assessment performs similarly across groups or has the property of measurement invariance (MI; Millsap, 2011). Without evidence of MI, risk scores may lack precision, resulting in imprecise decisions for individuals in different groups.

Present Study

The Washington Assessment of the Risks and Needs of Students (WARNS; George et al., 2015) is used by school administrators, courts, and youth service providers to understand behaviors related to truancy. For the WARNS, evidence for a scoring inference (e.g., French & Vo, 2019), and an extrapolation inference (e.g., Iverson et al., 2016) exist. However, the WARNS lacks factor invariance (FI), a form of MI, evidence across diverse groups. We evaluate an extrapolation inference (e.g., Kane, 2013) via FI at the internal factor structure level among African American, Latinx, and Caucasian students. We employ multiple group confirmatory factor analysis with traditional FI dichotomous criteria (e.g., Δχ², ΔCFI), as well as the equivalence testing (ET) procedure criteria, which recognizes fit as a degree of model misspecification. This dual criteria reporting provides more information about model misfit compared with traditional indices alone (Marcoulides & Yuan, 2017).

Method

Participants

High school participants (N = 1,468) aged 13–20 years in Grades 8–12 from Washington (83.6%; n_districts = 41) and Georgia (16.4%, n_districts = 1) schools completed the WARNS in 2018–2019. Schools engaged with the WARNS system (a) as an information-gathering tool or (b) as required by school policy when students surpassed an unexcused absence threshold. Thus, our sample reflects the students who would be assessed given the purpose of the WARNS. The ethnic groups examined included African American (n = 335, 46.6% female), Hispanic (n = 588, 51.3% female), and Caucasian (n = 524, 47.2% female). The average time of completion was 13 min.

Instrument

The WARNS is a 40-item self-report instrument with six specific factors (i.e., aggression-defiance, depression-anxiety, substance abuse, peer deviance, family environment, and school engagement) and a general factor supported via a bifactor model (Strand et al., 2019). Response options range from 0 = Never or hardly ever to 3 = Always or almost always.

Analysis

First, a bifactor model, supported by evidence and use (Strand et al., 2019) was fit for each group with diagonally weighted least squares estimation with the lavaan package in R. Model fit was assessed via (a) fit criteria, χ² test, comparative fit index (CFI ≥ .90), and root mean squared error of approximation (RMSEA ≤ .08; Brown, 2015), (b) out of bound parameter values, and (c) theory for interpretation and use. Internal consistency reliability was estimated via omega (ω; McDonald, 1999), ω hierarchical (ω_H), and ω hierarchical subscale (ω_HS; Rodriguez et al., 2016) to highlight reliability differences when accounting for the general factor.

Second, we examined FI via multigroup confirmatory factor analysis (CFA) with pairwise comparisons to allow direct comparison across criteria. Configural invariance (i.e., free parameters and equal factor form), metric invariance (i.e., equal pattern coefficients), and scalar invariance (i.e., equal loadings and intercepts) were examined. Changes (Δ) in χ², CFI, and RMSEA were used to compare models, where a non-significant Δχ², ΔCFI < .01, and ΔRMSEA < .01 supported metric invariance, and a non-significant Δχ², ΔCFI < .01, and ΔRMSEA < .015 supported scalar invariance (Chen, 2007).

Third, we applied the ET procedure to the same models. The procedure estimates adjusted RMSEA values conditioned on N, the number of groups, and df. RMSEA values were obtained for the configural, metric, and scalar models, and for the model comparisons. The RMSEA criteria were: RMSEA <.01 = Excellent; .01–.05 = Close, .05–.08 = Fair; .08–.10 = Mediocre; and >.10 = Poor (Yuan & Chan, 2016).

Results and Discussion

Table 1 contains the model fit for each group. A bifactor model met fit criteria, except for the CFI for the African American sample. However, we continued with the bifactor model, as it was consistent with the WARNS use and practice (Strand et al., 2019). Table 2 provides reliability estimates by group, showing much variability. For instance, the ω_HS for the subscale scores ranged .06–.67, .33–.52, and .27–.54 for African American, Caucasian, and Latinx, respectively. The standardized pattern coefficients for the general factor ranged from .18 to .86 for African American, .35 to .78 for Caucasian, and .36 to .83 for Latinx students. Results support the WARNS’ bifactor structure across groups, with a caution about subscale score unique information, given low reliability. However, a minimum ω_HS of .50 for specific factors may support score use (Reise et al., 2013).

Table 1.

WARNS Bifactor Model Fit for All Groups.

Model	χ²	df	p	CFI	RMSEA	RMSEA CI
African American	1,762.32	700	<.001	.880	.067	[.063, .070]
Caucasian	2,131.25	700	<.001	.915	.063	[.060, .066]
Latinx	2,125.42	700	<.001	.912	.059	[.056, .062]

Note. WARNS = Washington Assessment of the Risks and Needs of Students; CFI = comparative fit index; RMSEA = root mean squared error of approximation; CI = confidence interval.

Table 2.

Internal Consistency Reliability Estimates of the Bifactor Model for WARNS Scores.

Scores	African American	Caucasian	Latinx
Total WARNS score	.97/.84	.98/.85	.97/.85
Subscale scores
Aggression-defiance	.91/.06	.91/.33	.89/.31
Depression-anxiety	.90/.55	.94/.52	.93/.47
Substance abuse	.91/.13	.95/.27	.95/.34
Peer deviance	.90/.41	.93/.40	.91/.48
Family environment	.86/.50	.87/.41	.87/.27
School engagement	.89/.67	.89/.46	.90/.54

Note. Total score = ω/ω_H; subscale scores = ω/ω_HS. WARNS = Washington Assessment of the Risks and Needs of Students.

Table 3 contains the results for FI for both the traditional and ET criteria. Configural, metric, and scalar invariance were fully supported with the traditional criteria, indicating the factors structure was the same across groups. For the ET procedure, the configural models had Fair fit for all comparisons. For metric invariance, Excellent fit for all comparisons was observed, whereas for scalar invariance Excellent fit was only observed for the Caucasians versus Latinx comparison. For the remaining comparisons, Close fit was observed. These results support the usefulness of the dual criteria because, for example, ET’s results may point to the small amounts of differential item functioning on the WARNS across these groups (French & Vo, 2019). Given the results of the two procedures, the use of the WARNS total score for decisions for these groups is likely not influenced by the degree of model misfit or FI.

Table 3.

Factor Invariance (FI) Indices by Method for African American (A), Latinx (L), and Caucasian (C) Students.

Group	χ²	df	CFI	RMSEA	RMSEA CI	Traditional MI criteria						ET RMSEA criteria
Group	χ²	df	CFI	RMSEA	RMSEA CI	Δχ²	Δdf	p	ΔCFI	ΔRMSEA	Decision	.01	.05	.08	.10	Decision
C_r versus A
C	3,788.67*	1,400	.907	.063	[.061, .066]							.017	.058	.097	.123	Fair
M	3,594.52*	1,473	.918	.058	[.057, .061]				.011	.005	Invariant	.016	.058	.098	.123	Close
S	3,941.67*	1,546	.907	.057	[.058, .063]				.011	.001	Invariant	.016	.059	.098	.124	Close
C versus M				.026		71.10	73	.521			Invariant	.030	.061	.090	.110	Excellent
C versus S				.036		186.11	146	.014			Invariant	.025	.058	.088	.109	Close
L_r versus A
C	3,783.82*	1,400	.906	.061	[.058, .063]							.016	.058	.098	.124	Fair
M	3,611.12*	1,473	.915	.056	[.054, .058]				.009	.005	Invariant	.016	.059	.098	.124	Close
S	3,877.17*	1,546	.908	.057	[.055, .059]				.007	.001	Invariant	.016	.059	.099	.125	Close
C versus M				.028		74.50	73	.429			Invariant	.029	.060	.090	.110	Excellent
C versus S				.026		157.91	146	.237			Invariant	.025	.057	.088	.109	Close
C_r versus L
C	4,256.66*	1,400	.914	.061	[.059, .063]							.015	.059	.100	.126	Fair
M	3,812.98*	1,473	.929	.054	[.051, .056]				.015	.007	Invariant	.015	.059	.100	.127	Close
S	4,242.86*	1,546	.918	.056	[.054, .058]				.011	.002	Invariant	.015	.059	.101	.128	Close
C versus M				.001		42.75	73	.999			Invariant	.027	.059	.089	.110	Excellent
C versus S				.011		126.54	146	.876			Invariant	.023	.057	.088	.109	Excellent

Note: The Δχ² between S and M was not estimated given the scaling factor was negative and results with an adjustment (Satorra & Bentler, 2010) would not change conclusions. r subscript = reference group; C = configural; M = metric; S = scalar. MI = measurement invariance; ET = equivalence testing; RMSEA = root mean squared error of approximation; CFI = comparative fit index; CI = confidence interval.

p < .01

The results are limited by a few factors. First, the generalizability of results is limited given the sample is mainly from Washington State, and the African American sample is primarily from Georgia. Replication with a more heterogeneous sample is warranted, as the student context could influence responses. Second, existing general ET guidelines for misfit were employed. Additional work to understand the sensitivity of the ET criteria to detect meaningful factor variance across groups is needed. Third, the use of the dual criteria is a relatively new approach, and its application is not fully integrated into MI studies. Additional methodological work may help to understand how criteria can identify meaningful model misfit.

The study supports a bifactor model for the WARNS across three high school ethnic groups, which represents the intended population assessed by the instrument. The combined use of traditional FI and ET criteria can benefit such work. The combination of methods allows for a fine-grained examination of FI to understand, in this case, where the content may need to be reviewed for certain groups. For example, where Fair criteria are indicated, cognitive think-aloud protocols with those groups can probe if the content of the assessment is understood as intended. That said, our results provided evidence to support an extrapolation inference for the WARNS scores for these groups. Such work can aid practitioners in their work to support youth.

Footnotes

Acknowledgements

We acknowledge the assistance of Cihan Demir in the production of this work.

Author’s Note

B.F.F. was the lead researcher of the project including all aspects from the idea to the final manuscript. D.A. was the lead analyst and contributed to the writing of the manuscript. T.T.V. assisted with analyses and writing of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Brian F. French

References

Brown

T. A.

(2015). Confirmatory factor analysis for applied research (2nd ed.). The Guilford Press.

Case

Haines

(2010). Risky business? The risk in risk factor research. Criminal Justice Matters, 80(1), 20–22. https://doi.org/10.1080/09627251.2010.482234

Chen

F. F.

(2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464–504. https://doi.org/10.1080/10705510701301834

French

B. F.

T. T.

(2019). Differential item functioning of a truancy assessment. Journal of Psychoeducational Assessment. https://doi.org/10.1177/0734282919863215

George

Coker

French

Strand

Gotch

McBride

McCurley

(2015). Washington Assessment of the Risks and Needs of Students (WARNS) user manual. Center for Court Research.

Hockenberry

Puzzanchera

(2018). Juvenile court statistics 2016. National Center for Juvenile Justice. https://www.ojjdp.gov/ojstatbb/njcda/pdf/jcs2016.pdf

Iverson

French

B. F.

Strand

P. S.

Gotch

C. M.

McCurley

(2016). Understanding school truancy: Risk–need latent profiles of adolescents. Assessment, 25, 978–987. https://doi.org/10.1177/1073191116672329

Kane

M. T.

(2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

Marcoulides

K. M.

Yuan

K.-H.

(2017). New ways to evaluate goodness of fit: A note on using equivalence testing to assess structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 24(1), 148–153. https://doi.org/10.1080/10705511.2016.1225260

10.

McDonald

R. P.

(1999). Test theory: A unified treatment. Lawrence Erlbaum.

11.

Millsap

R. E.

(2011). Statistical approaches to measurement invariance. Routledge.

12.

Olver

M. E.

Stockdale

K. C.

Wormith

J. S.

(2009). Risk assessment with young offenders: A meta-analysis of three assessment measures. Criminal Justice and Behavior, 36(4), 329–353. https://doi.org/10.1177/0093854809331457

13.

Reise

S. P.

Bonifay

W. E.

Haviland

M. G.

(2013). Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment, 95(2), 129–140. https://doi.org/10.1080/00223891.2012.725437

14.

Rodriguez

Reise

S. P.

Haviland

M. G.

(2016). Applying bifactor statistical indices in the evaluation of psychological measures. Journal of Personality Assessment, 98, 223–237. https://doi.org/10.1080/00223891.2015.1089249

15.

Satorra

Bentler

P. M.

(2010). Ensuring postiveness of the scaled difference chi-square test statistic. Psychometrika, 75(2), 243–248. https://doi.org/10.1007/s11336-009-9135-y

16.

Schwalbe

C. S.

(2007). Risk assessment for juvenile justice: A meta-analysis. Law and Human Behavior, 31(5), 449–462. https://doi.org/10.1007/s10979-006-9071-7

17.

Strand

P. S.

Gotch

C. M.

French

B. F.

Beaver

J. L.

(2019). Factor structure and invariance of an adolescent risks and needs assessment. Assessment, 26, 1105–1116. https://doi.org/10.1177/1073191117706021

18.

Yuan

K.-H.

Chan

(2016). Measurement invariance via multigroup SEM: Issues and solutions with chi-square-difference tests. Psychological Methods, 21(3), 405–426. https://doi.org/10.1037/met0000080