Differential Item Functioning of a Truancy Assessment

Abstract

The Washington Assessment of Risk and Needs of Students (WARNS) is a brief self-report measure designed for schools, courts, and youth service providers to identify student behaviors and contexts related to school truancy. Empirical support for WARNS item invariance between ethnic groups is lacking. This study examined differential item functioning (DIF) to ensure that items on the WARNS function similarly across groups, especially for groups where truancy rates are highest. The item response theory graded response model was used to examine DIF between Caucasian, African American, and Latinx students. DIF was identified in six items across WARNS domains. The DIF amount and magnitude likely will not influence decisions based on total scores. Implications for practice and suggestions for an ecological framework to explain the DIF results are discussed.

Keywords

risk and needs differential item functioning high school students validity

Truancy is defined as the number of unexcused absences from school within a designated time period (Sutphen, Ford, & Flaherty, 2010). Truancy is linked to negative social, financial, and psychological outcomes (e.g., Rocque, Jennings, Piquero, Ozkan, & Farrington, 2017). Particularly concerning is the association between truancy and the criminal justice system, with at least 64% state prisons inmates lacking a high school diploma (Curley, 2016), and approximately 92,000 youth arrested on school property between 2011-2012 (U.S. Department of Education, Office for Civil Rights, 2014). Reducing truancy rates, negative trajectories, and implementing interventions and services to support student outcomes is a formidable challenge for researchers, educators, and policymakers (Dembo & Gulledge, 2009).

Rates in which truancy impacts student outcomes vary by social and cultural groups, with an overrepresentation of low socioeconomic status (SES) and minority-status students with disciplinary consequences which keep students out of classrooms (Fenning & Rose, 2007; Gardner & Martin, 2018). Of particular concern is the disparity in discipline rates among African Americans students in exclusionary and punitive consequences (Skiba et al., 2011) and rate of office referrals (Skiba, Michael, Nardo, & Peterson, 2002), both positively related to truancy rates. Washington, the context of this study, has the second highest rate of chronic truancy in the nation (Rowe, 2017). Thus, state policymakers and practitioners have focused on early-warning indicators such as truancy and associated behaviors to support student outcomes. Such indicators may be informed by evidence-based resources including risk assessments (Mendoza, Rose, Geiger, & Cash, 2016).

The Washington Assessment of Risk and Needs of Students (WARNS) is an instrument used by state service providers, courts, and schools for information about maladaptive behaviors and needs of youth aged 13 to 18 years (George et al., 2015). Scores are used to target appropriate interventions for students who indicate risk for behaviors linked to truancy and delinquency. Given score use and race/ethnic differences in truancy and related behaviors, providing score validation evidence is essential for measures such as WARNS (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014; Kane, 2013). Some validity evidence does exist for the WARNS including internal factor structure (e.g., Strand, Gotch, French, & Beaver, 2017) and latent profiles of risk and needs (e.g., Iverson, French, Strand, Gotch, & McCurley, 2016), yet there lacks evidence of item invariance across groups for which the instrument is widely used. Differential item functioning (DIF) refers to when the probability of endorsing a certain item response is unequal for different groups who possess the same ability level (Zumbo, 1999). DIF can introduce systematic bias and reduce test score validity (Kane, 2013). To safeguard against biased decision-making from WARNS scores, DIF was evaluated within a scoring inference using Kane’s (2013) validity argument framework, as “scoring procedures are appropriate, are applied as intended, and are free of overt bias” (p. 25) are assumptions for this inference. This validity lens is used to evaluate DIF evidence, which can support a fairness aspect of a scoring inference in building a validity argument. However, we remind the reader that the scoring inference examined across three groups alone cannot support all fairness claims about WARNS scores.

Method

Participants

Washington and Georgia state students (N = 1,468, 52% male), aged 13 to 18 years, completed the WARNS as standard practice in their schools during the 2017-2018 academic year. Data were collected via the online WARNS System, where a counselor or school administrator was present during administration, averaging 13 minutes. The ethnic background of students was Latinx (40.7%), followed by Caucasian (36.4%) and African American (22.9%). Of the students, 83.6% were from Washington and 16.4% were from Georgia, where Washington schools used WARNS routinely, and the Georgia district was in a first year of implementation of the assessment. In both cases, WARNS results are used for conversations with the student and parents and an information point in offering services and interventions.

Instrument

The WARNS (George et al., 2015) is a self-report, 40-item measure consisting of six subdomains. Each subdomain measured a student’s reflection of a thought, behavior, or emotion, in the past 2 months using a rating scale from 0 = Never or Almost Never, 1 = Sometimes, 2 = Often, and 3 = Always or Almost Always. See Table 1 for subdomain descriptions and internal consistency reliability estimates for this sample, which ranged from .78 to .87. The manual (George et al., 2015) provides internal structure evidence of a bifactor model for a general risk and needs factor, and six specific factors as well as high correlations with similar measures (<.70) such as the Problem Oriented Screening Instrument for Teenagers (POSIT; Rahdert, 1991).

Table 1.

WARNS Domain Descriptions and Internal Consistency Reliability Estimates (α) From Study Data.

Subdomain	Description	α
Aggression-Defiance (n_items = 8)	Youth’s externalizing behavior, destruction behavior, and defiance.	.87
Depression-Anxiety (n_items = 8)	Youth’s internalizing experience with distress and feelings of depression and/or anxiety.	.82
Family Environment (n_items = 5)	Quality of the parent–child relationship and environment including parental support, conflict, and feelings of closeness.	.79
Peer Deviance (n_items = 5)	Youth’s peer deviant activities including truancy, delinquency, trouble in school.	.83
School Engagement (n_items = 9)	Youth’s feelings of connection to the school environment and the task of learning.	.84
Substance Abuse (n_items = 5)	Frequency and severity that the youth uses alcohol, marijuana, and hard drugs, how use interferes with school.	.78

Note. WARNS = Washington Assessment of Risk and Needs of Students.

Analysis

An item response theory (IRT) graded response model (Samejima, 1969) was employed for each subdomain. Step 1 involved an iterative process to identify a purified anchor set of items for accurate matching of ability level (e.g., French & Maller, 2007). In Step 2, DIF was identified using the χ² test of significance for total DIF, a-parameter DIF (i.e., slope), and b-parameter (i.e., k’–’1 thresholds) DIF at the significance level of p < .01. In Step 3, items displaying DIF were categorized based on effect sizes and statistical significance, where .40 = small, .60 = moderate, and .80 = large areas between the item characteristic curve (ICC; Raju, 1988), representing the magnitude of DIF (Steinberg & Thissen, 2006).

Results and Discussion

Across all WARNS items, six (15%) items showed a large DIF effect size. Tables 2 to 4 contain the parameter estimates for the DIF items by group and by subdomain. For the Aggression-Defiance subdomain (Table 2), three DIF items were identified. When examining b-parameters, large effect sizes across groups ranged from 0.93 to 1.41, where African American students were more likely to endorse the Fights item, compared to their counterparts. See ICC in Figure 1. In the Family Environment subdomain, the item Argue showed differences between the African American and Caucasian group, with the latter having a greater item discrimination value compared to the former group, and a medium effect size (0.72; see Table 3; Figure 1 for ICC). Differences across groups on the b-parameters ranged from moderate (0.48) to large (2.61) effects. A single large b-DIF item was identified in the Peer Deviance subdomain (i.e., FrFights; Table 3), where African American students were more likely to endorse the item compared to Caucasian students, with moderate (0.60) to large (0.85) effect. In the subdomain of School Engagement, one DIF item was identified (i.e., HWComplete; Table 4), where b-parameter comparisons revealed effect sizes ranging from small (0.38) to large (1.03). The subdomains of Substance Abuse and Depression and Anxiety indicated no DIF items.

Table 2.

DIF Parameter Estimates for the Aggression-Defiance Subdomain by Group.

Item(s)	African American				Caucasian
Item(s)	a	b ₁	b ₂	b ₃	a	b ₁	b ₂	b ₃
Fights^a	1.15 (0.16)	0.43 (0.12)	2.59 (0.27)	4.41 (0.57)	1.35 (0.20)	1.84 (0.20)	3.52 (0.47)	4.48 (0.71)
ES	NA	−1.41	−0.93	−0.07
Lied^b	1.68 (0.21)	0.16 (0.10)	2.33 (0.19)	3.81 (0.42)	1.77 (0.18)	−0.29 (0.07)	1.44 (0.11)	2.48 (0.21)
ES	NA	0.45	0.89	1.33
	African American				Latinx
	a	b ₁	b ₂	b ₃	a	b ₁	b ₂	b ₃
Fights	1.13 (0.15)	0.58 (0.12)	2.77 (0.28)	4.62 (0.58)	1.24 (0.19)	1.95 (0.22)	3.70 (0.48)	4.87 (0.76)
ES	NA	−1.37	−0.93	−0.25
	Caucasian				Latinx
	a	b ₁	b ₂	b ₃	a	b ₁	b ₂	b ₃
Threatened^c	2.18 (0.27)	1.42 (0.10)	2.43 (0.19)	3.31 (0.34)	2.01 (0.30)	1.87 (0.16)	2.71 (0.26)	4.13 (0.64)
ES	−0.45	−0.28	−0.82

Note. Standard errors are in parentheses. ES = effect size, ES >0.80. DIF = differential item functioning.

I got into physical fights.

I lied, disobeyed, or talked back to adults.

I threatened to hurt someone.

Figure 1.

Item characteristic curves for African American (Panel A), Caucasian (Panel B), and Latinx (Panel C) groups for item “I got into physical fights” of the aggression-defiance subdomain (top row) and item “I got into arguments with my parents” of the Family Environment subdomain (bottom row).

Table 3.

DIF Parameter Estimates for the Family Environment and Peer Deviance Subdomain by Group.

Family environment
Item	African American				Caucasian
Item	a	b₁	b₂	b₃	a	b ₁	b ₂	b ₃
Argue^a	0.63 (0.12)	−0.30 (0.19)	2.76 (0.54)	4.94 (0.95)	1.40 (0.14)	−1.02 (0.11)	0.97 (0.11)	2.33 (0.2)
ES	0.71	0.72	1.79	2.61
FrFights^b	2.42 (0.28)	0.08 (0.07)	1.22 (0.09)	2.5(0.21)	1.99 (0.21)	0.68 (0.08)	2.03 (0.16)	3.36 (0.34)
ES	NA	−0.60	−0.81	−0.85

Note. Standard errors are in parentheses. ES = effect size; ES >0.80. DIF = differential item functioning.

I got into arguments with my parents.

My friends got into physical fights.

Table 4.

DIF Parameter Estimates for the School Engagement Subdomain by Group.

Item	African American				Caucasian
Item	a	b ₁	b ₂	b ₃	a	b ₁	b ₂	b ₃
HWcomplete^a	1.44 (0.17)	−1.42 (0.14)	−0.41 (0.10)	1.33 (0.18)	1.15 (0.12)	−2.45 (0.24)	−0.93 (0.12)	1.13 (0.14)
ES	NA	1.03	0.52	0.20
	African American				Latinx
	a	b ₁	b ₂	b ₃	a	b ₁	b ₂	b ₃
HWcomplete	1.44 (0.16)	−1.17 (0.16)	−0.16 (0.11)	1.58 (0.16)	1.69 (0.15)	−2.16 (0.17)	−0.85 (0.09)	1.20 (0.10)
ES	NA	0.99	0.69	0.38
	Caucasian				Latinx
	a	b ₁	b ₂	b ₃	a	b ₁	b ₂	b ₃
HWcomplete	1.16 (0.12)	−2.17 (0.25)	−0.66 (0.12)	1.38 (0.13)	1.69 (0.15)	−2.16 (0.17)	−0.85 (0.09)	1.20 (0.10)
ES	NA	−0.01	0.19	0.18

Note. Standard errors are in parentheses. ES = effect size; ES >0.80. DIF = differential item functioning; HW = homework.

I got my homework completed and turned in on time.

Results indicated that 6% of comparisons identified large DIF effects between groups. These DIF rates fall within the 5% and 10% commonly found with race and gender (Miller & Oshima, 1992). The main implication for practice for these groups is that the minor level of DIF spread across subdomains may not likely change individual decisions made with WARNS scores, although this requires additional empirical evidence to confirm. Nevertheless, these results generally support a scoring inference (Kane, 2013) where overt bias is minimized across the 120 group comparisons.

That said, the differences found with respect to the African American sample push DIF analyses in a different direction. This study’s convenience sample of African American students from a charter school in Georgia comprised students who do not fit into the regular school system (e.g., credit deficient, high truancy problems). Thus, differences between the Georgia–Washington school settings may contribute to why 87% of DIF items impacted the African American group. The context in which WARNS subdomains are measured may be overlooked in this study (Anastasi, 1993). Future research should explore how the sociocultural environment of an individual can help uncover how behaviors manifest differently depending on the student’s context (Bronfenbrenner, 1997; John-Steiner & Mahn, 1996). Pursuing an ecological DIF framework to understand how context may explain DIF across groups (Zumbo et al., 2015). It may be informative to take a multilevel modeling DIF framework (Shear, 2018). The use of school or contextual level variables (e.g., school truancy and discipline rates) may assist with exploring whether differences are related to a simple dichotomous category classification (e.g., race/ethnicity) or more complex sociocultural or contextual variables.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Brian F. French

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). The standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Anastasi

(1993). A century of psychological testing: Origins, problems, and progress. In Exploring applied psychology: Origins and critical analyses (pp. 9-36). doi:10.1037/11104-001

Bronfenbrenner

(1997). The ecology of cognitive development: Research models and fugitive findings. In Wozniak

Fischer

(Eds.), College student development and academic life: Psychological, intellectual, social and moral issues (pp. 3-44). Hillsdale, NJ: Lawrence Erlbaum.

Curley

(2016, November 18). How education deficiency drives mass incarceration. Retrieved from http://www.genfkd.org/education-deficiency-drives-mass-incarceration

Dembo

Gulledge

L. M.

(2009). Truancy intervention programs: Challenges and innovations to implementation. Criminal Justice Policy Review, 20, 437-456. doi:10.1177/0887403408327923

Fenning

Rose

(2007). Overrepresentation of African American students in exclusionary discipline: The role of school policy. Urban Education, 42, 536-559. doi:10.1177/0042085907305039

French

B. F.

Maller

S. J.

(2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393. doi:10.1177/0013164406294781

Gardner

Martin

(2018). Zero dropouts, zero arrests: Achieving systemic reform through collaboration. Retrieved from https://ccyj.org/wp-content/uploads/2019/05/Zero-Dropout-Zero-Arrests-2.pdf

George

Coker

French

Strand

Gotch

McBride

McCurley

(2015). Washington Assessment of the Risks and Needs of Students (WARNS) user manual. Olympia, WA: Center for Court Research.

10.

Iverson

French

B. F.

Strand

P. S.

Gotch

C. M.

McCurley

(2016). Understanding school truancy: Risk-need latent profiles of adolescents. Assessment, 25, 978-987. doi:10.1177/1073191116672329

11.

John-Steiner

Mahn

(1996). Sociocultural approaches to learning and development: a Vygotskian Framework. Educational Psychologist, 31(3-4), 191–206. doi:10.1207/s15326985ep3103&4_4

12.

Kane

M. T.

(2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73. doi:10.1111/jedm.12000

13.

Mendoza

N. S.

Rose

R. A.

Geiger

J. M.

Cash

S. J.

(2016). Risk assessment with actuarial and clinical methods: Measurement and evidence-based practice. Child Abuse & Neglect, 61(1–12). doi:10.1016/j.chiabu.2016.09.004

14.

Miller

M. D.

Oshima

T. C.

(1992). Effect of sample size, number of biased items, and magnitude of bias on a two-stage item bias estimation method. Applied Psychological Measurement, 16, 381-388. doi:10.1177/014662169201600410

15.

Raju

N. S.

(1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502.

16.

Rahdert

E. R.

(1991). The adolescent assessment/referral system. Rockville, MD: National Institute on Drug Abuse.

17.

Rocque

Jennings

W. G.

Piquero

A. R.

Ozkan

Farrington

D. P.

(2017). The importance of school attendance: Findings from the Cambridge study in delinquent development on the life-course effects of truancy. Crime & Delinquency, 63, 592-612. doi:10.1177/0011128716660520

18.

Rowe

(2017, September 3). Schools in Washington rank second-worst in nation for chronic absenteeism. Retrieved from https://www.seattletimes.com/education-lab/high-numbers-of-washington-students-miss-school-which-hinders-learning-for-all/

19.

Samejima

(1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34(S1), 1-97. doi:10.1007/bf03372160

20.

Shear

B. R.

(2018). Using hierarchical logistic regression to study DIF and DIF variance in multilevel Data: DIF and DIF variance in multilevel data. Journal of Educational Measurement, 55(4), 513-542. doi:10.1111/jedm.12190

21.

Skiba

R. J.

Horner

R. H.

Chung

C. G.

Rausch

M. K.

May

S. L.

Tobin

(2011). Race is not neutral: A national investigation of African American and Latino disproportionality in school discipline. School Psychology Review, 40, 85-108.

22.

Skiba

R. J.

Michael

R. S.

Nardo

A. C.

Peterson

R. L.

(2002). The color of discipline: Sources of racial and gender disproportionality in school punishment. The Urban Review, 34, 317-342. doi:10.1023/a:1021320817372

23.

Steinberg

Thissen

(2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11, 402-415. doi:10.1037/1082-989x.11.4.402

24.

Strand

P. S.

Gotch

C. M.

French

B. F.

Beaver

J. L.

(2017). Factor structure and invariance of an adolescent risks and needs assessment. Assessment. Advance online publication. doi:10.1177/1073191117706021

25.

Sutphen

R. D.

Ford

J. P.

Flaherty

(2010). Truancy interventions: A review of the research literature. Research on Social Work Practice, 20, 161-171. doi:10.1177/1049731509347861

26.

U.S. Department of Education, Office for Civil Rights. (2014). Civil rights data collection: Data snapshot: School discipline. Retrieved from http://ocrdata.ed.gov/Downloads/CRDC-School-Discipline-Snapshot.pdf

27.

Zumbo

B. D.

(1999). A handbook on the theory and methods of Differential Item Functioning (DIF). Ottawa, Ontario, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.

28.

Zumbo

B. D.

Liu

A. D.

Shear

B. R.

Olvera Astivia

O. L.

Ark

T. K.

(2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12, 136-151. doi:10.1080/15434303.2014.972559