Abstract
Factor analytic procedures were used to gather validity evidence on the internal structure of Skinner, Kinderman, and Furrer’s Engagement versus Disaffection With Learning measure to operationalize students’ classroom engagement. Independent samples of middle school students (Sample 1 = 618, Sample 2 = 493) were used to test the scale’s factor structure and invariance of measurement model parameters. Factor analytic results supported an alternative three-factor model of the scale’s previously reported four-factor solution. Cross-validation was based on the use of multisample confirmatory factor analysis, which supported strict factorial invariance of the scale’s measurement model parameters (e.g., factor loadings) across samples. Subscale reliability estimates, based on coefficient omega, exceeded 0.74. Implications of findings for research and practice are discussed.
Theoretical Framework
Student engagement is considered a key determinant of students’ academic and behavioral outcomes (Appleton, Christenson, & Furlong, 2008; Fredricks, Blumenfeld, & Paris, 2004). Broadly, it is the “quality of a student’s connection or involvement with the endeavor of schooling and hence with the people, activities, goals, values, and place that compose it” (Skinner, Kindermann, & Furrer, 2009, p. 494). It is posited to influence classroom participation and affected by the learning environment, such that it is found to decline with schooling (Hughes & Kwok, 2007; Skinner, Kindermann, Connell, & Wellborn, 2009). At the middle school level, research has focused on student engagement as a key factor in student learning (Abbott, 2017; Orthner, Jones-Sanpei, Akos, & Rose, 2013; Wang & Holcombe, 2010). This includes the development of classroom-based surveys to operationalize student engagement (Fredricks & McColskey, 2012; Fredricks et al., 2011). The aim of this study was to expand the validity evidence of scores of the classroom-administered Engagement versus Disaffection With Learning (EvD; Skinner, Kindermann, & Furrer, 2009) survey to assess students’ classroom engagement (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014).
The conceptualization and measurement of student engagement vary considerably in the literature (Appleton et al., 2008). Although conceptualized as a multidimensional construct, it has been operationalized in terms of a two-factor model with behavioral and affective components (Willms, 2003) and, more broadly, as a four-factor model with additional cognitive and psychological components (Reschly & Christenson, 2006). Within academic settings, engagement behaviors include persistence and classroom participation, whereas emotional engagement includes interest and enjoyment. Within a motivational framework, Skinner, Kindermann, & Furrer (2009) advanced a four-factor model of engagement with distinct engagement and disaffection, as well as behavioral and emotional, dimensions. Uniquely, their engagement model also considered its opposite, disaffection, derived from behaviors, mental states, and emotions that stem from disengagement. Disaffected behaviors include passivity and a lack of attention, whereas disaffected emotions include frustration and tiredness. The model posits that engagement and disaffection are observable behaviors indicative of classroom behavior and related to various student and environmental factors.
The EvD (Skinner, Kindermann, & Furrer, 2009) was developed as a multidimensional measure of students’ classroom engagement with four subscales: Behavioral Engagement (BE), Behavior Disaffection (BD), Emotional Engagement (EE), and Emotional Disaffection (ED). It was developed and validated based on data obtained on a large sample of third to sixth graders. Reported internal consistency estimates of subscale scores ranged from 0.61 to 0.85 and test–retest estimates ranged from 0.53 to 0.68. Factor analysis supported the unidimensionality of subscales, and alternative two-factor models supported distinct behavioral, emotional, engagement, and disaffection factors. A four-factor solution reported similar model-data fit as the two-factor models. Student and teacher reports were moderately correlated, establishing preliminary evidence for interrater reliability. Convergent validity was based on the correlation of scores to theoretically related variables (e.g., perceived control). Teacher reports were also correlated with these measures but to a weaker extent. Classroom observations of student engagement were found to correlate with teacher reports but not with student reports. Skinner, Kindermann, & Furrer (2009) reported that subscales can be used separately or aggregated to report students’ engagement versus disaffection. Although the psychometrics properties of the EvD are comparable to other engagement measures (Fredricks et al., 2011), further research is needed on the measurement of student engagement to support the use of engagement scale scores for decision-making purposes (Appleton et al., 2008).
In response, the purpose of this study was to gather additional validity evidence on the internal structure of the EvD among middle school students. Although Skinner, Kindermann, & Furrer (2009) also developed a teacher version of the instrument, this study focused on the student version. Situated in the Standards (e.g., Standard 1.13; AERA et al., 2014), this study contributes to ongoing validity studies to substantiate score interpretation and use to guide decisions pertaining to middle school students’ classroom engagement.
Method
Participants
Data were based on independent samples (NSample 1 = 618, NSample 2 = 493) of Grade 5 to 8 middle school students from 17 middle schools with 48 teachers participating in regional professional development on the implementation of project-based mathematics learning. Sample 1 grade levels included 22.3% Grade 6, 49.1% Grade 7, and 28.6% Grade 8. Sample 2 grade levels included 4.2% Grade 5, 39% Grade 6, 16.2% Grade 7, and 21.6% Grade 8 (19.1% missing). Due to confidentiality, individual student demographics were not gathered; however, the average school enrollment was 563.59 (SD = 215.12), 87% White, and 50.37% qualified for free/reduced lunch (SD = 13.57). On average, the percent of students scoring at or above proficiency in reading and mathematics were 57.53% (SD = 11.14) and 45.66% (SD = 16.55), respectively. The surveys, administered based on passive consent by the researchers in individual classrooms, took approximately 30 minutes per classroom. The average number of students within classrooms was 23 (SD = 8.68).
Instrumentation
The 24-item EvD measures students’ behavioral and emotional classroom engagement and disaffection, with responses recorded on a 4-point scale (1 = not at all true, 2 = rarely true, 3 = somewhat true, and 4 = very true). The four subscales are as follows: BE (five items), BD (five items), EE (five items), and ED (nine items). Notably, the ED subscale includes an additional three items to “tap the more differentiated disaffected emotions” (Skinner, Kindermann, & Furrer, 2009, 520), included in the present study. Scale validation was based on data obtained from 1,018 students in Grades 3 to 6 from a 4-year longitudinal study. Correlations among components indicated that engagement scores were strongly correlated, whereas disaffection scores were moderately to strongly correlated. With the exception of the ED subscale, subscales were found to be unidimensional. Competing confirmatory factor analysis (CFA) models supported the fit of a four-factor solution for the entire instrument with distinct BE, EE, BD, and ED factors. Notably, an alternative two-factor solution was also found to be acceptable. Subscale score internal consistency estimates ranged from 0.61 (BE) to 0.85 (ED), whereas combined scale scores (e.g., BE, EE) exceeded 0.73. Test–retest reliabilities over the academic year ranged from 0.53 to 0.73. Additional information is provided in Skinner, Kindermann, & Furrer (2009) study.
Data Analysis
Item analyses were used to examine the functioning of scale items, with designated items reverse scored. Alternative CFA models were used to test the scale’s factor structure based on Sample 1 data, with multisample CFA (Cheung & Rensvold, 2002) used to cross-validate its factor structure with Sample 2 data. Sample 1 data were randomly divided in half (Sample 1A, Sample 1B) such that if model-data fit of the CFA model was poor, an exploratory factor analysis (EFA) was used to identify the empirical factors, based on Sample 1B data. Factor retention was based upon eigenvalues >1.0, parallel analysis (PA), scree plot, and factor interpretability (Henson & Roberts, 2006), with loadings above 0.40 to identify salient loadings.
CFA was based upon robust weighted least squares (WLSMV; Muthén et al., 1997) for parameter estimation using Mplus 8.0 (Muthén & Muthén, 1998-2017) and reported to yield accurate parameter estimates for ordinal variables (Flora & Curran, 2004). Evaluation of model-data fit included chi-square statistic, root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker–Lewis Index (TLI). Due to the absence of cutoff criteria for fit statistics with the WLSMV estimator, Hu and Bentler’s (1999) empirically-based cutoffs (based on maximum likelihood) were used to evaluate model-data fit: RMSEA values less than 0.05 were used to indicate good model fit and those less than 0.08 suggested reasonable fit. CFI and TLI values above 0.95 were deemed acceptable. The COMPLEX command was used to compute standard errors and model chi-square statistics to account for nonindependence of observations (students nested in classrooms). Measurement model parameters of interest included factor loadings, thresholds, and residuals to determine level of invariance (Meredith, 1993).
As the chi-square difference statistic (χ2Difference) is known to reject the null hypothesis of equivalent model parameters in invariance testing based on trivial differences in large sample sizes, the incremental changes of the CFI and RMSEA values also were used in the tests of measurement invariance (i.e., <0.01; Cheung & Rensvold, 2002). Furthermore, WLSMV (DIFFTEST) as well as the theta parameterization option in Mplus were used to test error variance equality (Muthén & Muthén, 1998-2017). As a factor analytic estimate of reliability, coefficient omega was used to estimate internal consistency of scores (ω; Revelle & Zinbarg, 2009).
Results and Discussion
Descriptive Statistics
Table 1 reports item means ranging from 2.33 (Item 27) to 3.70 (Item 1) and standard deviations ranging from 0.51 (Item 1) to 1.14 (Item 27). Across items, the range was 3, indicating use of the entire response continuum. The average item-total correlation for Sample 1 was 0.58 (SD = 0.10; range = 0.29 [Item 3] to 0.74 [Item 12]) and for Sample 2 was 0.60 (SD = 0.07; range = 0.35 [Item 3] to 0.72 [Item 7]), indicating moderate relationships between items and the total score.
Item Descriptive Statistics of the EvD Instrument Across Samples 1 and 2.
Note. Values in parenthesis. Minimum and maximum values were 1 and 4 across samples. EvD = Engagement Versus Disaffection With Learning.
Model-data fit of alternative two-factor CFA models with distinct Engagement and Disaffection—χ2(323) = 989.51, RMSEA = 0.083 (90% confidence interval [CI] = [0.077, 0.089]), CFI = 0.854, TLI = 0.841—or Behavior and Emotional factors—χ2(323) = 971.53, RMSEA = 0.082 (90% CI = [0.076, 0.088]), CFI = 0.857, TLI = 0.845—based on Sample 1A were not acceptable across fit statistics. These findings are in contrast with Skinner, Kindermann, & Furrer (2009) findings of acceptable model-data fit of two-factor models with distinct Engagement and Disaffection and Behavioral and Emotional factors. Subsequently, four- and six-factor models with separate BE, BD, and alternative emotional factors were fit to the data, resulting in improper solutions (i.e., not positive definite matrix for the latent variable covariance matrix—correlations). Inspection of parameter estimates indicated that the correlation between the BE and BD factors was close to unity, indicating indistinguishable factors. Instead of imposing parameter constraints, an EFA was conducted to further examine the scale’s factor structure.
Based on Sample 1B data, a principal axis EFA with promax rotation (kappa = 4) supported a three-factor solution, with real-data eigenvalues of 9.03, 3.17, and 1.66 (PA = 1.66, 1.55, and 1.48). The factors resembled those of the original scale, with the emotional engagement and disaffection items generally falling on distinct factors and behavioral items loading onto Factor 3. Table 2 reports factor loadings, structure coefficients, and communalities. Factor correlations were moderate (range = 0.36 [BE and ED] to 0.68 [EE and ED]). Factor 1 included eight items and was labeled Emotional Engagement (EE), Factor 2 included seven items and was labeled Emotional Disaffection (ED), and Factor 3 included seven items and was labeled Behavioral Engagement (BE). Five items were eliminated due to a lack of a salient loading (<0.40). The lack of distinct behavioral engagement and disaffection factors suggested that students did not differentiate between these engagement types. Therefore, EFA findings generally support Skinner, Kindermann, & Furrer’s (2009) proposition of distinct emotional engagement and disaffection dimensions but differ in terms of a single unified behavioral dimension.
Exploratory Factor Analysis Pattern Coefficients, Structure Coefficients, and Communality Estimates.
Note. Bold values indicate salient factor loadings greater than |0.40|.P = Pattern coefficients; S = Structure coefficients.
Based on Sample 2 data, model-data fit of the EFA-based factor structure was acceptable—χ2(206) = 736.09, RMSEA = 0.072 (90% CI = [0.067, 0.078]), CFI = 0.953, TLI = 0.947. Table 3 reports pattern and structure coefficients and error variances. Factor correlations were low to moderate (range = 0.34 [EE and ED] to 0.69 [BE and EE]). Configural invariance was supported with acceptable model-data fit of the three-factor solution across Samples 1 and 2 data—χ2(412) = 1,383.863, RMSEA = 0.065 (90% CI = [0.061, 0.069]), CFI = 0.958, TLI = 0.953. Although a subsequent test of invariant factor loadings reported a statistically significant chi-square difference statistic—χ2Difference(19) = 29.09, p < .05—changes in RMSEA and CFI were less than 0.01, thus supporting their invariance. Next, the invariance of thresholds was tested which were deemed equal across groups due to changes in the RMSEA and CFI values less than 0.01, despite a significant chi-square difference test—χ2Difference(41) = 75.12, p < .01. Last, strict factorial invariance was met with the finding of invariant error variances—χ2Difference(22) = 95.07, p < .01; ΔRMSEA and ΔCFI < 0.01—suggesting similar reliability across distinct samples. Coefficient omega for the BE, EE, and ED subscales were 0.74, 0.81, and 0.77,
Factor Loadings, Structure Coefficients, and Error Variance Across Samples 1 and 2.
Note. Parameter estimates in parenthesis. Standardized solution reported.P = Pattern coefficients; S = Structure coefficients.
Table 4 reports descriptive statistics of scale scores in which students reported slightly higher than somewhat true in their behavioral engagement and slightly lower than somewhat true for their emotional engagement and disaffection. Subscale score coefficient alpha estimates ranged from .79 to .90 (EE), higher than previously reported estimates. BE scores were moderately correlated with EE scores (r = .59, p < .01) and low with ED scores (r = .29, p < .01), whereas EE and ED scores were moderately correlated (r = .40, p < .01). Although Skinner, Kindermann, & Furrer (2009) suggested that subscales could be used as distinct measures or combined to create aggregate scores, continued research is needed to further substantiate the value added by using scores in this manner. Nonetheless, combined with acceptable internal consistency reliability estimates, the scale seems suitable for use in research and educational settings. These findings further support the multidimensional structure of student engagement and the need for continued research to examine the structure of student engagement scores across diverse settings, students and their relationships to external variables.
Descriptive Information for Samples 1 and 2.
Note. Values in parenthesis unless otherwise specified.α = Cronbach’s coefficient alpha; CI = confidence interval.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
