Abstract
Self-assessment is fundamental to self-regulated learning; however, instruments to measure self-assessment practices are limited to a few developed educational systems. This study examined the psychometric properties of the Self-assessment Practice Scale (SaPS) in the English language subject using data from 778 secondary school students from the Philippines. We used confirmatory factor analysis (CFA) and Rasch analysis to test the SaPS’ within-network validity, then bivariate correlations and structural equation modeling (SEM) for between-network validity. The CFA supported the scale’s four-factor structure, and the Rasch analysis supported the scale’s dimensionality, rating scale effectiveness, and item fit statistics. The four SaPS subscales were positively correlated to agentic, cognitive, and metacognitive engagement. SEM results show that all SaPS factors (except self-monitoring) had significant associations to the engagement outcomes. This study highlights the sound psychometric properties of SaPS in a new educational context and its applicability as a subject-specific measure of assessment-as-learning strategies.
Keywords
Self-assessment is generally conceptualized as a process through which students judge their performance based on the information and evidence collected from different sources (Boud, 1995; McMillan & Hearn, 2008; Yan & Brown, 2017). Broadly speaking, self-assessment occurs when “students reflect on the quality of their work, judge the degree to which it reflects explicitly stated goals or criteria, and revise accordingly” (Andrade & Valtcheva, 2009, p. 13). Self-assessment is a core skill for self-regulated learning (Panadero & Alonso-Tapia, 2013; Yan, Brubacher, Boud, & Powell, 2020) and lifelong learning (Papanthymou, 2018; Siegesmund, 2017; Yan & Brown, 2017). It is linked with key learning outcomes, such as school achievement (McDonald & Boud, 2003; Yan, Chiu, & Ko, 2020), student motivation (McMillan & Hearn, 2008), and self-regulated learning (Panadero, Jonsson, & Botella, 2017), among others (see Andrade, 2019; Brown & Harris, 2013, for reviews). Because of the positive association between self-assessment and key learning outcomes, researchers have recognized the need to understand the self-assessment process and develop instruments to assess self-assessment (Panadero, Brown, & Strijbos, 2016; Yan & Brown, 2017). Doing so can help in programs or interventions designed to enact and maximize the impact of self-assessment.
Yan (2018a) adopted a theory-driven approach in developing the Self-assessment Practice Scale (SaPS) in the Hong Kong school context, which has been successfully applied in other contexts, such as higher education (Yan, 2020b) and professional training in western culture (Yan, Brubacher, Boud, & Powell, 2020). The instrument evaluates students’ use of two core self-assessment processes: seeking self-directed feedback and self-reflection. However, the SaPS’ validity for evaluating learner-centered formative assessment has not yet been investigated in developing countries with varied educational curricula (e.g., The Philippines). Validating the SaPS in the Philippines is especially crucial and beneficial, given the Philippines’ poor performance in its first participation in the recent Program for International Student Assessment or PISA 2018 (OECD, 2019).
This study aims to evaluate the psychometric properties of the SaPS (Yan, 2018a) in the Philippines. This research is a first attempt to investigate the validity of the SaPS in the English language subject at a secondary school level. This could then extend the use of the SaPS across cultures in light of improving the use of 21st-century skills like student self-assessment.
SaPS
Assessing self-assessment practices is different from evaluating self-assessment accuracy. Although there is a consensus that self-assessment is a complex process incorporating multiple steps (see Andrade Du & Wang, 2008; Boud, 1995), the understanding of self-assessment practices (i.e. the actions students do during the self-assessment process) is surprisingly limited. Yan and Brown (2017) proposed a cyclical process model of self-assessment that explicitly outlines three sequential actions of self-assessment including (1) determining assessment criteria, (2) self-directed feedback seeking, and (3) engaging in self-reflection. These actions were included in self-report measures following the cyclical process of self-assessment (Yan, 2016, 2018b), which evolve into the SaPS (Yan, 2018a). To our knowledge, the SaPS is the only available theory-driven instrument that is specifically designed for assessing self-assessment practices.
The SaPS is focused on the self-assessment processes of self-directed feedback seeking and self-reflection. The scale starts with “When I study…,” followed by self-assessment items which are grouped into four subscales corresponding to the four self-assessment actions: seeking external feedback through monitoring (SEFM; 5 items), seeking external feedback through inquiry (SEFI; 4 items), seeking internal feedback (SIF; 4 items), and engaging in self-reflection (SR; 7 items). The scale was validated with a sample of 2906 primary and secondary Hong Kong students. The results of factor analysis and Rasch analysis for SaPS supported its psychometric properties and structural validity (Yan, 2018a).
The SaPS and its short form and adapted versions have been used for graduate students (Yan, 2020b) and professional trainees in a Western cultural context (Yan, Brubacher, Boud, & Powell, 2020). The scale can track the development of self-assessment skills and can be used to inform teaching strategies that can promote self-assessment for optimizing learning outcomes. Evidence from recent interventions (e.g., randomized control trials and self-diaries) points to the effectiveness of self-assessment practices in increasing student achievement, self-regulation, and motivation (Meusen-Beekman, Joosten-ten Brinke, & Boshuizen, 2016; Yan, Chiu, & Ko, 2020).
The Current Study
This study extends the use of the SaPS as a critical tool in evaluating students’ self-assessment practices in the English language as a specific subject domain. We adopted the network construct validation approach (see Martin, 2007; Martin & Marsh, 2006) which involves examining the scale’s within-network construct validity (i.e. factor structure and internal reliability) and between-network construct validity (i.e. the association of SaPS’ dimensions with criterion-related constructs).
We used agentic, cognitive, and metacognitive engagement outcomes as correlates for testing the SaPS’ between-network validity since these outcomes are viewed as self-regulatory learning (SRL) strategies (Fredricks, Blumenfeld, & Paris, 2004; Zimmerman & Schunk, 2004) similar to self-assessment (see Brown & Harris, 2013; Panadero et al., 2017). Agentic engagement is defined as “students’ constructive contribution into the flow of the instruction they receive [in school]” (e.g., asking questions, communicating their thoughts and needs; Reeve & Tseng, 2011, p. 258) and is conceptualized as a key component of overall student engagement (Reeve, Cheon, & Jang, 2020). Cognitive engagement is composed of internal indicators of SRL, such as students’ striving and effort to understand complex ideas and master difficult skills (Fredricks et al., 2004; Fredricks & McColskey, 2012). Metacognitive engagement consists of SRL strategies such as planning, monitoring, and revising one’s schoolwork (Wolters, 2004). Metacognitive engagement has been posed as integrating motivation and SRL (see Zimmerman & Moylan, 2009) and is associated with self-assessment (Siegesmund, 2017). These student engagement outcomes are therefore posited as constructs interrelated with self-assessment.
Further, we aim to extend the utility of SaPS as a subject-specific instrument since students might employ different learning strategies (e.g., self-assessment practices and engagement outcomes) in different subject domains in school (e.g., Wigfield, Eccles, Mac Iver, Reuman, & Midgley, 1991; Wigfield, Guthrie, Tonks, & Perencevich, 2004). Therefore, we modified the opening SaPS prompt from “When I study” to “When I learn English…” to frame subject specificity. Such a procedure has been used previously with closely related constructs (e.g., self-concept and self-efficacy; Lent, Brown & Gore, 1997). Contextualizing self-assessment practices has also been recommended in a separate validation study (Yan, Brubacher, Boud, & Powell, 2020). The engagement outcomes measured are also framed in the context of the English language subject.
Method
Participants
Participants in the study were 778 secondary school students from the Philippines 1 . Participants with missing data greater than 5% and outliers for each of the measures were removed (see Data analysis section for details). Data from 673 students remained for the final analysis. The data consist of 186, 158, 157, and 172 students from Grades 7, 8, 9, and 10, respectively. Although the students’ age ranged from 11 to 19 years old (M = 14.14, SD = 1.51), there was only one 11-year-old and 12 18- and 19-year-olds. More than half of the participants were females (n = 376, 55.87%). It was vital to have a balanced number of participants from each grade level since SRL strategies can be developmentally influenced (see Paris & Newman, 1990).
Procedures
Data were collected through a paper-and-pen survey method. Procedures for this study were approved by the Human Research Ethics Committee of both authors’ affiliated university. A research assistant from the Philippines administered the questionnaires during the students’ class hours and was present during the data collection to respond to potential student inquiries. The questionnaires were in the English language as English is the medium of instruction in the Philippines’ K-12 schools. The students took about 10 minutes to complete the questionnaire.
Measures
SaPS
Bivariate correlations and descriptive statistics (n = 673).
Note. SEFM: seeking external feedback through monitoring; SEFI: seeking external feedback through inquiry; SIF: seeking internal feedback. *p < .05. **p < .01. ***p < .001. Values shown in parentheses on diagonal are internal consistency reliabilities of the scales (Cronbach’s alpha).
Item difficulties, standard errors, and item fit statistics for the SaPS items (n=768).
Note. *All measures and SEs are in logits. **To aid readability, this word is followed by a translation to the Philippine local language.
Engagement outcomes
We used a 5-item Agentic Engagement Scale (Reeve & Tseng, 2011) which consists of items on students’ constructive contribution to the flow of the instruction and feedback they receive in school (e.g., “During English class, I ask questions to help me learn.”). We used Wolters’s (2004) learning strategies questionnaire, which was derived from the Motivated Strategies for Learning Questionnaire (Pintrich, Smith, Garcia, & Mckeachie, 1993), to assess cognitive engagement (4 items) and metacognitive engagement strategies (4 items). Cognitive engagement items consist of elaboration-based learning strategies (e.g., “I adjust whatever we are learning so I can learn as much as possible.”), while metacognitive engagement items include self-regulation strategies (e.g., “Before I begin to study, I think about what I want to get done.”). These instruments’ internal consistencies in the present study are α = 0.87, 0.84, and 0.73, respectively.
Data Analysis
For the within-network validity, confirmatory factor analysis (CFA) and Rasch analysis were used. Numerous empirical studies have applied this complementary combined analytical approach (e.g., Chang & Engelhard, 2016; Testa et al., 2019; West et al., 2020; Yan, 2018a; Yan, Brubacher, Boud, Powell et al., 2020) to examine unique psychometric information of the scale items and subscales. The structural validity of the four-factor model of SaPS was assessed using CFA, where n = 19 participants with more than 5% of item-level missing data were excluded (Tabachnick, Fidell, & Ullman, 2007). Responses with item-level missing data were imputed using multiple imputation by chained equation (Azur, Stuart, Frangakis, & Leaf, 2011). Participants with extreme scores on multiple items or variables which might bias the parameter estimates (i.e. multivariate outliers; Kline, 2015) were identified using the Mahalanobis distance rule (see Bedrick, Lapidus, & Powell, 2000). This rule evaluates how far each person’s score lies from the centroid of all the participants (i.e. the aggregated mean of all item means). Fifty-seven participants were identified as outliers and were excluded from the CFA. Next, the two SaPS structural models were tested: a one-factor model and a four-factor model (SEFM, SEFI, SIF, and SR). All items were treated as continuous data, and the following goodness-of-fit indices were used to evaluate and compare the models: Comparative Fit Index (CFI), Tucker–Lewis Index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). Following the Hu and Bentler (1995) recommendation, a good model fit would include model CFI and TLI of greater than 0.90 and an RMSEA of less than .08. An SRMR value of less than .08 is considered a good fit (Hu & Bentler, 1999).
A multidimensional Rasch-based model (Adams, Wilson & Wang, 1997) using ConQuest 2.0 (Wu, Adams, Wilson & Haldane, 2007) was employed in this study because self-assessment practice was conceptualized as a multidimensional construct (Yan & Brown, 2017). All subscales are calibrated simultaneously, and the measurement precision on each subscale can be enhanced by taking into account any intercorrelations between the subscales (Bond, Yan, & Heene, 2020). The indicators used to examine the instrument quality in Rasch analysis include Rasch reliability, response category functioning, and item MNSQ (mean squared) fit statistics (i.e. infit MNSQ and outfit MNSQ).
For the between-network validity, we used bivariate correlations and structural equation modeling (SEM), respectively. Specifically, we evaluated how each of the hypothesized SaPS subscales correlate to agentic, cognitive, and metacognitive engagement followed by SEM. The use of a full SEM is essential to further examine the association between the constructs while controlling for item-level measurement errors (Yu & Hsu, 2013; Zumbo, 2014). We first tested whether the three engagement outcomes would fit a three-factor model. Then, we regressed the agentic, cognitive, and metacognitive engagement outcomes to the four SaPS factors. The same data screening procedures with the SaPS CFA were applied to the engagement outcomes. Consequently, the engagement data (n = 706) were merged to the SaPS data (n = 702), resulting in a final analytic dataset (n = 673) for the SEM. Apart from the Rasch analysis, all other analyses were conducted through the statistics software R (R Core Team, 2016) using Rosseel’s (2012) lavaan package.
Results
Within-Network Construct Validity
Confirmatory factor analysis and structural equation modeling fit indices
Note. CFA: confirmatory factor analysis; CFI: comparative fit index; RMSEA: root mean square error of approximation; SaPS: self-assessment practice scale; SEM: structural equation modeling; SRMR: standardized root mean square residual; TLI: Tucker-Lewis index. * p < .05., ** p < .01., *** p < .001; superscripts indicate sample size: a = 702, b = 706, c = 673.

The 4-factor structure of Self-assessment Practice Scale.
The multidimensional rating scale Rasch model results support the SaPS good psychometric properties. The step calibrations, or the measures of the transition points between adjacent categories, increased monotonically from −1.31, −0.85, −0.77, 0.62, to 2.31 logits, suggesting that the six-point response scale functioned well. All items demonstrated sufficient fit to the Rasch model as the values of infit and outfit MNSQ fell within the acceptable range between 0.75 and 1.33 (Wilson, 2005). This result indicates that all items were assessing each latent trait as hypothesized. The Rasch reliabilities for SEFM, SEFI, SIF, and SR were 0.84, 0.79, 0.75, and 0.87, respectively. Table 2 presents the item difficulties, standard errors, and item fit statistics.
Between-Network Construct Validity
Structural equation model where SaPS factors were entered as predictors of engagement outcomes.
Note. SEFM: seeking external feedback through monitoring; SEFI: seeking external feedback through inquiry; SIF: seeking internal feedback; SR: self-reflection.

Structural equation model with SaPS subscales predicting agentic, cognitive, and metacognitive learning strategies.
Discussion
This study aimed to validate the SaPS on an English language subject domain within a Philippine sample. Overall, the study highlights the SaPS’ reliability and validity for use in a specific subject domain. The CFA supported the hypothesized four-factor structure, and the multidimensional Rasch analysis showed that all 20 items of the SaPS demonstrated satisfactory item/subscale fit. These findings demonstrated within-network validity. As with previous validation studies of the SaPS (Yan, 2018a, 2020a; Yan, Brubacher, Boud, Powell et al., 2020), the four-factor structure had a better structural model fit. This supports the theoretical underpinnings of the SaPS as aligning with the cyclical process of self-assessment (Yan & Brown, 2017).
The relationships between the SaPS factors with the engagement outcomes demonstrated between-network validity. As self-assessment practices (Brown & Harris, 2013; Panadero, Jonsson et al., 2016; Yan, Brown, Lee, & Qiual, 2020) and student engagement (Fredricks et al., 2004; Zimmerman & Schunk, 2004) are both related with SRL, the direction of the relationship between the SaPS factors and engagement outcomes was expected to be positive. Indeed, the strength of the relationship between self-assessment and engagement was positive and moderate, indicating that the subscales are correlated but not so high as to be interpreted as tapping the same underlying construct.
Controlling for item-level measurement errors, SR predicted all engagement outcomes, especially cognitive engagement. The strong association of SR to cognitive engagement can be due to its practices similar with cognitive engagement (e.g., relating previous to new knowledge; connecting learning to personal experiences; Fredricks & McColskey, 2012; Wolters, 2004). SEFI was primarily associated with agentic engagement due to the similarity of students’ intentional desire to seek feedback (see Reeve & Tseng, 2011). Both SEFI and agentic engagement involves the act of asking questions to members of their learning environment (Reeve & Tseng, 2011; Yan, 2018a; Yan & Brown, 2017). The link between SIF and metacognitive engagement can be due to the importance of internal/psychological mechanisms relevant to self-assessment and engagement (Wolters, 2004; Yan, 2018a). How students feel emotionally in school can influence their engagement (Pekrun & Linnenbrink-Garcia, 2012) and their self-assessment practices (Yan, 2018a). SEFM was not associated with engagement outcomes. This lack of association can be due to the practical use of SEFM. Students may pay more attention to self-assessment practices while learning in school (e.g., inquiry, internal feedback, and self-reflection) and less attention to self-monitoring practices (e.g., checking content mastery by re-doing past exam papers, reviewing test results against answers in the textbook or website). SEFM involves practices which are more applicable before or after school. Hence, SEFM practices may be less salient when the latter self-assessment practices are practically more applicable. The statistically significant and theoretical association of the SaPS factors to the engagement outcomes further supports the scale’s validity.
The validation of the subject-specific SaPS has two potential practical implications. First, the SaPS’ scores can be used to identify student self-assessment practices that need further improvement. Interventions can then be designed to target specific self-assessment practices. Second, the effectiveness of scalable interventions like monitoring logs (Zimmerman & Kitsantas, 1997), self-assessment diaries (Yan, Chiu, & Ko, 2020), or self-assessment checklists (see Meusen-Beekman et al., 2016 for self-assessment strategies used in randomized controlled interventions) can be evaluated using the subject-specific SaPS. Consequently, teachers can encourage students to seek feedback and reflect on their English learning tasks.
Limitations and Directions for Future Research
Despite the notable strengths of the present study, we also note some limitations. First, the study used cross-sectional data; hence, future studies can explore test–retest reliability and predictive validity using a longitudinal approach. Second, our sample is only composed of students from Grades 7 to 10, limiting the possible generalizability of our findings. Extending the current sample to students from primary and higher education would be a meaningful future research endeavor. Finally, consistent with previous studies is the finding that shows a relatively lower reliability of SIF. Yan and Brown (2017) showed that internal feedback is more salient for subjects involving performance-related activities (e.g., sports, music, and arts) and could be less noticeable for academic or less performance-oriented subjects like English. Hence, future investigations can (a) refine the items under SIF and make them more suitable for less performance-oriented subjects, or (b) identify alternative ways, rather than self-reports, to capture students’ practices in SIF for self-assessment. From a substantive and theoretical standpoint, the exploration of social and psychological predictors of self-assessment practice are meaningful future research directions.
Conclusion
Schools from countries with an educational reform that only recently incorporated formative assessment strategies in the curriculum and geared at improving its students’ performance on international assessments (i.e. PISA) could benefit by using instruments that gage how students engage in self-assessment practices. Students’ use of self-assessment practices is an invaluable 21st-century skill that students can use for lifelong learning. This study supports the psychometric properties of the original SaPS and extends its applicability across cultures and in an English language subject domain.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
