Measurement Invariance Across Gender for the Metacognitive Self-Regulation Revised Scale

Abstract

The Metacognitive Self-Regulation scale (MSR) was recently improved after subjecting the scale to a comprehensive reanalysis and replacing it with the Metacognitive Self-Regulation Revised scale (MSR-R). However, up to this point, researchers have made no attempts to determine if the MSR or MSR-R performs equivalently for males and females. The first goal of the current study was to examine the MSR-R measurement model for invariance across groups. Second, we examined structural invariance by regressing grade performance on the MSR-R latent factor. The results indicated support for invariance across groups for the MSR-R measurement model, but revealed predictive validity issues when structural invariance was found to be untenable.

Keywords

Metacognitive Self-Regulation Revised measurement invariance metacognition self-regulation grade performance

In recent decades, there has been a movement to develop measures of metacognition that fit within the framework of self-regulated learning. In 1979, Flavell originally conceptualized metacognition as a purely cognitive method of examining one’s own mental processes. Since then, educational researchers have extended the definition of metacognition to include the interaction between one’s cognitive processes and behaviors aimed at completing a learning task (Dinsmore, Alexander, & Loughlin, 2008). For example, students may optimize their learning by monitoring progress during a learning task and respond to task-related problems by adapting their learning approach to better address task demands (Brown, Bransford, Ferrara, & Campione, 1982).

In 1991, Pintrich, Smith, Garcia, and McKeachie created the Metacognitive Self-Regulation scale (MSR) to efficiently measure metacognition from a self-regulated learning perspective. The scale featured 12 items specified on a one-factor congeneric model aimed at measuring students’ decisions to select or adapt a cognitive strategy during a learning task. However, in a recent reanalysis of the MSR, a confirmatory factor analysis of the 12 item, one-factor model resulted in very poor model fit (Tock & Moxley, 2016). An examination of the individual items revealed three items had a negative impact on reliability for the one-factor model, and correlated with outcome measures differently than the remaining nine items. After removal of these three items, Tock and Moxley (2016) specified a revised nine-item, one-factor congeneric model of the MSR scale (Metacognitive Self-Regulation Revised [MSR-R]). The results of the MSR-R specification indicated improved reliability and clearer criterion validity with outcome measures while maintaining the original theoretical construct of metacognitive self-regulation.

Despite the improvement to the MSR with recent revisions, the assumption of validity for a scale remains tenuous until there is evidence to demonstrate the model performs equivalently across groups (Sass, 2011). Although examinations of the MSR are limited, a recent study indicated no mean gender differences on the MSR score, but a statistically significant correlation between the MSR and grade performance for males but not for females (Bembenutty, 2007). The goals of the current study were to follow best practice in measurement invariance (MI) to examine the revised version of the MSR (MSR-R) across gender. The specific procedure performed was aimed at testing the assumption of MI for the MSR-R model, the equality of latent factor means for females and males, and structural invariance for the MSR-R regressed on grade performance across groups.

Method

During the spring and summer 2012 semesters, 356 participants were recruited and given course research credits in exchange for completing a brief questionnaire. Participants were removed if grade performance could not be acquired (five participants) or if they failed to properly respond to all items on the questionnaire (i.e., left answers blank; four participants). The final sample included 347 participants (111 males, 236 females) ranging in age from 17 to 27 (M = 18.97, SD = 1.44).

The MSR includes 12 items involving the control and regulation aspects of metacognition (Pintrich et al., 1991). However, only nine of the items were used in accordance with the scale revisions made by Tock and Moxley (2016) for the MSR-R.¹ Each item is scored on a 7-point Likert-type scale (1 = not at all true of me, 7 = very true of me) regarding the degree to which students engage in the use of metacognitive strategies. Although the item responses for the MSR-R are categorical, the seven response options are treated as continuous variables (Pintrich et al., 1991). The MSR-R was validated with its statistically significant correlation with total study time and the Time and Study Environment subscale (Tock & Moxley, 2016). For the current sample, data from the MSR-R had a coefficient alpha of .78.

Registered participants met in a computer lab to complete a 30-min online questionnaire that included a section both for the MSR and for demographic information. Official grade transcripts were obtained via the University Registrar for data regarding cumulative grade performance average (cGPA).

Statistical Analysis

We estimated all models in the current study with Mplus software (Muthén & Muthén, 2008), using a maximum likelihood estimator for continuous, normally distributed variables.

Model specification

We completed the procedure for MI in two stages. The first stage was performed to demonstrate acceptable model fit for separate male and female models of the MSR-R. Given acceptable fit, MI for multiple group confirmatory factor analysis (MGCFA) models could be tested in the second stage. In the second stage, we tested MI for the measurement model of MSR-R, followed by MI for the MSR-R model with grade performance regressed on the MSR-R latent factor. For both stages, a nine-item, one-factor congeneric model was specified for the MSR-R.

During the second stage, we developed MGCFA models for the MSR-R and MSR-R with grades, using Vandenberg and Lance’s (2000) MI procedure. Their procedure progresses through a pre-determined series of CFA models to determine if a model remains invariant between groups as additional model parameters are constrained to equality. Constraints are imposed until complete invariance is demonstrated or the hypothesis that the model is invariant becomes untenable. Models that reveal statistically significant decreases in model fit by going to the more constrained model indicate key discrepancies between the groups being compared. However, partial MI may be tested by freeing a limited number of constrained parameters.

The first model in Vandenberg’s procedure is a baseline model that is a reproduction of the originally specified model with no additional constraints (configural invariance).² If configural invariance is found tenable, additional constraints are imposed on each of the model parameters in the following ordered steps: factor loadings (metric invariance), intercepts (scalar invariance), error variances (invariant uniqueness), and factor variances (invariant factor variance). If the model remains invariant through each step, tests of the equivalence of factor means (invariant factor means) can be performed. Models with structural parameters require an additional step where path coefficients are constrained to equality (invariant path coefficients).

Model fit criteria

Decisions on model fit for all models were based on non-significant χ² and Hu and Bentler’s (1999) suggestions for model fit. While a non-significant χ² is indicative of a good fitting model, this standard is rarely achieved in practice (as sample size becomes large), and multiple additional criteria have been established to evaluate model fit (Bollen, 1989). Hu and Bentler established criteria for good fit (standardized root mean square residual [SRMR] < .08; root mean square error of approximation [RMSEA] < .06; comparative fit index [CFI] > .95), in addition to suggestions for adequate fit (CFI > .90) and mediocre fit (SRMR < .10; RMSEA < .08). In the current study, model fit will be designated as meeting good, adequate, or mediocre fit for each of the respective fit indices.

To determine the tenability of MI for each model, the overall model fit indices were examined to determine if each model met criteria for acceptable fit. In addition, we examined change in fit from the more free to the more constrained model to determine if the imposed constraints indicated a decrease in model fit. Criteria for practical (ΔCFI ≤ −.02) and statistical significant decreases (indicated by a statistically significant Δχ²) in model fit were utilized (Vandenberg & Lance, 2000).

Results

Descriptive Statistics for MSR-R and cGPA by Gender

Descriptive statistics for the MSR-R and cGPA are listed in Table 1. Each of the MSR-R (mean scale item and individual items) and cGPA variables had values considerably lower than the standards for substantial departures from normality for skewness (≤ ±2) and kurtosis (≤ ±7) identified by Curran, West, and Finch (1996).

Table 1.

Descriptive Statistics for MSR-R and cGPA for Females and Males.

	Female		Male		Combined
	M	SD	M	SD	M	SD
cGPA	3.25	0.66	3.22	0.65	3.24	0.66
MSR-R	4.62	0.97	4.56	1.05	4.60	0.99
Individual MSR-R items
41	5.66	1.20	5.56	1.43	5.63	1.28
44	4.41	1.63	4.23	1.51	4.35	1.59
54	4.08	1.93	3.79	1.81	4.00	1.89
55	4.25	1.68	4.47	1.67	4.32	1.68
56	4.34	1.79	4.54	1.72	4.40	1.77
61	4.11*	1.66	4.56*	1.56	4.25	1.64
76	5.30	1.31	5.14	1.38	5.25	1.33
78	4.87*	1.68	4.37*	1.73	4.71	1.71
79	4.54	1.84	4.36	1.77	4.48	1.82

Note. MSR-R = Metacognitive Self-Regulation Revised; cGPA = cumulative grade performance average.

Significant difference between female and male mean (p < .05).

Single Group Male and Female Factor Analytic Models for MSR-R

Separate nine-item, one-factor, congeneric models were specified to determine the adequacy of model fit for males and females on the MSR-R.³ The results of the single group models are listed in Table 2 and indicated a better model fit for males on the MSR-R. Each of the fit statistics for the male model met Hu and Bentler’s standards for a good model fit in addition to a result of a non-significant χ². For the female model, SRMR met the criteria for good fit, CFI met the criteria for adequate fit, and RMSEA met the criteria for mediocre fit. Given the evidence for each model meeting the established criteria, both models met standards to procced to MGCFA testing.

Table 2.

Measurement Invariance With Single Group and Multi-Group Confirmatory Factor Analysis for MSR-R.

Model	df	χ²	SRMR	RMSEA	CFI	Δdf	Δχ²	ΔCFI
Single group factor analysis
Male model	27	30.218	.044	.033	.986	—	—	—
Female model	27	54.408**	.045	.066	.923	—	—	—
Multigroup confirmatory factor analysis
1. Configural invariance	54	84.626**	.045	.057	.948	—	—	—
1 vs. 2	—	—	—	—	—	8	4.537	.006
2. Metric invariance	62	89.163*	.053	.050	.954	—	—	—
2 vs. 3a	—	—	—	—	—	8	26.705***	−.031^a
2 vs. 3b	—	—	—	—	—	6	9.545	−.006
3a. Scalar invariance	70	115.868***	.065	.061	.923	—	—	—
3b. PSI	68	98.708**	.054	.051	.948	—	—	—
3b vs. 4	—	—	—	—	—	9	15.875	−.011
4. Invariant uniqueness with PSI	77	114.583**	.081	.053	.937	—	—	—
4 vs. 5	—	—	—	—	—	1	1.081	−.001
5. Invariant factor variances with PSI	78	115.664**	.086	.053	.936	—	—	—
5 vs. 6	—	—	—	—	—	1	0.390	.001
6. Invariant factor means with PSI	79	116.054	.088	.054	.937	—	—	—

Note. MSR-R = Metacognitive Self-Regulation Revised; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation; CFI = comparative fit index; Δdf = change in df; Δχ² = change in chi squared; ΔCFI = change in CFI; PSI = partial scalar invariance.

ΔCFI ≤ −.02.

p < .05. **p < .01. ***p < .001.

MI for MSR-R

The first goal of MGCFA testing was to establish MI for the MSR-R measurement model. The results are listed in Table 2 and indicated partial MI. Configural and metric invariance were both found to be tenable with model fit that met established criteria and non-significant decreases in practical and statistical model fit. Tests of scalar invariance resulted in a statistically and practically significant decrease in model fit, Δχ²(8) = 26.705, p < .001; ΔCFI = −.031, and forced attempts to free individual intercepts that differed between groups. After inspecting the single group models and descriptive statistics, two items with large discrepancies between males and females were identified, with males scoring statistically significantly higher on the mean of Item 61, and females scoring statistically significantly higher on the mean of Item 78 (Table 1). These parameters were freed, and this resulted in a tenable model for partial scalar invariance (PSI).⁴ Models of invariant uniqueness with PSI and factor variance invariance with PSI were also tenable. Finally, factor mean invariance with PSI was tenable.

MI for MSR-R With Grades

With MI established on the measurement model, the second goal was to test structural invariance when a path to grade performance was regressed onto the one-factor congeneric MSR-R model. The first step was to establish invariant factor variance with PSI as a baseline model with cGPA regressed on the MSR-R latent factor. The fit indices for this model, χ²(94) = 146.816, p < .001; SRMR = .080; RMSEA = .057; CFI = .914, met established fit criteria. Next, the invariant path coefficient model was examined, χ²(95) = 152.430, p < .001; SRMR = .088; RMSEA = .059; Tucker–Lewis index [TLI] = .912; CFI = .907, and indicated a statistically significant decrease in model fit, Δχ²(1) = 5.61, p < .05, but not a significant practical decrease in model fit, ΔCFI = −.007. Although the statistical and practical change in fit indicated different outcomes, Sass (2011) suggested additional criteria for making decisions about MI. First, a researcher should be reluctant to declare MI if the Δχ² is statistically significant, even if practical change indices point to other conclusions. Second, the magnitude of difference for parameter estimates can be examined. An inspection of the unconstrained path coefficients in the factor variance invariance model revealed a statistically significant path between MSR-R and cGPA for males (β = .34, SE = .09, p < .001) but not for females (β = .05, SE = .07, p = .51). The preponderance of evidence indicated non-invariance for the structural parameter of the MSR-R across groups.

Discussion

The measurement portion of the MSR-R model appears to measure the same construct for males and females. However, evidence for a lack of support for structural invariance was found, with the scale being clearly related to performance for males but unrelated to performance for females. This result should be strongly considered by researchers who are concerned with the validity of the MSR-R scale.

In the case where a measure is used as selection criteria, this bias could have major implications in the validity of the measure. For example, among college admission tests, predictive bias has been found whereby women are consistently underselected when admissions decisions are based on the SAT (Leonard & Jiang, 1999). For the SAT, the demonstration of predictive invariance across gender is crucial. This type of bias has lead researchers to question the use of standardized tests across gender and call for test reform. Similarly, the MSR-R should not be used as a selection criteria given the predictive bias found in the current study.

Given the MSR-R is not typically used as selection criteria, the predictive bias may not be as immediately concerning as for measures of standardaized tests. According to Millsap (2011), predictive bias can arise from sources unrelated to MI and may or may not be a major threat to the validity of the measure. However, given the finding of MI in the MSR-R latent construct, it is now important for researchers to focus on understanding the mechanisms that underlie the dissociation between the relationship of the MSR-R and grades between males and females. For instance, the extent to which grades are measurement invariant across groups has received little attention, and violations of MI in grades could cause predictive bias (Millsap, 2011). Future investigations should examine moderators of the predictive bias identified by this study by using more online measures (i.e., verbal protocols) to validate the findings of the self-report MSR-R scale.

Footnotes

Acknowledgements

The authors give special thanks to Kristen Gomez and Sarah Ahmed for reviewing and editing this article.

Authors’ Note

The dataset examined for the current study is posted on the Open Science Framework website (osf.io) and can be found listed under the first author’s name and the title of the current article. The dataset was used for a previous publication regarding the Metacognitive Self-Regulation Revised scale (MSR-R) scale (Tock & Moxley, 2016, included in references). The MSR-R scale was established in the previous article. The present article examined the measurement invariance of the scale. The arguments do not overlap. No attempts were made to examine measurement invariance previously.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

Bembenutty

(2007). Self-regulation of learning and academic delay of gratification: Gender and ethnic differences among college students. Journal of Advanced Academics, 18, 586-616. doi:10.4219/jaa-2007-553

Bollen

K. A.

(1989). Structural equations with latent variables. New York, NY: John Wiley.

Brown

A. L.

Bransford

J. D.

Ferrara

R. A.

Campione

J. C.

(1982). Learning, remembering and understanding (Technical Report No. 244). Bethesda, MD: National Institute of Child Health and Human Development.

Curran

P. J.

West

S. G.

Finch

J. F.

(1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29. doi:10.1037/1082-989X.1.1.16

Dinsmore

D. L.

Alexander

P. A.

Loughlin

S. M.

(2008). Focusing the conceptual lens on metacognition, self-regulation, and self-regulated learning. Educational Psychology Review, 20, 391-409. doi:10.1007/s10648-008-9083-6

Flavell

J. H.

(1979). Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American Psychologist, 34, 906-911. doi:10.1037/0003-066X.34.10.906

L. T.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1-55. doi:10.1080/10705519909540118

Leonard

D. K.

Jiang

(1999). Gender bias and the college predictions of the SATs: A cry of despair. Research in Higher Education, 40, 375-407. doi:10.1023/A:1018759308259

Millsap

R. E.

(2011). Statistical approaches to measurement invariance. New York, NY: Routledge.

10.

Muthén

L. K.

Muthén

B. O.

(2008). Mplus (Version 5.1) [Computer software]. Los Angeles, CA: Author.

11.

Pintrich

P. R.

Smith

D. A. F.

Garcia

McKeachie

W. J.

(1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). Ann Arbor, MI: National Center for Research to Improve Postsecondary Teaching and Learning.

12.

Sass

(2011). Testing measurement invariance and comparing latent factor means within a confirmatory factor analysis framework. Journal of Psychoeducational Assessment, 29, 347-363. doi:10.1177/0734282911406661

13.

Tock

J. L.

Moxley

J. H.

(2016). A comprehensive reanalysis of the metacognitive self-regulation scale from the MSLQ. Metacognition and Learning, 12, 79-111. doi:10.1007/s11409-016-9161

14.

Vandenberg

R. J.

Lance

C. E.

(2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-70. doi:10.1177/109442810031002