Longitudinal Measurement Invariance of Likert-Type Learning Strategy Scales

Abstract

Whether or not learning strategies change during the course of higher education is an important topic in the Student Approaches to Learning field. However, there is a dearth of any empirical evaluations in the literature as to whether or not the instruments in this research domain measure equivalently over time. Therefore, this study details the procedure of longitudinal measurement invariance testing of self-report Likert-type scales, using the case of learning strategies. The sample consists of 245 University College students who filled out the Inventory of Learning Styles—Short Version three times. Using the WLSMV estimator to take into account the ordinal nature of the data, a series of models with progressively more stringent constraints were estimated using Mplus 6.1. The results indicate that longitudinal measurement invariance holds for all but two learning strategy scales. The implications for longitudinal analysis using scales with varying degrees of measurement invariance are discussed.

Keywords

measurement invariance longitudinal analysis ordinal data learning strategies

Educational researchers have long been interested in how students learn in higher education. One perspective on this issue is offered by the Students’ Approaches to Learning tradition (SAL), examining learners’ general preferences when it comes to learning (Biggs, Kember, & Leung, 2001). Researchers in the SAL field distinguish several dimensions of these preferences, such as processing and regulation strategies (Vermunt, 1996). The former are the cognitive activities that students apply when studying. The latter capture the different ways in which students regulate their learning. In assessing these learning strategies, self-report Likert-type questionnaires are mostly relied on (e.g., Study Process Questionnaire [SPQ]: Biggs et al., 2001; Inventory of Learning Styles [ILS]: Vermunt, 1996).

Research in the SAL field focuses increasingly on whether and how learning strategies change during the course of higher education (Vanthournout, Donche, Gijbels, & Van Petegem, 2011). Examining how these studies are undertaken statistically reveals a strong reliance on comparisons of manifest scale scores over time. For each student, the scores on the items for each scale are averaged at each wave. Subsequently, in studies with two measurement waves, paired-samples t-tests are relied on to compare the means. When more than two measurement waves are involved, repeated measures ANOVA are used.

However, such a straightforward comparison of manifest scale scores over time may be inappropriate when the measurement of the underlying constructs is not equivalent over time: the manifest mean (e.g., the manifest scale scores for the Memorizing scale) depends not only on the latent mean (e.g., being the true Memorizing score at each wave) but on the whole underlying measurement model (Steinmetz, Schmidt, Tina-Booth, Wieczorek, & Schwartz, 2009). Therefore, a longitudinal comparison always hinges on the assumption of longitudinal measurement invariance (Marsh & Grayson, 1994; Wu, Liu, Gadermann, & Zumbo, 2010). If the ruler does not measure equivalently over time, it is a daunting task to decide whether or not a change in the manifest scale scores is due to actual alterations in learning strategies over time (changes in the latent mean) or due to changes in the measurement over time (Vaillancourt, Brengden, Boivin, & Tremblay, 2003). A measurement can, for example, be age and treatment-sensitive: students having more experience in studying in higher education could interpret learning strategy items differently from novices. Thus, if the assumption of longitudinal measurement invariance is not confirmed, the validity of conclusions stemming from comparisons of manifest scale scores over time could be compromised (Shadish, Cook, & Campbell, 2002).

Nevertheless, an examination of the measurement model is generally neglected prior to the assessment of change over time (Li, Harmer, & Acock, 1996), perhaps due to a lack of familiarity with the assumption, or with the method of analysis required to verify this. Yet, “[. . .] whereas it may be reasonable to assume the invariance of these properties over short intervals, this assumption becomes more problematic as time intervals become longer” (Marsh & Grayson, 1994, p. 334). Recently, research into changes in learning strategies has increasingly allowed for such longer time intervals (e.g., Donche & Van Petegem, 2009). Thus, the assessment of whether or not the longitudinal measurement assumption holds true, is an evidential lacuna in the learning strategies literature which is becoming increasingly more problematic.

In the methodological literature, testing for measurement invariance across samples (e.g., gender or cross-culturally) is well described (Byrne, 2010). Moreover, though rare in the student approaches to learning field, numerous applications of multi-sample invariance testing can be found in other social science domains (e.g., Petscher & Huijun, 2008). A large number of these studies rely on data gathered using self-report Likert-type questionnaires. However, the ordinal nature of the data stemming from this is usually ignored by applying a maximum likelihood estimation procedure (Steinmetz et al., 2009). Studies showcasing measurement invariance testing with a distribution free estimation procedure are scarce. Next to this, measurement invariance testing in longitudinal designs differs from its multi-sample counterpart. Due to the repeated measurements, the responses at different time points are non-independent which, when neglected, can lead to model misspecification (Wu et al., 2010). Moreover, since the number of parameters to be estimated increases rapidly with the number of time points, examining the measurement invariance of all scales together is computationally difficult. Each scale is therefore investigated separately (Vandenberg & Lance, 2000). In sum, the requirements laid on the error terms and the testing procedure differs for longitudinal data compared with multi-sample designs.

In this study, we aim to illustrate longitudinal measurement invariance testing in the SAL domain. By detailing each step in verifying whether or not learning strategy scales measure equivalently over time, we offer a practical guide to longitudinal measurement invariance testing using ordinal data. Moreover, the consequences for the analysis of longitudinal change using scales with varying degrees of measurement invariance are discussed. Therefore, regardless of the research domain tackled here, this study may also be of interest to researchers in other social science fields investigating longitudinal change with self-report Likert-type questionnaires.

Method

Instrument and Sample

As a learning strategy questionnaire, we chose the Inventory of Learning Styles–Short Version (ILS-SV; Donche & Van Petegem, 2008). This instrument is based on Vermunt’s Inventory of Learning Styles (Vermunt, 1996), which was tested cross-culturally (Boyle, Duffy, & Dunleavy, 2003) and is frequently used in longitudinal research (Vanthournout et al., 2011). The ILS-SV has been validated for 1st-year University College students, demonstrating the dimensionality of the Vermunt theory, good reliabilities, and theoretically sound construct validity (Donche & Van Petegem, 2008).

The ILS-SV questionnaire measures learning strategies consisting of processing and regulation strategies (see Table 1). The former are mapped using four scales: Memorizing, Analysing, Critical processing, and Relating and structuring. Three scales map regulation strategies: External regulation, Self-regulation, and Lack of regulation. All items are scored on a 5-point Likert-type scale, ranging from 1 (I never or hardly ever do this), 2 (I sometimes do this), 3 (Neutral), 4 (I often do this) to 5 (I (almost) always do this).

Table 1.

Learning Strategies of the ILS-SV Questionnaire, Scales, Number of Items, Item Examples and Range of Scale Reliability

Scales	Items	Item example	Mean inter-item correlation
Processing strategies
Memorizing	4	I learn definitions by heart and as literally as possible.	.34-.39
Analysing	4	I study each course book chapter point by point and look into each piece separately.	.33-.36
Critical processing	4	I try to understand the interpretations of experts in a critical way.	.32-.39
Relating and structuring	4	I compare conclusions from different teaching modules with each other.	.35-.46
Regulation strategies
External regulation	5	I study according to the instructions given in the course material.	.20-.27
Self-regulation	4	I use other sources to complement study materials.	.28-.35
Lack of regulation	4	I confirm that I find it difficult to establish whether or not I have sufficiently mastered the course material.	.31-.38

One cohort of students entering a Flemish University College was followed during their 3 years of higher education. In March of the 1st academic year (from September to June), all 1st-year students were administered the ILS-SV during scheduled lecture slots. The same cohort had the questionnaire administered again in May of the 2nd and the 3rd year. Though students were not rewarded or given feedback, adequate response rates were obtained each time (73.6%, 67%, and 69.8%, respectively). Over the three waves, 245 students participated three times. Reliability analysis was conducted using the mean inter-item correlation, since Cronbach alpha values are very sensitive to the number of items (Palant, 2007). At each wave, all scales—containing each 4 to 5 items—met the .2 cut-off for good reliability (see Table 1).

Before detailing the measurement invariance testing procedure, we briefly explain the elements in play when assessing the change in learning strategy scales over time. A factor (e.g., the latent concept of Memorizing) is measured at three moments, each time using the same four items (see Figure 1; Y₁−Y₄).¹ The model attempts to predict an individual’s score on an item at a certain time (Y_ijt).

Figure 1.

Longitudinal measurement model

Y_{ijt} = τ_{jt} + λ_{jt} F_{it} + e_{ijt} where i = individual, j = item, t = time

In this prediction, three regression-like elements are key: the intercept (τ_jt), the factor loading (λ_jt) and the error (e_ijt) (Byrne, 2010; Wu et al., 2010; see Figure 1). The factor loading (λ_jt) represents the increase in Y by one increase in the factor (F_it). The intercept (τ_jt) can be understood as the value of Y when the latent variable (F_it) is zero. Therefore, it reflects the difficulty level or “[. . .] the ease in getting high manifest scores for a particular measured variable” (Marsh & Grayson, 1994, p. 336).

However, in our case, the items are ordinal. Therefore, there is not one intercept, but several thresholds. With a 5-point Likert-type scale, there are four thresholds (the number of scale points—1; Metha, Neale, & Flay, 2004). For example, τ_{3; time 2; threshold 1} expresses for Item 3 at Time 2 the difficulty level of scoring I sometimes do this (Likert point 2) compared to I never or hardly ever do this (Likert point 1) when the latent variable (F_it) is zero.

The third element in the equation is the measurement error (e_ijt). Due to the data’s longitudinal nature, it is plausible that errors pertaining to the same item (e.g., e₁₁, e₁₂ and e₁₃, see Figure 1) correlate over time (Vaillancourt et al., 2003). To prevent model misspecification, three item covariances are estimated per item (e.g., for Y₁: e₁₁-e₁₂, e₁₁-e₁₃ and e₁₂-e₁₃)² (Wu et al., 2010).

To assess change, the scores on the four items (Y’s) are usually averaged for each student per wave. Subsequently, manifest scale scores are compared over time. Conclusions are then drawn in terms of the underlying latent factors (F’s) (e.g., Memorizing decreases during higher education). Yet change in item scores over time (ΔY) can only be attributed to change in this latent factor (ΔF_it) when the other elements in the equation remain invariant over time (Byrne, 2010; Marsh & Grayson, 1994). However, due to the correlation between errors over time, and contrary to multi-group comparisons, error invariance is not expected in longitudinal measurement invariance testing (Wu et al., 2010). The longitudinal measurement invariance analysis of ordinal data thus consists of two elements: the invariance of factor loadings (λ’s) and of thresholds (τ’s).

Procedure for Longitudinal Measurement Invariance Testing

In testing whether the measurement invariance hypothesis holds, successively more constrained models are estimated for each scale (see Figure 2; Muthén & Muthén, 2010). Due to the data’s ordinal nature, the use of the maximum likelihood estimation procedure could not be justified. Therefore, a distribution-free estimation procedure, the weighted least squares means-variance (WLSMV) was employed (Metha et al., 2004; Muthén & Muthén, 2009) in Mplus 6.1.³

Figure 2.

Flowchart longitudinal measurement invariance testing

First, a baseline model is estimated, testing whether for each scale a unidimensional model holds at each measurement point (Vandenberg & Lance, 2000). To evaluate this, neither factor loadings nor thresholds are constrained to be equal over time, while the error covariances are included. Subsequently, an adequate fit is suggested by a CFI close to .95 (Hu & Bentler, 1999) and an RMSEA up to .08 (Byrne, 2010).

In the second model, for each item, the factor loadings (λ’s) are constrained to be equal over time (e.g., λ_{2 at time 1} = λ_{2 at time 2} = λ_{2 at time 3}; Wu et al., 2010). Subsequently, the hypothesis of invariance is evaluated by comparing the model fit of the more restricted invariant factor loadings model to the less restricted baseline model. To test this, the chi-square difference test (Δχ²) and the change in Comparative Fit Index criterion (ΔCFI) are relied on (Byrne, 2010; Vandenberg & Lance, 2000). For the former, the hypothesis of equal factor loadings over time is rejected when the chi-square difference test (Δχ²) has a probability lower than 0.05.⁴ For the latter, a decrease in CFI by 0.01 or more suggests that the invariance hypothesis should be rejected⁵ (Chueng & Rensvold, 2002). Failure to reject the hypothesis is interpreted as evidence that an increase of 1 in the factor score (F_it) procures the same increase (λ₂) in the item (Y₂) at each wave. If the hypothesis of equal factor loadings is rejected, this signifies that (at least) one of the items is more or less closely related to the underlying construct at one time rather than at the other (Cooke, Kosson, & Michie, 2001).

In this situation, additional models are warranted to identify the source(s) of the lack of equivalence. High values on the modification indices (MI) and the expected parameter change (EPC) suggest that the constraint on the factor loading needs to be freed (Muthén & Muthén, 2009). If such a partial factor loadings invariance model produces a non-significant loss of fit compared to the baseline model (p of Δχ²>.05; ΔCFI>-.01), all factor loadings can be assumed to be equal besides the one freely estimated. If the model fit is still worse in relation to the baseline model, the above procedure is repeated (see Figure 2).

Next, equality constraints on the thresholds (τ’s) are added. For each item, it is verified whether or not the difficulty level of going, for example, from “I often do this” to “I (almost) always do this,” remains constant over time (e.g., τ _{2 time 1; threshold 4} = τ _{2 time 2; threshold 4} = τ _{2 time 3; threshold 4}). A non-significant loss of fit of the invariant thresholds model compared to the (partially) invariant factor loadings model (p of Δχ²>.05; ΔCFI>–.01), suggests that the thresholds can be assumed to be equally difficult over time. Rejection of the equal thresholds hypothesis indicates that the difficulty level for (at least) one threshold varies over time (Metha et al., 2004). By freeing the constraint on the threshold causing most trouble according to the MI and EPC, a partial threshold invariance model is estimated.

How many factor loadings and thresholds can be freed without jeopardising future longitudinal analysis constitutes a debate in the literature (Byrne, 2010; Marsh & Grayson, 1994). Differences in factor loadings are, however, perceived to be more serious in relation to bias than differences in thresholds (Cooke et al., 2001). Therefore, we judge complete invariance of factor loadings as a necessary condition for longitudinal analysis. Concerning the number of unequal thresholds that are tolerable, a minimum of two items for which all thresholds are invariant is suggested (Steinmetz et al., 2009).

Results

Processing Strategies

The baseline model of the Memorizing scale showed adequate fit (see Table 2), indicating that the Memorizing scale is unidimensional at each measurement wave. In testing the invariance of the factor loadings, a non-significant loss of fit with respect to the unconstrained baseline model was obtained (Δχ² = 2,277, Δdf = 6, p = .89; ΔCFI = .008). The discrepancy between the invariant thresholds model and the invariant factor loadings model also satisfied the minimum criteria for invariance Δχ² = (13,378, Δdf = 22, p = .92; ΔCFI = .003). Complete longitudinal measurement invariance can thus be assumed for the Memorizing scale.

Table 2.

Results From Measurement Invariance Tests for Processing and Regulation Strategy Scales

	Model description	χ²	df	CFI	RMSEA	Δχ²	Δdf	p	ΔCFI
Memorizing	Baseline	54,670	39	.989	.040
	Invariant loadings	49,493	45	.997	.020	2,277	6	.893	.008
	Invariant thresholds	66,127	67	1.000	.000	13,378	22	.922	.003
Analysing	Baseline	82,231	39	.967	.067
	Invariant loadings	77,376	45	.975	.054	4,115	6	.661	.008
	Invariant thresholds	118,445	67	.960	.056	40,574	22	***	–.015
	Partial threshold invariance	111,011	66	.965	.053	32,320	21	.054	–.010
	Partial threshold invariance	105,520	65	.969	.050	25,599	20	.179	–.006
Critical processing	Baseline	47,445	39	.994	.030
	Invariant loadings	56,875	45	.992	.033	9,278	6	.158	–.002
	Invariant thresholds	82,600	67	.989	.031	22,637	22	.422	–.003
Relating and structuring	Baseline	64,925	39	.986	.052
	Invariant loadings	67,655	45	.988	.045	8,912	6	.179	.002
	Invariant thresholds	94,712	67	.985	.041	22,429	22	.435	–.003
External regulation	Baseline	126,532	72	.950	.056
	Invariant loadings	122,224	80	.961	.046	5,342	8	.72	.011
	Invariant thresholds	168,162	108	.945	.048	47,944	28	*	–.016
	Partial threshold invariance	158,145	107	.953	.044	34,424	27	.154	–.008
Self-regulation	Baseline	50,903	39	.991	.035
	Invariant loadings	74,066	45	.998	.014	1,809	6	.936	.007
	Invariant thresholds	67,851	65	.998	.013	19,990	20	.459	.000
Lack of regulation	Baseline	90,467	39	.969	.073
	Invariant loadings	73,890	45	.983	.051	2,685	6	.847	.014
	Invariant thresholds	103,456	67	.978	.047	26,346	22	.237	–.005

p < .05. **p < .01. ***p < .001.

For the Analysing scale, the baseline model also shows adequate model fit and constraining the factor loadings does not alter the model fit significantly (Δχ² = 4,115, Δdf = 6, p = .66; ΔCFI = .008). However, the invariant thresholds hypothesis is rejected (Δχ² = 40,574, Δdf = 22, p < .001; ΔCFI = –.015). The second threshold (going from I sometimes do this to neutral) of the item “I study each course book chapter point by point and look into each piece separately” is less difficult at the third wave (MI = 6.836, EPC = –.180). Relaxing the constraint on this threshold did not improve model fit sufficiently (Δχ² = 32,320, Δdf = 21, p = .054; ΔCFI = –.01). A re-examination of the modification indices pointed anew to the same item: the difficulty of answering I (almost) always do this is higher at the first wave (MI = 5.732, EPC = .180). Allowing this threshold to be freely estimated provided a model that was statistically indistinguishable from the equal factor loadings model (Δχ² = 25,599, Δdf = 20, p = .17; ΔCFI = –.006). The results for the Analysing scale thus suggested factor loadings invariance and the equality of all but two thresholds pertaining to the same item.

Concerning Critical processing, the baseline model suggests an adequate model fit. The hypothesis of invariant factor loadings was not rejected (Δχ² = 9,278, Δdf = 6, p = .16; ΔCFI = –.002) and constraining the thresholds did not decrease model fit (Δχ² = 22,637, Δdf = 22, p = .42; ΔCFI = –.003). The results for the Relating and structuring scale paint a similar picture. Both the factor loadings and the thresholds can be presumed to be equal over time (respectively, Δχ² = 8,912, Δdf = 6, p = .18; ΔCFI = .002 and Δχ² = 22,429, Δdf = 22, p = .44; ΔCFI = –.003). Consequently, for the scales Critical processing and Relating and structuring, the results indicate complete longitudinal invariance of factor loadings and thresholds.

Regulation Strategies

The fit of the baseline model of the External regulation strategy scale suggests that the unidimensionality of the scale holds over the three waves. Constraining factor loadings did not produce a significant worsening of fit (Δχ² = 5,342, Δdf = 8, p = .72; ΔCFI = .011), while the invariant thresholds model did (Δχ² = 47,944, Δdf = 28, p < .05; ΔCFI = –.016). The item “I study according to the instructions given in the course material” failed to reveal invariance at the second measurement wave for the fourth threshold (MI = 10.008, EPC = –.536). It was less difficult to answer I (almost) always do this in the 2nd year. Results for the External regulation scale thus suggest invariance over time of the factor loadings and all but one threshold.

Concerning the second scale, Self-regulation, the baseline model shows adequate fit and the hypothesis of factor loading invariance is not rejected (Δχ² = 1,809, Δdf = 6, p = .94; ΔCFI = .007). Constraining thresholds over time however, proves problematic for the item “I use other sources to complement study materials.” At both the second and the third wave, no students answered I (almost) always do this. At the first measurement wave, this answer is checked by less than 1% of the students. The invariant thresholds model (not estimating the two absent thresholds) fitted the data as well as the invariant factor loadings model (Δχ² = 19,990, Δdf = 20, p = .46; ΔCFI = .000), indicating that the measurement of Self-regulation can be assumed equivalent over time.

Lastly, for the Lack of regulation scale, the model fit for the baseline model suggests unidimensionality, and the discrepancy between the invariant factor loadings model and the baseline model satisfied the minimum criteria for invariance (Δχ² = 2,685, Δdf = 6, p = .85; ΔCFI = .014). Moreover, constraining the thresholds over time produces a non-significant loss of fit (Δχ² = 26,346, Δdf = 22, p = .24; ΔCFI = –.005). It is therefore concluded that the Lack of regulation scale measures equivalently over time.

Discussion

In the student approaches to learning (SAL) field, a growing number of studies have examined whether and how learning strategies evolve over the course of higher education. To assess this, comparisons of manifest scale scores over time by means of t-tests and repeated measures ANOVA are used. An often overlooked assumption of these techniques is that the ruler needs to measure equivalently at each wave. Taking the case of the learning strategies scales, the current study therefore illustrates the longitudinal measurement invariance testing procedure with ordinal Likert-type data.

The results confirm at least partial measurement invariance for the four processing and the three regulation scales of the ILS-SV. All factor loadings pertaining to the scales proved invariant over measurement waves as well as did at least all thresholds belonging to two items. This is a promising result since significance testing on found mean differences is only permitted if this minimal degree of partial invariance is confirmed (Steinmetz et al., 2009). However, which analytical technique is most adequate, depends on the degree of invariance of a scale.

For five learning strategies scales complete measurement invariance was confirmed, ensuring a comparable definition of the latent construct over time. In this situation, traditional statistical comparison procedures such as t-tests or repeated measures ANOVA on manifest scale scores are non-problematic (Steinmetz et al., 2009; Vandenberg & Lance, 2000). For the External regulation and Analysing scale, respectively, one and two thresholds failed to reveal equivalence over measurement moments. These variances can seriously hamper the comparison of manifest scale scores, since it is difficult to disentangle genuine changes in the underlying latent variable from nuisance due to shifts in the difficulty level (Steinmetz et al., 2009). Therefore, it is suggested that researchers refrain from traditional statistical procedures and explicitly model the small number of variations via a structural equation modelling procedure such as a multiple indicator latent growth model (Marsh & Grayson, 1994; Vandenberg & Lance, 2000).

Limitations and Future Studies

Certain limitations of the current study suggest additional avenues for future research. First, there are different techniques to assess measurement invariance. Here, the approach based on confirmatory factor analysis was used, while Fidalgo and Scalon (2010), for example, relied upon an IRT-based differential item functioning technique. It would be interesting to assess the impact of these different techniques for longitudinal measurement invariance testing. Second, when equivalence of the measurement model is established, the structural invariance can be assessed. For example, the evolution of the correlation between scales can be substantively relevant (Chueng & Rensvold, 2002). In the SAL field it is, for example, theoretically viable for scales to be differently related over time. Third, the results from this study cannot be generalized to other educational contexts, cultures, learning strategy questionnaires, or samples. Comparable to reliability, longitudinal measurement invariance should be assessed anew in each specific sample (Guttmannova, Szanyi, & Cali, 2008).

The limitations of the present study notwithstanding, the results provide apparent support for the need for longitudinal measurement equivalence testing. As was succinctly stated by Wu and colleagues (2010), “[. . .] establishing temporal measurement invariance is the prerequisite for analyzing change” (p. 126). We therefore hope to have provided a clear illustration of the longitudinal measurement invariance testing procedure in the case of ordinal data stemming from Likert-type questionnaires.

Footnotes

Acknowledgements

The authors would like to thank Huub van den Bergh (University of Utrecht) for his thorough comments on earlier drafts of this article. They would also like to express their gratitude to Linda Muthén for her advice and swift replies on data modeling issues.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported in this article was supported by a grant from the “Special Research Fund: New Research Initiatives” from the Research Board of the University of Antwerp. Opinions reflect those of the authors and do not necessarily reflect those of the granting agency.

Notes

References

Biggs

Kember

Leung

(2001). The revised two-factor Study Process Questionnaire: R-SPQ-2F. British Journal of Educational Psychology, 71(1), 133-149.

Boyle

Duffy

Dunleavy

(2003). Learning styles and academic outcome: The validity and utility of Vermunt’s Inventory of Learning Styles in a British higher education setting. British Journal of Educational Psychology, 73, 267-290.

Byrne

B. M.

(2010). Structural Equation Modeling with AMOS. New York, NY: Routledge.

Chueng

G. W.

Rensvold

R. B.

(2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233-255.

Cooke

D. J.

Kosson

D. S.

Michie

(2001). Psychopathy and ethnicity: Structural, item, and test generalizability of the Psychopathy Checklist–Revised (PCL-R) in Caucasian and African American participants. Psychological Assessment, 13, 531-542.

Donche

Van Petegem

(2008). The validity and reliability of the Short Inventory of Learning Patterns. In Cools

(Ed.), Style and cultural differences: How can organisations, regions and countries take advantage of style differences (pp. 49-59). Gent, Belgium: Vlerick Leuven Gent Management School.

Donche

Van Petegem

(2009). The development of learning patterns of student-teachers: A cross-sectional and longitudinal study. Higher Education, 57, 463-475.

Fidalgo

A. M.

Scalon

J. D.

(2010). Using generalized Mantel-Haenszel statistics to assess DIF among multiple groups. Journal of Psychoeducational Assessment, 28(1), 60-69.

Guttmannova

Szanyi

J. M.

Cali

P. W.

(2008). Internalizing and externalizing behavior problem scores: Cross-ethnic and longitudinal measurement invariance of the Behavior Problem Index. Educational and Psychological Measurement, 68, 676-694.

10.

Bentler

P. M.

(1999). Cut-off criteria for fit indixes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55.

11.

Harmer

Acock

(1996). The task and ego orientation in sport questionnaire: Construct equivalence and mean differences across gender. Research Quarterly for Exercise and Sport, 67, 228-238.

12.

Marsh

H. W.

Grayson

(1994). Longitudinal stability of latent means and individual differences: A unified approach. Structural Equation Modeling, 1, 317-359.

13.

Meade

A. W.

Johnson

E. C.

Braddy

P. W.

(2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93, 568-592.

14.

Metha

P. D.

Neale

M. C.

Flay

B. R.

(2004). Squeezing interval change from ordinal panel data: Latent growth curves with ordinal outcomes. Psychological Methods, 9, 301-333.

15.

Muthén

L. K.

Muthén

B. O.

(2009). Mplus User’s Guide (5th ed.). Los Angeles, CA: Author.

16.

Muthén

L. K.

Muthén

B. O.

(2010). Growth modeling with latent variable using Mplus: Advanced growth models, survival analysis and missing data (Mplus Short Courses). Los Angeles, CA: Author.

17.

Palant

(2007). SPSS survival manual: A step by step guide to data analysis using SPSS for Windows (3rd ed.). New York, NY: Open University Press.

18.

Petscher

Huijun

(2008). Measurement invariance of the Chinese Gifted Rating Scales: Teacher and parent forms. Journal of Psychoeducational Assessment, 26, 274-286.

19.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

20.

Steinmetz

Schmidt

Tina-Booth

Wieczorek

Schwartz

S. H.

(2009). Testing measurement invariance using multigroup CFA: Differences between educational groups in human values measurement. Quality & Quantity, 43, 599-616.

21.

Vaillancourt

Brengden

Boivin

Tremblay

R. E.

(2003). A longitudinal confirmatory factor analysis of indirect and physical aggression: Evidence of two factors over time. Child Development, 74, 1628-1638.

22.

Vandenberg

R. J.

Lance

C. E.

(2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4-69.

23.

Vanthournout

Donche

Gijbels

Van Petegem

(2011). Further understanding learning in higher education: A systematic review on longitudinal research using Vermunt’s learning pattern model. In Rayner

Cools

(Eds.), Style differences in cognition, learning and management: Theory, research and practice (pp. 78-96). London, UK: Routledge.

24.

Vermunt

(1996). Metacognitive, cognitive and affective aspects of learning styles and strategies: A phenomenographic analysis. Higher Education, 31(1), 25-50.

25.

Liu

Gadermann

A. M.

Zumbo

B. D.

(2010). Multiple-indicator multilevel growth model: A solution to multiple methodological challenges in longitudinal studies. Social Indicators Research, 97(2), 123-142.