Measurement Invariance in Cross-National Studies

Abstract

The increasing availability of large international surveys with repeated cross sections or panel data has led to an enormous increase in the opportunities for social researchers to perform cross-national and/or longitudinal comparisons. Prominent examples are the European Social Survey (ESS), the International Social Survey Program (ISSP), the European Value Study (EVS), the World Value Survey (WVS), the European Household Panel Study (EHPS), the Programme for International Student Assessment (PISA), and the Global Entrepreneurship Monitor (GEM). In all of these studies, scores on latent factors (such as human values, attitudes, opinions, or behavioral patterns) are included. These scores entail measurements of, for example, attitudes toward immigration or minorities, national identity, basic human values, gender roles, social and political trust, or well-being, just to name a few. In all of these studies, comparisons of countries at one or multiple points in time (i.e., a cross-sectional or longitudinal comparison) are possible.

If latent variable scores are to be meaningfully compared across countries and/or over time, the measurement structures underlying these latent factors should be stable, that is, “invariant” (Davidov et al. 2014; Davidov et al. 2018; Millsap 2011). Most studies until now have ignored the testing of invariance. Those studies that examined “measurement invariance” (MI) of measurement instruments have shown that the requirement of invariance across groups such as cultures, countries, or data collection modes is very hard to meet. In particular, strict forms of MI such as scalar invariance, which imposes identical factor loadings and indicator intercepts across the groups to be compared, often do not hold. Even partial scalar invariance, which only requires equality of the parameters of at least two items across groups, is seldom achieved. Indeed, failure to achieve at least partial metric and scalar invariance over countries and/or time points is often the case when using large-scale international comparative or longitudinal (survey) data across a large number of countries or time points (Marsh et al. 2018; Solokov 2018). When partial scalar invariance does not hold, it may indicate that individuals responded differently to survey items and that it may be impossible to compare latent factor means in a meaningful and valid way. If even partial metric invariance is not found, the comparison of regression coefficients is hindered.

Are the methods that are applied thus far too strict? Recent developments in the field of multiple-group alignment optimization (Asparouhov and Muthén 2013) using a component loss function (Jennrich 2006) and Bayesian statistics (Muthén and Asparouhov 2012; van der Schoot et al. 2013) provide new tools for assessing MI and imposing exact or more relaxed (approximate) forms of MI.

One of these developments is the alignment optimization procedure. The alignment optimization estimates a configural model with free loadings and intercepts across groups and fixed factor means and variances. The simplicity function is optimized at a few large noninvariant parameters and many approximately noninvariant parameters rather than many medium-sized noninvariant parameters. Simulation studies show that the alignment method works very well unless there is a majority of significant noninvariant parameters or small group sizes. The alignment method has the advantage of being easy to use and suitable for the comparison of many groups.

Another development is the estimation of approximate rather than exact MI. By using Bayesian structural equation modeling (Lee 2007; Muthén and Asparouhov 2012), the strictness of ideal forms of MI may be relaxed. In particular, exact zero constraints on the cross-group differences between all relevant measurement parameters (e.g., factor loadings and/or indicator intercepts) are substituted by “approximate” zero constraints (Muthén and Asparouhov 2012, 2013; van der Schoot et al. 2013). Instead of forcing intercepts to be exactly equal across groups, a substantive prior distribution around zero is used to bring the parameters closer to one another while allowing for some “wiggle room” (i.e., some deviation from zero is allowed). Because approximate MI was introduced rather recently, a substantial amount of applied research and Monte Carlo studies are needed to explore its possibilities and limitations in the application to survey data. This is both relevant for the choice of adequate fit measures (Asparouhov and Muthén 2017; Hoijtink and van der Schoot 2018) and the choice of the a priori values for the parameters.

Finally, research has begun to focus on the systematic investigation of the noninvariance pattern of factor loadings and intercepts from a sociological perspective rather than only as a methodological prerequisite for meaningful cross-country comparisons of latent means and regression coefficients. From the perspective of cross-cultural research, it might be very instructive to study which contextual factors (e.g., ethnic or religious diversity in a country, policies, media coverage, or economic conditions) are the reasons as to why the loadings of items or intercepts are not similar (Davidov et al. 2012; Jak, Oort, and Dolan 2013).

This special issue is dedicated to exploring developments in the field of MI testing. The four studies in the volume present newer approaches for addressing MI testing, and they demonstrate their application in testing for MI in cross-national survey data. Below, we briefly present each of the studies.

The first study, “Recent methods for the study of measurement invariance with many groups: Alignment and random effects,” by Bengt Muthén and Tihomir Asparouhov, reviews and compares recently proposed factor analytic and item response theory approaches to the study of invariance across groups. The first is the alignment procedure and the second is two-level modeling with random item parameters. While the former considers the groups as fixed, the latter samples the groups from a population. Both approaches are proposed as a means to deal with the complex situation when many groups are included in the analysis. The authors illustrate their use on a large sample with 26 groups and about 50,000 respondents from the ESS. A simulation study compares the two models, and the final section lists the pros and cons of each method.

The second study, “Testing for approximate MI of human values in the ESS,” by Jan Cieciuch, Eldad Davidov, René Algesheimer, and Peter Schmidt, conducts an approximate MI test of human values across ESS countries. It uses the 21-Item Portrait Value Questionnaire and a sample of about 275,000 respondents from 15 countries participating in all six ESS rounds. It compares the findings to those using the exact MI test published in previous studies. Whereas previous studies could not establish exact scalar invariance across countries, this study established cross-country approximate MI in each ESS round for two higher-order values: openness to change and self-enhancement. In the case of the two other higher-order values, self-transcendence and conservation, approximate MI was established across a subset of countries.

The third study, “Measurement invariance in comparing attitudes towards immigrants among youth across Europe in 1999 and 2009: The alignment method applied to International Association for the Evaluation of Educational Achievement (IEA) CIVED and ICCS,” by Ingrid Munck, Carolyn Barber, and Judith Torney-Purta, applies the alignment method to the analysis of adolescents’ support of immigrants’ rights. It uses a pooled data set from the 1999 IEA Civic Education Study and the 2009 IEA International Civics and Citizenship Education Study. The data include 92 groups (country by cohort by gender). It shows that students’ attitudes toward immigrants’ rights in 2009 were more positive than in 1999, and it compares the groups according to their factor means, a comparison that is easy to perform when using the alignment procedure, even when there are many groups in the analysis.

Finally, the fourth study, “Explaining measurement nonequivalence using multilevel structural equation modeling: The case of attitudes toward citizenship rights,” by Eldad Davidov, Hermann Dülmer, Jan Cieciuch, Anabel Kuntz, Daniel Seddig, and Peter Schmidt, addresses the problem faced by many researchers testing for MI in survey data, that is, that many scales are noninvariant across groups such as countries or cultures. Instead, it demonstrates how nonequivalence may serve as a useful source of information as to why equivalence is not obtained. Multilevel structural equation modeling is used to explain the absence of MI by introducing a contextual variable, the percentage of foreigners in the country, to explain items’ nonequivalence. The study uses ISSP data from the national identity module (2003) on attitudes toward granting citizenship rights to immigrants. Thus, the study shows that the method does not necessarily rectify nonequivalence, but it can help to explain why it is absent. We hope the special issue and its contributions will help researchers in conducting meaningful sociological comparative research, in their endeavors to examine MI across different groups (e.g., countries, cultures, or time points), and in their efforts to understand sources of differences in measurements across groups.

Footnotes

Acknowledgments

The authors would like to thank Christopher Winship for his enthusiasm on the topic, willingness to dedicate a special issue to it, and continuous support. The authors thank the reviewers who provided insightful comments on the studies in the volume as well as to Genevieve Butler for her continuous support in the production of this volume. Two of the guest editors are indebted to the co-guest editor, Bengt Muthén, as well as to Tihomir Asparouhov and Linda Muthén for developing and integrating the large selection of methods to test for MI in the software package Mplus. Eldad Davidov would like to thank the University of Zurich Research Priority Program for its support during work on this special issue. Peter Schmidt would like to thank the Humboldt fellowship of the Polish Foundation for basic research. The guest editors are also thankful for the suggestions provided by, cooperation with, support of, and fruitful and exciting discussions on the topic with many colleagues and fellow researchers including René Algesheimer, Constanze Beierlein, Jaak Billiet, Michael Braun, Jan Cieciuch, Alain de Beuckelaer, Hermann Dülmer, Remco Feskens, Joop Hox, Timothy Johnson, Anabel Kuntz, Katharina Meitinger, Bart Meuleman, Ingrid Munck, Daniel Oberski, Artur Pokropek, Rebeca Raijman, Jost Reinecke, Maxim Rudnev, Willem Saris, Elmar Schlüter, Shalom Schwartz, Daniel Seddig, Moshe Semyonov, Holger Steinmetz, Rens van de Schoot, Fons van de Vijver, William van der Veld, and Florian Zercher, just to name a few. We would also like to thank the participants of the recent conferences where we presented these ideas, including the European Survey Research Association (ESRA), the 3MC conference, and the Comparative Survey Design and Implementation (CSDI) Working Group organized by Peter Mohler and by the late Janet Harkness, among others. Finally, the guest editors would like to thank Lisa Trierweiler for the English proof of much of the material included in this volume.

References

Asparouhov

Tihomir

Muthén

Bengt

. 2013. “Multiple Group Factor Analysis Alignment.” Structural Equation Modeling 21:495–508. doi: 10.1080/10705511.2014.919210.

Asparouhov

Tihomir

Muthén

Bengt

. 2017, 4 27. “Prior-Posterior Predictive P-Values.” Mplus Web Notes: No. 22, Version 2. Retrieved May 26, 2017 (http://www.statmodel.com/examples/webnotes/webnote22.pdf).

Davidov

Eldad

Cieciuch

Jan

Meuleman

Bart

Schmidt

Peter

Billiet

Jaak

. 2014. “Measurement Equivalence in Cross-National Research.” Annual Review of Sociology 40:55–75. doi: 10.1146/annurev-soc-071913-043137.

Davidov

Eldad

Dülmer

Hermann

Schlüter

Elmar

Schmidt

Peter

Meuleman

Bart

. 2012. “Using a Multilevel Structural Equation Modeling Approach to Explain Cross-Cultural Measurement Noninvariance.” Journal of Cross-Cultural Psychology 43(4):558–75. doi: 10.1177/0022022112438397.

Davidov

Eldad

Schmidt

Peter

Billiet

Jaak

Meuleman

Bart

. 2018. Cross-Cultural Analysis: Methods and Applications. 2nd ed. New York: Routledge.

Gibson

Rachel K.

Haller

Max

Hadler

Markus

Hoellinger

Franz

Dimova

Lilia

Pyman

Heather

Pammett

Jon H.

(2012). International Social Survey Programme: National Identity II - ISSP 2003.

Hoijtink

Herbert

Schoot

Rens van der

. 2018. “Testing Small Variance Priors Using Prior-Posterior Predictive P-Values.” Psychological Methods 23:561–569. doi: 10.1037/met0000131.

Jak

Suzanne

Oort

Frans J.

Dolan

Conor V.

. 2013. “A Test for Cluster Bias: Detecting Violations of Measurement Invariance across Clusters in Multilevel Data.” Structural Equation Modeling 20:265–82. doi: 10.1080/10705511.2013.769392.

Jennrich

Robert I.

2006. “Rotation to Simple Loadings Using Component Loss Functions: The Oblique Case.” Psychometrika 71:173–91. doi: 10.1007/s11336-003-1136-B.

10.

Lee

Sik-Yum

. 2007. Structural Equation Modeling: A Bayesian Approach. Chichester, England: Wiley.

11.

Marsh

Herbert W.

Guo

Jiesi

Parker

Philip D.

Nagengast

Benjamin

Asparouhov

Tihomir

Muthén

Bengt

Dicke

Theresa

. 2018. “What to Do When Scalar Invariance Fails: The Extended Alignment Method for Multi-Group Factor Analysis Comparison of Latent Means across Many Groups.” Psychological Methods 23:524–545. doi: 10.1037/met0000113.

12.

Millsap

Roger E.

2011. Statistical Approaches to Measurement Invariance. New York: Taylor and FrancisGroup.

13.

Muthén

Bengt

Asparouhov

Tihomir

. 2012. “Bayesian SEM: A More Flexible Representation of Substantive Theory.” Psychological Methods 17:313–35. doi: 10.1037/a0026802.

14.

Muthén

Bengt

Asparouhov

Tihomir

. 2013, 1 11. “BSEM Measurement Invariance Analysis.” Mplus Web Notes: No. 17, Retrieved May 26, 2017 (http://www.statmodel.com/examples/webnotes/webnote17.pdf).

15.

Solokov

Boris

. 2018. “The Index of Emancipative Values: Measurement Model Misspecifications.” American Political Science Review 112(2):395–408. doi: 10.1017/S0003055417000624

16.

van der Schoot

Rens

Kluytmans

Anouck

Tummers

Lars G.

Lugtig

Peter

Hox

Joop

Muthén

Bengt

. 2013. “Facing Off with Scylla and Charybdis: A Comparison of Scalar, Partial, and the Novel Possibility of Approximate Measurement Invariance.” Frontiers in Psychology 4(770):1–15. doi: 10.3389/fpsyg.2013.00770.