The Necessity of Testing Measurement Invariance in Cross-Cultural Research: Potential Bias in Cross-Cultural Comparisons With Individualism

Abstract

Individualism and collectivism are some of the most widely applied concepts in cultural and cross-cultural research. They are commonly applied by scholars who use arithmetic means or sum indexes of items on a scale to examine the potential similarities and differences in samples from various countries. For many reasons, cross-cultural research implicates numerous methodological and statistical pitfalls. The aim of this article is to summarize some of those pitfalls, particularly the problem of measurement non-invariance, which stems from the different understandings of questionnaire items or even different character of constructs between countries. This potential bias is reduced by latent mean comparisons performed with Multigroup Confirmatory Factor Analysis and the Measurement Invariance procedure within a Structural Equation Modeling framework. These procedures have been neglected by many researchers in the field of cross-cultural psychology, however. In this article, we compare ‘traditional’ (comparison of arithmetic means) and ‘invariant’ (latent mean comparison) approaches and provide necessary R source codes for replications of measurement invariance and latent mean comparisons within other scales. Both approaches are demonstrated with real data gathered on an Independent and Interdependent Self-Scale from 1386 participants across six countries (Slovenia, Croatia, Bosnia and Herzegovina, Serbia, Macedonia and Albania). Our results revealed considerable differences between the ‘invariant’ and ‘traditional’ approaches, especially in post-hoc analyses. Since ‘invariant’ results can be considered less biased, this finding suggests that the currently prevalent method of comparing the arithmetic means of cross-cultural scales of individualism and collectivism can potentially cause biased results.

Keywords

Cross-cultural research measurement invariance equivalence individualism and collectivism scalar invariance

Introduction

Cultural context has a crucial and inherent effect on human psychological processes shaping their perception, cognition, emotion and behaviour. Many influential cases of research have repeatedly demonstrated the effect of culture, often leading to questioning (culturally) universalistic psychological theories (for review, see Nisbett, 2003; Berry et al., 2011; Keith, 2013; Matsumoto & Hwang, 2019; Cohen & Kitayama, 2019; Shiraev & Levy, 2020). From this perspective, it is evident that credible and valid psychological science should take into consideration the influence of culture on essential psychological processes and other phenomena which stem from them. Research which ignores or overlooks the effects of cultural environments can therefore be less than rigorous, can be biased, can offer limited scope for interpretation and can suffer in its lack of interpretive power. Consequently, some scholars even claim that the methodological and statistical principles (including the formulation of theories and hypotheses) used in the cross-cultural field must be incorporated and integrated into the underlying research aspects of mainstream psychological science (e.g. Cheung, 2012; Cohen, 2009; Hardin et al., 2014; Brady et al., 2018; Henrich et al., 2010a; Kashima, 2015; Matsumoto, 2001; Sternberg, 2014; Wang, 2016).

This claim may be perceived by other scholars as bold but appears to be justified in respect to the history of psychological science, which was, in some parts of its history, riddled with stereotypes and racial and ethnic biases (i.e. ‘scientific racism’, see Thomas & Sillen, 1972; Leong et al., 2012; Richards, 2012), with a tendency towards being ethnocentric and extremely universalistic (for review, see Jahoda & Krewer, 1997; Triandis, 2007; Keith, 2013; Klineberg, 1980; Lonner, 2013). Consequently, psychology, in general, was criticized for its ignorance of the socio-cultural background of psychological processes and for its lack of culturally and ethnically diverse research samples. Since the bulk of psychological theories has been defined by North American authors and verified on respective samples in the past (e.g. Adair et al., 2002; Bauserman, 1997; De Barona, 1993; Iwamasa & Smith, 1996; Loo et al., 1988; May, 1997; Graham, 1992; Guthrie, 1976; Ponterotto, 1988; Sue, 1999), and quite recently (e.g. Adair & Huynh, 2012; Arnett, 2008; Cheon et al., 2020; Henrich et al., 2010; Nielsen et al., 2017; Rad et al., 2018), the generalizability and validity of proposed psychological constructs are, without further evidence, very limited beyond the ‘Western, Educated, Industrialized, Rich and Democratic’ (WEIRD; Henrich et al., 2010) societies, which constitute just a minority of the world’s population.

Even though some psychological research often ignores cultural influences (Brady et al., 2018; Kashima, 2015), a strictly universalist approach (‘culture doesn’t matter’) is nowadays relatively scarce (Wang, 2016). Since an accepted consensus now exists among scholars on the effect of culture on individuals, ‘culture’ is increasingly taken into account in psychological research. Nevertheless, it is necessary to point out that the mere involvement of ‘culture’ in research is insufficient. Cross-cultural research requires the employment of relatively sophisticated methodological and statistical procedures to reduce the various biases which are unique to such research and to establish comparability between countries (Berry et al., 2011). Without these procedures, cross-cultural studies suffer a lack of validity. The aim of the present article is therefore to introduce one of the most challenging issues in cross-cultural research, that is, measurement invariance testing, and demonstrate it through data gathered from several European countries with an individualism/collectivism scale. To be as illustrative as possible, we chose to present two approaches to analyzing data (one approach takes into account measurement invariance, the other does not) and compare their results.

Cross-Cultural Comparability

Cross-cultural research is struggling with various methodological and statistical pitfalls (for review, see Buil et al., 2012; Fischer & Poortinga, 2018; Matsumoto & Yoo, 2006; Van de Vijver, 1998; Van de Vijver & Leung, 2000; Van de Vijver & Tanzer, 2004). These pitfalls need to be addressed in order to achieve a desirable comparability of results obtained from different cultures. A sufficient level of comparability in measurement outcomes is ensured by various types (Van de Vijver & Leung, 2011), or, if they possess an internal hierarchical structure, by levels (Poortinga, 1989) of equivalence. Even though over 50 types of equivalence have been defined in the literature (Johnson, 1998), the most commonly used taxonomy distinguishes construct equivalence, method equivalence, item equivalence and invariance verified at the measurement level (Van de Vijver, 1998). These are discussed in more detail in the following sections.

Since the terminology of cross-cultural methodology might be sometimes misleading, in this text we decided to explicitly distinguish between ‘cross-cultural equivalence’ and ‘measurement invariance’ (see also Welzel et al., 2021). Even though these two terms overlap and are often used in an interchangeable manner, cross-cultural equivalence refers to a more general concept of methodological and theoretical equivalence which can be jeopardized by various ‘biases’ (bias means the lack of equivalence). Measurement invariance refers to the purely statistical and psychometric assessment of such cross-cultural equivalence (Matsumoto & Yoo, 2006; Putnick & Bornstein, 2016). Invariance, which can be assessed only during data analysis (a posteriori), naturally stems from equivalence, which should be assessed during each stage of the research (mostly a priori; for review, see Davidov et al., 2014; He & Van de Vijver, 2012; Hui & Triandis, 1985; Millsap & Meredith, 2007; Poortinga, 1989; Van de Vijver, 1998; Van de Vijver & Tanzer, 2004). Both terms are composed of similar hierarchically sorted levels: for cross-cultural equivalence they are construct equivalence, method equivalence and item equivalence; for measurement invariance, they are configural invariance, metric invariance and scalar invariance (Van de Vijver, 1998). These two phenomena combined should provide the necessary evidence for cross-cultural comparability of constructs and their measurement (Čeněk and Urbánek, 2019).

Construct Equivalence

The construct equivalence assumes that constructs exist and have the same meaning across various cultural groups (Hui & Triandis, 1985; Van de Vijver, 1998). This cannot be accomplished without an adequate and clear definition of culture and cultural groups as objects of analyses (Levine et al., 2007). Even though this requirement might appear relatively simple, it is routinely very problematic in the reality of cross-cultural research, because culture is often operationalized with respect to the birthplace of participants, in most cases defined as an individual country or region. Yet, other facets of cultural background exist (often understood as fuzzy sets), such as religion, social class, socio-economic status, ethnicity, nationality or identity, which should also be taken into account in any clarification of conceptual and operational definitions (Cohen, 2009).

Furthermore, since the experimental manipulation of ‘culture’ is, for obvious reasons, not possible in cross-cultural research, scientists must rely on quasi-experimental and correlational research designs with no ambition of making decisions about causality. In other words, the mere observed difference between participants from two cultures in a psychological process does not imply that the difference is caused by cultural influences (i.e. cultural attribution fallacy; Matsumoto & Yoo, 2006). The solution of this fallacy lies in the so-called unpacking studies, that is, complex research designs with an enormous number of potentially related variables collected, and with careful examination of all theoretically plausible relationships between them. Such methodology should be able to reduce the risk that the differences are caused by phenomena not related to culture (Matsumoto & Juang, 2012; Matsumoto & Yoo, 2006). This procedure also increases the demands on the equivalence of concepts because of the greater number of variables in these models and more complex relationships between them.

Another way to achieve a higher level of construct equivalence is to involve scholars from the cultures investigated and the inclusion of informants with expertise in local cultures and languages (Davidov et al., 2014; Van de Vijver, 1998). This can take the form of simultaneous development of instruments across cultures involved in the study and selection of the most appropriate items (see Leong et al., 2010; Werner & Campbell, 1970), or, by contrast, independent development of instruments and selection of all items generated by the research team (see Campbell, 1986).

Method Equivalence

Method equivalence assumes that the instruments used in research and sampling and administration procedures are similar across cultures and therefore comparable (He & Van de Vijver, 2012). Regarding sample equivalence, scientists should make well-informed sample and sampling decisions about the construct under scrutiny, because various cultural groups often differ in their level of education, socio-economic status, religion, etc., which can result in confounding effects. Such variables should be therefore controlled in the analysis (Van de Vijver, 1998). Furthermore, samples and sampling should be adapted to the research goals – generally speaking, homogenous samples are adequate in the examination of cross-cultural differences, while heterogeneous samples should be used for the examination of cross-cultural similarities (Boehnke et al., 2011).

Another type of equivalence is instrument equivalence, which addresses the fact that participants from different cultures do not always consistently react to the instrument and its characteristics (Van de Vijver, 1998). An instrument and its items/stimuli should not vary across cultures in the level of their appropriateness or familiarity. Some other impediments to instrument equivalence are inherent to self-report scales, in which the results might be systematically biased by differences in response styles (for review, see Van Vaerenbergh & Thomas, 2012). A large amount of evidence is available which suggests that the differences in responses of participants from various cultures can be systematically affected by extreme, midpoint, acquiescence and social desirability response styles (e.g. Batchelor & Miao, 2016; Baumgartner & Weijters, 2015; Harzing, 2006; Johnson & Van de Vijver, 2003; Smith, 2004). The effects of response biases can lead to erroneous interpretations of scores obtained for various cultural groups. It is therefore desirable to assess these effects and evaluate their potential confounding effect on results (Van de Vijver & Tanzer, 2004).

Many a posteriori methods have been defined to resolve response biases. For instance, acquiescence bias can be controlled in analyses through an ipsatization in reliability estimation (Fischer, 2004; Fischer & Milfont, 2010); extreme or midpoint bias can be investigated and controlled using the sum-score calculated only from extreme (mid-point) answers (Peterson et al., 2014); social desirability bias is usually assessed and controlled through a score obtained from an additional social desirability scale (Larson, 2018). Furthermore, acquiescence, extreme and midpoint biases are often modelled (and controlled) within a multi-group confirmatory factor analysis (MG-CFA) as a common method variance (e.g. Morren et al., 2011; Welkenhuysen-Gybels et al., 2003). Other authors suggest controlling them, for instance, within restricted latent class factor analysis (Morren et al., 2011) or item response theory (Zhang & Wang, 2020). From these four response styles, we can further derive a general response style, which can be modelled and controlled in a manner similar to controlling each response style (He & Van de Vijver, 2015). In addition, some a priori steps in the construction and administration of an instrument might result in a decrease in or prevention of a response bias (e.g. item randomization, Uskul & Oyserman, 2006; or inclusion of reversed items, Paulhus, 1991).

The final, crucial type of method equivalence, that is, administration equivalence, addresses the confounding effects which occur during administrative procedures. Examples of these effects are inconsistencies in social, physical or technical conditions, unexpected events, the effects of administrators, ambiguous instructions, communication problems, different modes of data collection, etc. (see Van de Vijver & Tanzer, 2004). Non-equivalence of administration may be significantly reduced with the aid of test–retest, training and intervention studies (Van de Vijver & Tanzer, 2004).

Item Equivalence

Item equivalence assumes that the measurement on the item level is equal across various countries (i.e. the item has the same psychological meaning across cultures; Van de Vijver, 1998). This is influenced mainly by the translations and adaptations of items, with the aim of obtaining ‘culture-free’ and ‘culture-fair’ items. Such items, however, must not only be translated accurately without any shift in meaning or connotation (Poortinga, 1989) but should also be appropriate to the specific culture and consider certain effects of culturally specific nuisance factors or connotations associated with item wording (Harkness et al., 2010; The Council of the International Test Commission, 2018). Furthermore, item wording should eliminate the potential reference-group effect, that is, the situation when participants from different cultures use different reference groups (often their own cultural group) while answering on self-report scales (see Heine et al., 2002).

The bias at the item level can be prevented by collecting pilot data and performing an item analysis (The Council of the International Test Commission, 2018), incorporating a larger number of translators from various cultural backgrounds (Byrne et al., 2009), using back-translation (Brislin 1970; Werner & Campbell, 1970) or multi-step committee-based translation methodologies such as ‘Review, Adjudicate, Pre-test and Document’ (TRAPD; Harkness et al., 2010), detecting item bias using judgmental methods (e.g. linguistic and psychological analysis) or differential item functioning (DIF; Van de Vijver and Tanzer, 2004), and identifying errors with the aid of cognitive interviewing (Fitzgerald et al., 2011).

Measurement Invariance

Measurement invariance refers to the degree to which the instruments used in research are equally reliable in two or more cultures (or other groups). Measurement invariance can only be assessed by psychometric means (Matsumoto & Yoo, 2006; Putnick & Bornstein, 2016) and naturally stems from the cross-cultural equivalence described above, simultaneously providing statistical evidence for the degree of equivalence. Measurement invariance and its types are described in great detail elsewhere (e.g. Boer et al., 2018; Brown, 2015; Byrne et al., 1989; Chen, 2007; Cheung & Rensvold, 2002; Davidov et al., 2014; Fischer & Karl, 2019; Hoyle, 2012; Kline, 2016; Meredith, 1993; Milfont & Fischer, 2010; Millsap, 2011; Putnick & Bornstein, 2016; Vandenberg & Lance, 2000) and therefore we only provide a brief summary herein.

As mentioned, measurement invariance assessment is most commonly performed through a multi-group confirmatory factor analysis (MG-CFA), which is a special case of structural equation modeling (SEM). The general assumption is that psychological traits are latent constructs measured indirectly by observed indicators (e.g. questionnaire items; see Byrne, 2010; Bollen, 1989; Brown, 2015; Hoyle, 2012; Kline, 2016). The SEM framework is a versatile tool for controlling any potential measurement error and therefore also a useful tool for cross-culturally equivalent research (e.g. Boer et al., 2018; Chen, 2008; Fischer & Karl, 2019; Milfont & Fischer, 2010). To illustrate the difference of analysis performed on a construct (latent) level and on an observed (manifest) level, imagine that scholars are interested in the real differences between people with respect to their intelligence. Scholars who compare the difference on a construct level would compare estimated (indirectly measured) levels of general intelligence (g factor), whereas scholars who compare the difference on an observed level would compare the directly measured observed scores obtained from tests of intelligence that are imperfect indicators of general intelligence.

The measurement invariance procedure is based on a sequential equation of parameters across groups in three phases: the establishment of configural, metric and scalar invariance. The fourth phase, the establishment of strict invariance, is usually not performed in cross-cultural research since it is not necessarily required for the purposes of group comparisons (Meredith, 1993). The increasingly restricted nested models are compared (i.e. metric invariance is contrasted with configural invariance; scalar invariance is contrasted with metric invariance), and if the model fit is not significantly worsened by adding new parameter restrictions, a more restricted model is preferred due to better interpretability of the results.

In the first phase, configural (otherwise known as structural or factorial) invariance is assessed. In this step, the baseline measurement model which allows all parameters to be freely estimated across groups is tested. A satisfactory configural model requires an equal latent factor structure across groups (i.e. the number of factors and the pattern of relationships between factors and items; Thurstone, 1947). The configural model is evaluated by overall fit. The following indicators and thresholds are recommended (good/acceptable fit): Tucker–Lewis Index (TLI) ≥ .95/.90, Comparative Fit Index (CFI) ≥ .95/.90, Root Mean Square Error of Approximation (RMSEA) ≤ .06/.08 and Standardized Root Mean Square Residual (SRMR) ≤ .08 (Hu & Bentler, 1999). Values outside these boundaries suggest that the baseline measurement model is noninvariant, that is, its factor structure differs in the cultural groups.

When configural invariance is satisfactorily established, the metric (otherwise known as weak or pattern) invariance can be evaluated. In a metric model, the factor loadings are constrained to be equal across groups (Horn & McArdle, 1992; Millsap, 2011; Thurstone, 1947). If the model is not significantly worsened, it is considered tenable. Even though such worsening was assessed with a chi-square test in the past, this approach is currently not recommended because this test is sensitive to the sample size (Saris et al., 2009). Hence, the assessment of a significantly worsened model is nowadays usually estimated by the inspection of change (delta) in the CFI, RMSEA and SRMR. The most used criteria are those proposed by Chen (2007): ΔCFI ≤ .01, ΔRMSEA ≤ .015 and ΔSRMR ≤ .030 (in the case of scalar level, ΔSRMR ≤ .015). Sufficient metric invariance means that each item loads the latent construct to a similar degree across groups, and therefore a comparison of factor variances and covariances is possible.

If the metric invariance is established, the scalar (otherwise known as strong) invariance can be assessed. The scalar model constrains item intercepts (or thresholds in the case of discrete variables) for metric invariant items to be equal across groups, which means that differences in the latent construct capture all the mean differences in the observed scores (Meredith, 1993; Steenkamp & Baumgartner, 1998). If the scalar invariance is supported, which is to say that the model is not significantly worse compared to the metric invariance model (the same criteria for delta of fit indices), it indicates that constraining the item intercepts across groups does not significantly affect the model fit.

Once the configural, metric and scalar measurement invariance is established, the mean values of latent factors can be meaningfully compared across groups (or across different time points). This is usually conducted by setting the latent mean in one group to 0 while allowing it to vary in the remainder of the groups. This procedure lets us interpret latent means in terms of standardized effect sizes (i.e. reference group method, see Steinmetz, 2011). The estimated mean parameters in the remainder of the groups therefore represent a difference in the latent means between groups (Putnick & Bornstein, 2016). The procedure of testing difference among groups is called latent mean comparisons (a.k.a. structured means modeling, SMM; Sörbom, 1974), and it represents an alternative to the t-test, ANOVA, ANCOVA, MANOVA, MANCOVA, etc. Its main benefit lies in the fact that means are comparing within SEM on a construct (latent) level instead of on an observed level. This means the results are free of measurement error, the procedure has a lower number of assumptions and it can deal with multicollinearity or homogeneity of variances. (For more details, see Aiken et al., 1994; Breitsohl, 2019; Cole et al., 1993; Hancock, 1997; 2001; Hancock, et al., 2000; Thompson & Green, 2013; Whittaker, 2013). The scalar level of invariance is thus crucial in any cross-cultural comparison, because only this degree of measurement invariance allows a comparison of groups at mean-level differences.

If noninvariance is detected in any phase of measurement invariance testing, the researcher should stop the analysis and determine the issues of noninvariance or accept that the constructs are noninvariant and discontinue the analysis of measurement invariance and also abandon the interpretation of group differences (Putnick & Bornstein, 2016). Another possible step may lie in an analysis of the practical effect size of such noninvariance (e.g. with d_MACS effect size, see Nye et al., 2019; or with MIVI effect size, see Groskurth et al., 2021). Unfortunately, in the literature there is no generally accepted consensus on what to do next when measurement invariance fails (Millsap, 2011).

Regarding configural noninvariance, research might redefine the constructs of interest (e.g. allow correlated residuals, omit variables, change the number of factors, use hierarchical or bifactor structural models) and retest the model (Chen, 2008; Putnick & Bornstein, 2016), or use exploratory procedures to identify the origin of the misfit and DIF (Fischer & Karl, 2019). However, once these modifications and alterations of the baseline model have been conducted, the analysis can no longer be understood as confirmatory and should be considered and reported as an exploratory analysis (Bollen, 1989; Byrne, 2010). Furthermore, even the exploratory approach within the MG-CFA framework still needs to be well-grounded theoretically (i.e. theory-driven rather than purely data-driven, cf. Brown, 2015). Another possible solution of configural noninvariance might lie in detecting subsets of countries where measurement invariance holds and analyzing them separately (Davidov et al., 2014). Furthermore, noninvariance at the metric and scalar levels can be resolved by establishing a partial invariance computed from at least two invariant items (yet ideally from at least half of all subscale items) and consequently retesting the given level of measurement invariance. In this case, only invariant items are constrained in terms of factor loadings or intercept, while the rest of the items remain unconstrained, which allows valid comparisons across groups (Byrne et al., 1989; Steenkamp & Baumgartner, 1998). However, since this procedure is estimated based on modification indices, its disadvantage lies in the assumption that all other loadings (i.e. except those which were marked as noninvariant) are invariant. If this assumption is not true, the results of the modification indices will be inaccurate (Cheung & Rensvold, 2002).

Nevertheless, even partial scalar measurement invariance might be too restrictive and hard to achieve within a traditional MG-CFA, especially in cross-cultural studies which involve many cultural groups (e.g. Boer et al., 2018; Byrne and van de Vijver, 2017; Davidov et al., 2014; Rutkowski & Svetina, 2014). Furthermore, the MG-CFA is not very effective for verifying the invariance in groups with extreme differences (which, nevertheless, might be legitimate in cross-cultural research). Such models will almost always signalize noninvariance, which does not necessarily mean lack of comparability caused by different item understanding (Welzel et al., 2021). Despite this, the measurement invariance still needs to be address in a research. Therefore, in the case of noninvariance, research might consider using recently developed alternative methods that might be more flexible for assessing measurement invariance (Fischer et al., 2021). For example, multiple-indicators multiple-causes modeling (MIMIC; Kim et al., 2011), alignment-within-CFA (AwC, Marsh et al., 2018), item response theory for DIF identification (IRT; Kim and Yoon, 2011), Bayesian structural equation modeling (BSEM; Muthén and Asparouhov, 2012), exploratory structural equation modeling (ESEM; Asparouhov and Muthén, 2009), multi-group factor analysis alignment (Asparouhov and Muthén, 2014), clusterwise simultaneous component analysis (De Roover et al., 2014), multilevel confirmatory factor analysis (Davidov et al., 2016), mixture multigroup factor analysis (MMG-FA; De Roover et al., 2020), exploratory-based multigroup factor rotation (MGFR; De Roover and Vermut, 2019) or data-driven tool called SEM trees (Brandmaier et al., 2013) might serve in this regard.

In spite of many imperative calls for the application of invariance measurement in research, the practice is unfortunately still not as common as it should be, as shown in recent reviews. For example, only 4% of studies using social and personality psychological instruments yielded satisfactory measurement invariance for gender or age groups (Hussey & Hughes, 2020). Regarding cultural groups, only 17% of cross-cultural comparative quantitative studies verified measurement equivalence in the Journal of Cross-Cultural Psychology (see Boer et al., 2018) or the Journal of Personality and Social Psychology (Chen, 2008). Similarly, scalar measurement invariance has not been achieved in a single study on child and adolescent psychopathology (Stevanovic et al., 2017), nor in studies using already validated measurements of personality psychology (Dong & Dumas, 2020). The unsatisfactory situation of the lack of invariance testing is also observed in counselling research (Chen et al., 2020). The situation is similar in research on individualism and collectivism (Chen & West, 2008; Cozma, 2011; Lacko et al., 2021), which has been a long-term ‘flagship’ construct in cross-cultural research, and which is therefore applied in the present article as a demonstration of the essentiality of measurement invariance testing in cross-cultural research.

Individualism and Collectivism: An Overview

The concepts of individualism (sometimes referred to as independent self-construal) and collectivism (or interdependent self-construal) are some of the most investigated constructs in cross-cultural research and are widely used as predictors of many other psychological phenomena (for review, see Oyserman et al., 2002; Markus & Kitayama, 1991; Singelis et al., 1995). The prevalent theory postulates that while independent social orientation and emphases of self-direction, self-expression and autonomy are typical for individualistic cultures, collectivistic cultures embrace interdependent social orientation and place emphases on harmony, relatedness and connection with others. It is also assumed that Western cultures such as the USA or Great Britain are individualistic, while Eastern cultures, for example, China or Japan, are collectivistic (Markus & Kitayama, 1991).

Despite the fact that individualism and collectivism (I/C) are referred to in almost all psychology textbooks, and that every research psychologist is probably aware of this dimension, the validity of I/C research has often been criticized over the past two decades (e.g. Bresnahan et al., 2005; Brewer & Chen, 2007; Chen & West, 2008; Cozma, 2011; Heine et al., 2002; Lacko et al., 2021; Levine, Bresnahan, Park, Lapinsky, et al., 2003; 2003b; Matsumoto, 1999; Oyserman et al., 2002; Oyserman & Lee, 2008; Schimmack et al., 2005; Takano & Osaka, 1999, 2018; Voronov & Singer, 2002). Critics highlight that I/C research lacks concurrent and discriminant validity, clarity of conceptualization and, most importantly, that the scholars of I/C research do not use validated methods or adequate statistical procedures. As Lacko et al. (2021) pointed out, not a single validated instrument is currently available which would repeatedly satisfy the demanding criteria of scalar measurement invariance across various cultures and would simultaneously remain confirmatory.

As was already mentioned, measurement invariance is usually not verified in I/C research. For instance, even very recently published articles in prestigious journals indexed in the Web of Science tend to ignore metric measurement invariance when using I/C as a predictor or a correlate in cross-country studies (e.g. Burton et al., 2019; Galang et al., 2021; Krys et al., 2019), and tend to ignore scalar measurement invariance in mean comparisons across cultures (e.g. Anakwah et al., 2020; Benavides & Hur, 2020; Gomez & Taylor, 2018). Since such articles generally overlook the potential noninvariance issue, their results might be biased. Furthermore, results of t-tests or ANOVAs on observed scores might exaggerate group differences compared to latent mean comparisons, which are based on the scalar measurement invariance (Cole et al., 1993). Therefore, the observed differences in such articles might not correspond to the real level of I/C in the examined populations.

Current Study: An Illustrative Example of Comparison of ‘Traditional’ and ‘Invariant’ Approaches

The aim of the present article is not only to introduce measurement invariance testing and emphasize its necessity in cross-cultural psychology. The article also illustrates the difference between the results obtained from adequate statistical analysis and the results obtained from analysis which ignores the above-mentioned criteria of measurement invariance and potentially produces systematically biased findings. In other words, we are comparing two approaches to analysis of cross-cultural, self-report data. We labelled the first mentioned approach ‘invariant’ and the second approach ‘traditional’ (‘traditional’ because it currently represents the vast majority of research which uses self-report subjective scales in the I/C field; Chen & West, 2008; Lacko et al., 2021). In the ‘invariant’ approach, we established a partial scalar measurement invariance with MG-CFA and consequently compared cultures using latent means, whereas in the ‘traditional’ approach we simply calculated the observed arithmetic means of subscales and analyzed the differences using a one-way analysis of variance.

In order to achieve this aim, we used a real and unpublished cross-cultural dataset. The main reason for not publishing the original data was the inability to establish even configural measurement invariance across countries, which is unfortunately quite common in I/C research (for review, see Lacko et al., 2021). We applied various methods to address the issue of noninvariance. Our approach, however, has been evolving from a theory-driven to a purely data-driven assessment of measurement invariance without any theoretical background. (For a more detailed description of our previous analyses, which were theoretically grounded, see Supplementary Material). Hence, we would like to highlight that the following results of the ‘invariant’ approach are not confirmatory and must not be interpreted in this way. It is impossible to say whether the proposed model is valid from a theoretical perspective or not, and we therefore cannot say much about the real differences between the selected countries in their level of I/C. Since we do not have any other objective criterion (e.g. known validity) caused by the lack of valid instruments in the I/C fields, it is also impossible to say which of used approach produce more valid results in our specific case (i.e. whether the invariant results correspond to the reality more than traditional or not) despite the invariant approach is generally recommended. Hence, both procedures serve purely as a demonstration. We believe that demonstration of the measurement invariance on a real dataset might be more useful specially for readers unaware of such statistical procedure than simulation studies despite the fact, that such approach allow neither to know the true model nor to manipulate with the level of noninvariance.

The full dataset contained data from 1927 participants in seven European countries. One country represents Central Europe (Czechia) and the rest are Balkan countries (Slovenia, Croatia, Bosnia and Herzegovina, Serbia, Macedonia and Albania, see Table 1). Participants were college students (students of social or human sciences) gathered from Mendel University (Czechia), University of Ljubljana (Slovenia), University of Rijeka (Croatia), University of Banja Luka (Bosnia and Herzegovina), University of Novi Sad (Serbia), University of Tetova (Macedonia) and University of Tirana (Albania). The universities were often set in the biggest or the second biggest city in the country (with exception of Rijeka that is the third biggest and Tetovo that is the fifth biggest) and some of them were also national capitals. The participants were, therefore, collected from highly developed regions. The data collection was done via university mailing lists and related students’ groups on social networks.

Table 1.

Basic descriptive statistics of participants.

Country	N	Gender (female)	Age range	Age M (SD)
Total	1927	1387 (72%)	18–67	23.2 (6.1)
Slovenia	200	150 (75%)	19–50	24.3 (5.2)
Croatia	207	159 (77%)	18–42	22.1 (3.0)
Bosnia and Herzegovina (B&H)	346	191 (55%)	18–67	28.7 (11.5)
Serbia	219	190 (87%)	18–37	20.9 (2.2)
Macedonia	264	156 (59%)	18–32	20.8 (1.5)
Albania	150	138 (92%)	20–47	22.5 (3.7)
Czechia	541	403 (75%)	18–25	22.0 (1.8)

N = number of participants, M = Mean, SD = Standard Deviation.

It is important to examine these countries because the level of I/C in Balkan countries is rather unknown and studies from Czechia are reporting mixed results (cf. Bašnáková et al., 2016; Čeněk, 2015; Dumetz & Gáboríková, 2017; Kolman et al., 2003; Lacko et al., 2020. Furthermore, in Balkan countries, states and societies are multi-ethnic with a great religious, cultural and linguistic diversity. Orientations of peoples to preserve the cultural identities of their groups have also yielded frequent results of conflicts among the peoples. Therefore, one of the major challenges for the European Union is the perceptions about their individual or group orientation to the peoples of the Balkans.

The level of I/C across cultures was measured according to the Independent and Interdependent Self Scale (IISS; Lu and Gilmour, 2007). The scale consisted of two dimensions (an independent-self and an interdependent-self subscale) and forty-two (21 for independent and 21 for interdependent self-construal) 7-point Likert-type numerical items (1 = strongly disagree, 7 = strongly agree). The IISS items were derived from older individualism–collectivism scales such as the Self-Construal Scale (Singelis, 1994), the Individualism–Collectivism Scale (Triandis & Gelfand, 1998) and the concept of independent/interdependent self-construal (Markus & Kitayama, 1991). Two versions of the scale were used in previous research: a full version (e.g. Dixon, 2007; Lacko et al., 2020; Marquez & Ellwanger, 2014), and a shortened version (e.g. Siu & Lo, 2013). Although the IISS showed satisfactory reliability (independent subscale: α = .86; interdependent subscale: α = .89), the authors did not use a confirmatory factor analysis (CFA) for verification of its factor structure.

All statistical analyses were performed in R (v4.0.3; R Core Team, 2020), using the software packages lavaan (v0.6-7; Rosseel, 2012), semTools (v0.5-4; Jorgensen et al., 2018) and JASP (v0.12.2). The data and the R syntax for the ‘invariant’ approach are available online (see https://osf.io/g5z32/?view_only=ae12132150cd4e2cbdcd3ed47654b637 – anonymized link).

Results

In the following section, we compare the results obtained from the ‘invariant’ and ‘traditional’ approaches. We focused on the most often reported procedures in I/C research, namely: reliability estimation, descriptive statistics, cross-cultural differences and post-hoc tests. Please note that the first section of the following results, measurement invariance, was produced only in accordance with the ‘invariant’ approach. All subsequent results were then computed for both methods in a parallel manner.

Measurement Invariance

For model estimation, we used a robust, weighted, least square mean and variance estimator (WLSMV; for results of an alternative maximum likelihood estimation with robust standard errors [MLR] see Supplementary Material), which is suitable for ordinal and non-normally distributed data from Likert-type scales (Finney & DiStefano, 2006). Besides the WLSMV estimator, we applied delta parameterization, the Wu and Estabrook (2016) model identification methodology and a pairwise method to address missing data. This ordinal procedure has been demonstrated in detail by Svetina et al. (2019). Since our data were skewed and medium sample sizes per group were gathered, we observed some empty cells on one side of the ordered categorical (ordinal) scales. This situation represented a problem for polychoric correlations computations. Two possible solutions to this problem are suggested in the literature: a) add values to the empty cells (Savaleo, 2011), and b) collapse multiple categories into a single category (Rutkowski et al., 2019). To perform the subsequent analyses, we decided to select the first option and fill the empty cells with non-missing values. This procedure led to twelve newly created fictional respondents with one manually added value (the remainder of the values were missing).

To establish a configural invariance between countries, we had to apply two additional steps (see also Supplementary Material):

1) The Czech sample was removed from further analysis because of its entirely different pattern of covariances and factor loadings (e.g. opposite valency) to the other countries. Hence, only 1386 participants were incorporated into the MG-CFA (the same data were used with the ‘traditional’ approach, see below).

2) Three quarters of items were removed based on the purely exploratory data-driven inspection of the data (more or less ‘trial and error method’), and only items that allowed establishing a configural invariance across countries were kept. This resulted in a two-dimensional, 10-item scale (Individualism = item 2, item 3, item 4, item 5 and item 6; Collectivism = item 26, item 28, item 30, item 33 and item 37; for the item wording, see Lu & Gilmour, 2007) with two correlated residuals (item 33 with item 37, and item 2 with item 3). Since we wanted to compare both approaches in the way they are usually used in real practice, we compare the 10-item version in the invariant approach with the 42-item version in the traditional approach. For a direct comparison on the same 10-item version of the scale in both approaches, see Supplementary Material.

Measurement invariance testing was performed in several steps (see the results in Table 2). First, we verified the configural measurement model of the original 42-item, two-dimensional factor model (i.e. original configural). However, this model yielded an unsatisfactory fit (RMSEA = .071 [90% CI: .070, .073], CFI = .616, TLI = .596, SRMR = .089). Second, we verified the configural measurement model with the 10 items on which we were able to establish configural invariance. The proposed model yielded satisfactory fit indices and described the data well (i.e. configural).

Table 2.

Measurement invariance results.

Measurement model	χ² (df)	RMSEA [90% CI]	CFI	TLI	SRMR	ΔCFI	ΔRMSEA
Original configural	χ² (5166) = 19967.688 ***	.071 [.070, .073]	.616	.596	.089	-	-
Configural	χ² (192) = 337.260 ***	.057 [.047, .067]	.981	.973	.053	-	-
Threshold	χ² (392) = 572.734 ***	.045 [.036, .052]	.976	.983	.053	.005	−.012
Metric	χ² (432) = 667.099 ***	.048 [.041, .056]	.968	.980	.060	.008	.003
Scalar	χ² (472) = 1213.262 ***	.082 [.077, .088]	.900	.943	.062	.068	.034
Partial scalar	χ² (442) = 734.046 ***	.053 [.047, .060]	.961	.976	.059	.007	.005

***p < .001; χ² = Chi-square; df = degrees of freedom; RMSEA = Root mean square error of approximation; CI = Confidence intervals; CFI = Comparative fit index; TLI = Tucker–Lewis index; SRMR = Standardized root mean square residual; Δ = delta (change).

Third, the threshold¹ and metric invariance models were tested by constraining the items’ thresholds and items loadings, which resulted in acceptable changes of the model fit (ΔCFI = .005, ΔRMSEA = −.012 for threshold; ΔCFI = .008, ΔRMSEA = .003 for metric; see Table 2). The results suggest that the metric invariance of the proposed model holds well, and therefore it is possible to examine the relationships between variables, for example, through correlations or regressions (Milfont & Fischer, 2010).

Fourth, we verified scalar invariance. Nevertheless, constraint of the items’ intercepts resulted in an unacceptable change in model fit (ΔCFI = .068; ΔRMSEA = .034; see Table 2). Therefore, in the next step, we specified a partial scalar invariance model with six unconstrained items’ intercepts based on iteratively releasing of constraints according to the statistical significance and χ² values obtained from the Lagrange multiplier tests (for all groups at once; items 2, 4, 6, 28, 30, 33). This left us with two fully invariant items per each I/C dimension (meaning two items with constrained intercepts, loadings and thresholds; Byrne et al., 1989; Steenkamp & Baumgartner, 1998). This led to acceptable partial scalar invariance fit indices (ΔCFI = .007, ΔRMSEA = .005; see Table 2). A comparison of the latent means between countries was therefore possible.

Reliability Estimation

In this section, we verify the internal consistency reliability of the scales using Cronbach’s alpha and McDonald’s omega coefficients for each approach (cf. Viladrich et al., 2017). The presentation of reliability coefficients serves here purely as a description of the correlations between items and not as evidence of the psychometric quality of the scales. We report these coefficients mainly for the reason that the internal consistency estimates in cross-cultural research, which uses self-report scales, are often the only (or main) indicators of the psychometric quality of these scales, and then are subsequently used as a justification of their quality. These coefficients, however, tend to overestimate the reliability of scales with a high number of items, which is often the case of I/C self-report scales. A comparison of the differences between each approach (see Table 3) showed that differences were not large and that the coefficients obtained from the ‘invariant’ approach (5 items per subscale) were still sufficient for research purposes. This suggests two crucial points: 1) the ‘traditional’ approach (21 items per subscale) used an unnecessarily large number of similar items; and 2) because I/C scales usually use a large number of items, the internal consistency coefficient, even with a satisfactory value, is not sufficient evidence of psychometric quality.

Table 3.

Cronbach’s α and McDonald’s ω internal consistency estimation.

	Individualism		Collectivism
	Invariant α/ω	Traditional α/ω	Invariant α/ω	Traditionalα/ω
Slovenia	.851/.805	.866/.862	.702/.617	.829/.832
Croatia	.881/.815	.812/.789	.684/.640	.853/.857
B&H	.837/.772	.849/.843	.770/.735	.865/.863
Serbia	.781/.693	.883/.884	.696/.679	.844/.846
Macedonia	.785/.721	.843/.842	.828/.781	.888/.885
Albania	.807/.703	.791/. 779	.758/.700	.823/.826

α = Cronbach’s alpha; ω = McDonald’s omega; Δ = delta (change).

Descriptive Statistics

In this section, we report descriptive statistics for both approaches. We estimated the latent means for the ‘invariant’ approach and the arithmetic means for the ‘traditional’ approach; standard deviations were calculated in both cases (see Table 4). Even though a mutual comparison was neither intuitive nor direct, we could compare their order. In both approaches, Albania reached the highest scores in both subscales, followed by Macedonia. Furthermore, both approaches revealed that Slovenia was the least individualistic and Serbia was the least collectivistic. However, the order of the remaining countries differed.

Table 4.

Estimated latent means (SD/SE) and arithmetic means (SD/SE).

	Individualism		Collectivism
	Invariant	Traditional	Invariant	Traditional
Slovenia	−.310 (.699/.069)	5.155 (.750/.053)	−.185 (.484/.052)	4.858 (.676/.048)
Croatia	.024 (.719/.096)	5.563 (.563/.039)	−.175 (.465/.069)	4.734 (.686/.048)
B&H	−.267 (.762/.084)	5.465 (.729/.039)	.007 (.695/.075)	4.784 (.837/.045)
Serbia	−.132 (.570/.085)	5.379 (.706/.048)	−.239 (.476/.067)	4.555 (.679/.046)
Macedonia	.330 (.887/.121)	5.644 (.773/.048)	.222 (.885/.093)	5.051 (.968/.060)
Albania	.470 (.873/.151)	5.723 (.607/.050)	.478 (.673/.114)	5.236 (.730/.060)

SD = standard deviation, SE = Standard error.

To provide a more convenient comparison of descriptive statistics, the data were clustered into two dual Y-axis line charts (see Figures 1 and 2). The latent and arithmetic means as well as their standard errors were transformed via min–max normalization into a particular range. From the graphs and descriptive statistics, it is clear that the latent means in some countries obtained from the ‘invariant’ approach differed from the arithmetic means obtained from the ‘traditional’ approach. We observed the greatest differences between the approaches for Bosnia-Herzegovina and Croatia in individualism and for Slovenia and Croatia in collectivism.

Figure 1.

Comparison of latent and arithmetic means in individualism subscale.

Figure 2.

Comparison of latent and arithmetic means in collectivism subscale.

Between-Country Differences

In this section, we analyze the differences between the countries. For the ‘invariant’ approach, we applied the Chi-Squared difference test (Satorra, 2000) to the entire two-dimensional measurement model to identify the main effect between nested models (i.e. latent mean comparison) and found a statistically significant result: Δ X² = 280.52, Δ df = 42, p < .001. This suggests that significant differences exist between the countries in both individualism and collectivism.

For the ‘traditional’ approach, we first applied Levene’s test for equality of variances. The test indicated that the assumption of homoscedasticity was violated for both individualism, F_{(5, 1380)} = 4.219, p < .001, and collectivism, F_{(5, 1380)} = 6.725, p < .001. Therefore, we applied Welch’s correction. Then, two one-way ANOVAs revealed significant differences between the countries in both individualism, F_{(5, 593.921)} = 16.236, p < .001, ω² = .053, and collectivism, F_{(5, 856.069)} = 20.203, p < .001, ω² = .058. In summary, omnibus analysis showed, in both approaches, statistically significant differences between countries in their level of individualism and collectivism.

Post-hoc Tests

In this section, the results for each approach in the post-hoc tests are reported. The ‘traditional’ approach used post-hoc pairwise t-tests, whereas the ‘invariant’ approaches used the latent mean comparisons. In order to reduce potential type I errors, we corrected all the following post-hoc pairwise comparisons using Holm–Bonferroni correction for multiple comparisons. Additionally, for each pairwise comparison, the Cohen’s d effect size was computed (for SEM: the standard deviation was computed as a square root of the variance; from these standard deviations, the pooled standard deviations were calculated, which were subsequently used for calculation of effect size; see Breitsohl, 2019).

First, we analyzed the differences between each approach in the level of individualism. The ‘invariant’ approach revealed ten statistically significant results, and the ‘traditional’ approach revealed nine (see Table 5). However, only six pairwise comparisons were statistically significant in both approaches; the remainder of the results were inconsistent (e.g. statistically significant in the first approach, but not in the second). Two out of the six shared comparisons, and seven comparisons in total, differed in their effect sizes (i.e. difference greater than .250). We also identified one statistically insignificant pairwise comparison which showed an opposite valence (i.e. negative for ‘invariant’, but positive for ‘traditional’). Hence, we can conclude that both approaches yielded different post-hoc analysis results.

Table 5.

Post-hoc analysis for individualism subscale.

	Mean differences (SE) [95% CI]		Cohen’s d		Δ
	Invariant	Traditional	Invariant	Traditional	Δ
Slovenia versus Croatia	−0.334 (0.092) ** [−0.515, −0.153]	−0.435 (0.075) *** [−0.650, −0.220]	−0.471	−0.591	−0.120
Slovenia versus B&H	−0.043 (0.084) [−0.208, 0.122]	−0.332 (0.067) *** [−0.525, 0.140]	−0.058	−0.424	−0.366
Slovenia versus Serbia	−0.178 (0.084) [−0.342, −0.014]	−0.235 (0.074) * [−0.447, −0.023]	−0.280	−0.296	−0.016
Slovenia versus Macedonia	−0.640 (0.113) *** [−0.862, −0.419]	−0.495 (0.071) *** [−0.698, −0.292]	−0.789	−0.612	0.177
Slovenia versus Albania	−0.780 (0.143) *** [−1.060, −0.500]	−0.597 (0.082) *** [−0.831, −0.363]	−1.002	−0.807	0.195
Croatia versus B&H	0.290 (0.086) ** [0.122, 0.459]	0.103 (0.067) [−0.088, 0.293]	0.390	0.140	0.250
Croatia versus Serbia	0.156 (0.084) [−0.009, 0.321]	0.200 (0.074) * [−0.010, 0.410]	0.241	0.274	−0.033
Croatia versus Macedonia	−0.306 (0.110) * [−0.523, −0.090]	−0.060 (0.071) [−0.261, 0.141]	−0.374	−0.080	0.294
Croatia versus Albania	−0.446 (0.141) * [−0.723, −0.170]	−0.161 (0.081) [−0.394, 0.071]	−0.556	−0.246	0.310
B&H versus Serbia	−0.135 (0.078) [−0.287, 0.018]	0.097 (0.066) [−0.090, 0.284]	−0.195	0.125	0.320
B&H versus Macedonia	−0.597 (0.107) *** [−0.806, −0.388]	−0.163 (0.062) [−0.340, 0.015]	−0.730	−0.206	0.524
B&H versus Albania	−0.737 (0.138) *** [−1.007, −0.467]	−0.264 (0.074) ** [−0.476, −0.052]	−0.924	−0.359	0.565
Serbia versus Macedonia	−0.462 (0.105) *** [−0.668, −0.257]	−0.260 (0.069) ** [−0.458, −0.062]	−0.608	−0.324	0.284
Serbia versus Albania	−0.602 (0.136) *** [−0.869, −0.335]	−0.361 (0.080) *** [−0.591, −0.132]	−0.849	−0.494	0.355
Macedonia versus Albania	−0.140 (0.136) [−0.869, 0.335]	−0.102 (0.078) [−0.323, 0.120]	−0.159	−0.134	0.025

p_holm < .05, ** p_holm < .01; *** p_holm < .001; CI = Confidence intervals; SE = standard error; Δ = delta (change).

Second, we analyzed the differences in the level of collectivism between each approach. The ‘invariant’ approach revealed eight statistically significant results, and the ‘traditional’ approach revealed nine (see Table 6). Seven pairwise comparisons between the approaches were statistically significant, while three comparisons differed. Even though the seven shared differences showed similar effect sizes (i.e. a change less than .250), we identified four other comparisons which differed in their effect sizes (i.e. a change greater than .250). More importantly, three comparisons showed opposite valency. In summary, both approaches revealed different patterns of results for both the individualism and collectivism subscales. These findings suggest that the results obtained from individualism–collectivism self-report scales might differ depending on the statistical approach employed. Since the ‘traditional’ approach lacks measurement invariance testing, these results suggest the necessity of measurement invariance testing in cross-cultural, self-report research.

Table 6.

Post-hoc analysis for collectivism subscale.

	Mean differences (SE) [95% CI]		Cohen’s d		Δ
	Invariant	Traditional	Invariant	Traditional	Δ
Slovenia versus Croatia	−0.010 (0.072) [−0.151, 0.131]	0.124 (0.078) [−0.099, 0.347]	−0.021	0.182	−0.203
Slovenia versus B&H	−0.192 (0.075) [−0.340, −0.045]	0.074 (0.070) [−0.126, 0.273]	−0.307	0.094	−0.401
Slovenia versus Serbia	−0.054 (0.071) [−0.085, 0.193]	0.303 (0.077) ** [0.083, 0.523]	−0.113	0.447	−0.560
Slovenia versus Macedonia	−0.407 (0.090) *** [−0.584, −0.229]	−0.194 (0.074) [−0.404, 0.017]	−0.551	−0.227	−0.324
Slovenia versus Albania	−0.663 (0.110) *** [−0.879, −0.447]	−0.378 (0.085) *** [−0.621, −0.135]	−1.158	−0.540	−0.618
Croatia versus B&H	−0.183 (0.071) [−0.323, −0.043]	−0.50 (0.069) [−0.248, 0.147]	−0.294	−0.064	−0.230
Croatia versus Serbia	0.064 (0.068) [−0.069, 0.198]	0.179 (0.076) [−0.039, 0.397]	0.136	0.262	−0.126
Croatia versus Macedonia	−0.397 (0.086) *** [−0.566, −0.228]	−0.318 (0.073) *** [−0.526, −0.109]	−0.544	−0.371	−0.173
Croatia versus Albania	−0.654 (0.106) *** [−0.861, −0.446]	−0.502 (0.084) *** [−0.743, −0.261]	−1.162	−0.712	0.450
B&H versus Serbia	0.247 (0.071) ** [0.108, 0.386]	0.229 (0.068) * [0.035, 0.423]	0.397	0.294	0.103
B&H versus Macedonia	−0.214 (0.087) [−0.385, −0.043]	−0.267 (0.046) *** [−0.451, −0.084]	−0.270	−0.299	0.029
B&H versus Albania	−0.471 (0.106) *** [−0.679, −0.263]	−0.452 (0.077) *** [−0.671, −0.232]	−0.684	−0.560	−0.114
Serbia versus Macedonia	−0.461 (0.086) *** [−0.630, −0.292]	−.497 (0.072) *** [−0.702, −0.291]	−0.633	−0.585	−0.048
Serbia versus Albania	−0.718 (0.106) *** [−0.925, −0.510]	−0.681 (0.083) *** [−0.919, −0.442]	−1.270	−0.972	0.298
Macedonia versus Albania	−0.257 (0.115) [−0.483, −0.031]	−0.184 (0.081) [−0.414, 0.046]	−0.314	−0.207	−0.107

*p_holm < .05, ** p_holm < .01; *** p_holm < .001; CI = Confidence intervals; SE = standard error; Δ = delta (change).

Discussion

In the paper, we described cross-cultural equivalence and measurement invariance, and we emphasized their importance in cross-cultural research. In particular, we illustrated two different approaches to the analysis of data obtained from individualism/collectivism self-report scales, that is, ‘invariant’ and ‘traditional’ approaches, and compared their results. The comparisons revealed that each approach yields different statistical results for post-hoc tests, meaning that some pairwise comparisons are statistically significant and have high effect sizes inconsistently across approaches.

To provide a more illustrative interpretation of the effect of noninvariant items on the group comparisons results, we discuss in more detail the biggest disunity of both approaches which can be observed in the individualism score of Bosnia and Herzegovina. According to Hofstede’s (Hofstede et al., 2010) individualism index, all Balkan countries are strongly collectivistic (Croatia [33], Slovenia [27] and Serbia [25] are slightly more individualistic than Macedonia [22], Bosna and Herzegovina [22] and Albania [20]). Bosna and Herzegovina was indeed identified as second less individualistic in the invariant approach (latent mean was −0.267) and the difference with the lowest individualistic country, Slovenia, was statistically insignificant and showed a very weak effect size (d = −0.058). Furthermore, it showed high effect sizes in comparisons with Macedonia (d = −0.730) and Albania (d = −0.924) that were significantly more individualistic.

These results were however entirely different when analyzed on the observed level within the traditional approach. In the traditional approach, Bosna and Herzegovina could be characterized as rather an individualistic country (arithmetic mean 5.465). The difference with Slovenia who remained the least individualistic was significant with a moderate effect size (d = −0.424). Even though it was still less significantly individualistic than Albania, the difference showed a much lower effect size (d = −0.359). However, the difference with Macedonia was not statistically significant (d = −0.206). These differences between the effect sizes of invariant and traditional approaches were higher than .50. The similarly high difference was observed also when arithmetic means were calculated from the same 10-item version of the scale (see Supplementary Material). Furthermore, the insignificant difference between Bosna and Herzegovina and Serbia showed opposite results in both approaches. In the invariant approach, Serbia was more individualistic (d = −0.195), whereas, in the traditional approach, Serbia was less individualistic (d = 0.097) than Bosna and Herzegovina.

These findings clearly demonstrate that the presence of differential item functioning (bias in the intercepts) can lead to different conclusions when comparing the same construct between countries. This finding convincingly shows that establishing scalar invariance is an important step in the analysis of this type of cross-cultural data before any actual comparison of mean scores because it improves the credibility of the results and their interpretation. Our results agree with other previous studies conducted in related research fields, which used both real and simulated data and which illustrated that ignoring measurement (non-)invariance might lead to biased results (Chen, 2008; Guenole & Brown, 2014; Hsiao & Lai, 2018; Jeong & Lee, 2019; Oberski, 2014; Schmitt et al., 2011; Steinmetz, 2013; Widaman & Reise, 1997).

Since the comparison of observed means by t-tests or ANOVAs assumes potentially wrong model of data, namely, invariant factor loadings and intercepts between groups when it is often not the case, results from these types of analyses may often not be valid. Comparison of construct level latent means in SEM framework offers more valid approach. Because the comparison of potentially biased observed means without assessment of measurement invariance is still the dominant approach in cross-cultural research, doubts about the validity of cross-cultural differences in diverse psychological phenomena should be raised. The consequences of ignoring measurement noninvariance in comparing results from self-report scales across cultures might be far reaching and may be contributing to the current situation of many replication failures and the inability to adapt the scales to countries beyond the most commonly investigated (WEIRD) countries.

We deliberately illustrated both approaches with a single I/C scale example because I/C as a construct represents one of the ‘flagships’ of cross-cultural research. Pioneers of I/C research have usually ignored measurement invariance testing in the construction and adaptation of I/C scales (which is understandable since the development of psychometric methods for verifying the scalar level of measurement invariance testing is relatively recent; Hoyle, 2012). However, the situation has not improved because researchers have subsequently either used these scales with untested/unknown measurement invariance (and possibly with an excellent Cronbach’s alpha) for comparisons based on observed means or sum-scores, or they were able to establish only the configural invariance through substantial changes of exploratory character and not the scalar level of measurement invariance (Chen & West, 2008; Lacko et al., 2021). Even though the goal of I/C research in the majority of cases lies in the cross-cultural comparisons, many of these comparisons of mean scores are, paradoxically, for the reasons mentioned above, most likely biased and actually incomparable from a psychometric point of view.

Apart from methodological implications, we must emphasize that our results do not provide evidence of validity for the I/C construct, or the IISS scale, for two main reasons. First, we had to omit the Czech participants entirely from the analysis to establish configural invariance, which is the lowest level of measurement invariance. This suggests that the IISS lacks construct equivalence across dissimilar groups of countries, because the I/C appears to manifest differently in Czechia (Central Europe) than in Balkan countries. Second, the final version of the questionnaire used in the ‘invariant’ approach does not represent a validated shortened IISS scale with satisfactory psychometric properties for Balkan countries for the reason that it was established purely through a data-driven exploratory approach regardless of any theoretical rationale. This approach led to removal of majority of IISS items which, however, jeopardizes the content validity and the shortened scale might not reflect the proposed theoretical construct. In addition, even though the invariant approach is statistically sound and often recommended, we cannot say that it provided more valid results than the traditional approach in our specific case due to the unknown validity of the instrument. Hence, we do not interpret the obtained results in terms of differences between countries, and this part of our demonstration should not be understood as a guide for future research. We believe that a confirmatory, theory-driven approach is necessary for the validation of this type of scale and its usefulness in cross-cultural research. Another limitation of demonstration lies in the lack of knowledge of the real (non-)invariance across countries and consequently also in the lack of control and manipulation with its levels, which is understandable, because we used a real dataset instead of simulated ones. The sample size of some groups was also rather small which forced us to fill the negligible amount of empty cells with non-missing values.

For the reason of different results obtained from both approaches, we agree with scholars on the call to abandon the ‘traditional’ approach in favour of emphasizing the importance of implementing measurement invariance testing in cross-cultural research in general (e.g. Boer et al., 2018; Chen, 2008; Jeong & Lee, 2019; Fischer & Karl, 2019; Milfont & Fischer, 2010) as well as in self-report I/C research (e.g. Chen & West, 2008; Cozma, 2011; Lacko et al., 2021; Levine, Bresnahan, Park, Lapinsky, et al., 2003; 2003b; Oyserman et al., 2002; Schimmack et al., 2005). Incorporating measurement invariance testing in I/C research therefore has a great potential to increase the validity and reliability of cross-cultural comparisons with self-report scales and to provide the necessary fundamentals for an examination of the real differences in the levels of psychological phenomena across cultures.

Supplemental Material

sj-pdf-1-ccr-10.1177_10693971211068971 – Supplemental Material for The Necessity of Testing Measurement Invariance in Cross-Cultural Research: Potential Bias in Cross-Cultural Comparisons With Individualism– Collectivism Self-Report Scales

Supplemental Material, sj-pdf-1-ccr-10.1177_10693971211068971 for The Necessity of Testing Measurement Invariance in Cross-Cultural Research: Potential Bias in Cross-Cultural Comparisons With Individualism– Collectivism Self-Report Scales by David Lacko, Jiří Čeněk, Jaroslav Točík, Andreja Avsec, Vladimir Đorđević, Ana Genc, Fatjona Haka, Jelena Šakotić-Kurbalija, Tamara Mohorić, Ibrahim Neziri and Siniša Subotić in Cross-Cultural Research

Footnotes

Acknowledgements

This work was supported by Czech Science Foundation (GA20-01214S: ‘Vzájemná percepce akulturačních preferencí u společenské většiny a přistěhovalců v meziskupinové perspektiv’). The publication fees were co-financed by Mendel University in Brno. We would like to thank Miha Hribernik for translation to Slovenian, and Ivana Didak for translation to Croatian.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Grantová Agentura České Republiky (GA20-01214S).

ORCID iD

David Lacko

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Adair

J. G.

Coělho

A. E. L.

Luna

J. R.

(2002). How international is Psychology? International Journal of Psychology, 37(3), 160–170. https://doi.org/10.1080/00207590143000351

Adair

J. G.

Huynh

C.-L.

(2012). Internationalization of psychological research: Publications and collaborations of the United States and other leading countries. International Perspectives in Psychology: Research, Practice, Consultation, 1(4), 252–267. https://doi.org/10.1037/a0030395

Aiken

L. S.

Stein

J. A.

Bentler

P. M.

(1994). Structural equation analyses of clinical subpopulation differences and comparative treatment outcomes: Characterizing the daily lives of drug addicts. Journal of Consulting and Clinical Psychology, 62(3), 488–499. https://doi.org/10.1037//0022-006x.62.3.488

Anakwah

Horselenberg

Hope

Amankwah‐Poku

Koppen

(2020). Cross‐cultural differences in eyewitness memory reports. Applied Cognitive Psychology, 34(2), 504-515. https://doi.org/10.1002/acp.3637.

Arnett

J. J.

(2008). The neglected 95%: Why American psychology needs to become less American. The American Psychologist, 63(7), 602–614. https://doi.org/10.1037/0003-066X.63.7.602

Asparouhov

Muthén

(2009). Exploratory structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 16(3), 397–438. https://doi.org/10.1080/10705510903008204

Asparouhov

Muthén

(2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495–508. https://doi.org/10.1080/10705511.2014.919210

Bašnáková

Brezina

Masaryk

(2016). Dimensions of culture: The case of Slovakia as an outlier in Hofstede’s research. Cesk. Psychol, 60(1), 13–25.

Batchelor

Miao

Ch.

(2016). Extreme response style: A meta-analysis. Journal of Organizational Psychology, 16(2), 51–62.

10.

Baumgartner

Weijters

(2015). Response biases in cross-cultural measurement. In Ng

Lee

A. Y.

(Eds.), Frontiers of culture and psychology. Handbook of culture and consumer behavior (pp. 150–180). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199388516.003.0008

11.

Bauserman

(1997). International representation in the psychological literature. International Journal of Psychology, 32(2), 107–112. https://doi.org/10.1080/002075997400908

12.

Benavides

Hur

(2020). Self-construal differences in chile and south korea: A brief report. Psychological Reports, 123(6), 2410–2417. https://doi.org/10.1177/0033294119868786.

13.

Berry

Poortinga

Breugelmans

Chasiotis

Sam

(2011). Cross-cultural psychology: Research and applications (3nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511974274

14.

Boehnke

Lietz

Schreier

Wilhelm

(2011). Sampling: The selection of cases for culturally comparative psychological research. In Matsumoto

Van de Vijver

F. J. R.

(Eds.), Cross-cultural research methods in psychology (pp. 101–129). Cambridge University Press.

15.

Boer

Hanke

(2018). On detecting systematic measurement error in cross-cultural research: A review and critical reflection on equivalence and invariance tests. Journal of Cross-Cultural Psychology, 49(5), 713–734. https://doi.org/10.1177/0022022117749042

16.

Bollen

K. A.

(1989). Structural equations with latent variables. John Wiley & Sons.

17.

Brady

L. M.

Fryberg

S. A.

Shoda

(2018). Expanding the interpretive power of psychological science by attending to culture. Proceedings of the National Academy of Sciences, 115(45), 11406–11413. https://doi.org/10.1073/pnas.1803526115

18.

Brandmaier

A. M.

von Oertzen

McArdle

J. J.

Lindenberger

(2013). Structural equation model trees. Psychological Methods, 18(1), 71–86. https://doi.org/10.1037/a0030001

19.

Breitsohl

(2019). Beyond ANOVA: An introduction to structural equation models for experimental designs. Organizational Research Methods, 22(3), 649–677. https://doi.org/10.1177/1094428118754988

20.

Bresnahan

M. J.

Levine

T. R.

Shearman

S. M.

Lee

S. Y.

Park

C.-Y.

Kiyomiya

(2005). A multimethod multitrait validity assessment of self-construal in Japan, Korea, and the United States. Human Communication Research, 31(1), 33–59. https://doi.org/10.1111/j.1468-2958.2005.tb00864.x

21.

Brewer

M. B.

Chen

Y. R.

(2007). Where (who) are collectives in collectivism? Toward conceptual clarification of individualism and collectivism. Psychological Review, 114(1), 133–151. https://doi.org/10.1037/0033-295X.114.1.133

22.

Brislin

R. W.

(1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 187–216. https://doi.org/10.1177/135910457000100301

23.

Brown

T. A.

(2015). Confirmatory factor analysis for applied research (2nd ed.). The Guilford Press.

24.

Buil

de Chernatony

Martínez

(2012). Methodological issues in cross-cultural research: An overview and recommendations. Journal of Targeting, Measurement and Analysis for Marketing, 20(3-4), 223–234. https://doi.org/10.1057/jt.2012.18

25.

Burton

Delvecchio

Germani

Mazzeschi

(2019). Individualism/collectivism and personality in italian and american groups. Current Psychology: A Journal for Diverse Perspectives on Diverse Psychological Issues, 40, 29–34. https://doi.org/10.1007/s12144-019-00584-4

26.

Byrne

B. M.

(2010). Structural equation modeling with amos: Basic concepts, applications and programming. Routledge/Taylor & Francis Group.

27.

Byrne

B. M.

Oakland

Leong

F. T. L.

van de Vijver

F. J. R.

Hambleton

R. K.

Cheung

F. M.

Bartram

(2009). A critical analysis of cross-cultural research and testing practices: Implications for improved education and training in psychology. Training and Education in Professional Psychology, 3(2), 94–105. https://doi.org/10.1037/a0014516

28.

Byrne

B. M.

Shavelson

R. J.

Muthén

(1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456–466. https://doi.org/10.1037/0033-2909.105.3.456

29.

Byrne

B. M.

van de Vijver

(2017). The maximum likelihood alignment approach to testing for approximate measurement invariance: A paradigmatic cross-cultural application. Psicothema, 29(4), 539–551. https://doi.org/10.7334/psicothema2017.178

30.

Campbell

D.T.

(1986). Science’s social system of validity-enhancing collective believe change and the problems of the social sciences. In Fiske

D. W.

Shweder

R. A.

(Eds.), Metatheory in social science (pp. 108–113). University of Chicago Press.

31.

Čeněk

(2015). Cultural dimension of individualism and collectivism and its perceptual and cognitive correlates in cross-cultural research. The Journal of Education, Culture and Society, 2015(2). 210–225.

32.

Čeněk

Urbánek

(2019). Adaptace a ekvivalence testových metod: Inspirace pro psychologické testování minorit v ČR. [The adaptation and equivalence of test methods: An inspiration for psychological assessment of minorities in the Czech Republic]. Czechoslovak Psychology, 63(1), 42–54.

33.

Chen

F. F.

(2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464–504. https://doi.org/10.1080/10705510701301834

34.

Chen

F. F.

(2008). What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95(5), 1005–1018. https://doi.org/10.1037/a0013193

35.

Chen

C.-C.

Lau

J. M.

Richardson

G. B.

Dai

C.-L.

(2020). Measurement invariance testing in counseling. Journal of Professional Counseling: Practice, Theory & Research, 47(2), 89–104. https://doi.org/10.1080/15566382.2020.1795806

36.

Chen

F. F.

West

S. G.

(2008). Measuring individualism and collectivism: The importance of considering differential components, reference groups, and measurement invariance. Journal of Research in Personality, 42(2), 259–294. https://doi.org/10.1016/j.jrp.2007.05.006

37.

Cheon

B. K.

Melani

Hong

(2020). How USA-centric is psychology? An archival study of implicit assumptions of generalizability of findings to human nature based on origins of study samples. Social Psychological and Personality Science, 11(7), 928–937. https://doi.org/10.1177/1948550620927269

38.

Cheung

F. M.

(2012). Mainstreaming culture in psychology. American Psychologist, 67(8), 721–730. https://doi.org/10.1037/a0029876

39.

Cheung

G. W.

Rensvold

R. B.

(2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5

40.

Cohen

A. B.

(2009). Many forms of culture. American Psychologist, 64(3), 194–204. https://doi.org/10.1037/a0015308

41.

Cohen

Kitayama

(2019). Handbook of cultural psychology (2nd ed.). Guilford Publications.

42.

Cole

D. A.

Maxwell

S. E.

Arvey

Salas

(1993). Multivariate group comparisons of variable systems: MANOVA and structural equation modeling. Psychological Bulletin, 114(1), 174–184. https://doi.org/10.1037/0033-2909.114.1.174

43.

Cozma

(2011). How are individualism and collectivism measured? Romanian Journal of Applied Psychology, 13(1), 11–17.

44.

Davidov

Dülmer

Cieciuch

Kuntz

Seddig

Schmidt

(2016). Explaining measurement nonequivalence using multilevel structural equation modeling. Sociological Methods & Research, 47(4), 729–760. https://doi.org/10.1177/0049124116672678

45.

Davidov

Meuleman

Cieciuch

Schmidt

Billiet

(2014). Measurement equivalence in cross-national research. Annual Review of Sociology, 40(1), 55–75. https://doi.org/10.1146/annurev-soc-071913-043137

46.

De Barona

M. S.

(1993). The availability of ethnic materials in psychology journals: A review of 20 years of journal publication. Contemporary Educational Psychology, 18(4), 391–400. https://doi.org/10.1006/ceps.1993.1029

47.

De Roover

Timmerman

M. E.

De Leersnyder

Mesquita

Ceulemans

(2014). What’s hampering measurement invariance: Detecting non-invariant items using clusterwise simultaneous component analysis. Frontiers in Psychology, 5(604), 1–14. https://doi.org/10.3389/fpsyg.2014.00604

48.

De Roover

Vermunt

J. K.

Ceulemans

(2020). Mixture multigroup factor analysis for unraveling factor loading noninvariance across many groups. Psychological Methods, . https://doi.org/10.1037/met0000355.

49.

De Roover

Vermut

J. K.

(2019). On the exploratory road to unraveling factor loading non-invariance: A new multigroup rotation approach. Structural Equation Modeling: A Multidisciplinary Journal, 26(6), 905–923. https://doi.org/10.1080/10705511.2019.1590778

50.

Dixon

D. J.

(2007). The effects of language priming on independent and interdependent self-construal among Chinese university students currently studying English. Current Research in Social Psychology, 13(1), 1–9.

51.

Dong

Dumas

(2020). Are personality measures valid for different populations? A systematic review of measurement invariance across cultures, gender, and age. Personality and Individual Differences, 160(1), 109956. https://doi.org/10.1016/j.paid.2020.109956

52.

Dumetz

Gáboríková

(2017). The Czech and Slovak Republics: A cross cultural comparison. Marketing Science and Inspirations, 11(4), 2–13.

53.

Finney

S. J.

DiStefano

(2006). Non-normal and categorical data in structural equation modeling. In Hancock

Mueller

(Eds.), Structural equation modeling: A second course (pp. 269–314). Information Age Publishing.

54.

Fischer

(2004). Standardization to account for cross-cultural response bias: A classification of score adjustment procedures and review of research in JCCP. Journal of Cross-Cultural Psychology, 35(3), 263–282. https://doi.org/10.1177/0022022104264122

55.

Fischer

Karl

J. A.

(2019). A primer to (cross-cultural) multi-group invariance testing possibilities in R. Frontiers in Psychology, 10, 1507. https://doi.org/10.3389/fpsyg.2019.01507

56.

Fischer

Karl

J. A.

Fontaine

Poortinga

(2021, June). 16). Evidence of validity does NOT rule out systematic bias: A commentary on nomological noise and cross-cultural invariance. PsyArXiv. https://doi.org/10.31234/osf.io/k9wbj.

57.

Fischer

Milfont

(2010). Standardization in psychological research. International Journal of Psychological Research, 3(1), 88–96. https://doi.org/10.21500/20112084.852

58.

Fischer

Poortinga

Y. H.

(2018). Addressing methodological challenges in culture-comparative research. Journal of Cross-Cultural Psychology, 49(5), 691–712. https://doi.org/10.1177/0022022117738086

59.

Fitzgerald

Widdop

Gray

Collins

(2011). Identifying sources of error in crossnational questionnaires: Application of an error source typology to cognitive interview data. Journal of Official Statistics, 27(4), 569–599.

60.

Galang

C. M.

Johnson

Obhi

S. S.

(2021). Exploring the relationship between empathy, self-construal style, and self-reported social distancing tendencies during the COVID-19 pandemic. Frontiers in Psychology, 12, 588934. https://doi.org/10.3389/fpsyg.2021.588934

61.

Gomez

Taylor

K. A.

(2018). Cultural differences in conflict resolution strategies: A US–Mexico comparison. International Journal of Cross Cultural Management, 18(1), 33–51. https://doi.org/10.1177/1470595817747638

62.

Graham

(1992). “Most of the subjects were White and middle class": Trends in published research on African Americans in selected APA journals, 1970–1989. American Psychologist, 47(5), 629–639. https://doi.org/10.1037/0003-066X.47.5.629

63.

Groskurth

Bluemke

Lechner

C. M.

(2021). September 21). Are my measures comparable across groups? A unified item- and scale-score-level approach to quantifying scalar non-invariance bias. PsyArXiv . https://doi.org/10.31234/osf.io/fbshu.

64.

Guenole

Brown

(2014). The consequences of ignoring measurement invariance for path coefficients in structural equation models. Frontiers in Psychology, 5, 980. https://doi.org/10.3389/fpsyg.2014.00980

65.

Guthrie

R. V

. (1976). Even the rat was white: A historical view of psychology. Harper & Row.

66.

Hancock

G. R.

(1997). Structural equation modeling methods of hypothesis testing of latent variable means. Measurement and Evaluation in Counseling and Development, 30(2), 91-105. https://doi.org/10.1080/07481756.1997.12068926.

67.

Hancock

G. R.

(2001). Effect size, power, and sample size determination for structured means modeling and mimic approaches to between-groups hypothesis testing of means on a single latent construct. Psychometrika, 66(3), 373–388. https://doi.org/10.1007/BF02294440

68.

Hancock

G. R.

Lawrence

F. R.

Nevitt

(2000). Type I error and power of latent mean methods and MANOVA in factorially invariant and noninvariant latent variable systems. Structural Equation Modeling: A Multidisciplinary Journal, 7(4), 534–556. doi:10.1207/s15328007sem0704_2

69.

Hardin

E. E.

Robitschek

Flores

L. Y.

Navarro

R. L.

Ashton

M. W.

(2014). The cultural lens approach to evaluating cultural validity of psychological theory. American Psychologist, 69(7), 656–668. https://doi.org/10.1037/a0036532

70.

Harkness

J. A.

Villar

Edwards

(2010). Translation, adaptation, and design. In Harkness

J. A.

Braun

Edwards

Johnson

T. P.

Lyberg

Mohler

P. P.

Pennell

B.-E.

Smith

T. W.

(Eds.), Wiley series in survey methodology. Survey methods in multinational, multiregional, and multicultural contexts (pp. 117–140). John Wiley & Sons. https://doi.org/10.1002/9780470609927.ch7

71.

Harzing

A.-W.

(2006). Response styles in cross-national survey research. International Journal of Cross Cultural Management, 6(2), 243–266. https://doi.org/10.1177/1470595806066332

72.

Heine

S. J.

Lehman

D. R.

Peng

Greenholtz

(2002). What’s wrong with cross-cultural comparisons of subjective Likert scales? The reference-group effect. Journal of Personality and Social Psychology, 82(6), 903–918. https://doi.org/10.1037/0022-3514.82.6.903

73.

Henrich

Heine

S. J.

Norenzayan

(2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61–135. https://doi.org/10.1017/S0140525X0999152X

74.

Van de Vijver

(2012). Bias and equivalence in cross-cultural research. Online Readings in Psychology and Culture, 2(2). 1–19. https://doi.org/10.9707/2307-0919.1111

75.

Van de Vijver

(2015). Effects of a general response style on cross-cultural comparisons: Evidence from the teaching and learning international survey. Public Opinion Quarterly, 79(S1), 267–290. https://doi.org/10.1093/poq/nfv006

76.

Hofstede

G. J.

Minkov

(2010). Cultures and organizations: Software of the mind (3rd ed.). McGraf-Hill.

77.

Horn

J. L.

McArdle

J. J.

(1992). A Practical guide to measurement invariance in research on aging. Experimental Aging Research, 18(3-4), 117–144. https://doi.org/10.1080/03610739208253916

78.

Hoyle

R. H.

(2012). Handbook of structural equation modeling. The Guilford Press.

79.

Hsiao

Y. Y.

Lai

(2018). The impact of partial measurement invariance on testing moderation for single and multi-level data. Frontiers in Psychology, 9, 740. https://doi.org/10.3389/fpsyg.2018.00740

80.

L.-T.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

81.

Hui

C. H.

Triandis

H. C.

(1985). Measurement in cross-cultural psychology. Journal of Cross-Cultural Psychology, 16(2), 131–152. https://doi.org/10.1177/0022002185016002001

82.

Hussey

Hughes

(2020). Hidden invalidity among 15 commonly used measures in social and personality psychology. Advances in Methods and Practices in Psychological Science, 3(2), 166–184. https://doi.org/10.1177/2515245919882903

83.

Iwamasa

G. Y.

Smith

S. K.

(1996). Ethnic diversity in behavioral psychology. Behavior Modification, 20(1), 45–59. https://doi.org/10.1177/01454455960201002

84.

Jahoda

Krewer

(1997). History of cross-cultural and cultural psychology. In Berry

J. W.

Poortinga

Y. H.

Pandey

(Eds.), Handbook of cross-cultural psychology (2nd ed.) (pp. 1–42). Allyn & Bacon.

85.

Jeong

Lee

(2019). Consequences of not conducting measurement invariance tests in cross-cultural studies: A review of current research practices and recommendations. Advances in Developing Human Resources, 21(4), 466–483. https://doi.org/10.1177/1523422319870726

86.

Johnson

T. P.

(1998). Approaches to equivalence in cross-cultural and cross-national survey research. In Harkness

(Ed.), Cross-cultural survey equivalence (pp. 1–40). Zentrum für Umfragen, Methoden und Analysen - ZUMA. https://nbn-resolving.org/urn:nbn:de:0168-ssoar-49730-6

87.

Johnson

T. P.

Van de Vijver

F. J. R.

(2003). Social desirability in cross-cultural research. In Harkness

J. A.

Van de Vijver

F. J. R.

Mohler

P. P.

(Eds.), Cross-cultural survey methods (pp. 193–202). John Wiley.

88.

Jorgensen

T. D.

Pornprasertmanit

Schoemann

A. M.

Rosseel

(2018). semTools: Useful tools for structural equation modeling. R package version 0.5-3. https://CRAN.R-project.org/package=semTools

89.

Kashima

(2015). Culture and psychology in the 21st century. Journal of Cross-Cultural Psychology, 47(1), 4–20. https://doi.org/10.1177/0022022115599445

90.

Keith

K. D.

(2013). Encyclopedia of cross-cultural psychology. Wiley-Blackwell.

91.

Kim

E. S.

Yoon

(2011). Testing Measurement invariance: A comparison of multiple-group categorical CFA and IRT. Structural Equation Modeling: A Multidisciplinary Journal, 18(2), 212–228. https://doi.org/10.1080/10705511.2011.557337

92.

Kim

E. S.

Yoon

Lee

(2011). Testing measurement invariance using MIMIC. Educational and Psychological Measurement, 72(3), 469–492. https://doi.org/10.1177/0013164411427395

93.

Kline

R. B.

(2016). Principles and practise of structural equation modeling (4th edn.). The Guilford Press.

94.

Klineberg

(1980). Historical perspectives: Cross-cultural psychology before 1960. In Triandis

H. C.

Lambert

W. W.

(Eds.). Handbook of cross-cultural psychology (pp. 1–14). Allyn & Bacon.

95.

Kolman

Noorderhaven

N. G.

Hofstede

Dienes

(2003). Cross‐cultural differences in central Europe. Journal of Managerial Psychology, 18(1), 76-88. https://doi.org/10.1108/02683940310459600.

96.

Krys

Zelenski

J. M.

Capaldi

C. A.

Park

Tilburg

Osch

Uchida

(2019). Putting the “we” into well‐being: Using collectivism‐themed measures of well‐being attenuates well‐being’s association with individualism. Asian Journal of Social Psychology, 22(3), 256–267. https://doi.org/10.1111/ajsp.12364

97.

Lacko

Čeněk

Urbánek

(2021). Psychometric properties of the independent and interdependent self-construal questionnaire: Evidence from the Czech Republic. Frontiers in Psychology, 12, 564011. https://doi.org/10.3389/fpsyg.2021.564011

98.

Lacko

Šašinka

Č.

Čeněk

Stachoň

(2020). Cross-cultural differences in cognitive style, individualism/collectivism and map reading between central european and east Asian University Students. Studia Psychologica, 62(1), 23–42. https://doi.org/10.31577/sp.2020.01.789

99.

Larson

R. B.

(2018). Controlling social desirability bias. International Journal of Market Research, 61(5), 534–547. https://doi.org/10.1177/1470785318805305

100.

Leong

Leung

Cheung

(2010). Integrating cross-cultural psychology research methods into ethnic minority psychology. Cultural Diversity & Ethnic Minority Psychology, 16(4), 590–597. https://doi.org/10.1037/a0020127

101.

Leong

Pickren

Tang

(2012). A history of cross-cultural clinical psychology, and its importance to mental health today. In Chang

Downey

Ch.

(Eds.). Handbook of race and development in mental health (pp. 11–26). Springer.

102.

Levine

T. R.

Bresnahan

M. J.

Park

H. S.

Lapinski

M. K.

Lee

T. S.

Lee

D. W.

(2003b). The (In)validity of self-construal scales revisited. Human Communication Research, 29(2), 291–308. https://doi.org/10.1111/j.1468-2958.2003.tb00840.x

103.

Levine

T. R.

Bresnahan

M. J.

Park

H. S.

Lapinsky

M. K.

Wittenbaum

G. M.

Shearman

S. M.

Lee

S. Y.

Chung

Ohashi

(2003a). Self-construal scales lack validity. Human Communication Research, 29(2), 210–252. https://doi.org/10.1111/j.1468-2958.2003.tb00837.x

104.

Levine

T. R.

Park

H. S.

Kim

R. K.

(2007). Some conceptual and theoretical challenges for cross-cultural communication research in the 21st century. Journal of Intercultural Communication Research, 36(3), 205–221. https://doi.org/10.1080/17475750701737140

105.

Lonner

(2013). Chronological benchmarks in cross-cultural psychology. Foreword to the encyclopedia of cross-cultural psychology. Online Readings in Psychology and Culture, 1(2), 1–13. https://doi.org/10.9707/2307-0919.1124

106.

Loo

Fong

K. T.

Iwamasa

(1988). Ethnicity and cultural diversity: An analysis of work published in community psychology journals, 1965–1985. Journal of Community Psychology, 16(3), 332–349. https://doi.org/10.1002/1520-6629(198807)16:3<332:aid-jcop2290160308>3.0.co;2-8

107.

Gilmour

(2007). Developing a new measure of independent and interdependent views of the self. Journal of Research in Personality, 41(1), 249–257. https://doi.org/10.1016/j.jrp.2006.09.005

108.

Markus

Kitayama

(1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98(2), 224–253. https://doi.org/10.1037/0033-295X.98.2.224

109.

Marquez

R. C.

Ellwanger

(2014). Independent and Interdependent self-construals do not predict analytic or holistic reasoning. Psychological Reports, 115(1), 326–338. https://doi.org/10.2466/17.07.PR0.115c16z8

110.

Marsh

H. W.

Guo

Parker

P. D.

Nagengast

Asparouhov

Muthén

Dicke

(2018). What to do when scalar invariance fails: The extended alignment method for multi-group factor analysis comparison of latent means across many groups. Psychological Methods, 23(3), 524–545. https://doi.org/10.1037/met0000113

111.

Matsumoto

(1999). Culture and self: An empirical assessment of Markus and Kitayama’s theory of independent and interdependednt self-construal. Asian Journal of Social Psychology, 2(3), 289-310. https://doi.org/10.1111/1467-839X.00042.

112.

Matsumoto

(2001). Cross-cultural psychology in the 21st century. In Halonen

J. S.

Davis

S. F.

(Eds.), The many faces of psychological research in the 21st century society for the teaching of psychology (pp. 98–115). Society for the Teaching of Psychology.

113.

Matsumoto

Hwang

H. C.

(2019). The handbook of culture and psychology (2nd ed.). Oxford University Press.

114.

Matsumoto

Juang

(2012). Culture & psychology (5th ed.). Wadsworth Publishing.

115.

Matsumoto

Yoo

S. H.

(2006). Toward a new generation of cross-cultural research. Perspectives on Psychological Science, 1(3), 234–250. https://doi.org/10.1111/j.1745-6916.2006.00014.x

116.

May

R. M.

(1997). The scientific wealth of nations. Science, 275(5301), 793. https://doi.org/10.1126/science.275.5301.793

117.

Meredith

(1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/bf02294825

118.

Milfont

T. L.

Fischer

(2010). Testing measurement invariance across groups: Applications in cross-cultural research. International Journal of Psychological Research, 3(1), 111–130. https://doi.org/10.21500/20112084.857

119.

Millsap

R. E.

(2011). Statistical approaches to measurement invariance. Routledge/Taylor & Francis Group.

120.

Millsap

R. E.

Meredith

(2007). Factorial invariance: Historical perspectives and new problems. In Cudeck

MacCallum

R. C.

(Eds.), Factor analysis at 100: Historical developments and future directions (pp. 131–152). Lawrence Erlbaum Associates Publishers.

121.

Morren

Gelissen

J. P. T. M.

Vermunt

J. K.

(2011). Dealing with extreme response style in cross-cultural research: A restricted latent class factor analysis approach. Sociological Methodology, 41(1), 13–47. https://doi.org/10.1111/j.1467-9531.2011.01238.x

122.

Muthén

Asparouhov

(2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17(3), 313–335. https://doi.org/10.1037/a0026802

123.

Nielsen

Haun

Kärtner

Legare

C. H.

(2017). The persistent sampling bias in developmental psychology: A call to action. Journal of Experimental Child Psychology, 162, 31–38. https://doi.org/10.1016/j.jecp.2017.04.017

124.

Nisbett

R. E.

(2003). The geography of thought: How Asians and Westerners think differently ... and why. Free Press.

125.

Nye

C. D.

Bradburn

Olenick

Bialko

Drasgow

(2019). How big are my effects? Examining the magnitude of effect sizes in studies of measurement equivalence. Organizational Research Methods, 22(3), 678–709. https://doi.org/10.1177/1094428118761122

126.

Oberski

D. L.

(2014). Evaluating sensitivity of parameters of interest to measurement invariance in latent variable models. Political Analysis, 22(01), 45–60. https://doi.org/10.1093/pan/mpt014

127.

Oyserman

Coon

Kemmelmeier

(2002). Rethinking individualism and collectivism: Evaluation of theoretical assumptions and meta-analyses. Psychological Bulletin, 128(1), 3–72. https://doi.org/10.1037/0033-2909.128.1.3

128.

Oyserman

Lee

(2008). Does culture influence what and how we think? Effects of priming individualism and collectivism. Psychological Bulletin, 134(2), 311–342. https://doi.org/10.1037/0033-2909.134.2.311

129.

Paulhus

D.L.

(1991). Measurement and control of response bias. In Robinson

J. P.

Shaver

P.R.

Wrightsman

L.S.

(Eds.), Measures of personality and social psychological attitudes (pp. 17–59). Academic Press.

130.

Peterson

Rhi-Perez

Albaum

(2014). A cross-national comparison of extreme response style measures. International Journal of Market Research, 56(1), 89–110. https://doi.org/10.2501/ijmr-2014-005

131.

Ponterotto

J. G.

(1988). Racial/ethnic minority research in the Journal of Counseling Psychology: A content analysis and methodological critique. Journal of Counseling Psychology, 35(4), 410–418. https://doi.org/10.1037/0022-0167.35.4.410

132.

Poortinga

Y. H.

(1989). Equivalence of cross-cultural data: An overview of basic issues. International Journal of Psychology, 24(6), 737–756. https://doi.org/10.1080/00207598908247842

133.

Putnick

D. L.

Bornstein

M. H.

(2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41, 71–90. https://doi.org/10.1016/j.dr.2016.06.004

134.

R Core Team . (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/.

135.

Rad

M. S.

Martingano

A. J.

Ginges

(2018). Toward a psychology of Homo sapiens: Making psychological science more representative of the human population. Proceedings of the National Academy of Sciences of the United States of America, 115(45), 11401–11405. https://doi.org/10.1073/pnas.1721165115

136.

Richards

(2012). 'Race’, racism and psychology: Towards a reflexive history (2nd ed.). Routledge/Taylor & Francis Group.

137.

Rosseel

(2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. http://www.jstatsoft.org/v48/i02/

138.

Rutkowski

Svetina

(2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74(1), 31–57. https://doi.org/10.1177%2F0013164413498257

139.

Rutkowski

Svetina

Liaw

Y-L.

(2019). Collapsing categorical variables and measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 26(5), 790–802. https://doi.org/10.1080/10705511.2018.1547640

140.

Saris

W. E.

Satorra

van der Veld

W. M.

(2009). Testing structural equation models or detection of misspecifications? Structural Equation Modeling: A Multidisciplinary Journal, 16(4), 561–582. doi:10.1080/10705510903203433

141.

Satorra

(2000). Scaled and adjusted restricted tests in multi-sample analysis of moment structures. In Heijmans

R. D. H.

Pollock

D. S. G.

Satorra

(Eds.), Innovations in multivariate statistical analysis. A Festschrift for Heinz Neudecker (pp. 233–247). Kluwer Academic Publishers.

142.

Savaleo

(2011). What to do about zero frequency cells when estimating polychoric correlations. Structural Equation Modeling: A Multidisciplinary Journal, 18(2), 253–273. 10.1080/10705511.2011.557339

143.

Schimmack

Oishi

Diener

(2005). Individualism: A valid and important dimension of cultural differences between nations. Personality and Social Psychology Review, 9(1), 17–31. https://doi.org/10.1207/s15327957pspr0901_2

144.

Schmitt

Golubovich

Leong

F. T.

(2011). Impact of measurement invariance on construct correlations, mean differences, and relations with external correlates: An illustrative example using Big Five and RIASEC measures. Assessment, 18(4), 412–427. https://doi.org/10.1177/1073191110373223

145.

Shiraev

E. B.

Levy

D. A.

(2020). Cross-cultural psychology critical thinking and contemporary applications (7th ed.). Routledge.

146.

Singelis

(1994). The measurement of independent and interdependent self-construal. Personality & Social Psychology Bulletin, 20(5), 580–591. https://doi.org/10.1177/0146167294205014

147.

Singelis

Triandis

Bhawuk

Gelfand

(1995). Horizontal and vertical dimensions of individualism and collectivism: A theoretical and measurement refinement. Cross-Cultural Research, 29(3), 240–275. https://doi.org/10.1177/106939719502900302

148.

Siu

E. S.

(2013). Cultural contingency in the cognitive model of entrepreneurial intention. Entrepreneurship Theory and Practice, 37(2), 147–173. https://doi.org/10.1111/j.1540-6520.2011.00462.x

149.

Smith

P. B.

(2004). Acquiescent response bias as an aspect of cultural communication style. Journal of Cross-Cultural Psychology, 35(1), 50–61. https://doi.org/10.1177/0022022103260380

150.

Sörbom

(1974). A general method for studying differences in factor means and factor structure between groups. British Journal of Mathematical and Statistical Psychology, 27, 229–239. https://doi.org/10.1111/j.2044-8317.1974.tb00543

151.

Steenkamp

J.-B. E. M.

Baumgartner

(1998). Assessing measurement invariance in cross-national consumer research. Journal of Consumer Research, 25(1), 78–90. https://doi.org/10.1086/209528

152.

Steinmetz

(2011). Estimation and comparison of latent means across cultures. In Davidov

Schmidt

Billiet

(Eds.), European association for methodology series. Cross-cultural analysis: Methods and applications (pp. 85–116). Routledge/Taylor & Francis Group.

153.

Steinmetz

(2013). Analyzing observed composite differences across groups: Is partial measurement invariance enough? Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(1), 1–12. https://doi.org/10.1027/1614-2241/a000049

154.

Sternberg

R. J.

(2014). The development of adaptive competence: Why cultural psychology is necessary and not just nice. Developmental Review, 34(3), 208–224. https://doi.org/10.1016/j.dr.2014.05.004

155.

Stevanovic

Jafari

Knez

Franic

Atilola

Davidovic

Bagheri

Lakic

(2017). Can we really use available scales for child and adolescent psychopathology across cultures? A systematic review of cross-cultural measurement invariance data. Transcultural Psychiatry, 54(1), 125–152. https://doi.org/10.1177/1363461516689215

156.

Sue

(1999). Science, ethnicity, and bias: Where have we gone wrong? American Psychologist, 54(12), 1070–1077. https://doi.org/10.1037/0003-066x.54.12.1070

157.

Svetina

Rutkowski

(2019). Multiple-group invariance with categorical outcomes using updated guidelines: An illustration using mplus and the lavaan/semtools packages. Structural Equation Modeling: A Multidisciplinary, 27(1), 111–130. https://doi.org/10.1080/10705511.2019.1602776

158.

Takano

Osaka

(1999). An unsupported common view: Comparing Japan and the U.S. on individualism/collectivism. Asian Journal of Social Psychology, 2(3), 311–341. https://doi.org/10.1111/1467-839X.00043

159.

Takano

Osaka

(2018). Comparing Japan and the United States on individualism/collectivism: A follow-up review. Asian Journal of Social Psychology, 21(4), 301–316. https://doi.org/10.1111/ajsp.12322

160.

The Council of the International Test Commission (2018). ITC Guidelines for Translating and Adapting Tests (Second Edition), International Journal of Testing, 18(2), 101–134. https://doi.org/10.1080/15305058.2017.1398166

161.

Thomas

Sillen

(1972). Racism and psychiatry. Brunner/Mazel.

162.

Thompson

M. S.

Green

S. B.

(2013). Evaluating between-group differences in latent variable means. In Hancock

G. R.

Mueller

R. O.

(Eds.), Quantitative methods in education and the behavioral sciences: Issues, research, and teaching. Structural equation modeling: A second course (pp. 163–218). IAP Information Age Publishing.

163.

Thurstone

L. L.

(1947). Multiple factor analysis. University of Chicago Press.

164.

Triandis

H. C.

(2007). Culture and psychology: A history of the study of their relationship. In Kitayama

Cohen

(Eds.), Handbook of cultural psychology (pp. 59–76). The Guilford Press.

165.

Triandis

Gelfand

(1998). Converging measurement of horizontal and vertical individualism and collectivism. Journal of Personality and Social Psychology, 74(1), 118–128. https://doi.org/10.1037/0022-3514.74.1.118

166.

Uskul

A. K.

Oyserman

(2006). Question comprehension and response: Implications of individualism and collectivism. In Chen

Y.-R.

(Ed.), National Culture and Groups: Research on Managing Groups and Teams (Vol. 9, pp. 173–201). Emerald Publishing. https://doi.org/10.1016/S1534-0856(06)09008-6

167.

Van de Vijver

(1998). Towards a theory of bias and equivalence. In Harkness

(Ed.), Cross-cultural survey equivalence (pp. 41–65). Zentrum für Umfragen, Methoden und Analysen -ZUMA-. https://nbnresolving.org/urn:nbn:de:0168-ssoar-49731-1

168.

Van de Vijver

F. J. R.

Leung

(2000). Methodological issues in psychological research on culture. Journal of Cross-Cultural Psychology, 31(1), 33–51. https://doi.org/10.1177/0022022100031001004

169.

Van de Vijver

F. J. R.

Leung

(2011). Equivalence and bias: A review of concepts, models, and data analytic procedures. In Matsumoto

van de Vijver

F. J. R.

(Eds.), Crosscultural research methods in psychology (pp. 17–45). Cambridge University Press.

170.

Van de Vijver

Tanzer

N. K.

(2004). Bias and equivalence in cross-cultural assessment: an overview. Revue Européenne de Psychologie Appliquée/European Review of Applied Psychology, 54(2), 119–135. https://doi.org/10.1016/j.erap.2003.12.004

171.

Van Vaerenbergh

Thomas

T. D.

(2012). Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25(2), 195–217. https://doi.org/10.1093/ijpor/eds021

172.

Vandenberg

R. J.

Lance

C. E.

(2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–69. https://doi.org/10.1177/109442810031002

173.

Viladrich

Angulo-Brunet

Doval

(2017). A journey around alpha and omega to estimate internal consistency reliability. Annals of Psychology, 33(3), 755–782. https://doi.org/10.6018/analesps.33.3.268401

174.

Voronov

Singer

J. A.

(2002). The myth of individualism-collectivism: A critical review. The Journal of Social Psychology, 142(4), 461–480. https://doi.org/10.1080/00224540209603912

175.

Wang

(2016). Why should we all be cultural psychologists? Lessons from the study of social cognition. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 11(5), 583–596. https://doi.org/10.1177/1745691616645552

176.

Welkenhuysen-Gybels

Billiet

Cambré

(2003). Adjustment for acquiescence in the assessment of the construct equivalence of Likert type score items. Journal of Cross-Cultural Psychology, 34(6), 702–722. https://doi.org/10.1177/0022022103257070

177.

Welzel

Brunkert

Kruse

Inglehart

R. F.

(2021). Non-invariance? An overstated problem with misconceived causes. Sociological Methods & Research. https://doi.org/10.1177/0049124121995521.

178.

Werner

Campbell

D.T.

, (1970). Translating, working through interpreters, and the problem of decentering. In Naroll

Cohen

(Eds.), A handbook of cultural anthropology (pp. 398–419). American Museum of Natural History.

179.

Whittaker

T. A.

(2013). The impact of noninvariant intercepts in latent means models. Structural Equation Modeling: A Multidisciplinary Journal, 20(1), 108–130. doi:10.1080/10705511.2013.742397

180.

Widaman

K. F.

Reise

S. P.

(1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. In Bryant

K. J.

Windle

West

S. G.

(Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 281–324). American Psychological Association

181.

Estabrook

(2016). Identification of confirmatory factor analysis models of different levels of invariance for ordered categorical outcomes. Psychometrika, 81(4), 1014–1045. https://doi.org/10.1007/s11336-016-9506-0

182.

Zhang

Wang

(2020). Validity of three IRT models for measuring and controlling extreme and midpoint response styles. Frontiers in Psychology, 11, 271. https://doi.org/10.3389/fpsyg.2020.00271

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.57 MB