Abstract
Several studies that measured basic human values across countries with the Portrait Values Questionnaire (PVQ-21) reported violations of measurement invariance. Such violations may hinder meaningful cross-cultural research on human values because value scores may not be comparable. Schwartz et al. proposed a refined value theory and a new instrument (PVQ-5X) to measure 19 more narrowly defined values. We tested the measurement invariance of this instrument across eight countries. Configural and metric invariance were established for all values across almost all countries. Scalar invariance was supported across nearly all countries for 10 values. The analyses revealed that the cross-country invariance properties of the values measured with the PVQ-5X are substantially better than those measured with the PVQ-21.
Schwartz (1992, 1994) defined values as broad, trans-situational goals that vary in importance and serve as guiding principles in the life of a person or a group. The central proposition of the Schwartz value theory is that values form a circular motivational continuum. Values located in adjacent regions (wedges) of the circle have a similar motivational content (e.g., conformity and tradition). Hence, any behavior that promotes, maintains, or defends one value (e.g., following family customs) is likely to serve the adjacent values at the same time. Values located in opposing wedges of the circle express conflicting motivations (e.g., security and stimulation). Hence, any behavior that serves one (e.g., bungee jumping, serving stimulation) is likely simultaneously to come at the expense of the opposing value (security). The assumption that values form a continuum implies that the circle of values can be partitioned for scientific convenience in many different ways. Depending on the aims of a study, one might distinguish fewer broadly defined values or more narrowly defined values.
The classic version of the theory (Schwartz, 1992) proposed partitioning the circular continuum into 10 basic human values or 4 higher order values. Recently, Schwartz et al. (2012) proposed a refined theory of human values, which distinguished 19 more narrowly defined values. Some of these values derive from a finer partitioning of previous broad values (e.g., two subtypes of security: personal and societal). Others are newly discriminated values that capture a part of the motivational continuum that is situated between two of the previous broad values (e.g., “face,” between power and security in the original version). Figure 1 presents the circle of 19 narrowly defined values of the refined theory. Table 1 defines each of these values, the 10 values of the classic theory, and four higher order classifications of the values.

Circular motivational continuum of 19 values in the refined value theory.
The 4 Higher Order Values, the 10 Basic Values, and 19 More Narrowly Defined Values in the Refined Theory of Values with the Number of Items (in Parenthesis) included in the Current Analysis.
Source. Schwartz et al. (2012).
Hedonism is located on the border of openness and self-enhancement values. We included hedonism in the model for openness.
Face is located on the border of self-enhancement and conservation values. We included face in the model for conservation.
Humility is located on the border of conservation and self-transcendence values. We included humility in the model for conservation.
To measure the original 10 values, Schwartz developed the Schwartz Value Survey (SVS; Schwartz, 1992) and both the 40-item and 21-item versions of the Portrait Values Questionnaire (PVQ; Schwartz, 2003; Schwartz, Melech, Lehmann, Burgess, & Harris, 2001). The first instrument for measuring the 19 values in the refined theory uses the PVQ format and contains 57 items (Schwartz et al., 2012). This instrument is called the PVQ-5X, for fifth, experimental version. 1
Studies of the original 10 values, with various PVQ versions, have been carried out in more than 50 countries (e.g., Bilsky, Janik, & Schwartz, 2011; Schwartz, 2006). However, the full measurement invariance needed for meaningful cross-cultural comparison (Chen, 2008; Steenkamp & Baumgartner, 1998; Vandenberg, 2002; Vandenberg & Lance, 2000) has rarely been established. Several studies with the PVQ-21 have revealed severe violations of cross-country measurement invariance (e.g., Davidov, 2008, 2010; Davidov, Schmidt, & Schwartz, 2008). No study has examined the measurement invariance properties of the new scale to measure 19 values.
Schwartz et al. (2012) reported that both confirmatory factor analysis (CFA) and multidimensional scaling (MDS) performed on the pooled sample of data gathered across countries supported the discrimination of the 19 values in the refined theory. They also found that the 19 values exhibited a greater predictive and explanatory power than the original 10 basic values. These findings point to gains from adopting the refined theory. However, nothing is known of the measurement invariance of this instrument that measures the 19 values. Consequently, we do not know if it has better properties for cross-cultural comparisons than previous scales. The current study examines the measurement invariance properties of the first instrument developed to test the refined values theory, the PVQ-5X.
Evidence of measurement invariance would encourage researchers to use the PVQ of the refined theory to measure values in various contexts and to collect data in different countries. We tested the measurement invariance of the PVQ-5X across eight countries. First, we briefly present the topic of measurement invariance and discuss some previous results of tests of measurement invariance with the PVQ. We then present our results and discuss their implications.
Measurement Invariance
Horn and McArdle (1992) defined measurement invariance as “whether or not, under different conditions of observing and studying phenomena, measurement operations yield measures of the same attribute” (p. 117). The most widely used method to investigate measurement invariance is multigroup confirmatory factor analysis (MGCFA; Bollen, 1989; Jöreskog, 1971). This method involves setting cross-group constraints and comparing more restricted models with less restricted models (Byrne, Shavelson, & Muthén, 1989; Steenkamp & Baumgartner, 1998). According to this method, there are three levels of measurement invariance (Vandenberg & Lance, 2000): (a) configural (all groups have the same pattern of factor loadings), (b) metric (the factor loadings are constrained to be equal across the compared groups), and (c) scalar (in addition, the indicator intercepts are constrained to be equal across groups). Metric invariance is required to compare factor covariances or unstandardized regression coefficients across groups; its presence indicates that a construct has the same metric and the same meaning across groups. Scalar invariance is required to compare construct means across groups; its presence indicates that the scales are used in a similar way in each group. 2
If a model that constrains the parameters of all appropriate indicators to be equal across groups (loadings at the metric and loadings plus intercepts at the scalar level of measurement) is acceptable, then full metric and scalar invariances are supported. Some researchers have argued, however, that partial scalar invariance (metric and scalar) is sufficient for meaningful comparisons (Byrne et al., 1989; Steenkamp & Baumgartner, 1998). Partial invariance is supported when the parameters of at least two indicators (loadings at the metric and loadings plus intercepts at the scalar level of measurement) are constrained to be equal across groups.
Findings of Previous Cross-National Measurement Invariance Tests of Values
The measurement invariance across countries of the PVQ-21 has been tested in several studies (e.g., Davidov, 2008, 2010; Davidov, Datler, Schmidt, & Schwartz, 2011; Davidov et al., 2008). Tests of this instrument have been conducted on data from the European Social Survey (ESS). Davidov and colleagues found two types of problems with this scale. First, only seven values were identified at the configural level in most of the countries. It was necessary to unify some pairs of adjacent values, specifically, power with achievement, benevolence with universalism, and conformity with tradition (Davidov et al., 2008). Unifying these pairs of values did not contradict the theory because adjacent values express similar motivations and each partitioning of the continuum is arbitrary to some extent. However, it would be preferable to choose how to partition the continuum based on theoretical considerations and on the purpose of a study rather than to be constrained by the limitations of the instrument. The second problem was that metric, but not scalar, invariance was established for the seven values. Although this is sufficient for comparing associations across samples, the lack of scalar invariance implies that value means are not strictly comparable using the PVQ-21.
Knoppen and Saris (2009) challenged the idea of unifying adjacent values. They suggested that the need to unify values was not due to limitations of the theory but to the strategy used to choose the items of the PVQ-21. That strategy sought to cover the conceptual content of the entire circle of values with as few items as possible. Consequently, the items chosen to represent each specific value were not sufficiently homogeneous (see also Saris, Knoppen, & Schwartz, 2013). This suggests that the need to unify values can be eliminated by choosing more homogeneous items to measure each value. Cieciuch and Davidov (2012) followed the suggestion of Knoppen and Saris (2009) not to test all of the values in one model. Instead, they created separate models for each higher order value, applying the so-called magnifying glass strategy (Cieciuch & Schwartz, 2012). We too adopt this approach. 3
The Current Study
The current study is the first to test the measurement invariance across countries of the PVQ-5X. In doing so, it is also the first to test the measurement invariance of the 19 values in the refined values theory. Three features of the PVQ-5X, made possible by the refinement of the values theory, lead us to expect a higher level of measurement invariance than found with the PVQ-21: (a) The items chosen to measure each value are more homogeneous because the values are defined more narrowly. (b) There are more indicators per value (three) than in the PVQ-21. (c) The narrower values can be combined to measure the original 10 values with even more indicators per value (6-9; Cieciuch, Davidov, Vecchione, & Schwartz, 2014). We therefore formulate the following two expectations:
Configural and metric measurement invariance will be established for all values.
Scalar measurement invariance will be supported for at least several values.
Method
Samples and Procedure
We collected data during 2010 in the following countries: Finland (N = 334, 65% female, Mage = 42.3, SDage = 6.1), Germany (N = 325, 77% female, Mage = 23.4, SDage = 5.0), Israel (N = 394, 65% female, Mage = 25.7, SDage = 6.2), Italy (N = 388, 59% female, Mage = 35.6, SDage = 14.5), New Zealand (N = 527, 68% female, Mage = 19.5, SDage = 4.2), Poland (N = 547, 66% female, Mage = 27.0, SDage = 10.0), Portugal (N = 295, 58% female, Mage = 27.0, SDage = 10.4), and Switzerland (N = 201, 70% female, Mage = 28.8, SDage = 7.7). Researchers (or instructed assistants) gathered data through self-report questionnaires. Participation was voluntary and respondents were assured that their responses would be kept anonymous. In New Zealand, Israel, Switzerland, and partially in Portugal, the data were gathered online, whereas a written format was used in the other countries (further details are available from the first author).
Questionnaire
The PVQ-5X (Schwartz et al., 2012) contains three items to measure each of the 19 values. Like previous version of the PVQ, each item describes a person in terms of his or her values and respondents are asked to rate “How much is this person like you” on a scale ranging from 1 (not like me at all) to 6 (very much like me). Unlike previous PVQ versions, each item contains only one sentence. Multilanguage versions of the PVQ-5X were prepared using an iterative process of translation and back-translation until the author of the survey and native speakers agreed that the translation optimally captured the nuances of each item.
Schwartz et al. (2012) excluded 9 items based on MDS and CFAs performed on the pooled within-sample covariance matrix. We included in our analyses only the 48 items which they retained. Most values were measured by three indicators and the rest with two. The items included in the current study as well as the scale we recommend are available from the fifth author.
Statistical Analyses
The analyses consisted of three steps:
1. Following Byrne (2004), we performed a CFA separately in each country prior to testing measurement invariance. We applied maximum likelihood estimation.
We used three global fit measures to determine whether the model was acceptable. Root mean square error of approximation (RMSEA) reflects the degree to which a researcher’s model reasonably fits the population covariance matrix, while taking into account the degrees of freedom and sample size (Brown, 2006). It is a parsimony-adjusted index that favors simpler models. When the RMSEA value is smaller than 0.05, the model can be assumed to perform very well (Browne & Cudeck, 1993). When the RMSEA value is 0.08 or less, the model can be assumed to perform reasonably well (Hu & Bentler, 1999; Marsh, Hau, & Wen, 2004). The comparative fit index (CFI) compares the fit of a researcher’s model with a more restricted baseline model. CFI values that are larger than 0.90 indicate an acceptable model fit (Hu & Bentler, 1999). The standardized root mean square residual (SRMR) compares the sample variances and covariances with the estimated variances and covariances. When the SRMR value is smaller than 0.05, the model can be assumed to perform very well, and when it is lower than 0.08, the model can be assumed to perform reasonably well (Hu & Bentler, 1999; Marsh et al., 2004). Because the p value is sensitive to the sample size, we did not rely on its value (Cheung & Rensvold, 2002; Saris, Satorra, & van der Veld, 2009).
2. We ran the MGCFA without any constraints to assess configural invariance. In subsequent MGCFAs, we added the restrictions necessary to test each more stringent level of measurement invariance. If measurement invariance was not established at a given level, we searched for and released the constraints of the parameters that caused the misspecification.
To determine whether the fit of more restrictive models deteriorated significantly, we relied on the cutoff criteria suggested by Chen (2007). The criteria for identifying a lack of metric invariance compared with the configural invariance model, in a sample larger than 300, were a change larger than 0.01 in CFI, supplemented by a change larger than 0.015 in RMSEA or a change larger than 0.03 in SRMR. The criteria for identifying a lack of scalar invariance compared with the metric invariance model were a change larger than 0.01 in CFI, supplemented by a change larger than 0.015 in RMSEA or a change larger than 0.01 in SRMR. As an overall criterion, we used changes in CFI larger than 0.01 as indicating the absence of invariance (Byrne & Stewart, 2006). The analyses were performed in Mplus 7.11 (Muthén & Muthén, 1998-2012).
3. We used the Jrule program (Oberski, 2009; Saris et al., 2009) to detect local misspecifications of the parameters in the model.
For metric invariance, we determined which item loadings in which country caused the largest misspecification. For scalar invariance, we determined which item intercept in which country caused the largest misspecification. Next, we released only the misspecified items in these particular countries and repeated the analysis. However, if there was a need to release a parameter in more than half of the countries, we released it in all countries. After detecting the largest misspecification and releasing noninvariant parameters, we relied on the global fit measures of the final models.
Results
We created a separate model for the first-order factors of each higher order value (Cieciuch & Davidov, 2012; Cieciuch & Schwartz, 2012). For example, Figure 2 illustrates the model for openness to change values. Separate analyses of the results for each higher order value in each country led to the conclusion that all models in all countries reached acceptable model fit (the global fit indices for each sample are reported in the appendix). Next, we ran multigroup analyses for each model of higher order value. Table 2 presents the global fit measures for each model.

A CFA model for the openness values.
Global Fit Measures for the Multigroup Confirmatory Factor Analysis.
Note. df = degrees of freedom; RMSEA = root mean square error of approximation; PCLOSE = probability of close fit; SRMR = standardized root mean square residual; CFI = comparative fit index.
Based on the cutoff criteria of Chen (2007) and Byrne and Stewart (2006), we draw the following conclusion: 16 of the 19 values demonstrated full metric invariance across all countries (self-direction-thought, stimulation, power-dominance, power-resources, face, security-personal, security-societal, conformity-rules, conformity-interpersonal, tradition, humility, benevolence-dependability, benevolence-caring, universalism-concern, universalism-nature, universalism-tolerance). There was full metric invariance for hedonism in all countries but Switzerland and Poland. There was full metric invariance for self-direction-action in all countries but Finland, Portugal (where partial metric invariance was established), and Italy (where there was lack of metric invariance). There was full metric invariance for achievement in all countries but Finland and Poland (where partial metric invariance was established).
Full or partial scalar invariance was supported for the following 10 values across nearly all countries (with a few exceptions for single countries 4 ): benevolence-caring, universalism-tolerance, universalism-concern, universalism-nature, hedonism, power-dominance, power-resources, security-personal, security-societal, and self-direction-thought.
Discussion
The current study examined the invariance properties of the PVQ-5X for measuring 19 values across eight countries. The results demonstrated that the PVQ-5X has better invariance properties than the PQV-21. It was possible to differentiate all 19 values in each country in single CFA and MGCFA analyses at the configural level. Thus, configural invariance for 19 values was supported across all countries. Metric or partial metric invariance was also supported for almost all countries. In addition, scalar or partial scalar invariance was supported for approximately half of the values. Therefore, the refinements of the values theory not only improved its heuristic power (Schwartz et al., 2012), these refinements also made it possible to develop a measurement instrument that is more appropriate for cross-national comparisons.
This is a substantial benefit because the limited scalar invariance of values measured by the PVQ-21 that is part of the ESS precluded comparisons of means for some of the 10 values across larger sets of countries. Thus, the basic values data in the ESS could only be used for other purposes (e.g., comparison of unstandardized regression coefficients or covariances between values and other theoretical constructs of interest).
A major concern of researchers who collect large-scale survey data is confidence that their instruments will yield comparable measurements across samples. This is particularly relevant for new instruments such as the value scale examined in this study. Given the theoretical importance of the value theory for describing individuals and societies and for explaining attitudes and behavior, the current study delivers encouraging results. The refined values theory and the new measurement instrument overcame several problems of noncomparability identified with previous instruments. 5 Not only does the PVQ-5X measure the refined set of 19 values, it can also be used to measure the original 10 values (Cieciuch et al., 2014), and it exhibits invariance across a large number of varied countries. Furthermore, it takes only 1 to 2 min longer to complete than the PVQ-40, and only 2 to 3 min longer to complete than the PVQ-21.
Three factors contributed to the improved measurement properties of the PVQ-5X. First, the items that measure each value are more homogeneous. This was possible because the refined theory makes 19 distinctions in the value circle rather than 10. Some of the original 10 values encompassed disparate contents. By splitting them into conceptually narrower values (e.g., three subtypes of universalism), it was possible to measure their contents with more homogeneous items. This made it possible to identify all 19 values without having to unify any values (cf. Davidov et al., 2008). Second, each narrow value was measured by three items. Although we excluded nine items based on the CFA and MDS analyses of the pooled sample of data (Schwartz et al., 2012), three items remained for 10 of the values. Third, each item contained only one sentence rather than the two sentences in earlier PVQ instruments. This simplification eliminated the possibility of confusion due to perceiving some items as double-barreled.
Compared with studies that used the PVQ-21, the current study was limited in two important ways. First, we analyzed data from only eight countries. Second, our data were from convenience samples rather than representative population samples. It is therefore important to test for invariance with the new value scale using representative samples in a variety of countries. Our findings suggest that studies of the measurement invariance of the 19 refined values will provide stronger evidence than previous work for the validity of cross-national comparisons. Moreover, successful measurement of the more refined values makes possible more finely calibrated explanations of attitudes and behavior and more detailed analyses of differences between cultural and other groups than the original 10 values did.
Footnotes
Appendix
Global Fit Measures of Each Higher Order Value Model for the Single Sample Confirmatory Factor Analyses
| Self-transcendence (df = 55) |
Conservation (df = 98) |
Self-enhancement (df = 11) |
Openness (df = 38) |
|||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| χ2 | CFI | RMSEA | SRMR | χ2 | CFI | RMSEA | SRMR | χ2 | CFI | RMSEA | SRMR | χ2 | CFI | RMSEA | SRMR | |
| Switzerland | 89.6 | .963 | .056 [.034, .076] | .044 | 177.2 | .945 | .063 [.048, .078] | .056 | 31.9 | .950 | .097 [.059, .137] | .059 | 67.2 | .959 | .062 [.037, .086] | .045 |
| Germany | 78.0 | .983 | .036 [.014, .053] | .037 | 193.4 | .950 | .055 [.043, .066] | .046 | 41.8 | .936 | .093 [.064, .124] | .051 | 61.1 | .968 | .043 [.021, .063] | .039 |
| Finland | 89.8 | .977 | .043 [.026, .059] | .031 | 164.9 | .963 | .045 [.033, .057] | .041 | 35.5 | .976 | .082 [.053, .112] | .030 | 114.8 | .919 | .078 [.062, .094] | .048 |
| Israel | 153.1 | .953 | 067 [.055, .080] | .043 | 273.1 | .924 | .067 [.058, .077] | .056 | 14.8 | .996 | .030 [.000, .064] | .022 | 153.6 | .932 | .088 [.074, .103] | .042 |
| Italy | 121.6 | .956 | .056 [.042, .069] | .044 | 159.5 | .967 | .040 [.028, .051] | .034 | 36.4 | .968 | .077 [.050, .106] | .030 | 129.0 | .914 | .079 [.064, .094] | .056 |
| New Zealand | 216.8 | .939 | .075 [.064, .085] | .046 | 261.2 | .934 | .056 [.048, .065] | .044 | 37.2 | .976 | .067 [.044, .092] | .031 | 172.1 | .922 | .082 [.070, .094] | .044 |
| Poland | 137.8 | .964 | .053 [.042, .064] | .036 | 260.0 | .934 | .055 [.047, .063] | .047 | 30.3 | .982 | .057 [.033, .081] | .028 | 211.6 | .884 | .092 [.080, .104] | .056 |
| Portugal | 85.9 | .971 | .044 [.024, .061] | .037 | 153.9 | .957 | .043 [.029, .056] | .048 | 23.1 | .981 | .061 [.025, .096] | .029 | 92.4 | .931 | .070 [.052, .088] | .048 |
Note. Abbreviations of value labels are presented in Table 1. df = degrees of freedom; CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual; χ2 = chi-square.
Acknowledgements
The second author would like to thank the EUROLAB, GESIS, Cologne, for their hospitality during work on this article. The authors would like to thank Lisa Trierweiler for the English proof of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work of the first and second authors on this article was supported by the Scientific Exchange Program (Switzerland) and the Research Priority Program “Social Networks,” University of Zurich. The work of the first author was partially supported by Grant DEC-2011/01/D/HS6/04077 from the Polish National Science Centre. The work of the fifth author on this article was partly supported by the Higher School of Economics (HSE) Basic Research Program (International Laboratory of Socio-Cultural Research).
