Abstract
Current-day cross-cultural psychology typically attends to a linguistic equivalence requirement for the assessment of one and the same psychological construct in different cultures. The present article suggests loosening the requirement of using identically worded items in all cultures included in a cross-cultural comparison in favor of following a more emic methodology in instrument development. Using purely illustrative material on the relationship of paternal warmth and trust in five cultures (Germany, Moldova–Russian, Togo–French, Zambia–English, and Zimbabwe–Shona), an approach is suggested that develops items autonomously within the cultures included in a comparison, subsequently ascertains structural and measurement equivalence of covariance matrices obtained on the basis of items differently worded in different cultures, and finally validates the measurement by showing the equality of the relationship of the differentially measured latent construct under scrutiny (here, paternal warmth) with the comparison variable (here, trust) in all cultures. The authors hope to offer first steps toward a quantitative emic comparative psychology and discuss further research needs.
When reading state-of-the-art contemporary methods textbooks (e.g., van de Vijver & Leung, 2011) in cross-cultural psychology, the impression prevails that the highest possible degree of equivalence in measurement across cultures is an irrevocable requirement for valid cross-cultural comparisons. In the literature, we find typologies and definitions of equivalence in abundance; this is not the place to review in detail the differentiations that are made. Relying on one of the earliest articles in the field by Berry (1969), one is well advised to distinguish at least three types or levels of equivalence.
First, there is functional equivalence. In terms of Berry, who cites Goldschmidt (1966) as well as Frijda and Jahoda (1966) on this,
Functional equivalence of behaviour exists when the behaviour in question has developed in response to a problem shared by two or more social/cultural groups, even though the behaviour in one society does not appear to be related to its counterpart in another society. These functional equivalences must preexist as naturally occurring phenomena; they are discovered and cannot be created or manipulated by the cross-cultural psychologist. (p. 122)
To give an example, “family cohesion” and related constructs like a “positive family climate” or “parental warmth” can be compared across cultures that form families of whatever size around co-residing mothers and fathers of a given child. It cannot be compared across cultures, when a culture (like the Nayars in Old India) is included, where biological mothers and fathers do not co-reside. Or, to take the matter to the everyday world, an umbrella and an anorak are functionally equivalent, an umbrella and a plastic bag are not, although the latter can also be used for protecting one’s head against rain.
The second type of equivalence that Berry sees as needed is conceptual equivalence. Conceptual equivalence has no simple or widely accepted definition. In principle, conceptual equivalence stands for equivalence in (semantic) meaning. Do members of different cultural groups prima face conceive the same “thing” when they are requested to assess, for example, the cohesion of their family, the positivity of the climate in their family, or the warmth of their parents? Or, to once again blatantly take the matter to the real world, a bottle of Coke and a can of Pepsi are conceptually equivalent, a can or bottle of Seven Up is not, although all three beverages can be classified as soda pop, thereby being functionally equivalent.
The third aspect of equivalence that Berry (also 1980) speaks of is metric equivalence. For this aspect of equivalence, too, no simple and widely accepted definition exists (Vandenberg & Lance, 2000). What is, in principle, addressed is the question that if respondents are asked to react to a certain item through marking a number (or response category later converted to a number), the marked number needs to express an equivalent degree of agreement/disagreement with the question.
While there seems to be consensus in the literature that functional equivalence can only be plausibilized—more convincingly or less convincingly—by the researcher, there is considerably less agreement about how to test conceptual equivalence and metric equivalence or even the appropriateness of Berry’s terminological distinction. Van de Vijver and Leung (2011) slice up the equivalence “cake” somewhat differently by addressing “structural or functional equivalence” (p. 20) as the first aspect, “metric or measurement unit equivalence” (p. 21) as the second, and “scalar or full score equivalence” (p. 21) as the third aspect. Often conceptual equivalence, as distinguished by Berry, is seen as supported when the number and relational pattern of factors is equal across the cultures included in a study. Metric equivalence is accepted as given for a psychological construct under scrutiny, when the relative weight of items used to measure it (loading pattern), and the size and relationship pattern of the measurement errors are equal. For psychological tests, the mean structure should additionally be the same (scalar equivalence, in van de Vijver and Leung’s terms). Such an approach generally rules out that one and the same construct can be measured by items with a different semantic content in different cultures.
The present article sets out to offer a few ideas to pave first steps on the way toward a quantitative emic cross-cultural psychology, that is, a culturally comparative psychology that operationalizes some of its quantitative measures differentially from within the cultures included in a comparison.
Since its professional inception in the early 1970s, 1 cross-cultural psychology, sometimes creatively, sometimes in fiercely antagonistic disputes, discussed the ways and means of how to compare psychological phenomena across cultures. Debates about the comparability of test scores across cultures are almost as old as psychometrics themselves; we just have to remember the concepts of a “culture-free” or a “culture-fair” test as introduced by Cattell (1949).
Within cross-cultural psychology and also in dissociation from it, discourse on the comparability of psychological phenomena across cultures was always marred by epistemological controversies if not even ideologies. When the most prominent arguments are subsequently sketched (or at least what the present authors see as those arguments), single-source references are deliberately omitted, as the present article is not a review article. Readers are referred to the archive of the International Association for Cross-Cultural Psychology (IACCP; Berry & Lonner, 2006) as one source for detailed references. Other more accessible sources obviously are the various hand- and textbooks that have meanwhile appeared (e.g., Berry, Poortinga, & Pandey, 1997; Kitayama & Cohen, 2007; Matsumoto, 2001; Poortinga, 1977; Triandis, 1980).
The intention of the founders of cross-cultural psychology as a distinct subdiscipline of psychology was to check the validity of psychological research findings beyond Euro-American cultures (Jahoda, 1973). This credo quickly led to controversies under which circumstances one could (and even should) compare. The movement for a culturally informed psychology commenced as a rebellion against the psychological mainstream of the time, which almost exclusively studied White middle-class U.S. students. Quickly, however, the new approach drew the criticism of being a particularly perfidious attempt to now study the whole world on the basis of North American theories. Allegations of an enlightened cultural imperialism were voiced at least in undertones of methodological debates, because the founders of cross-cultural psychology adhered to an epistemologically universalist top–down approach to psychological research: Contemporary psychological theories were typically chosen for a validity check outside of their cultural origination context. It was, however, the case some 40 years ago (and is to a high degree still today) that an overwhelming majority of—published—psychological theories originated from a Euro-American cultural context. Thus, quasi-automatically, a Euro-American understanding of psychological phenomena prevailed.
The world at that time was, however, still in the phase of de-colonialization (we write the 1970s), and among and beyond cross-culturalists, a movement of indigenization had a strong impact (Sinha, 1984). It pleaded for an attempt to always understand psychological phenomena from within a given culture; something that Europeans and North Americans had done all along, without, of course, explicitly stating it. The unequivocal plea for understanding psychological phenomena from within a given culture did then put new constraints on cross-cultural comparisons. How can one compare across cultures if the phenomena to be compared are defined from within cultures? How can you even be sure that you have the same “thing” under scrutiny? Epistemological extremists argued that you cannot, but at least among cross-culturalists, there was a strong sentiment that comparison is important and possible. Controversies typically argued along epistemological and methodological camps: Qualitative versus quantitative research, bottom–up versus top–down strategies, the Verstehen principle originating from German Geisteswissenschaften against a (neo-)positivist explanation approach, and an indigenous versus a universalist scientific principle were some of the camp-building buzzwords. The controversy between an indigenous and a universalist approach to answering the question what impact culture has on psychological phenomena and their interrelation has been discussed most lively under the heading “emic” versus “etic,” a terminology borrowed from linguistics (Pike, 1954) and subsequently from anthropology (Goodenough, 1970).
All in all, it can safely be summarized that a deep rift between a qualitative, interpretive, indigenous, emic, and a quantitative, positivist, universalist, etic approach to a culturally informed comparative psychology has characterized the scene for several decades. Recently a mixed-methods approach has been proposed as a possible bridge across the rift (e.g., Roer-Strier & Kurman, 2009). Such an approach essentially advocates the credo of an “okay let’s do it both, and then let’s try to make sense of what we find” by relating results from both approaches to each other (“triangulation”). The current article goes a step further by suggesting an actual “marriage” of the two camps by offering a suggestion for a quantitative emic approach to a culturally informed comparative psychology.
In current etically minded social research, there seems to be a trend toward ever stricter requirements for equivalence (Seipel & Rippl, 2013). A few decades ago, it was enough to achieve a comparable factor structure of the items used to measure a construct in different cultures. When there were an equal number of factors and an equal pattern of substantial loadings, equivalence was accepted as given once there was a consensus on functional equivalence, that is, an acceptance that the construct was sufficiently similar across culture-specific nomological networks. Today, however, equality of loadings and of measurement errors and error correlations are often being required to allow a comparison across cultures (for an application, see Byrne & van de Vijver, 2010). In the testing literature, an equivalence of the mean structure is additionally required in the form of demanding item intercept equivalence.
At the same time, it has become customary to simultaneously secure functional and conceptual equivalence (i.e., the requirement that concepts have the same meaning) by only utilizing equally worded items in all cultures. Thus, linguistically identical items that produce identical mathematical relations when used in studies in different cultures are often seen as the silver bullet of etic cross-cultural psychology. Identical mathematical relations, however, do not in and by themselves prove that identical psychological content is assessed. If covariance matrices are identical across cultures, this solely suggests that numbers and their relationships are equal. It is utterly irrelevant for the math, what the numbers mean: “The numbers don’t remember where they came from” (Lord, 1953, p. 751). Just as a mind experiment, there can certainly be structural equivalence, loading equivalence, and metric equivalence in matrices that are based on something different in each and every cell of a matrix. Equivalence of meaning cannot be created by mathematical equivalence. In current-day etic quantitative cross-cultural psychology, it is secured by accepting identically formulated items as proof that numbers mean the same thing. Not that the current authors are mistaken, this approach is the “reigning” approach to comparing across cultures by all means, and deservedly it will remain the most prevalent approach to instrument development in cross-cultural psychology. One should in our opinion, however, be aware that the requirement of linguistic equality bears the danger to go at the expense of measuring constructs in a way that secures culture-specific ecological validity.
When one takes into consideration that formulating identical items for cross-cultural research typically means that you have to include different languages (or in certain cases different variants of a language, like French from France and from Quebec, or British and American English), it quickly becomes evident that semantic identity (of items) is discretionary, as linguists will likely point out. Even words that would generally be accepted as equal across languages, like Haus, house, maison, casa, дом, 房子 (to just use the first author’s native language and the five official languages of the United Nations as the basis for an example), often if not always have culture-specific side connotations. In etic cross-cultural psychology, we see an inclination to prove equivalence of meaning by equivalence of covariance matrices and vice versa. This is not meant to say that such a conclusion is improper, but it is likely to have the consequence that a fairly narrow kernel of a psychological construct will be included in cross-cultural research, namely, the one for which semantic identity can be proven via showing the equivalence of covariances.
Mathematical equivalence cannot create semantic equivalence but can only obviate it. This insight opens the door for a culturally informed quantitative emic comparative psychology, so at least the thrust of the current article. With equal rigor as in quantitative etic cross-cultural psychology, one could test mathematical equivalence for matrices that are based on instruments that differ in wording between cultures. Certainly also when taking that approach, the mathematical proof of equivalent covariance matrices does not imply that the meaning of whatever is assessed by differently worded instruments is the same only because there is numeric equivalence. Meaning equivalence would obviously have to be secured in a way that differs from the approach that uses (linguistically) “equal” items.
Method
How might this be done? The remainder of the article sets out to illustrate a possible way. The illustrative character of the empirical material gathered in the present study must strongly be emphasized.
Developing Emic Instruments
The first step is that a culturally diverse research team finds a consensus on the psychological phenomenon they want to study. For such an agreement, it is, of course, necessary in the first place that all cultures under scrutiny in a given research project are represented in the—ideally multilingual—research team. The team must then agree that for the phenomenon at stake, there is functional equivalence. For the current illustration case, a team of four Germans joined forces with citizens of Moldova, Togo, Zambia, and Zimbabwe. 2
The research team (seven students from diverse social science disciplines, none of them a “genuine” psychology major, and the first author) decided to work on the emic assessment of paternal warmth. Why this choice? In principle, a construct was sought that is of substantial importance for the well-being of humans, which at the same time is likely to be understood in different ways in different cultures. Paternal warmth seems a good candidate for being such a construct for several reasons. Using data from the Standard Cross-Cultural Sample of the Human Relations Area Files (HRAF), Veneziano (2003) impressively showed that—on the level of culture—low paternal warmth is a very strong (and stronger than low maternal warmth) predictor of aggressive behavior of adolescents. He, thereby, corroborated earlier—individual level—findings by Chorost (1962) that were based on self-report data obtained from U.S. adolescents.
At the same time, both research and folk wisdom tell us that what indicates warmth of a father may differ substantially across (historic) time and across cultures (Perälä-Littunen, 2004). Work on ethnotheories has greatly enhanced our knowledge about such differences (Amorim & Rossetti-Ferreira, 2004; H. Keller et al., 2006). Harkness, Super, and Mavridis (2011) express the need of “understanding the actual content of parental ethnotheories in their own culturally ‘emic’ terms” (p. 73). The current authors, thus, see the construct of paternal warmth as a prototypical candidate to illustrate options for a quantitative emic cross-cultural assessment.
What were the steps to arrive at emic instruments for the assessment of paternal warmth? The research group first consulted about a number of formal points. It was decided that instruments are to be developed in five languages (English–Zambia; French–Togo; German, Russian–Moldova; and Shona–Zimbabwe). English is the official language of Zambia. French is the official language of Togo with Ewe and Kabiyé being additional local languages. German is the official language of Germany. In Moldova, Moldovan, a variant of Romanian, is the most widely spoken language, with Russian, Ukrainian, and Gagauz other officially recognized languages. Russian was chosen for the to-be-developed instrument to increase comprehension in the research group: Shona was the only language comprehended by only one member of the research group. Contrary to Russian, Moldovan/Romanian would have been a second one. Shona is one of three official languages of Zimbabwe; the other two are English and Ndebele. All members of the research team were fluent in English.
It was then decided that the introductory phrase of the instrument should read,
Dear Participant: We would be happy if you could take the time to briefly answer a few questions on your experience with your father. Please answer the subsequent questions on a scale from 1 to 7, where 1 stands for “I totally disagree with the statement” and 7 stands for “I totally agree with the statement.” Please think of the time when you were still a child.
Radically speaking, the research team deviated from its emic thrust by using a linguistically “identical” introductory phrase in every culture, but longer discussions in the team came to the conclusion that there is no culture specificity in the introductory phrase that merited an emic approach to intro-formulation.
Further formal decisions were that every item would best commence with the phrase, “My father . . .” and that every item should be affixed with a 7-point Likert-type scale ranging (as stated) from “totally disagree” to “totally agree.” For the Shona version, using, “My father . . . /Baba vangu . . .” as the introductory two words for every item was not implemented for a combination of grammatical and stylistic reasons, but an analogous phrase also appears in every item (see the appendix). The intro and the response scale were translated to the four non-English languages. The translations were checked by the entire team (as far as possible) as to their appropriateness. The translation to French proved to be slightly more difficult than the translation to the other languages. It was finally decided to use “Pas du tout d’accord” and “Tout à fait d’accord” as response scale anchors. Also, the decision to use a uniform response format across all cultures violates a pure emic research orientation, but again the research team saw no reason to assume that this inserted bias into the study. 3
After these formal decisions, research group members from the five language communities/cultures independently developed 10 items each that were in their subjective understanding measures of paternal warmth, thereby making subjective face validity the primary criterion for item quality. The Germans in the research group were allowed to collaborate on this task; the other members of the research group worked independently on the task and were allowed to only consult with non-researchers from their own cultural background, but not with each other. All non-German research group members reported having consulted with fellow students from their own culture of upbringing before formulating the items that went into their emic instrument. This consultation helped to reduce the possibility of individual meaning foci on paternal warmth to a certain degree, but must certainly has to be systematized in “real” research.
Further discussions in the research group focused on possibilities to facilitate a validity check for the items developed separately for the five languages/cultures. Readers should once again note that the purpose of the present study was not to study paternal warmth in its own right, nor to develop a new scale on that construct, but to offer empirical material to illustrate that cross-cultural comparisons are potentially possible when using differently worded “emic” instruments in all cultures. For that purpose it is, however, not enough to show that covariance matrices of the 10 items are equivalent in the five cultures. Unlike in traditional etic approaches, this equivalence cannot be taken as quasi-proof that meaning is equivalent across cultures, because here the semantic material, on which the matrices are based, differs a priori.
In this situation, it is necessary to relate paternal warmth to some other variable and show that the relationship with that variable is identical across cultures. At the latest here, it becomes evident that the approach suggested in the present paper is not an emic one in purist terms. What is actually suggested is to formulate an instrument strictly from within a culture (emically), but then validate it through linking it to another variable that is etically derived (in the sense of accepted by the scientific community as measuring the same “thing” around the globe, independent of culture). Candidates for such variables could be other established scales measuring paternal warmth, variables that have consensually been shown in the literature to be related to paternal warmth, observational data, and biological or neurophysiological/neuropsychological variables. As another established paternal warmth scale available in the five languages/cultures included here seemingly does not exist and as it was out of the question to obtain any observational or bio or neurodata for a preliminary methods study, it was decided to look out for a variable for which a universal relationship with paternal warmth can plausibly be assumed. Again the exemplary illustrative purpose of taking this road to validation should be kept in mind. An accepted measure for this variable would have to be available in all five languages of the current research project, that is, English, French, German, Russian, and Shona. In substantive terms—as pointed out by Veneziano (2003)—a hostility scale would be a good candidate for becoming an external validation measure. The so-called Cook–Medley Scale, a hostility scale constructed on the basis of the Minnesota Multiphasic Personality Inventory (MMPI; Cook & Medley, 1954) could have been such a scale due to the near ubiquitous availability of translations of the MMPI, but a Shona version of the MMPI could not be obtained.
The research group eventually settled for including a measure of trust as the external validation variable. A relationship between parental warmth and trust has frequently been reported by psychiatrists (e.g., J. G. Johnson, Bromley, & McGeoch, 2005). A relationship between trust and paternal warmth, in particular, can be assumed on the grounds of aggregate level analyses of sociologists. They report (Delhey, Newton, & Welzel, 2011) highest levels of general trust in Scandinavia. At the same time, we know from other publications that paternal involvement in child-rearing is also highest in Scandinavia and does have an impact on child outcome variables (Sarkadi, Kristiansson, Oberklaid, & Bremberg, 2008). It, thus, seems plausible to infer that the fact of paternal warmth being available to more children than elsewhere is one of the reasons why trust levels are higher in Scandinavia than elsewhere, suggesting in more general terms that paternal warmth “breeds” trust.
Of course, at this point, the authors have to concede once more that by resorting to assessing the relationship between an emically defined paternal warmth and trust measured in a linguistically identical manner in all cultures, the propositions made here for the emic construction of instruments for cross-cultural comparisons leaves the narrow confines of the emic approach to empirical research: An emically based comparison is not possible without a, so-to-speak, etic anchor.
To measure trust (as our etic anchor), the research group used the general trust item from the World Values Survey (WVS), which is available in all languages included in the present study. The item was, however, affixed with an 11-point response scale (as used for this item in the European Social Survey).
4
In English it reads,
Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people? Please tell me on a score of 0 to 10, where 0 means “you can’t be too careful” and 10 means that “most people can be trusted.”
In addition, respondents were requested to indicate their gender and their age in years.
Sample
Convenience samples were analyzed for the current illustration study—note once again that this is not a study on the relationship of paternal warmth and trust, but a study meant to illustrate options for emic measurement that nevertheless allows comparison across cultures. 5 Research group members of the five culture/language groups solicited participation of 30 or more individuals from their own cultural background. Table 1 offers information on sample sizes, gender distribution, and average age.
Sample Characteristics.
Results
Analyses to illustrate how the research group arrived at valid emic paternal warmth scales were performed step-by-step. Step 1 was to conduct within-language/culture exploratory factor analyses of the items independently developed in the different language groups. Because unidimensional scales were sought, a forced one-factor solution was chosen, using principal component analysis. Also, here one has to concede that this analytic step violates a pure emic approach. It could certainly be that an emic understanding of paternal warmth is unifactorial in Culture A, but bi- or multifactorial in other cultures. Forcing all items on one factor in every culture once more adds an ingredient from etic instrument construction in that it implies that there is something like a universal “g” factor (Spearman, 1904) in paternal warmth. Results of the five exploratory factor analyses are documented in the appendix.
Using the most common threshold (Peterson, 2000), it was decided to discard items that did not have a minimal loading of .40 in their country. Two German items, one English item, and one Russian item did not fulfill the .40 criterion and were thus discarded. As scales for the five cultures/languages were planned to have an equal number of items, this meant that the two lowest loading items in the other countries had to be discarded as well. Further tests were performed on the eight highest loading items from all five cultures/languages. Internal consistencies for the eight-item scales per culture/language varied between α = .79 (French/Togo) and α = .94 (Russian/Moldova).
After within-culture item selection, files were merged for all five cultures/languages. This was done by first assigning technical names to the items within country. The names ITEM1 to ITEM8 were assigned within country to items in accordance with the sequence of their loading sizes. For the English (Zambia) subsample, this meant that the item “My father spent as much time as he could with us” became ITEM1, “My father gave the best example to me and my siblings” became ITEM2, down to the item “My father walked me to school every morning,” which became ITEM8. The same item name assignment was undertaken in the other four languages/cultures.
The first step to show that covariance matrices in the five cultures/languages are equivalent was to perform a five-sample confirmatory factor analysis using AMOS21 (Arbuckle, 2012). Table 2 documents the pertinent fit indices for all subsequently reported analytic steps.
Summary Table of Model-Fit Indices.
Note. An insignificant PCLOSE value suggests that RMSEA does not significantly exceed .05. CFI = comparative fit index; RMSEA = root mean square error of approximation; Δχ2 = difference in the chi-square values between the model and preceding model; pΔχ2 = probability of Δχ2 under the Ho.
Whereas paths were fixed to equality for the three African samples and for the two European samples, different values were allowed across continents.
A comparison of (G) and (I) also shows an insignificant decrease in fit: Δχ2 = 6.74, p = .346.
Typically, the minimal requirement of equivalence is that of corroborating that all eight items load significantly on one factor in all countries. That model reached an acceptable goodness of fit (see Table 2, Model A), suggesting that the one-factor model is a well-fitting model. Further tests then checked how far equality constraints could be “tightened up” without losing significant amounts of model fit. The first step of “tightening” was to fix the loadings of same items, that is, items with the same technical names ITEMj to identity. For this model (B), the loss of fit as expressed in χ2 was insignificant; other fit indices improved slightly. The next step was to fix the variance of the latent construct to equality across all cultures/languages. 6 For this model (C), the loss of fit as expressed in χ2 was once again insignificant (on the p ≤ .25 level); other fit indices stayed almost unchanged. In a third step, equality of the measurement error of the individual items was checked. That analysis was performed for all items one-by-one. The first test pertained to ITEM1, the highest loading item in all five samples. For this model (D), the loss of fit as expressed in χ2 was once again insignificant (on the p ≤ .25 level); other fit indices stayed almost unchanged. For ITEM2 to ITEM4, equality of item measurement errors could not be shown; in these cases, there was a significant loss of fit, if their error variances were fixed to equality. For ITEM5, however, equality could once again be corroborated. For this model (E), the loss of fit as expressed in χ2 was insignificant (p = .19); other fit indices stayed almost unchanged. For none of the remaining three items could equality of measurement error be shown. There always was either a moderate (ITEM8) or a substantial loss in fit (ITEM6, ITEM7) when measurement errors were fixed to equality.
The final step of “internal” validity checks was performed by checking whether item intercepts could be fixed to equality for any of the items. This once again was the case for two items, but different than those for which measurement error equality could be shown, namely, ITEM2 and ITEM7. The overall model fit for the model that fixed the intercepts of these two items to equality was acceptable (see Table 2, Model F).
In an interim summary, one can say that analyses showed that the 5 × 8 items from the five cultures/languages could be shown to all load on one factor in their cultures. It could further be shown that all eight items for the five cultures do have identical loadings on the latent variable “paternal warmth” in their culture. 7 Third, it was shown that the latent variable “paternal warmth” had the same variance in all cultures. Furthermore, for ITEM1 and ITEM5, it could be shown that their error variances were also equal across cultures (something not the case for the other six items). Item intercept equality could additionally be shown for ITEM2 and ITEM7. The existence of a unidimensional construct with eight indicators measuring it in a structurally equal manner (loading equivalence), producing equally dispersed distributions of latent scores, could, thus, be shown. Higher demands of item equivalence could only be fulfilled in limited terms: Sameness of measurement error could be shown for two of the eight items. Intercept identity (i.e., an item score contributing identically to the overall score of the latent variable in all cultures) could also be shown for two items; scalar equivalence in the sense of van de Vijver and Leung (2011) was clearly missed.
The model with the above-described constraints was in a next step taken to test how the latent variable (as an exogenous variable) predicted trust (as an endogenous variable) in the five cultures/languages. The point of commencement once again was a model that allowed the path from paternal warmth to trust to freely vary in the five cultures/languages. This “free” model had a good fit (see Table 2, Model G). Subsequently, a comprehensive search for the best possible constrained model was performed. The search commenced with fixing all paths from paternal warmth to trust and the variances of trust to being equal. That model fit the data significantly less well than the free model. In subsequent tests, essentially all possible combinations of equality constraints for the paternal warmth to trust path and for the variances of trust were checked. It turned out that a model that constrained the paths from paternal warmth to trust to be equal across the three African samples and equal across the two European samples (but numerically different from the African samples, Model H) fit the data as well as the unconstrained model. The same was true for a—final—model that also constrained variances of trust to be equal in the African samples and in the European samples (but not numerically equal between the two continents, Model I). The final model also had a better parsimony-adjusted fit than the fully free model (PCFI 8 free = .930/PCFIconstrained = .959).
Judged on the grounds of AMOS’ modification indices (MIs), this picture did not change, when gender and age were additionally entered into the model as further exogenous controls: For none of the constrained parameters did a significant MI emerge, suggesting that equality is not a consequence of gender- and/or age-related sampling peculiarities of the given convenience samples. Table 3 documents the standardized coefficients for factor loadings of ITEM1 to ITEM8 and the standardized path from the latent paternal warmth variable to trust.
Item Loadings and Path Coefficients for Five Cultural/Language Groups.
This column documents standardized path coefficients β from the latent paternal warmth variable as an exogenous variable to trust as an endogenous variable. For β = .40, coefficients are significant (p ≤ .001).
Table 3 first of all makes it obvious that the relationship between the latent paternal warmth variable and trust is equal in the three African cultures/language communities and in the two European cultures/language communities. It is worth noting that the path for the African samples is highly significant (p < .001), whereas for the two European samples, the path coefficient is insignificant, and also has the conceptually “wrong” sign. For none of the items is there full scalar equivalence (equal item errors and equal item intercepts), but two items (ITEM1 and ITEM5) have identical error terms and thereby loadings. In rough and ready translations to English, these two items read, “My father spent as much time as he could with us” and “My father helped me with my homework” (Zambia), “My father was very engaged in my academic education” and “My father occasionally offered me gifts” (Togo), “My father has always supported me in that I pursue my interests” and “My father wished me a good night every evening when I went to sleep” (Germany), “My father always praised me when I did something well” and “My father used positive physical contact in order to express his love” (Moldova–Russian), and “My father often hugged me as an expression of closeness and coziness between us” and “My father constantly told me that he is proud of me when I did something good” (Zimbabwe–Shona). 9 Two other items (ITEM2 and ITEM6) have identical intercepts (see the appendix).
Discussion
To commence with an aside, those readers who were brought up in traditional psychometrics and those who work closely together with practitioners will be familiar with the fact that many psychological tests have parallel, sometimes several parallel forms. For these parallel test forms, the scientific community generally accepts that results obtained with one form of a test can directly be compared with those obtained with another form of the test. The current authors point to this fact, not because they think that their suggestions to allow emically developed instruments to assess a certain psychological construct and later nevertheless compare across cultures is the very same thing as assessing one and the same construct with parallel test forms and compare the scores obtained from different groups of people, obtained with the different forms. We do acknowledge that a lot of careful, often multi-year work is indeed invested into the development of contemporary parallel forms of tests. The intention of the current authors in pointing to parallel test forms rather is to relativize the strong tendency of contemporary cross-cultural psychology that one and the same phenomenon can typically only be assessed by identical items, that is, items that have been proven to be linguistically identical through careful translation endeavors supported by evidence for mathematical matrix identity. The reasonable use of state-of-the-art parallel test forms to assess one and the same psychological construct suggests that linguistic identity is not a “born” requirement when assessing different groups of people and later compare their results.
The crux of the suggestion made here is that it shifts the obligation to ascertain equivalence away from the instrument itself (here the paternal warmth scale) to the relation of its scores with another measure (here the WVS trust item). For this other measure, equivalence has to be given if one wants to discard the requirement of linguistic equivalence for the newly developed emically derived measure. Single self-report items as the one chosen here are not really optimal candidates for constituting the standard of comparison, even in cases like the trust item, where non-equivalence concerns have not typically been raised. One suggestion to overcome the problem of having to prove the equivalence of the standard of comparison before relating it to the emically derived instrument could be to resort to unobtrusive observational, physiological, or neurophysiological measures to avoid something like an infinite regression (i.e., having to prove equivalence of the anchor variable with another and yet another such variable). If one can show that an emically derived measure of paternal warmth is identically related to physiological measures (like hormone levels, muscle contractions, skin resistance, electroencephalography [EEG] or magnetoencephalography [MEG] forms, or yet more sophisticated neurophysiological measures), this could be seen as support for the conviction that the same “thing” can be measured via differently worded items in all cultures under scrutiny. For the time being, validating emic measures by relating them to “hard science” measures may not (yet) be an option, though, as neuropsychological measurement, in particular, is still in its infancy. An example from very recent genetic research might, however, serve as a valid illustration of what is suggested here: In their article, “The Genetic Correlation Between Height and IQ: Shared Genes or Assortative Mating,” M. C. Keller et al. (2013) report from a twin study that the “additive genetic correlation between height and IQ is .13 in males . . . and .22 in females” (The PLOS Genetics Staff, 2014, para. 1). They measured height in centimeters and IQ via the Wechsler Adult Intelligence Scale/Wechsler Intelligence Scale for Children (WAIS/WISC). In our quantitative emic approach, we would have measured IQ by emic instruments (not only with two parallel test forms) and would then have fixed the relationship between the intelligence measures and height and the coefficients published by Keller et al. in all included cultures, thereby fixing the relationship between an emically validated variable and an etically defined variable and the same coefficient in all cultures. Also here, one has to acknowledge that the relationship between intelligence and height may indeed be a universal one, but that it is likely to be affected by within-culture variability, so that one may additionally have to include individual-level socio-economic status as a control, but this proviso does not invalidate the general thrust of the approach: Variables for which a strong genetic component for their relationship with another variable (the emically measured variable, in our case) has convincingly been shown are good candidates to externally validate emic measurement.
But where do we now stand with regard to the illustrative material we offered in the current article? We have corroborated that there are eight items each from five cultures/language groups that can be shown to have equal loadings on their culture-specific latent variable paternal warmth and that this latent variable has equal variability across cultures. It also exhibited an equal covariation with an external variable in the three African cultures included in our illustration study. Stricter equivalence demands were not met in a more than rudimentary form: Two items could be shown to have equal measurement error across cultures, for two further—unfortunately different—items, equal intercepts could be shown. Had the authors been “lucky” in a way that the two items with equal measurement errors and with equal intercept would have been the very same items, partial scalar equivalence as required in etic cross-cultural work (see, for example, Davidov, Schmidt, & Billiet, 2011) would already have been achieved.
What can one draw from the illustration study presented here? It seems possible to develop items that fulfill the requirement of partial scalar if not even full scalar invariance. Subsequent studies need a much larger item pool per culture coupled with a less ad hoc item formulation process. It also seems possible to validate emic scales by relating them to an external variable. Harder validity criteria are needed in subsequent studies for the external variable to be used for validation purposes. In the future, such external validation variables might come from the sphere of biological or neuro(psycho)logical measures so that covariation “created” by the fact that both variables are self-report variables is ruled out. Third, larger non-convenience samples are needed for further tests of the suggested approach to developing instruments for a quantitative emic comparative psychology. Convenience samples of students and their personal networks bear the problem that artificial covariation within culture is created. Small samples as presented for the current illustration study bear the further problem that it is easier to corroborate invariance in smaller than in larger samples under the significance testing logic.
What the current article calls for, in our view, is a larger more systematic study with a greater item pool for the construct under scrutiny and bio- or neurodata to measure the external criterion variable. Ideally, such a study should be undertaken with larger population-representative samples. What the current article, however, also calls for in the eyes of its authors is giving up any dogmatic prerogative to undertake cross-cultural comparisons with “identically” worded items only. Conversely, the authors have to concede that their approach is to some degree advocating empiricism. Items adequacy is not deducted from theoretical considerations, but is based on consensual face validity within a culture, supported by cross-cultural mathematical “sameness.” Theory actually only enters the stage when a relationship with an “anchor variable” is postulated and tested. The question may, however, be allowed whether exactly this is not after all at the core of emic thinking?
What the article does not advocate is to use an emic approach to cross-cultural measurement for all possible psychological constructs. First and foremost, the approach advocated here seems to be an alternative to etic instrument construction for psychological phenomena that initially have a culture-specific definition (like guanxi, wasta, jeitinho, and “pulling strings”; see Smith, Huang, Harb, & Torres, 2012) but can and should, nevertheless, fruitfully be compared across cultures.
Footnotes
Appendix
With-Culture/Language Item Loadings.
| Language | Item wording | Item loading a |
|---|---|---|
| English (Zambia) | My father … | |
| … spent as much time as he could with us | .90 | |
| … gave the best example to me and my siblings | .82 | |
| … provided for me and the family | .82 | |
| . . . was loving and caring | .77 | |
| . . . helped me with my homework | .68 | |
| . . . did not do any chores around the house | .59 | |
| . . . did not raise his voice to my mother in front of me and my siblings | .51 | |
| . . . walked me to school every morning | .50 | |
| . . . was very concerned about my health and well-being | .42 | |
| . . . did not hit me and my siblings | .12 | |
| French (Togo) | Mon père . . . | |
| . . . était très engagé dans mon instruction scolaire | .84 | |
| . . . me souhaitait bonne nuit avant que je n’aille au lit | .70 | |
| . . . m’aidait à faire mes devoirs quand il avait du temps libre | .66 | |
| . . . semblait toujours intéressé par ce que j’avais à dire | .62 | |
| . . . m’offrait des petits cadeaux de temps en temps | .58 | |
| . . . faisait du vélo avec moi (ou d’autres activités sportives) | .57 | |
| . . . mangeait avec moi à l’heure du diner | .57 | |
| . . . me punissait lorsque je faisais des bêtises | .56 | |
| . . . me lisait des histoires avant que je n’aille au lit | .56 | |
| . . . me demandait toujours comment ma journée à l’école s’était passée lorsqu’il rentrait du boulot | .55 | |
| German (Germany) | Mein Vater . . . | |
| . . . hat mich immer darin unterstützt meinen Interessen nachzugehen. | .91 | |
| . . . hörte mir immer zu, wenn ich über etwas reden wollte. | .90 | |
| . . . hat sich regelmäßig nach meinen Problemen und meinem Befinden erkundigt. | .89 | |
| . . . gab mir das Gefühl, beschützt zu sein, wenn ich Angst hatte. | .87 | |
| . . . hat mir jeden Abend, bevor ich schlafen gegangen bin, eine gute Nacht gewünscht. | .85 | |
| . . . stand immer hinter mir, auch wenn ich einen Fehler gemacht hatte. | .82 | |
| . . . hat mich auf den Arm genommen um mich zu trösten. | .78 | |
| . . . hat gerne viele Geschichten über mich vor Anderen erzählt. | .52 | |
| . . . hat mich oft körperlich bestraft. | −.36 | |
| . . . hat mich vor allem mit “Sohn”/”Tochter” angesprochen. | −.27 | |
| Russian (Moldova) | Мой отец . . . | |
| . . . хвалил меня, когда я делал/a что-то хорошо | .92 | |
| . . . был очень заботливый со мной | .91 | |
| . . . поддерживал меня, когда у меня была проблема | .89 | |
| . . . обычно показывал мне, что он гордился мной | .85 | |
| . . . использовал положительный физический контакт, чтобы показать свою любовь | .77 | |
| . . . проводил выходные со мной в большинстве случаев | .77 | |
| . . . был очень понимающим, когда я делал/a что-то неправильно | .77 | |
| . . . обычно говорил мне, что он гордился мной | .76 | |
| . . . был всегда доступен, когда я нуждался/нуждалась в совете | .72 | |
| . . . помогал мне делать домашнее задание по крайней мере раз в неделю. | .39 | |
| Shona (Zimbabwe) | Wakasunununguka nababa vako zvekuvambundira kazhinji. | .93 |
| Baba vako vanombokupawo zvipo here apo ne apo. | .78 | |
| Kana ukawirwa nedambudziko kuchikoro, baba vako vanotaurika navo. | .77 | |
| Unotaurawo nababa vako kazhinji. | .73 | |
| Baba vako vanombokutaurirawo kuti vanodada newe kana uchinge wagona. | .63 | |
| Kana ukatadza, baba vako havangokurova asi vanotaura newe kuti unzwisise kutadza kwako. | .62 | |
| Unombofambawo nababa vako uchivhakacha murivaviri. | .61 | |
| Kana ukaita shamwari yawanetsana nayo ringave dambudziko raunotaurira baba vako. | .56 | |
| Baba vako vakanzwa kuti pane umwe mwana arikukunetsa kana kukudenha vanopindirawo. | .50 | |
| Baba vako vanorangarirawo bhavhudhe rako. | .48 | |
Principal component analysis; items are ordered by loading size.
Authors’ Note
This article draws on a prior publication by
in the introductory sections, but then, much in contrast to the prior article, offers exemplary data illustrating the suggestions made for the implementation of a quantitative emic cross-cultural psychology. Raw data can be obtained from the first author.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
