Abstract
Several sources of bias can plague research data and individual assessment. When cultural groups are considered, across or even within countries, it is essential that the constructs assessed and evaluated are as free as possible from any source of bias and specifically from bias caused due to culturally specific characteristics. Employing the Explanations of Unemployment Scale (revised form) for a sample of 1,894 employed and unemployed adults across eight countries (the United States, the United Kingdom, Turkey, Spain, Romania, Poland, Greece, and Brazil), we applied a method based on individual differences multidimensional scaling and principal component analysis to detect item bias in terms of culture and try to eliminate this bias variance from the overall item variance so as to (a) avoid jeopardizing validity levels and (b) arrive at clearer and more meaningful dimensions after adjusting the raw scores by removing the bias part. The results supported our statistical–psychometric intervention as the structure computed for the unadjusted data was enhanced and clarified when the data were adjusted for bias in terms of culture. Finally, implications for individual assessment procedures are discussed, and a method for evaluating the relative impact of bias in terms of culture on the raw assessment scores is also presented.
Keywords
Introduction
Testing processes are often plagued by serious sources of bias, such as administration errors, evaluation misjudgments, and incompatible or inappropriate norms. Other psychometric characteristics of the test employed, such as liability to response styles, structural deficiencies, and more (Byrne, 2008; Nunnally & Bernstein, 1994; van de Vijver & Leung, 1997), can also be a potential threat. One of these serious methodological and psychometric disadvantages is related to a person’s special cultural characteristics and to the impact which construct nonequivalence levels (possibly induced by these characteristics) have on the assessment outcomes (van de Vijver, 2011). This fact has been completely disregarded in early cross-cultural research (Xu & Barnes, 2011). Sireci (2005, 2011) has supported that linguistic diversity poses threat and suggested bilingual designs for identifying items that do not function differently across languages. Such bias is now well known to methodologists and psychometricians working with cross-cultural data, with biased information being misleading and wrong (Hambleton & de Jong, 2003). Thus, to rule out such systematic bias, construct equivalence studies should be conducted before doing differential item functioning studies.
When dealing with bias detection and elimination, several terms require attention. With respect to the aims of the current study, attention was first directed to construct equivalence. Differences in scores between cultural groups can reflect valid differences in the construct measured or they can reflect—at least partially—measurement artifacts or bias. One major cause of bias in cross-cultural research is culture itself. Poortinga (1989), assuming “it is meaningful to postulate the identity of psychological constructs cross-culturally” (p. 738), has defined “comparison scale” as the identical scale cross-culturally formed by any hypothetical construct in terms of which a comparison is made; following this, he has discussed several ways of dealing with the artifacts in measurement caused by “bias in terms of culture,” as “Which psychometric properties of data can be validly compared depends on which parameters of measurement scales can be taken as invariant across cultures” (p. 740). In a satisfactory cross-cultural study, there is no variance left to be explained in terms of culture (Poortinga & van de Vijver, 1987). Thus, construct nonequivalence in cross-cultural studies can be mainly attributed to cultural variance, which paradoxically has to be reduced to null to derive cross-culturally comparable and meaningful structures.
Cultural characteristics, a basic notion in cross-cultural research, are difficult to define but they can be regarded as cultural identities with which a member of this culture abides. These may correlate with ecocultural indexes such as affluence and religion, education and population statistics, and even ecological facts for a country, as these can characterize an underlying cluster structure across seemingly different nations (Georgas & Berry, 1995). Thus, characteristics may be studied and may seem different across countries, but if the characteristics studied have common grounds across some of these units, then the number of cultures studied is less than the number of countries involved. Thus, cultural characteristics become even more difficult to define, as the “cluster of nations” method factor forming a homogeneous “culture” in a cross-cultural analysis is itself a part of the definition. Even more, such cultural characteristics can produce construct nonequivalence both across countries and across clusters of countries under study. Finally, different cultures may exist within the same country, as a culture may certainly be a subset of people within a country, since homogeneous subsets of people may possess specific characteristics that distinctly differentiate their way of cultural thinking from other homogeneous subsets of people within the same country (Mylonas, 2009a). For example, males and females within the same nation may possess and exhibit culturally different ways of interpersonal communication. Similar “differences” may exist under any kind of subset formation within a country-nation, such as occupation, place of residence, age groups, educational levels, and so on, their behaviors each time reflecting different cultural identities. Although this argument may initially sound contradictory to the Georgas and Berry (1995) arguments on clusters of nations, in fact it is a complementary one, as for the current argument, nations may “conceal” different cultures depending on financial, educational, religious, occupational characteristics, and even gender; in short, clusters of cultures within the same nation may exist.
Finally, the impact which cultural variations and characteristics of any type may have on a measure-item and on the assessed construct creates its differential functioning which can be briefly named “bias in terms of culture.” If bias in terms of culture can be efficiently treated (Poortinga, 1989; van de Vijver & Leung, 1997; van de Vijver & Poortinga, 2002), then construct equivalence can be sought as variance is set free to accommodate the true structure of the test under consideration avoiding bias side-effects, so structural equivalence can be much more easily studied.
Dealing With Bias in Terms of Culture
Several ways of dealing with bias in terms of culture have been proposed. These include and have mainly focused on item deletion, or in the case of cross-cultural research even in the deletion of whole countries from the overall sample. Removing item bias at the item level does not necessarily lead to equivalence, as it does not necessarily remove construct bias and bias in general. However, the question of bias in terms of culture elimination can, under conditions, be reduced to item bias with the biased item being treated as a disturbance at the item level that has to be removed (van de Vijver & Leung, 1997). Such removal of items, though, conceals potential danger as validity levels may be threatened (even content validity can be at stake; Byrne & van de Vijver, 2010).
There are statistical ways that have been proposed to detect and eliminate bias to achieve invariance across cultures. Some of these methods attempt to account for cultural variance by introducing confounding variables in the study design and then exclude variance/items by means of hierarchical regression models. Alternative ways include covariance structure analysis (CSA; Poortinga & van de Vijver, 1987) or structural equation modeling (SEM; Byrne, 2008; Byrne, Shavelson & Muthén, 1989; Byrne & van de Vijver, 2010). A separate note should be drawn with respect to SEM as it allows for equivalence testing and also encompasses flexible techniques (even for large-scale samples) so as to retain nonequivalent items in the analysis in the form of culture-specific indicators through the partial measurement condition (Byrne et al., 1989; van de Vijver, 2011).
Another method is to circumvent the cultural bias effect by controlling for external criteria, such as gender, age, ability, and other confounding variables specific in each culture, such as ecological indexes (Georgas & Berry, 1995); for intelligence testing, it has been shown—through partial correlation coefficients—that more than 50% of the items are biased (Valencia, Rankin, & Livingston, 1995). However, such an attempt may detect the source of bias but does not partial out unwanted variance. Other methods of bias detection have been extensively discussed by Sireci (2011). Van Hemert, Baerveldt, and Vermande (2001) have proposed that “researchers who want to compare ethnic groups or groups with various levels of acculturation should carry out a study on the cultural bias of their items” (p. 394), signifying the importance of detecting and possibly eliminating item bias that may be or contain cross-cultural bias or bias in terms of culture. However, their contribution in bias detection and elimination suggested replacement of items that show to be highly biased (following the traditional approach of item deletion) and did not allow for retaining useful parts of the biased items’ variance. Still, the authors suggested that validity should be protected and that replacement of biased items should take place during a pilot study or by administering double in number items (with two versions per item) in any empirical-questionnaire study. Finally, a method of adjusting identified intercept differences when estimating latent means has been coined (Scholderer, Grunert, & Brunsø, 2005), a method similar in principle to our own.
In general, most of the aforementioned methods do not deal with individual assessment, as they are applied to samples. Even more, these methods are employed under the assumption that when deleting-replacing items to achieve invariance, this deletion-replacement remedies for bias overall and so this overall remedy is recursively transferred to the individual scores. The latter expectation though may be challenged, as deleting items may have serious impact on a scale’s validity, so if we decide to delete items in cross-cultural (or cross-groups) research so as to achieve construct equivalence, we may end up with a scale that assesses a different construct than the original or intended one. This may have serious repercussions on individual psychological assessment apart from the empirical research ones. Byrne and van de Vijver (2010) have exemplified the process by referring to van de Vijver, Mylonas, Pavlopoulos, and Georgas (2006), where 7 out of the 18 items in all were deleted either beforehand or during equivalence testing. Although Byrne and van de Vijver conclude that the scale’s validity was not harmed, this may not always be the case.
An alternative way of dealing with such bias might be to work as much as possible within the variance of each item. That is, instead of deleting unwanted items, it would be much better to retain all items with their variance as “free” as possible of unwanted bias in terms of culture. So, we might try to intervene at the item variance level, instead of totally discarding the item. Through empirical studies this would be expected to reveal correlations closer to the true value and thus achieve better or at least clearer factor structures than the ones achieved before the intervention. Even more, with respect to individual assessment and scores, the scale’s validity—as initially described and supported in a standardization study or a similar project of test construction—would not be threatened at all. This way, the individual scores would be more meaningful with respect to the original theoretical framework, still free of as much bias in terms of culture as possible.
Aim of the Current Study
The aim of the present study is to further explore and support an existing method for detecting and possibly eliminating bias in terms of culture (Mylonas, 2009a, 2009b). This bias may initially appear in the form of item-bias with respect to the items of a scale that has been applied cross-culturally, but through the method, it should be shown that most—if not all—of this item’s bias is in reality bias in terms of culture. Having shown this, we may proceed to the second stage of the method, that is, elimination or at least reduction of these bias levels, so as to discard as much unwanted variance as possible without eliminating any of the original scale items. For the method to be tested, in the current study we employed Furnham’s Explanations of Unemployment Scale (EoU-Revised) as administered to eight country samples of employed and unemployed adults. For the modeling of the data before and after the intervention through our method, principal component analysis (PCA) designs were employed in our attempt to explain under both conditions the largest possible variance through the extracted dimensions (Merenda, 1997). Real factors (Kline, 1993) and not estimated ones from the data infinity of solutions should be compared across the two conditions at this stage; common factor models (e.g., via maximum likelihood methods) should be employed in future attempts when the structure of the scale per se would be under consideration. However, indicative maximum likelihood solutions were also computed as a preliminary common factor approach using the same bias-reduction method. Finally, we combined the PCA solutions along with CSA indexes to be able to monitor construct equivalence at all stages.
Materials and Method
The Explanations of Unemployment Scale
Furnham (1982) introduced the reasons of unemployment issue and the respective assessment scale. Furnham and Lewis (1986) suggested that unemployment has several implications to the unemployed and that the way people perceive the causes of unemployment may have an effect as well. Furnham and Lewis (1986) also addressed several of the related facets, devoting a large part to unemployment and its psychological consequences on the individual and society. Unemployment and psychological health are related in terms of the psychological reactions and their stages and cycles, with immobilization and shock appearing at the first stages and with internalization and inertia being typically observed as long-term reactions to unemployment.
Studies on unemployment and psychological health (e.g., Feather, 1990; Furnham, 1983; Goldman-Mellor, Saxton, & Catalano, 2010; Jackson & Warr, 1984; Lewis, Webley, & Furnham, 1995) have addressed psychological and social adjustment and effects of unemployment. Furthermore, fatalistic explanations as given by the extensively unemployed have been further explored (Hayes & Nutman, 1981). Other research (Turner, Kessler, & House, 1991; Waters & Moore, 2001, 2002) has connected self-esteem with explanations of unemployment and to Locus of control as yet another related factor (Cvetanovski & Jex, 1994). In a more general sense, explanations for unemployment are considered attributions related to psychological processes and are linked to expectations, which then, in turn, affect beliefs about the causes of success and failure (Furnham & Lewis, 1986).
The explanations of unemployment themselves have been studied through the Explanations of Unemployment Scale for British samples of employed and unemployed individuals (Furnham, 1982; Furnham & Lewis, 1986). Three main axes of explanations were initially described: (a) individualistic reasons, expected to be favored by the employed participants attributing unemployment to personal disposition; (b) societal reasons; and (c) fatalistic reasons. Both (b) and (c) were expected to be provided as explanations mainly by the unemployed, as they would be attributing their unemployment more to external (societal, chance) than to internal (individualistic) reasons. Apart from verifying the hypotheses, Furnham’s study offered the 20-item EoU scale with eight items assessing individualistic explanations, eight items assessing societal explanations, and four items assessing fatalistic explanations. Examples of items in this scale are the following: “Unemployed people are too fussy and proud to accept some jobs,” an explanation of unemployment is “Inefficient and less competitive industries that go bankrupt,” and “The introduction of widespread automation.” All items are scored on a 7-point Likert-type scale.
Factor structure of Furnham’s EoU scale
A New Zealand study (Lewis et al., 1995) supported the 1982 findings and the original scale structure. However, cross-cultural studies that followed the 1982 study (e.g., Feather, 1985; Ward, 1991) only partly replicated factor patterns. Not only the identity but even the number of factors were different across countries, as Feather (1985), using 27 items, described six factors, namely, Lack of Motivation, Recession and Social Change, Competence Deficiency, Defective Job Creation, Personal Handicap, and Specific Discrimination. In addition, Ward (1991) described seven factors, and although differences have been individually described for various countries, these differences have not yet been summarized (e.g., meta-analysis or culture invariance modeling or multilevel covariance structure modeling). However, Furnham (1988) argues that “although different studies have empirically derived rather different factors, it seems quite possible to categorize these into one or other theoretical framework: i.e. individualistic (internal, voluntary, effort, ability), societal (external structural, task difficulty) and fatalistic (cyclical, luck, chance, uncontrollable)” (p. 133). This introduced a cross-cultural issue of explanations for unemployment across national groups as a function of the prevailing economic conditions in each country (Furnham & Hesketh, 1988). Whatever the outcomes though, these and other studies (e.g., Payne & Furnham, 1987) indicated the need for cross-cultural testing of the EoU Scale both in terms of factor structure and of differences in the explanations used across cultures. For possible differences across cultures to be revealed though, the factors assessed through the scale should be comparable and methodologically and statistically equivalent across these nations (Poortinga, 1989; Poortinga & van de Vijver, 1987; van de Vijver & Leung, 1997; van de Vijver & Poortinga, 2002; van de Vijver & Tanzer, 1997). Another issue is the original EoU scale’s ability to assess contemporary facets of unemployment; although comparability with previous findings should be preserved for cross-cultural studies, it is common practice to devise and test new items to bring scales up to date. Toward this end, in a pilot attempt, a study was conducted by the first author as described below.
Pilot Study Stage and the Revised EoU Scale (2007)
We initially set out to elaborate on new revised items and test their metric properties along with the 20 original EoU items. This attempt aimed at the Scale’s reexamination to arrive at a new set of items, possibly including a number of the original ones, but with a main prerequisite: the factor structure for the final set of items would be at least similar, if not identical, to the theoretically proposed one (original EoU scale). In this sense, the items would serve the update need and at the same time the theoretical structure would still be testable. Thus, if not identical, quite similar factors should arise with the new set of items (expected to contain items from the original scale as well) for the adapted version of the scale to be provisionally accepted and further employed.
As mentioned earlier on, the original scale’s numbers of items were 20, structured in three units (Individualistic, Societal, and Fatalistic reasons). This scale was adapted using a Greek population (EAPA 9; Mylonas, 2007). 1 This adaptation study was carried out on employed and unemployed Greeks (N = 250); 44 items (scored on the same 7-point Likert-type scale) were employed in all in an attempt to identify new items and structure which might depict people’s perception about the reasons of unemployment in an up-to-date fashion. The main outcomes supported the three-factor structure consisting of 19 items, of which 8 were original EoU items. The statistical and psychometric analysis involved a series of confirmatory factor analysis models along with internal consistency estimates and CSA models (Muthén, 1994, 2000), across the two groups (employed and unemployed) so as to arrive at a common, factor-equivalent structure for these two groups. The final selection of the 19 items was conducted on a pool of 33 items that was available after the above-mentioned procedure was applied. The outcomes closely resembled the theoretically proposed structure: The individualistic factor was clearly the same with the original EoU one. The second factor was a Societal dimension as it refers to lack of provision by the State closely resembling the societal reasons in the original EoU scale, and the third factor was a Fatalistic one (involving uncontrollable socioeconomic and technological changes) also closely resembling the original EoU dimension. Overall, the statistical and psychometric outcomes for these 19 items and the three factors were satisfactory; thus, this scale (EoU-R) was from this point onward employed in the eight-country main study toward the aim of testing for the method suggested.
Intervention Method to Detect and Reduce Bias in Terms of Culture
If a factor equivalent structure for a set of countries is the target, then the items to be factor analyzed could also be themselves a possible source of bias estimation. That is, for a set of items across countries, indexes can be computed that may contain information about the variance explained only in terms of culture. Thus, accounting for cultural variance in a set of items can be accomplished, if we could estimate the amount of variance caused by “culture” using the information provided by these same items. Previous research has been carried out (Gari, Panagiotopoulou, & Mylonas, 2008; Mylonas et al., 2011) in a similar rationale. However, using a pool of items to measure their own bias in terms or culture levels is not a very common research endeavor and no specific method is readily available. Even more, popular statistical software do not include routines that can provide answers to the questions raised so far. Still there are many statistical tools in our aid toward variance estimation.
The method to be described here on is based on a combination of PCA models and a special application of multidimensional scaling models. PCA is employed only as a starting “baseline” point (PCA modeling on common factor CSA-estimated correlation matrices), providing the initial solution for a pool of items to be processed through the multidimensional scaling methods, and is then employed again at the final stage to compare the “corrected” structure to the “baseline” one. The corrected structure is expected to be as “clear” as possible, having accounted for bias in terms of culture. The analysis was carried out on the revised 19-item EoU scale data collected from eight countries around the world (Brazil, Greece, Poland, Romania, Spain, Turkey, the United Kingdom, and the United States) 2 with a randomly selected N of 1,894 employed and unemployed adults (aged 18 to 67 years; each country’s N is provided in the Results section).
The method is applied at this stage to cross-country data, but we should stress that it certainly applies to cross-group data as well, since these can be considered being different cultures even within the same country, as supported earlier (Mylonas, 2009a). The main aim is to check for possible inconsistencies in the “baseline” structure and, having corrected for it through the proposed method, recheck for desired improvement in this structure. If this can be achieved, the corrected scores do not apply only to research samples, but they can better describe the individual during individual assessment via the instrument and its described structure, after having corrected for bias in terms of culture.
The initial aim was to arrive at an overall structure for a number of countries or cultures. Ideally, this structure would have been a universal one across cultures. Still, discrepant items are most of the time present in these initial structure solutions. With respect to the universality issue, CSA methods have to be applied to all structures, and intraclass correlations have to be evaluated before and after the intervention; however, the interpretation of universality levels was not the main point of interest in this study.
For the set of items across countries, one can compute indexes that may contain information about the variance explained only in terms of culture. Thus, accounting for cultural variance is a procedure that can be achieved by estimating for a set of items the amount of variance caused by “culture,” using the information provided by these same items. Such estimates can be computed in the way described below, through Multidimensional Scaling models. Multidimensional scaling has been widely used to model cross-cultural similarities and differences (as in the Schwartz studies on values through Smallest Space Analysis, a variant of multidimensional scaling; see, e.g., Schwartz, 1992; Schwartz & Sagie, 2000). Furthermore, sophisticated ways to model similarities and differences simultaneously are available, such as the individual differences Euclidean distance model (via ALSCAL), through which we can compute the underlying dimensions for a set of countries and at the same time compute the relative importance of these dimensions for each country, in terms of dimension weights.
A “weirdness index” computed for each country is also available, which corresponds to the proportionality of the individual dimension weights to the overall average weights, thus depicting the eccentricity of each country’s similarity matrix with respect to the overall dimensions in the data. This index can be considered an r 2 index, since it accounts for variance explained by the eccentricity of the similarity matrix, thus depicting the covariance of the cultural elements with the measures of interest. Following the computation of this r-square estimation for each of the countries involved, we can adjust the raw scores for the bias estimates in this index through the procedure described by Formulae (1) to (3) in Table 1 (Mylonas, 2009a, 2009b). The main rationale behind our methods is that through the computed solutions, we take the multivariate dissimilarity into account, having produced the index for each country’s dissimilarity; this way we refer to the multidimensional system of all countries and measures with which a country assimilates or not. Taking advantage of this information, we can adjust the standard deviations for each and every measure within each country and then through z scores adjust the initial raw scores for which we recompute structures. Thus, the “weirdness index” effect can be removed from the original raw scores by adjusting the standard deviation of each item within each country taking this “cultural effect size” out. The adjustment stage is initiated by computing the z scores for the raw data (Table 1) within each separate group, with the final aim being the recalculation of raw scores for each item based on adjusted standard deviations. All computations are performed within each country separately and for each item separately. All participants scores are thus adjusted, having removed some of the bias in terms of culture as depicted by the “cultural effect size” through the weirdness index. We have to stress “some variance” following Ype Poortinga’s discussion (6th IACCP European Regional Congress, July 2003, Budapest) stating that any method employed to reduce bias in terms of culture (Mylonas, 2003) should not discard too much of the error variance because there would be no variance left to explain. Proceeding with minor adjustments to cautiously account for bias, at least up to some extent, sounds wiser; however, the outcomes themselves can show if such a cautious approach needs being bolder or not.
Formulae for Adjusting Each Item’s Standard Deviation and Raw Score.
Note. Weirdness index expresses the amount of variance common with dissimilarity across countries, that is, the higher it gets for a country the more the dissimilarity with the average multidimensional weights for all other countries in the solution. In such a sense, this r 2 index does not imply common variance across measures, but common variance with multidimensional dissimilarity across cultures, or “bias in terms of culture.”
Results
The loadings for the overall sample (19 items, eight countries, N = 1,894), as computed through PCA followed by orthogonal rotation (Stage 1), are presented in Table 2. This is the solution for the raw, unadjusted scores, and was computed on the actual correlations as estimated through CSA among all 19 items.
Principal Component Solution for the Initial Eight-Country Data.
Note. KMO = .94; |D| = .00023; Bartlett’s test of sphericity statistically significant; cutoff loading = .50. By definition, all absolute loadings of .50 and above are statistically significant, and these are indicated in boldface.
It is clear in this solution (Stage 1) that only the “Unemployed people lack self-knowledge and pursue jobs not corresponding to their qualifications” item cross-loads on two dimensions. A minor discrepancy one might say, but still this discrepancy might be due to some obscurity in the cultural “dynamics” and it might be possible even to avoid this discrepancy if some bias variance (unwanted metric or construct inequivalence) could be detected and eliminated. The average estimated intraclass correlation for these 19 items estimated through CSA methods and indicating the probability of a universal structure was .122, departing from the acceptable limit of .06 to assume universality of the factor structure across cultures.
At the intermediate stage, we computed the weirdness indexes (WI) for the countries involved in our sample through the individual differences Euclidean distance multidimensional scaling model (six-dimensional solution, R 2 = .73, Kruskal’s Stress = .13). This resulted into the following indexes: Brazil (n = 411), WIBr = .2733; Greece (n = 250), WIGr = .4916; Poland (n = 156), WIPl = .3156; Romania (n = 204), WIRo = .1921; Spain (n = 238), WIEs = .2839; Turkey (n = 200), WITr = .2677; the United Kingdom (n = 199), WIUK = .2854; and the United States (n = 236), WIUSA = .1025. These WI scores were employed as r 2 indexes to correct for the standard deviation levels within each item, within each country, following Formulae (1) and (2) (Table 1). Then, the adjusted for this correction raw scores were computed (Formula 3) and these were reanalyzed in Stage 2 (PCA solution for CSA matrices; Table 3).
Principal Component Solution for the Adjusted for “Bias in Terms of Culture” Eight-Country Data.
Note. KMO = .94; |D| = .00012; Bartlett’s test of sphericity statistically significant; cutoff loading = .50.By definition, all absolute loadings of .50 and above are statistically significant, and these are indicated in boldface.
What is of specific importance is that for this adjusted solution, the item “Unemployed people lack self-knowledge and pursue jobs not corresponding to their qualifications” now loaded only—and more clearly—on the third dimension, whereas it was cross-loading on two dimensions before the adjustment. Although the profit may initially seem small, we need to keep in mind that cross-loadings often lead to erroneous aggregate score computation, leading in turn into erroneous conclusions even during individual assessment, so it is indeed a step forward to avoid this possibility. The profit in the adjusted-scores solution is exactly this, that it is easier to decide that item “5_N” loads on Dimension 3 and not on Dimension 2, as the difference between the two loadings is now .08, whereas before adjustment the difference was .05 with both loadings being .50 or above. All other loadings did not change or changed slightly, making no difference to the overall structure. A final note on the results is an intriguing one for further research: the average estimated intraclass correlations for the adjusted scores solution did not improve and it actually departed a little more from universality (.16), so a secondary expectation, that is, enhanced levels of equivalence, was not found. According to Byrne and van de Vijver (2010), “the mechanical use of fit statistics can lead to erroneous conclusions; knowledge of the cultures studied is also important in reaching conclusions” (p. 128); this outcome may be attributable to the relatively small number of countries in the data set, not allowing for subsets of cultures to diminish the nonuniversality strength, or it might be that the score adjustment has revealed other parts of the variance initially masked by the bias in terms of culture showing a greater but more realistic fluctuation of constructs across countries-cultures, possibly requiring multilevel modeling. However, these interpretations are speculative for the time being and they obviously require further research. The overall outcomes though are in accordance with previous attempts (Mylonas, 2009b), where for other sets of information and a simulation-like approach the interpretative power of the adjusted solutions was shown to be consistently better.
Discussion and Conclusions
We should evaluate the outcomes of our methods under two perspectives: a research perspective, which mainly refers the use of the scale in empirical studies during computation and use of dimension composite scores; and a psychometric perspective, which refers to individual assessment and evaluation of the relative position of the person with respect to the constructs assessed and with respect to the bias impact on the individual scores.
Starting with the research perspective, our method seems to produce the desired outcomes, as the structure after score adjustment does not change dramatically but “minor” irregularities seem to have been taken care of. The initial second dimension (before adjustment) consisted of five items, including Item “5_N,” which could not be ignored as we could not decide, even after the rotation stage, which dimension it belonged to (second or third, or both). Indeed, the same item was a part of the third dimension too, the dimension “lack of provision by the State,” a societal facet, consisting of six items. At this stage (before adjustment), one might argue that Item 5_N (Unemployed people lack self-knowledge and pursue jobs not corresponding to their qualifications) is not in accordance with the main identity of the third dimension, as it sounds like an inability that the unemployed may suffer from and this is closer to the second dimension identity (“unemployed too fussy and proud,” they “lack intelligence and ability,” etc.). So, the specific item would much better fit the second dimension than the third one, but the loading on the second dimension is lower, and still it cannot be ignored. So, abiding to a common-practice solution, we might include the item’s information into both dimensions when computing the composite scores. However, after correcting for bias in terms of culture, we arrive at a solution that can be more informative (even most of the loadings were somewhat higher for the adjusted scores solution). The same item seems to become clearer in its identity by correlating more obviously with the third dimension and by departing from the second one. It then becomes clear that “Unemployed people lack self-knowledge and pursue jobs not corresponding to their qualifications” is a reason that respondents attribute less to personal inability and much more to poor State and Society provision in terms of poor or bad molding of personality and values through the educational system. So, there seems to be a responsibility shift when one considers this item as a part of the third dimension, a shift already apparent for another third-dimension item, namely, “Unemployed people do not qualify for contemporary market needs,” which could also be misinterpreted as a personal inability but which the respondents consider a responsibility of the State. Apart from clarifying the item’s identity when having adjusted for bias in terms of culture, we are now also able to avoid including this item in the second dimension based on the lower loading than the one before the adjustment and based on the larger departure levels from its loading on the third dimension. Thus, the third dimension’s composite score would still be computed using six items, but the second dimension’s composite score would now be computed on four, not five, items and this obviously would alter each final individual score for this dimension (as one item in five is 20% of the information). Finally, it should also be noted that for the adjusted scores solution, even most of the loadings were now somewhat higher than for the nonadjusted scores Stage 1 solution.
With respect to the psychometric perspective, it is important to remember that the gain, if any, would appear at the individual assessment level and that this is directly linked to the bias in terms of culture impact on personal scores, which are of course computed as factor composite scores supported through previous empirical research. We should then attempt to evaluate this bias impact and take it under consideration while interpreting a specific dimension score. These can apply under the following assessment situation: the assessment expert has employed a specific test for which the structure for the unadjusted item scores is available; so, he/she can compute the composite factor scores for the individual when all item scores have been assessed. The expert has access to the mean and standard deviation for the dimensions of the test that hold for the specific population corresponding to the individual assessed. 3 Standardization details may also be available but are not necessary if the mean and the standard deviations for specific groups are. Finally, the expert needs the weirdness index corresponding to possible bias across nationalities-countries or, more often, across groups (e.g., sex) and this index should be available through empirical research with respect to the test employed and with respect to the specific population and/or specific groups considered to be producing the “cultural bias.” Naturally, we do not refer only to the EoU test at this point but to any test, especially those tests that have to do with abilities, vocational interests, personality and psychopathologic properties of the individual, and so on. Having these at hand, the expert can through simple calculations compute a corrected score for each of the dimensions taking bias in terms of culture group into account and then evaluate this corrected score with regard to the strength of the bias impact as to further understand the nature of the score. Below, we will present an illustration of this individual score correction method accompanied by the method outcomes computed for our overall EoU eight-country sample.
For the illustration, we will consider only the third dimension composite score (societal-poor educational system, according to the Stage 1 structure) as computed for a Greek participant. In most cases, the structure available will of course be the unadjusted scores one. The data for this individual are as follows: X = 5.17, mean = 2.6980, and standard deviation = 0.9136. The weirdness index for this cultural group (country in this case, namely, Greece) is .4916 (as shown through the current study). This information can lead to a computation of a z score that the expert can compute by subtracting X from the mean and by dividing the outcome by the standard deviation (Formula 1). This would result in a z score of 2.70578. Then, using Formula (2), the corrected standard deviation can be computed and this would be 0.6514. Then, using Formula (3), the expert can compute the corrected, free of “bias in terms of culture” score for the individual and this would be approximately 4.46. If we compare the unadjusted score (5.17) to the corrected for bias one (4.46), we find that there is an algebraic difference of 0.71, corresponding to 11.8% of the total possible range of the scores (possible composite scores of 1 to 7, Range = 6) for the dimension. In other words, the impact of bias in terms of culture in this case is around 12%, which has to be taken into consideration when qualitatively interpreting the individual’s score.
A final but very important note is that the 4.46 score (corrected for bias) corresponds to the same relative position of the individual with respect to the normative sample, as by definition and by method assumptions the z score remains the same. Thus, it is not the relative position of the individual that changes after the correction method but the hermeneutical importance of this relative position with respect to the impact of cultural bias on it. In the above-mentioned example, one might conclude that for the specific individual the bias in terms of culture impact is largely active and his/her extreme (z = 2.70578) score may be considered at least partially an artifact. To eliminate the artifact, norms for the adjusted for bias scores would be required but this is far from feasible yet, so for now we may settle for being aware of the size of the bias impact as to avoid arriving into erroneous conclusions, sometimes leading to redundant or mistaken interventions or, even worse, sometimes leading to no action, having “missed the signal.” To graphically show how the impact of bias in terms of culture—in terms of country in our study—affects the dimension scores, we have computed the probability mass functions for the z scores per dimension for each of the two conditions (unadjusted and adjusted for bias in terms of culture scores) and we have plotted the function outcomes on the y-axis with the x-axis being the raw scores as derived directly through the revised EoU scale. These are presented in Figure 1, where the impact of bias in terms of culture is clearly apparent with the separate for each country distribution-like outcomes reducing their kurtosis after adjustment and showing that with even closer to the mean initial scores we may be dealing with extreme responses or behaviors, which may require further psychological attention by the expert.

Composite scores for each dimension before and after adjustment (x-axes) and probability mass functions for z scores; eight countries, N = 1,894.
The methods proposed are obviously not a “panacea” to the problem; they simply introduce just one other way of dealing with bias in terms of culture across countries or across cultures (even within nations) and a way to take it down to the individual and interpret its properties in a better and safer way. If this last part is to be accepted, it should be clear that when culture’s effect is eliminated for a person, then this effect has to be neutralized for all members of the same cultural group, if we want the corrected scores and their interpretation to make sense in relation to each other.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
