Abstract
This study examined the latent factor structure of the General Health Questionnaire–28 (GHQ-28) in a Black South African sample (N = 523). Results of the single-group confirmatory factor analysis support the universal four-factor structure of general psychological health observed in Western samples. However, multigroup confirmatory factor analyses (i.e. split-sample cross-validation approach, conducted with invariance analyses) for a three-factor structure suggest that psychological health could have a less differentiated dimensional structure in some African populations. Theoretical and practical implications of the study results are discussed.
Keywords
Background
Mental disorders are highly prevalent in less-developed countries but often go untreated (Consortium, 2004) partly because of a lack of rigorous assessment of these disorders with validated measures. It is typical to observe that Westernized tests are sometimes used more frequently in developing world societies than measures constructed locally (Oakland, 2004). As a consequence, cross-cultural psychological assessment remains a salient issue in developing world societies, especially in those that are diverse (Casillas and Robbins, 2005; Van de Vijver, 2002).
From a psychometric perspective, the comparability of psychological constructs in cross-cultural assessment poses significant methodological issues, since cultural norms, values, attitudes, and experiences influence the construction of culture-specific constructs (Harpaz, 1996). Therefore, it is recommended that the psychometric characteristics of transported psychological measures—in the form of measurement validity, reliability, and translation, conceptual, and metric equivalence—should routinely be investigated and established.
On a theoretical level, it has been suggested that dimensions of psychological health are universal across cultures (Goldberg and Hillier, 1979). This universality hypothesis has been extensively tested in subsequent empirical research (e.g. Werneke et al., 2000). Although there has been substantial research on the generalizability of measures of psychological health from the Western contexts within which they were developed to non-Western settings, very little has been published on test transportability to developing countries in Africa. Despite this paucity of cross-cultural measurement validity and reliability evidence, measures such as the General Health Questionnaire (GHQ)–28 (Goldberg and Hillier, 1979) are widely used in applied practice and research in African settings to screen for and diagnose general psychological health.
As the experience of general psychological health may be culture specific (Matsumoto, 2000), it is important to establish how well Western psychological measures provide valid and reliable measures of psychological health in other cultural contexts. Also, at a broader theoretical level, questions remain about the appropriateness of Western typologies of psychological health in developing countries until these are confirmed with empirical research. For these reasons, the present study sought to determine the factor structure of the GHQ-28 in a largely unexplored population within the African developing world, that is, adults from the Black ethnic group in South Africa.
Reliability and measurement validity evidence of the GHQ-28
Even though it was developed as a self-administered screening instrument to diagnose early psychological distress in primary care settings (Werneke et al., 2000), the GHQ-28 has been widely used for other purposes, such as measuring change in symptomatology over time (Ormel et al., 1989). The GHQ-28 comprises 28 items that form four subscales, namely, Somatic Symptoms, Anxiety and Insomnia, Social Dysfunction, and Severe Depression (Goldberg and Hillier, 1979). The GHQ-28 shows comparable indices of validity and discriminative power, accompanied by the advantage of being shorter and easy to fill out (Lobo et al., 1986) than longer versions.
The GHQ was developed in the United Kingdom in English, but there have been many applications and translations into other languages (e.g. Lobo et al., 1986; Nagyova et al., 2000). Various studies have sought evidence of reliability and measurement validity of the GHQ, most of which confirmed the GHQ-28 as a reliable and valid measure of psychological distress within various cultural contexts (e.g. Boardman, 1987; Goldberg et al., 1997; Kalliath et al., 2004).
Most research on the factorial structure of the GHQ-28 in other national samples have confirmed the original four-factor solution (Werneke et al., 2000), with the Severe Depression and Social Dysfunction subscales showing the greatest stability—although significant overlap seems to exist between the Anxiety and Insomnia and Somatic Symptom groups (Werneke et al., 2000). Taken together, these results suggest a degree of cross-cultural transportability of the measure’s factor structure and, by implication, also the underlying theoretical framework.
However, out of over 50 validity studies published, nearly all were conducted in Europe and the United States. As culture exerts influence on the creation, maintenance, and definition of abnormal behaviors, concerns about the cross-cultural reliability and validity of psychological distress diagnoses, and even in the diagnostic categories used, could arise (Matsumoto, 2000). Traditional tools of clinical psychological assessment are generally based on a standard definition of abnormal behavior and may fail to capture culturally specific expressions of disorders (Marsella, 1979). For example, standard diagnostic instruments to measure depressive disorder may miss important cultural expressions of depression in Africans, among others (Beiser, 1985), which bring into question not only the universality of the dimensions of psychological health but also its manifestation in behaviors in different cultures.
Cross-cultural consistency of dimensions of general psychological health
General psychological health could be experienced in different ways across national, cultural, and language contexts. For example, the presence of depression is characterized by physical, emotional, and behavioral changes (Berry et al., 1992) involving symptoms of “intense sadness, feelings of futility and worthlessness, and withdrawal from others” (Sue et al., 1990: 325). However, there is inconclusive evidence for the notion that expressions of symptomatology of depression are cross-culturally constant, as the literature points to both universal and culturally specific ways in which depression may occur and be experienced (Matsumoto, 2000). Cultures appear to locate feeling states not only in the mind but also in different parts of the body, which may explain why somatic complaints in the manifestation of depression are emphasized in some cultures (Matsumoto, 2000). For example, Marsella (1979) has argued that depression mainly takes an affective form in individualistic cultures, with typical symptoms of feelings of loneliness and isolation dominating the clinical picture. In more collectivistic (communal) cultures—which would include most African cultures (Hofstede, 2001)—somatic symptoms of headaches are dominant. Similar findings regarding somatization, or bodily complaints as expression of psychological distress, have been reported elsewhere. For instance, Hispanic, Japanese, and Chinese people are said to somaticize more than Europeans or Americans, although more recent cross-cultural studies challenge this view (Matsumoto, 2000). As it is likely that the manifestation of psychological distress may be culture specific, it could be expected that a measurement model spanning across cultures could vary considerably.
Ongoing research into the cross-cultural generalizability of the GHQ-28 to the African context is relevant for various reasons. First, such research would provide further support for the use of the measure in clinical practice and applied research in Africa. Second, at a practical level, insight into factor structures in African samples might help to design future measures for local use. Third, to test Goldberg’s hypothesis of a common language of psychological distress between cultures, research in African samples could determine the appropriateness of the four-factor theoretical framework used to classify psychological health within the African context.
Approach of the present study
The present study investigated the psychometric properties and factor structure of the GHQ-28 within a nonclinical African sample of adults employed in the South African military. Three alternative factor structures (models M1, M2, and M3) of the GHQ-28 were examined with confirmatory factor analysis (CFA) via LISREL 8.8 (Jöreskog and Sörbom, 2004). These models were suggested by earlier research, including a four-factor model (M1) (Goldberg and Hillier, 1979), a five-factor model (M2) suggested by earlier exploratory factor analysis on a larger sample from which the current data were drawn (Dhladhla and De Kock, 2008), and an empirically derived three-factor model (M3) (i.e. based on the high intercorrelations observed between the Somatic Symptoms and Anxiety and Insomnia subscales in the current data). We utilized a split-sample approach and used the results of the single-group CFAs to determine the best-fitting baseline model for conducting cross-validation.
Cross-validation measurement invariance (MI) tests were conducted with multigroup CFA (MGCFA) to provide a second confirmation of the model with the best generalization potential (Hair et al., 2010). The following series of tests were conducted with MGCFA: loose cross-validation (single-group CFAs on both samples), factor structure equivalence (i.e. factor form/configural invariance; Vandenberg and Lance, 2000), metric invariance (i.e. factor loading equivalence, ∧g = ∧g′), scalar invariance (τg = τg′), error variance invariance (
Methods
Participants and procedure
Participants were 523 uniformed military employees from the South African National Defence Force and only from the broad ethnic African population group. Of these, 437 (83.6%) were male participants. Most of this nonprobability sample, 317 (60.6%), was in the rank group Private to Corporal, followed by the junior officers (Candidate Officer to Lieutenant), totaling 146 (27.9%). About 61.8 percent of the participants had a senior secondary school qualification, and the remainder had postschool qualifications (25.4%) or lower (12.8%) than secondary school. Some (57%) of the participants were single and 40.2 percent married. The sample showed reasonable dispersion across age groups, although most (42.1%) were aged 18–25 years.
Study participation was voluntary. Data were gathered in a single-administration session supervised by a registered intern psychologist. After the aim of the research was explained to the participants, informed consent was obtained. All study participants were proficient in English, the official workplace medium, despite it being most subjects’ second language. During the administration session, no participants indicated difficulties in understanding any words or expressions in the test items.
Following the administration of the measure, the original sample (N = 523) was randomly split into two comparable samples—calibration (n = 262) and cross-validation (n = 261).
Measures
General psychological health
We used the English version of Goldberg and Hillier’s (1979) GHQ-28, which assesses the four factors of general psychological health (Somatic Symptoms, Anxiety and Insomnia, Social Dysfunction, and Severe Depression) with 28 items. The GHQ requests participants to indicate how their health in general has been over the past few weeks, using behavioral items with a 4-point scale indicating frequency of experience (ranging from 1 = not at all to 4 = much more than usual).
Data analytic procedure
A series of CFA models (single- and multigroup-item level analyses) were specified and estimated with LISREL 8.80 (Jöreskog and Sörbom, 2004). We compared a four-factor model (M1; Goldberg and Hillier, 1979), a five-factor model (M2) suggested by earlier exploratory factor analysis on a larger sample (Dhladhla and De Kock, 2008), and an empirically derived three-factor model (M3). Based on the normality assessment results, robust maximum likelihood (RML) was specified as the estimation technique (Tabachnick and Fidell, 2001), producing the Satorra–Bentler chi-square statistic (S-Bχ2). The goodness of fit (GOF) for each model was assessed by reviewing various fit indices and their respective confidence intervals (Bentler, 1990; Cheung and Rensvold, 2002; Vandenberg and Lance, 2000).
Results
Missing values and normality
Missing values were imputed with PRELIS 2.80 (Jöreskog and Sörbom, 2002) via the multiple imputation procedures. Sample sizes were slightly attenuated (calibration sample: n = 260, cross-validation sample: n = 258). In both samples, the null hypotheses of univariate and multivariate normalities were rejected (skewness and kurtosis in calibration sample: χ2 = 8520.29, p = .000; cross-validation sample: χ2 = 9341.42, p = .000), and hence, RML estimation was employed to derive model parameter estimates.
Descriptive statistics and correlation between study variables
Table 1 presents the means, standard deviations, and the latent variable intercorrelations between the GHQ-28 dimension scores. As expected, positive correlations (uncorrected for measurement error) between the dimensions of psychological distress were observed, ranging from .22 (Personal Dysfunction and Social Dysfunction) to .77 (Somatic Symptoms and Anxiety and Insomnia). The internal consistency reliability estimates (Cronbach’s alpha) ranged from .70 to .83.
Descriptive statistics and intercorrelations (corrected and uncorrected for unreliability) between GHQ-28 subscales for M1 and M2 (calibration sample).
GHQ: General Health Questionnaire; SD: standard deviation; M2: five-factor model; M3: three-factor model.
All significant at p < .05; M2 below the diagonal; M1 above the diagonal.
Latent variable correlation is 1.0 due to disattenuation for measurement error; correlations in parentheses are uncorrected for unreliability.
Alternative models for the GHQ-28: CFA results
Table 2 reports the fit indices for the three models (M1, M2, and M3). Based on the root mean square error of approximation (RMSEA), Nonnormed Fit Index (NNFI), Comparative Fit Index (CFI), and standardized root mean residual (SRMR) values, all three models exhibited very good fit. All of the cutoff criteria were liberally exceeded (i.e. RMSEA < .07, CFI/NNFI > .92, SRMR < .08). In addition, the test of close fit (in contrast to exact fit) performed by LISREL by testing Ho: RMSEA ≤ .05 against H a : RMSEA ≥ .05, revealed close fit in all the models (p > .05). Intercorrelations between the subscales for M1 and M2 are reported in Table 1, above and below the diagonal, respectively. In M3, Somatic Symptoms correlated .50 and .88 with Social Dysfunction and Severe Depression, respectively, while Social Dysfunction and Severe Depression correlated .58 (all significant). Model 3 obtained an Expected Cross-Validation Index (ECVI) value of 2.22 (M1: ECVI = 2.24; M2: ECVI = 2.23). Based on the similar GOF results for all models and the lowest ECVI value, M3 (i.e. three-factor model) was identified as the best-fitting baseline model.
Fit indices for alternative factor structure models of the GHQ-28.
GHQ: General Health Questionnaire; M1: original four-factor structure; M2: five-factor structure; M3: three-factor structure; χ2: normal theory weighted least square chi-square; S-Bχ2: Satorra–Bentler scaled chi-square; df: degree of freedom; NNFI: Nonnormed Fit Index; CFI: Comparative Fit Index; SRMR: standardized root mean residual; pclose fit: p value for close fit; RMSEA (CI): root mean square error of approximation with 90 percent confidence interval.
n = 260; *p < .05.
For the M3 model, all but one factor loading (item Som 1, “Been feeling perfectly well and in good health?”) were statistically significant. This item consistently obtained the lowest (nonsignificant) loading in all the models. A similar pattern of factor loadings was obtained over the three models. The four lowest completely standardized factor loadings were item Som 1 (Somatic Symptoms, .13), item Som 2 (Somatic Symptoms, .37), item SDys 1 (Social Dysfunction, .26), and item SDys 2 (Social Dysfunction, .33).The remaining completely standardized factor loadings ranged from .49 to .73 (over all models).
Cross-validation results
The first step in the cross-validation was to fit M3 (i.e. three-factor model—with Somatic Symptoms and Anxiety and Insomnia collapsed into one scale) to the cross-validation sample in a single-group CFA (Meade et al., 2008). The model fit results were comparable to and slightly better than the M3 results for the calibration sample. A normal theory weighted least squares chi-square of 735.83 (p < .05) and S-Bχ2 of 384.97 (degree of freedom (df) = 347; p > .05) were obtained. The RMSEA of .021 (confidence interval: .000; .032), CFI of .99, NNFI of .99, and SRMR of .069 all provide evidence of very good model fit. In addition, evidence for close fit also emerged (pclose fit = 1.00).
The results for the completely standardized factor loadings showed a similar trend to the calibration sample results. All loadings were significant, except for item Som 1 (in the Somatic Symptoms subscale). With the exception of the four lowest loadings, the other loadings ranged from .43 to .68. The Anxiety and Insomnia subscale items loaded satisfactorily (ranging from .43 to .64) when specified to load on the Somatic Symptoms subscale.
Based on these results, we continued with the second step in the cross-validation procedure. To conduct the MGCFA cross-validation, the item with the strongest loading on every scale (reference variable) was set to 1.0, specifying the scale of each latent variable—item 6 of the Somatic Symptoms subscale, item 6 of the Social Dysfunction subscale, and item 2 of the Severe Depression subscale.
The results of the cross-validation are presented in Table 3. Excellent configural invariance was achieved, with all the cutoff points being liberally exceeded (RMSEA ≤ .07; CFI/NNFI > .92). Respondents from the two samples used the same conceptual framework in responding to items in the GHQ. Strong statistical (ΔS-Bχ2(25) = 12.08, p > .05) and practical evidence (ΔCFI = .001; ΔMc = .01; ΔΓ1 = .001) was obtained to support metric invariance (constraining the factor loadings of like items to be equal across groups) across the samples, meaning that the regression of the indicator variables on the latent traits has parallel slopes across groups for all indicator variables or items. We found strong statistical and practical evidence for scalar invariance (i.e. equivalence in the vector of regression intercepts; ΔS-Bχ2(53) = 34.18, p > .05; ΔCFI = .001; ΔMc = .01; ΔΓ1 = .001) and error variance–covariance invariance (ΔS-Bχ2(81) = 60.26, p > .05; ΔCFI = .002; ΔMc = .02; ΔΓ1 = .001). Last, complete invariance (statistical and practical difference between fully constrained model vs fully unconstrained model) was evident (ΔS-Bχ2(87) = 69.41, p > .05; ΔCFI = .002; ΔMc = .02; ΔΓ1 = .001). This implies that the GHQ-28 revised three-factor measurement model successfully cross-validated from the calibration to the cross-validation sample.
Results of the cross-validation invariance tests (i.e. configural, metric, scalar, error variance, and complete/full invariance analyses).
M0: fully unconstrained model / configural invariance model; M1: invariance of factor loadings; M2: invariance of factor intercepts; M3: invariance of error variance-covariance matrix; M4: invariance of latent variable variance-covariance matrix df: degree of freedom; χ2: normal theory weighted least square chi-square; S-Bχ2: Satorra–Bentler scaled chi-square; NNFI: Nonnormed Fit Index CFI: Comparative Fit Index; pclose fit: p value for close fit; RMSEA (CI): root mean square error of approximation with 90 percent confidence interval; 1: p > .05 (p = .97); 2: p > .05 (p = .97); 3: p > .05 (p = .95); 4: p > .05 (p = .91).
p < .05.
Discussion
In earlier research, very good transferability of the four-factor structure of the GHQ-28 across cultures (Werneke et al., 2000) has been reported. The results of the current study support the universality of this factor structure, by finding good fit of the original four-factor solution in a Black South African sample.
However, any given model—even with good fit—is only one potential explanation, and other model specifications can work equally well (Hair et al., 2010). Therefore, we tested alternative factor solutions by fitting competing models (i.e. those with three and five factors) to the data. Our results suggest that a three-factor structure provided a slightly better representation of the broad psychological health construct in our data. The high intercorrelation between the Somatic Symptoms and Anxiety and Insomnia subscales suggested that these dimensions should be merged into a broader dimension of psychological distress. This inability of Black South African participants to differentiate between the two subscales concurs with the results of Werneke et al. (2000), who reported similar findings for samples drawn from Bangalore (India) and Ibadan (Nigeria). In their study, they concluded that “items of the original Somatic Symptoms, Anxiety and Insomnia scales loaded nearly equally on two factors” (p. 826). The underlying theoretical structure of general psychological health among Africans could, therefore, consist of fewer, but broader, dimensions of general psychological health than those reported in the literature from respondents in Western settings.
Our results also showed that certain items had unsatisfactory factor loadings, especially on the Somatic Symptoms and Personal Dysfunction scales. Cultural relativism holds that psychopathology and culture are intertwined because culture plays a role in determining the exact behavioral manifestations of abnormal behavior (Matsumoto, 2000). By implication, it would not make sense to expect behavioral indicators of latent constructs such as psychological distress to load similarly on these variables when applied in different cultural contexts. Using societal norms as a criterion for assessing symptoms of abnormality is problematic because norms change over time, and they are culturally subjective. What members of a particular society or culture consider deviant may be accepted as normal by others (Matsumoto, 2000: 253). When transferring measures across cultures, it is recommended that intensive content validation of the measure’s items should precede data collection in order to capture possible culturally unique expressions of the latent variable constructs.
Another explanation for the possible item bias in the poorly performing items may reside in the language of items when transferred across cultures. Cultural nuances may be encoded in language (Matsumoto, 2000), and therefore, it is possible that—even though the standard English GHQ-28 version was used in our study—the contextualized meaning of these items in the local culture’s adoption of English may differ.
Despite these provisos, our findings indicate that the four-factor model of psychological distress is fairly robust when applied within a Black South African sample. In this way, our results agree with earlier findings of cross-cultural transferability of the GHQ-28 to non-Western countries (Werneke et al., 2000) but add to the literature by extending this evidence to a developing African country. More specifically, revisiting the exact structure of two GHQ-28 subscales (Somatic Symptoms and Personal Dysfunction) before further application in South African samples—and perhaps other African samples—is warranted.
Some limitations of this study need to be addressed. Our data were sourced from a single sample that was more homogenous (ethnically, but also in terms of gender) than the broader ethnically diverse African population, implying a limited generalizability of our findings. We attempted to address this limitation by conducting cross-validation. Although successfully cross-validated, it must be acknowledged that sampling error is not sufficiently overcome when making use of the split-sample approach to cross-validation (Raju et al., 1999). Future cross-validation using independent samples should yield more robust estimates of MI across samples in the broader African population. Moreover, the invariance of the instrument over gender groups should also be addressed in future studies.
An interesting avenue for future research would be exploring the possible culturally specific manifestations of psychological distress in various African samples. Prior research (Beiser, 1985) has suggested that depression, for instance, can be experienced by Africans in behaviors not captured in the standard GHQ-28, which suggests that an adapted version of the GHQ-28 could offer an improved way to assess psychological distress in Africa. Studies are needed that examine how the existing measure of psychological distress taps the latent construct in African samples and also how—at a theoretical level—the nature of the construct of psychological distress presents in an African setting. There is ample evidence that forms of abnormal behavior are sometimes observed in only certain sociocultural milieus—the so-called culture-bound syndromes (Matsumoto, 2000). Considering the vast differences in psychosocial contexts, nature of stressors, and supportive mechanisms and resources between developed and developing countries, a deeper investigation of the subfacets of psychological distress in Africa is warranted.
Conclusion
This study assessed the factor structure of psychological health (as measured by the GHQ-28) in a Black South African sample. Our results showed that, despite the good fit of the commonly observed four-factor solution, a three-factor structure fit better in the present Black South African sample. It appears that a less differentiated experience of psychological health may be evident in certain African populations than those observed in earlier cited literature. These results have interesting implications for the general international consensus about the theoretical structure of psychological health. We encourage further research on the factor structure of psychological measures in Africa and other developing countries.
Footnotes
Funding
The International Test Commission (ITC) provided a travel grant for one of the authors to present the findings of an earlier version of the study.
