Abstract
Brief measures that are comparable across disparate groups are particularly likely to be useful in primary care settings. Prior research has supported a six-item short form of the Whiteley Index (WI), a commonly used measure of health anxiety, among English-speaking respondents. This study examined the measurement invariance of the WI-6 among Black (n = 183), Latino (n = 173), and White (n = 177) respondents seeking treatment at a U.S. community health center. Results supported a bifactor model of the WI-6 among the composite sample (N = 533), suggesting the presence of a general factor and two domain-specific factors. Results supported the incremental validity of one of the domain-specific factors in accounting for unique variance in somatic symptom severity scores beyond the general factor. Multiple-groups confirmatory factor analysis supported the configural, metric, ands scalar invariance of the bifactor WI-6 model across the three groups of respondents. Results provide support for the measurement invariance of the WI-6 among Black, Latino, and White respondents. The potential use of the WI-6 in primary care, and broader, settings is discussed.
Almost everyone experiences health anxiety—defined by cognitive–behavioral researchers as the wide range of worry individuals can have about their health (Asmundson & Taylor, 2005)—at times in their lives (Taylor & Asmundson, 2004). Health anxiety is dimensional, ranging from mild concern about one’s health to excessive preoccupation with one’s health (Ferguson, 2009; Longley et al., 2010), and severe health anxiety is related to poor patient outcomes within primary care. For example, Fink, Ørnbøl, and Christensen (2010) found that individuals with severe health anxiety seeking treatment in primary care had heightened physical dysfunction, greater mental health complaints, and used between 41% and 78% more health care services per year than individuals with a well-defined medical condition without elevated health anxiety. The poorer functioning of individuals with severe health anxiety was unaccounted for by depression, well-defined medical conditions, or other forms of anxiety. Such findings support recommendations for screening patients presenting for treatment in primary care for health anxiety (Fink et al., 1999).
To better understand the impact of health anxiety within primary care, and elsewhere, it is of course necessary to develop valid assessment tools. Practicality is an important quality of assessment tools within primary care, with Kroenke, Spitzer, and Williams (2003) opining that “the busy nature and competing demands of primary care practice make efficiency a particularly important attribute of any new measure” (p. 1285). To date, the Whiteley Index (WI; Pilowsky, 1967) and the Illness Anxiety Scale (Kellner, 1986) are the most commonly used measures of health anxiety (Sirri, 2014). The Illness Anxiety Scale consists of 29 items (27 main and 2 supplemental items), and thus, its length is prohibitive for routine use within primary care. Alternatively, among various proposed factor structures, studies suggest that a six-item short form of the WI is optimal among English-speaking respondents (Asmundson, Carleton, Bovell, & Taylor, 2008; Welch, Carleton, & Asmundson, 2009). Because of its brevity, the WI seems particularly well-suited for use within primary care and has been used to assess health anxiety in such settings in prior investigations (Fink et al., 1999; Speckens, Spinhoven, Sloekers, Bolk, & Van Hemert, 1996).
Factorial Validity
As has been reviewed elsewhere, the adequacy of the WI’s factorial validity has been a long-standing source of debate (Welch et al., 2009). Pilowsky (1967) proposed a three-factor solution of the original 14-item dichotomously-scored (true/false) WI. Subsequent studies found divergent factor structures (ranging from one to three factors) that contained differing numbers of items (ranging from 6 items to all 14 items) of the WI (Asmundson et al., 2008; Fink et al., 1999; Hiller, Rief, & Fichter, 2002; Hinz, Reif, & Brähler, 2003; Rief, Hiller, Geissner, & Fichter, 1994; Schwarz, Witthöft, & Bailer, 2007; Speckens et al., 1996). Welch et al. (2009) proposed that discrepant WI factor structures could be the result of its dichotomous response option. Heeding Barsky et al.’s (1992) call to use a version of the WI that has a 5-point item response option, Welch et al. examined competing WI factor structures when its items were rated using a 5-point scale. Welch et al. found that Asmundson et al.’s (2008) six-item WI provided the best fit to the data among competing solutions in an English-speaking sample. A more recent study examining competing WI factor structures using a Norwegian translated version of the measure found that a different version of the WI provided a better fit to their data than did Asmundson et al.’s version (Veddegjærde, Sivertsen, Wilhelmsen, & Skogen, 2014). Such findings indicate that the WI items may function differently when translated into languages other than English. Pursuant to this study, Asmundson et al.’s version has been the most consistently supported version of the WI for use with English-speaking respondents (Asmundson et al., 2008; Welch et al., 2009). Because we were specifically interested in examining the WI among English-speaking respondents, we used Asmundson et al.’s version of the WI in the current study. For ease of interpretation, Asmundson et al.’s version of the WI is referred to as the WI-6.
The WI-6 consists of two, three-item scales that assess health worry (e.g., Do you worry a lot about your health?) and bodily/somatic preoccupation (e.g., Do you find that you are bothered by many different symptoms?), respectively. Studies using the WI-6 have found that (a) scores on the total scale and subscales show good internal consistency (Cronbach’s αs ranging from .84 to .91), (b) the total scale score correlates strongly with scores on other health anxiety scales (rs of .63 and .80), and (c) the total scale score saliently loads on a higher-order factor with scores of other health anxiety scales (Bardeen & Fergus, 2014; Fergus, 2013, 2014). A notable limitation of prior evaluations of the WI-6 is that the adequacy of its two-factor solution using a 5-point item response option has yet to be replicated since Welch et al. (2009). Other notable limitations are that Welch et al. used the full-length WI when supporting the use of the WI-6 and the factor structure of the WI-6 has only been examined among college students (Asmundson et al., 2008; Welch et al., 2009). The factor structure of the WI-6 has thus not yet been examined when administered as a standalone measure or when completed by individuals presenting for treatment within primary care. Examining short-form item pools independently of their full-length counterparts is an essential aspect of validating short-form measures (Smith, McCarthy, & Anderson, 2000). Additionally, the high personal and economic costs associated with health anxiety in primary care settings (Fink et al., 2010) highlight the importance of examining the factor structure of a practical measure of health anxiety, such as the WI-6, among respondents presenting for treatment in primary care.
Whereas preliminary research suggests the WI-6 scales have distinct correlates (Bardeen & Fergus, 2014), the two WI-6 latent factors share a large correlation (r = .70; Asmundson et al., 2008). Such findings highlight the possibility that a bifactor measurement model of the WI-6 should be considered. Researchers suggest that bifactor models are most appropriate for constructs or measures that are posited to be principally unidimensional, but there also exist smaller latent dimensions (subdomains) that have substantive value and must be specified to achieve a good-fitting solution. (Brown, 2015, p. 301)
In bifactor models, there exists a general factor (e.g., health anxiety) that accounts for the covariation among all items of a measure. Yet there also exists more than one domain-specific factor (e.g., health worry, bodily/somatic preoccupation) that accounts for unique variance in the covariation among the indicators of the respective domain-specific factor beyond the general factor (Brown, 2015; Reise, 2012; Reise, Bonifay, & Haviland, 2013; Reise, Moore, & Haviland, 2010). An accumulating body of research has found that bifactor models tend to provide a good fit to the data when examining measures of anxiety-related constructs (e.g., Ebesutani, McLeish, Luberto, Young, & Maack, 2014; Norr, Allan, Boffa, Raines, & Schmidt, 2015; Olatunji, Ebesutani, & Abramowitz, 2015). In the context of the WI-6, a potential advantage to considering a bifactor model is the ability to examine whether the highly correlated domain-specific factors matter when accounting for the general factor. For example, researchers can consider whether the domain-specific factors relate to a criterion variable while holding the general factor constant (Brown, 2015).
Measurement Invariance
Another remaining question surrounding the WI-6, and health anxiety measures more broadly, relates to whether its items function similarly in diverse respondent groups. More than a decade ago, Sue (1999) argued that our discipline has “not followed good scientific principles in assuming that findings from research on one population can be generalized to other populations” (p. 1073) and that it is simply good practice to study any groups to which we intend to apply our results. Prior research has examined the applicability of symptom measures (e.g., depression) among diverse groups of respondents in primary care (e.g., Huang, Chung, Kroenke, Delucchi, & Spitzer, 2006). Such studies align with McDaniel et al.’s (2014) call for the use of culturally sensitive measures within primary care settings.
Despite such calls, and interest in screening for health anxiety in primary care settings (Fink et al., 1999), no known published study has yet examined the invariance of a measure of health anxiety among racial/ethnic minorities. The lack of published data surrounding the invariance of the WI-6 items among racial/ethnic minorities represents a gap in the literature, both in terms of the potential use of the WI-6 in primary care settings and the broader use of the measure. The WI-6 was developed and validated using samples of predominantly (>85%) self-identifying White respondents (Asmundson et al., 2008; Welch et al., 2009). The ability for a measure to be comparable across disparate groups is imperative for its use as a screening instrument in primary care (Huang et al., 2006) and, as noted, researchers should not assume that research findings from one population generalize to another (Sue, 1999).
Potential racial/ethnic differences in health anxiety remain unexamined in the literature; however, there is reason to believe that the WI-6 may function differently among racial/ethnic minorities. For example, among self-identifying Latino respondents, Hirai, Stanley, and Novy (2006) found that a measure containing health-related worry items loaded on the same factor as measures that had items assessing somatic preoccupation. Hirai et al.’s findings raise the possibility that the separate health worry and bodily/somatic preoccupation factors found by Asmundson et al. (2008) may be better represented as a single factor among Latino respondents. Another potential limitation of the WI-6 is that its items may not be as representative of health anxiety among Black respondents. For example, Hunter and Schmidt (2010) proposed that there is generally an underreporting of cognitive symptoms of anxiety (e.g., worry) and increased prominence of catastrophic interpretations of specific somatic symptoms among individuals who self-identify as Black. As discussed, the WI-6 items pertain to health worry and bodily/somatic preoccupation, respectively. Following from Hunter and Schmidt, the WI-6 items may thus function differently among Black respondents relative to, at least, White respondents.
The Current Study
Using treatment-seeking respondents from a primary care clinic in the United States, we sought to provide the first known examination of the factor structure of the WI-6 when it was independently administered from the full-length WI. As noted, such an examination is an essential aspect of validating short-form measures (Smith et al., 2000). Using a composite sample of respondents self-identifying as Black, Latino, and White, we predicted that a bifactor model of the WI-6 would provide the best fit to the data among competing models. This prediction was based on the prior noted research suggesting the good fit of bifactor models of measures of anxiety-related constructs relative to correlated trait models (e.g., Ebesutani et al., 2014; Norr et al., 2015; Olatunji et al., 2015). To further examine the bifactor model, exploratory analyses were used to test if the domain-specific factors evidenced incremental validity in accounting for unique variance in scores on a criterion measure (in this case, a measure of somatic symptom severity) after accounting for the general factor. Prior studies examining health anxiety measures have used measures of somatic symptom severity as a criterion variable (Fergus & Valentiner, 2011; Longley, Watson, & Noyes, 2005).
We expected that using the composite sample would mask any potential structural differences in the WI-6 seen in group analyses. We thus sought to next provide the first known investigation of potential racial/ethnic differences of the WI-6 through examining its measurement invariance among self-identifying Black, Latino, and White respondents. The examination of the WI-6 among these specific groups of respondents was informed by groups of primary care respondents examined in prior studies examining the measurement invariance of symptom measures (e.g., Huang et al., 2006) as well as the practical consideration of the demographic composition of patients presenting to the primary care clinic used in this study.
Hirai et al.’s (2006) findings raise the possibility that a one-factor model of the WI-6 items may provide the best fit to the data among Latino respondents. Hunter and Schmidt’s (2010) proposals about the relative infrequency of cognitive anxiety and relative prominence of somatic symptoms among Black respondents raise the possibility that the WI-6 items could be biased indicators of health anxiety among this group of respondents. However, given the lack of prior data speaking to racial/ethnic differences in health anxiety or the measurement invariance of the WI more specifically, the examination of these potential racial/ethnic differences was considered largely exploratory. Based on prior studies examining the measurement invariance of bifactor measurement models (e.g., Ebesutani et al., 2014; Olatunji et al., 2015), we were specifically interested in the configural, metric, and scalar invariance of the WI-6.
Method
Participants
The initial sample consisted of 538 adults presenting for treatment at a community health center in a moderately sized southern United States city. Among the sample, 183 (34.0%) self-identified as Black, 177 (32.9%) as non-Hispanic White, and 173 (32.2%) as Latino. Among the remaining participants, three (0.5%) self-identified as Asian and two (0.4%) as “other.” The demographic composition of the sample was consistent with the demographic composition of patients typically presenting for treatment at the health center. About half of the sample was insured through a state/federal insurance program (50.4%), 23.3% were uninsured, 12.8% were insured through a private insurance program, and 13.5 were insured through another insurance program.
There was heterogeneity in the primary presenting problem of participants, which was determined by physician diagnoses using the clinical modification of the ninth revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-9-CM; Medicode, 1996). For descriptive purposes, we report the percentage of participants having a primary presenting problem within classes of presenting problems as defined in the ICD-CM-9. The percentage of participants having a primary presenting problem within a specific class was the following: 15.4% endocrine, nutritional, and metabolic disease; 15.2% mental disorder; 12.3% disease of circulatory system; 10.9% symptom, signs, and ill-defined condition; 9.9% external cause of injury and supplemental classification; 6.5% disease of respiratory system; 5.0% disease of genitourinary system; 3.5% infectious and parasitic disease; 3.5% disease of the musculoskeletal system and connective tissue; 3.2% disease of the skin and subcutaneous tissue; 3.0% neoplasm; 2.6% disease of the nervous system and sense organs; 2.4% disease of digestive system; 2.4% complication of pregnancy, childbirth, and the puerperium; 1.9% disease of blood and blood forming organs; 1.9% injury and poisoning; and 0.4% congenital anomalies.
The self-identifying Black, Latino, and White participants (N = 533) had a mean age of 45.5 years (SD = 17.4) and consisted of 76.2% females. Latino (M = 39.1, SD = 16.8) participants were significantly younger than Black (M = 47.1, SD = 16.7) or White (M = 49.9, SD = 17.0) participants (F[2, 530] = 19.6, p < .001). There were no gender differences among the three groups of participants (χ2[2] = 2.7, p = .26). In addition to the WI-6 (Asmundson et al., 2008; Pilowsky, 1967), participants completed the following measure.
Measure
Patient Health Questionnaire–15 (PHQ-15; Kroenke, Spitzer, & Williams, 2002)
The PHQ-15 is a 15-item self-report measure that assesses the severity of somatic symptoms. Respondents are presented with a somatic symptom (e.g., stomach pain, chest pain, dizziness) and are asked to indicate the severity of that symptom over the past month using a 3-point scale (ranging from 0 to 2). The PHQ-15 was chosen because it is considered a preferred measure to assess for the severity of somatic symptoms (Kroenke, 2007). Although all 15 items were administered, there is one item (i.e., menstrual cramps or other problems with your periods) that is completed by only female respondents. Given that our data analytic strategy involved using the PHQ-15 items to develop a single latent construct of somatic symptom severity applicable for all participants (see below), this item was excluded from analysis.
Procedure
The present research was approved by an institutional review board serving the local medical community. Prospective participants were approached by a trained research assistant in waiting rooms, where the study purpose was briefly described. Participants seeking treatment at the community health center were recruited consecutively into the study. Interested prospective participants met with a research assistant individually. After obtaining written informed consent that stressed participation was voluntary and confidential, participants completed the WI-6, PHQ-15, and other measures not related to the present study aims individually in the waiting room. A more private location was made available to participants. All participants stated that English was their preferred language for communication. Participants consented to have a research team member complete a medical record review, which was used to determine primary presenting problems. Participants were entered into a raffle to have a chance (approximately 10%) of winning a $20 gift card.
Data Analytic Strategy
Because the adequacy of Asmundson et al.’s (2008) two-factor solution of the WI-6 had yet to be replicated when its items were administered independently of the full-length WI, we initially examined the measurement models using the composite sample. We examined three measurement models. 1 The first model was a one-factor model that consisted of each of the WI-6 items loading on the factor. The second model was a correlated two-factor (trait) model that consisted of three items (1, 2, 6) with primary loadings on Factor I and three items (3, 4, 5) with primary loadings on Factor II. No secondary loadings were modeled, but the factors were allowed to intercorrelate. The third model was a bifactor model that consisted of all six items loading on a general factor (i.e., health anxiety). Additionally, three items (1, 2, 6) loaded onto the domain-specific Factor I and three items (3, 4, 5) loaded onto domain specific Factor II. The covariances of all factors were fixed to zero in the bifactor model (following Brown, 2015). Graphical depictions of one-factor, correlated trait, and bifactor models are presented elsewhere (e.g., Brown, 2015; Ebesutani et al., 2014; Reise et al., 2010). To further examine the bifactor model, we used omegaH to examine the proportion of reliable variance in WI-6 scores attributable to the general factor. We used omegaS to examine the proportion of reliable variance in WI-6 scores after controlling for the general factor. Both omegaH and omegaS were calculated following Reise et al.’s (2013) recommendations.
A structural regression model was used next to examine whether the domain-specific factors of the WI-6 related to a latent somatic symptom severity construct when holding the general factor constant. The structural regression model consisted of simultaneously modeling the bifactor model of the WI-6 described above and a latent construct of somatic symptom severity. The latent construct of somatic symptom severity consisted of the PHQ-15 items, with the exception of the one noted item that was relevant to only women, loading on a single factor. Pathway coefficients were freely estimated from the general factor to the somatic symptom severity latent construct and from the two domain-specific factors to the somatic symptom severity latent construct. The structural regression model was only tested using the composite sample because the separate groups would yield N:q ratios well below 10:1 (Kline, 2011) in the context of the structural regression model.
Tests of measurement invariance were examined through a series of measurement models, which was conducted using a multiple-groups confirmatory factor analysis framework. The aforementioned three measurement models in the composite sample were first tested separately among Black, Latino, and White participants to ensure the same measurement model was supported in each group. When considering the number of parameters estimated, the sample size in each group generally met the minimum ratio of cases to parameters (N:q) recommended by Kline (2011) of 10:1 for each measurement model. The N:q ratio for the bifactor model was slightly below Kline’s recommendation for White (9.8:1) and Latino (9.6:1) groups. Overall, though, the current sample sizes were deemed adequate, particularly given the lack of extant data speaking to the current study aims.
Next, a series of increasingly restrictive models was examined. These models included testing for equivalent (a) configural invariance (equal form); (b) metric invariance (equal factor loadings); and (c) scalar invariance (indicator intercepts; Vandenburg & Lance, 2000). When testing (a), we simultaneously examined the adequacy of the WI-6 factor structure in the three groups; when testing (b), we constrained factor loadings to equality; when testing (c), we constrained indicator intercepts to equality, as well as fixed the latent factor mean among White respondents to zero and freely estimated the latent factor mean among Black and Latino respondents. These three tests comprised the tests of measurement invariance (Brown, 2015). 2
All models were tested using LISREL 8.80 (Jöreskog & Sörbom, 2007). Tests for multivariate skewness and kurtosis were significant for some WI-6 item scores, suggesting the possibility of multivariate non-normality. Multivariate nonnormality can negatively affect results obtained when using maximum likelihood (ML) estimation. Robust ML estimation (Satorra & Bentler, 1994) was therefore used for all reported analyses, as this estimation procedure provides parameter estimates with standard errors that are robust to nonnormality (Brown, 2015). Five commonly recommended (Brown, 2015; Hu & Bentler, 1999; Kline, 2011) fit statistics were used to evaluate the models: comparative fit index (CFI), nonnormed fit index (NNFI), root mean square error of approximation (RMSEA), standard root mean square residual (SRMR), and the expected cross-validation index (ECVI). Hu and Bentler’s (1999) guidelines were used to evaluate fit: CFI and NNFI should be close to .95, RMSEA should be close to .06, and SRMR should be close to .08. Furthermore, the upper limit of the 90% RMSEA confidence interval should not exceed .10 and lower ECVI values indicate better model fit (Kline, 2011).
In addition to the fit statistics, model comparisons were evaluated as follows. First, the Satorra–Bentler scaled difference chi-square test was used (i.e., SDCS; following Brown, 2015). A significant SDCS test between two comparable models indicates a significant decrement in model fit. However, because the SDCS test is affected by sample size, model testing might result in significant SDCS tests when differences in parameter estimates are trivial in magnitude (Brown, 2015; Kline, 2011). As such, and following the recommendations of Brown (2015) and Kline (2011), we also used alternative tests for comparing models. One alternative test included examining the change in CFI (ΔCFI). Meade, Johnson, and Braddy (2008) identified a ΔCFI value of less than or equal to .002, and Cheung and Rensvold (2002) identified a ΔCFI value of less than or equal to .01 as representing functionally trivial differences in parameter estimates among models. The other model comparison test was examining RMSEA 90% confidence intervals (CIs). Differences in model fit are considered nonsignificant if models have overlapping 90% RMSEA CIs (Wang & Russell, 2005).
Results
WI-6 Measurement Models
Goodness-of-fit statistics from the WI-6 measurement model based on Asmundson et al. (2008) are presented in Table 1. This two-factor correlated (trait) model generally provided an adequate fit to the data in the composite sample. With the exception of the RMSEA being greater than .06 and the upper limit of the 90% RMSEA CI exceeding .10, each fit statistic met the specified guidelines. A one-factor model of the WI-6 generally provided a poor fit to the data. As further presented in Table 1, the fit statistics were worse for the one-factor model relative to the correlated two-factor model. Moreover, the SDCS indicated that the one-factor model provided a significantly poorer fit to the data than the correlated two-factor model. When comparing the correlated two-factor and one-factor models, the ΔCFI (.031) was larger than the specified range (i.e., .002-.01) and the RMSEA 90% CIs were not overlapping. As such, the correlated two-factor model was retained as the better fitting measurement model relative to the one-factor model. The latent correlation in the correlated two-factor model was large (r = .79, p < .001). Next, a bifactor model was fit to the data and it provided good model fit. All of its fit statistics met the specified guidelines. The bifactor model was compared with the correlated trait model (in this case, the correlated two-factor model) following prior research (e.g., Ebesutani et al., 2014). All the fit statistics were more favorable for the bifactor model relative to the correlated two-factor model. In addition, although the RMSEA 90% CIs were overlapping, the SDCS was significant and the ΔCFI (.012) was larger than the specified range (i.e., .002-.01). As such, it was deemed that the bifactor provided a better fit to the data than did the correlated two-factor model in the composite sample.
Goodness-of-Fit Statistics for Tested Models.
Note. SDCS = scaled-difference chi-square test (*p < .05); CI = confidence interval.
= compares 1-factor and correlated 2-factor models. b = compares correlated 2-factor and bifactor models. c = compares modified bifactor and bifactor models. d = compares configural and metric invariance models. e = compares metric and scalar invariance models.
The WI-6 items and factor loadings from the bifactor model are presented in Table 2. With the exception of Item 1, the WI-6 items continued to significantly load on the domain-specific factors after controlling for the general factor. However, the factor loadings on the domain-specific factors were almost all lower relative to the factor loadings on the general factor. The omegaH estimate for the general factor was .74 (error variance is 12%), indicating that 84% of the reliable variance in WI-6 scores was attributable to the general factor. The omegaS estimate for Factor I was .16 (error variance 21%) and the omegaS estimate for Factor II was .28 (error variance 17%), indicating that 20% and 34% of reliable variance in Factor I and Factor II scores, respectively, was independent of the general factor.
Completely Standardized Factor Loadings From the Bifactor Model.
Note. G = general factor; Factor I = health worry; Factor II = bodily/somatic preoccupation. Loadings in boldface significant at p < .05.
As discussed by Brown (2015), bifactor modeling will sometimes indicate the psychometric irrelevance of a domain-specific factor. When considering the bifactor model of the WI-6, it seemed possible that domain-specific Factor I may be irrelevant when accounting for the general factor. Following from Brown, we respecified the original bifactor model such that domain-specific Factor I was removed and its items were modeled to load only on the general factor. In this modified bifactor model, the specification of domain-specific Factor II was unchanged from the original bifactor model. Results from the modified bifactor model are presented in Table 1. With the exception of the RMSEA and the RMSEA 90% CI, each fit statistic met the specified guidelines. However, all the fit statistics of the modified bifactor model indicated poorer fit relative to the original bifactor model. Although the RMSEA 90% CIs were overlapping, the SDCS was significant and the ΔCFI (.015) was larger than the specified range (i.e., .002-.01) when comparing those two models. As such, the original bifactor model was retained as the better fitting model relative to the modified bifactor model. To fully represent domain-specific Factor I, and to avoid the potential underidentification of this factor in subsequent analyses, Item 1 of the WI-6 was modeled as loading on both the domain-specific Factor I and the general factor even though this item evidenced a nonsignificant loading on the domain-specific factor in the bifactor model.
Structural Regression Model
We examined the incremental validity of the domain-specific factors above the general factor in the composite sample. As noted, the structural regression model simultaneously modeled the bifactor model of the WI-6 and a latent construct of somatic symptom severity using 14 items from the PHQ-15. Pathway coefficients were freely estimated from the general factor to the somatic symptom severity latent construct and from the two domain-specific factors to the somatic symptom severity latent construct. The structural regression model generally provided an adequate fit to the data (χ2 = 602.83, SB χ2 = 574.05, df = 161; RMSEA = .069 [90% CI = .063-.076]; CFI = .95; NNFI = .94; SRMR = .055; ECVI = 1.26). The RMSEA was slightly above specified guidelines and the NNFI was slightly below specified guidelines. The remaining fit indices all met specified guidelines. Pathway coefficients indicated that Factor II (β = .50, p < .001), but not Factor I (β = .15, p = ns), related to the latent somatic symptom severity construct when holding the effect of the general factor (β = .51, p < .001) constant.
Measurement Invariance of WI-6
As a precursor to tests of measurement invariance, we initially examined the fit of the above examined measurement models separately in each group. As presented in Table 1, the correlated two-factor model seemed to be a better fitting model relative to the one-factor model across Black, Latino, and White respondents. With the exception of the RMSEA among Black respondents and the upper limit of the RMSEA 90% CI among both Black and White respondents, each fit statistic met the specified guidelines for the correlated two-factor model across all three groups. As further presented in Table 1, the fit statistics were worse for the one-factor model relative to the correlated two-factor model. Moreover, the SDCS indicated that the one-factor model provided a significantly poorer fit to the data than the correlated two-factor model among each group. Although there were overlapping RMSEA 90% CIs across groups, the ΔCFI was generally larger than the specified range (i.e., .002-.01) when comparing the one-factor and correlated two-factor models across groups. Of note, the ΔCFI in the Latino group was .009 when comparing the correlated two-factor and one-factor models. This value would suggest a significant decrement in model fit according to Meade et al. (2008), but not according to Cheung and Rensvold (2002). Overall, the correlated two-factor model was retained as the better fitting measurement model relative to the one-factor model in each group because of the collective consideration of the SDCS, magnitude of fit statistics, and ΔCFI. The latent correlation in the correlated two-factor model was large in each group (Black: r = .84; Latino: r = .85; White: r = .70; ps < .001).
Next, a bifactor model was fit to the data and it provided good model fit across groups. All the fit statistics met the specified guidelines. The improvement in model fit between the bifactor and correlated two-factor model was most evident among Black respondents. Although the RMSEA 90% CIs were overlapping, the SDCS was significant and the ΔCFI (.025) was larger than the specified range (i.e., .002-.01) when comparing the models. The improvement in model fit between the bifactor and correlated two-factor model was generally equivocal among Latino and White respondents. In both groups the SDCS was nonsignificant, the RMSEA 90% CIs were overlapping, and the ΔCFI was not larger than the specified range (i.e., .002-.01). Of note, the ΔCFI in the White group was .006 when comparing the bifactor and correlated two-factor models. This value would suggest a significant decrement in model fit according to Meade et al. (2008), but not according to Cheung and Rensvold (2002).
The modified bifactor model was fit to the data next in each group. This model received the most support among Latino respondents; however, it provided a poorer fit to the data compared with the original bifactor model among both Black and White respondents. Among those two groups, the modified bifactor model evidenced a decrement in each fit index relative to the original bifactor model, the SDCS was significant, and the ΔCFI was larger than the specified range (i.e., .002-.01) relative to the original bifactor model.
Overall, we decided to retain the original bifactor model as the best fitting model across all three groups for the following reasons. First, the bifactor and correlated trait models both specify the presence of health worry and bodily/somatic preoccupation factors. The plausibility of subscales can be more directly examined in a bifactor model (e.g., through use of omegaH and omegaS values) relative to a correlated trait model (Reise et al., 2010). Second, as further noted by Reise et al. (2010), the bifactor model allows for tests of measurement invariance at both the general and domain-specific factor levels. The original bifactor model provided good model fit among Latino respondents, although so did the modified bifactor model. Retaining the original bifactor model among Latino respondents allowed for a comparison of measurement invariance across all three groups of respondents. The factor loadings from the bifactor model are presented in Table 2. With the exception of Item 1 across all three groups and Item 2 in the Latino group, the WI-6 items continued to significantly load on the domain-specific factors after controlling for the general factor. However, the factor loadings on the domain-specific factors were almost all lower relative to the factor loadings on the general factor. The omegaH estimate for the general factor was .75 (error variance is 12%), .77 (error variance is 12%), and .71 (error variance is 10%) across Black, Latino, and White respondents, respectively. The omegaH values indicate that between 79% and 88% of the reliable variance in WI-6 scores is attributable to the general factor. The omegaS estimate for Factor I was .19 (error variance 18%), .11 (error variance 18%), and .25 (error variance 18%) across Black, Latino, and White respondents, respectively. The omegaS estimate for Factor II was .23 (error variance 21%), .23 (error variance 17%), and .32 (error variance 16%) across Black, Latino, and White respondents, respectively. The omegaS scores indicate that between 13% and 38% of reliable variance in the domain-specific factor scores is independent of the general factor.
Results from tests of measurement invariance of the bifactor model are presented in Table 1. The model with constraints testing for equal form (configural invariance) suggested equivalence between the three groups. All the fit indices met specified guidelines. The model with constraints testing for equal factor loadings (metric invariance) and equal indicator intercepts (scalar invariance) suggested equivalence between the three groups. All the fit indices met specified guidelines in both models. Of note, the RMSEA of the equal intercepts model was lower than the RMSEA of the equal factor loadings model. Brown (2015) described the possibility of a more parsimonious model that constrains previously freed parameters to equality, coupled with trivial changes in χ2, leading parsimony fit indices, such as the RMSEA, to show improvements. Overall, the fit statistics for each model met specified guidelines. The nonsignificant SDCS, the ΔCFIs being .000, and to a lesser degree the overlapping RMSEA 90% CIs (given that nearly all measurement models had overlapping RMSEA 90% CIs in the current study), further supported the presence of configural, metric, and scalar invariance.
Discussion
We examined the factor structure and measurement invariance of Asmundson et al.’s (2008) WI-6 among Black, Latino, and White respondents presenting for treatment in a primary care clinic. The adequacy of the correlated two-factor (trait) solution found by Asmundson et al. was generally replicated in this primary care sample. This finding is notable, as the current study was the first to examine the factor structure of the WI-6 items when they were administered independently of the full-length item pool. Extending prior studies examining the factor structure of the WI-6, we provide the first known examination of a bifactor model of the measure. Results supported a bifactor model of the WI-6, consisting of a general factor (health anxiety) and two domain-specific factors (health worry and bodily/somatic preoccupation). The general factor accounted for the overwhelming major of variance in WI-6 scores (84%), suggesting that the domain-specific factors may have limited added value beyond the general factor. A modified bifactor model consisting of a general factor and only a bodily/somatic preoccupation domain-specific factor generally did not provide as good of model fit as the original bifactor model. As such, the health worry domain-specific seems psychometrically relevant beyond the general factor. Importantly, though, tests of incremental validity found that the bodily/somatic preoccupation, but not the health worry, factor accounted for unique variance in somatic symptom severity scores beyond the general factor.
The incremental validity results indicate that potentially important information may be lost by using the total scale instead of the subscale scores of the WI-6. Future research should continue to examine the incremental explanatory power of the domain-specific factors, especially seeking to examine if the health worry factor contributes to our understanding of certain criterion variables beyond the general factor. For example, reassurance seeking is commonly seen among individuals with elevated health anxiety (Taylor & Asmundson, 2004) and is highly relevant to worry (Beesdo-Baum et al., 2012). It is important to note that the health worry items generally only modestly or did not significantly load on the domain-specific factor after accounting for the general factor. Alternatively, the bodily/somatic preoccupation items continued to significantly load on the domain-specific factor after accounting for the general factor. As such, the bodily/somatic preoccupation factor may be the only domain-specific factor meaningfully distinct from the general factor.
Results demonstrated that a bifactor model of the WI-6 provided good model fit across Black, Latino, and White respondents. Although support for the bifactor model, relative to the correlated two-factor (trait) model, was greatest among Black respondents, the bifactor model was ultimately retained across all three groups. It is important to note that the modified bifactor model appeared viable among Latino respondents. As such, it is possible that the psychometric relevancy of the health worry domain-specific may differ across groups of respondents. Future research should continue to examine this possibility. A reason for retaining the bifactor model across all three groups included that the bifactor model can more directly speak to the plausibility of subscales, relative to the correlated two-factor model, by including a general factor in addition to domain-specific factors (Reise et al., 2010). Another advantage to examining the bifactor model compared with the correlated two-factor model of the WI-6 is that the bifactor model allowed for tests of measurement invariance at both general and domain-specific factor levels (Reise et al., 2010). Retaining the original bifactor model relative to the modified bifactor model among Latino respondents allowed for an examination of the measurement invariance of that model across all three groups of respondents.
The present results are the first known examination of potential racial/ethnic differences on a measure of health anxiety. Measurement invariance of the bifactor model was supported. Those findings indicate that the WI-6 has the same meaning and structure across Black, Latino, and White respondents. In examining the bifactor model across all three groups, the general factor accounted for an overwhelming amount of variance (between 79% and 88%) in WI-6 scores. These results parallel those found with the composite sample. Unfortunately, the sample sizes across the three groups were not large enough to separately examine the incremental validity of the domain-specific factors relative to the general factor in each group. Whereas the incremental validity analyses in the composite sample support the use of WI-6 subscales, future research is needed to examine whether that same pattern of findings holds when data from Black, Latino, and White respondents are separately analyzed.
These findings have important implications, as the ability for a measure to be comparable across disparate groups is imperative for its use as a potential screening instrument in primary care (Huang et al., 2006). As discussed, severe health anxiety poses personal and economic challenges for patients within primary care settings (Fink et al., 2010) and therefore screening for health anxiety in such settings could be beneficial (Fink et al., 1999). The brevity of the WI-6 compared with other versions of the measure should not be overlooked, as efficient assessment tools are highly valued in primary care settings (Kroenke et al., 2003). The invariance of the WI-6 indicates that higher or lower scores on the measure can be considered to reflect true group differences in the construct rather than the respective measure having items that preclude individuals from specific groups responding in comparable ways (e.g., Baas et al., 2011).
To date, a seven-item version of the WI (i.e., WI-7) has been most commonly used in primary care settings (Fink et al., 1999; Fink et al., 2010). Although the WI-7 was not examined in this study, the extant literature points to at least two potential advantages of using the WI-6 relative to the WI-7. For example, the WI-7 is a unifactorial measure of health anxiety. If future research replicates and extends findings that the domain-specific factors of the WI-6 add unique information beyond the general factor, the WI-6 may provide a more fine-grained assessment of health anxiety in primary care settings. Furthermore, research comparing the WI-6 and WI-7 has found that the WI-6 produces a better fit to the data when the items are responded to using either a dichotomous (Asmundson et al., 2008) or 5-point (Welch et al., 2009) scale among English-speaking respondents. As noted, a recent study found that the WI-6 did not provide the best fit to the data using a Norwegian translated version of the full-length WI (Veddegjærde et al., 2014). Such findings raise the possibility that different versions of the WI may be preferred based on the preferred language for communication of respondents. Users of the WI should keep this possibility in mind when considering the preferred version of the measure for their purposes.
Study limitations must be acknowledged. Although we had nearly equal sample sizes, which are desirable when conducting multiple-groups confirmatory factor analyses, the samples were modest in size. It will be important for future studies to investigate the WI-6 among larger samples of the groups examined in this study, as well as other racial groups. Our study is further limited because we did not assess other aspects of cultural identification and, thus, the degree to which respondents were representative of their self-identified group is unknown. Self-identified race is a broad variable that offers only a starting point; our grouping of respondents did not allow us to examine potentially important within-group differences. It is important to note that Latino respondents were younger than the other two groups, which did not appear to lead to structural differences or differential item functioning, in terms of indicator intercepts, of the WI-6 items. MIMIC models, which can allow for an examination of covariates within a multiple-groups confirmatory factor analysis framework, only provide a limited examination of measurement invariance (i.e., only equal indicator intercepts; Brown, 2015) and, thus, were not used. Finally, considering possible idiosyncrasies of the sample (e.g., geographic location, diverse presenting problems, high representation of females), the generality of findings would be supported by replication in other primary care settings.
Limitations notwithstanding, the present results provide preliminary support for the measurement invariance of the WI-6 among Black, Latino, and White respondents. The WI-6 is brief and allows for the operationalization of health anxiety as a multidimensional construct. Use of the WI-6 in primary care settings may lead to improvements in how health anxiety is assessed, and ultimately treated, within such settings. More broadly, the invariance of the WI-6 may allow future research to examine potential racial/ethnic differences in health anxiety.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
