Abstract

Recently, bifactor modeling applications in clinical measurement have proliferated (e.g., Caspi et al., 2014; Lahey et al., 2012; Simms, Grös, Watson, & O’Hara, 2008; Vanheule, Desmet, Groenvynck, Rosseel, & Fontaine, 2008). It is critical, however, to distinguish between two types of applications. The first focuses on using the bifactor model as a tool for understanding the psychometrics of an assessment scale (see Rodriguez, Reise, & Haviland, 2016a, 2016b). This type has proven invaluable for informing the degree to which a measure yields an univocal total score (Reise, Moore, & Haviland, 2010) and, relatedly, the extent to which subscales representing theoretically distinct constructs (i.e., group factors) yield reliable scores after accounting for the general factor (Reise, Bonifay, & Haviland, 2013).
The second is far more ambitious and leverages a bifactor model to represent the general and group factor structure of an entire domain of psychological functioning. This is the type in question in the target article (Snyder, Young, & Hankin, 2016, this issue), wherein correlations among psychopathology items are modeled as reflecting a single psychopathology dimension, p, as well as orthogonal internalizing/externalizing group factors. In this and related studies, it is the structure, rather than the psychometric properties, that is of paramount theoretical interest. We raise three issues with bifactor modeling as it is applied to the “structure of psychopathology”—interpretability, model fit, and validation—and we point to recent psychometric tools for bifactor model evaluation.
In the initial development of the bifactor method, Holzinger and Swineford (1937) stated that group factors are derived from the residual correlations that remain after extracting the general factor. Although estimation methods have changed radically since 1937, interpretation of group factors that are orthogonal to a general factor remains challenging. In certain applications (e.g., Cho, Cohen, & Kim, 2014), group factors are not viewed as meaningful subconstructs of a test but rather as methodological “nuisances” that impede measurement of the primary construct of interest. Assuming, however, that group factors are meaningful, how should they be interpreted? They must be construed as substantively unique, measuring subconstructs exclusive to the general factor. Snyder et al. (2016, this issue) and others conclude that the structure of psychopathology includes internalizing/externalizing group factors; it is unclear whether, or to what degree, these factors can be interpreted as traits orthogonal to the p factor.
It may also be difficult to interpret the general dimension in a bifactor model. Previous research has noted that a positive manifold does not imply a single general causal structure (e.g., a single neuropsychobiological structure that causes variation across content-diverse indicators; van der Maas et al., 2006). Although strong correlations among measures may suggest a bifactor structure, that does not imply that such a structure exists at the genotypic level (e.g., Cohen, Cohen, Teresi, Marchi, & Velez, 1990). Thus, it could be that the internalizing/externalizing factors identified by Snyder et al. (2016, this issue) are interpreted correctly but that the emergence of a general p factor, rather than being generated by a single general latent trait, is the result of some different process altogether. Researchers must carefully investigate such issues before considering the bifactor model seriously as a foundational structure for clinical research.
Of particular concern is the bifactor model’s tendency to show superior goodness of fit in model comparison studies. In Snyder et al. (2016, this issue), the bifactor structure outperformed both the unidimensional and the two (correlated) factor alternatives regarding goodness of fit and was thereby selected as the best representation of psychopathology. However, the superior performance of the bifactor model may be a symptom of “overfitting”—that is, modeling not only the important trends in the data but also capturing unwanted noise. Bonifay and Cai (under review) demonstrated that the bifactor model has a high propensity to fit any possible data, even when the data follow random patterns. Reise, Kim, Mansolf, and Widaman (under review) also revealed that the bifactor model appears to fit better, not necessarily because it better accounts for valid response variability but rather because it accommodates nonsense response patterns. What Snyder et al. admire as the “robustness” of the bifactor model may in fact be reflective of an inbuilt tendency to overfit data. Moreover, Murray and Johnson (2013) showed that standard statistical fit indices can be biased in favor of the bifactor model, even if the true population model follows a different structure. Thus, we echo the advice of Murray and Johnson: “Decisions as to which model to adopt . . . should not rely on which is better fitting” (p. 407).
A final concern involves the validity of the model. Structural models of a particular form are specified, estimated, and argued for in order to create (not discover) latent variables that serve as proxies for individual differences on psychological traits. Demonstrating the fit of a bifactor model, relative or absolute, is therefore not sufficient evidence to validate the latent variable. Regarding Snyder et al. (2016, this issue), good fit is not sufficient to advance claims that the structure of psychopathology, as it works in the brain, follows a bifactor structure. To validate the model, additional evidence, beyond “construct validation approaches and associations in . . . nomological networks” (Snyder et al., 2016, this issue, p. 4) is required. We propose that structural models for psychopathology must be validated by demonstration that the psychobiological structure exists and that changes in the functioning of that structure lead to changes in the latent variable (Borsboom, Mellenbergh, & van Heerden, 2003, 2004). In this regard, cognitive neuroscience, behavioral biology, and similar fields are likely to play a more important role than statistics.
In sum, bifactor models are methodologically controversial due to their difficult interpretability and tendency to overfit data. However, that is not to say they should never be used. Rather, researchers who consider a bifactor model should proceed with caution—not only by avoiding tenuous interpretations of the latent dimensions and resisting overemphasis on good fit but also by exploring various statistics for evaluating their model. Rodriguez et al. (2016a) present a practical guide to several psychometrically informative bifactor-derived statistics that are easy to calculate (see also Rodriguez et al., 2016b). We encourage Snyder et al. (2016, this issue) and other proponents of the p factor to explore these indices, as they may provide support for (or against) the appropriateness of the bifactor structure of psychopathology.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
