Abstract
We advanced several “riskier tests” of the validity of bifactor models of psychopathology, which included that the general and specific psychopathology factors should be reliable and well represented by their respective indicators and that including a general factor should improve on the correlated factor model’s external validity. We compared bifactor and correlated factors models of psychopathology using data from a community sample of youth (N = 2,498) whose parents provided ratings on psychopathology and theoretically relevant external criteria (i.e., personality, aggression, antisociality). Bifactor models tended to yield either general or specific factors that were unstable and difficult to interpret. The general factor appeared to reflect a differentially weighted amalgam of psychopathology rather than a liability for psychopathology broadly construed. With rare exceptions, bifactor models did not explain additional variance in first-order psychopathology symptom dimensions or external criteria compared with correlated factors models. Together, our findings call into question the validity of bifactor models of psychopathology and the p factor more broadly.
Keywords
The lack of an empirically supported structure of psychopathology is a persistent challenge to the study and treatment of psychopathological conditions (Cuthbert, 2005; Krueger & Markon, 2006; Widiger, 2005). Contemporary taxonomies that enlist quantitative methods to yield empirically based structures of psychopathology tend to be hierarchically organized (e.g., Kotov et al., 2017; Krueger et al., 2018; Wright et al., 2013). These models begin with signs and symptoms at the bottommost levels of the hierarchy, which are then grouped into broader dimensions on the basis of their covariation at increasing levels of the hierarchy. In recent years, many researchers have turned their focus to the ostensible uppermost level of the hierarchy, the general factor of psychopathology (Caspi et al., 2014; Lahey et al., 2012). Hereafter, we refer to the general factor of psychopathology as the p factor when discussing it as a substantive construct and as the general factor when discussing it as a methodological construct within the context of bifactor models so as not to conflate bifactor models with theory surrounding the p factor and empirical investigations of the p factor using other approaches (e.g., Forbes et al., 2017; Kim & Eaton, 2015). 1
Although the precise nature of the p factor is not yet understood, it is thought to reflect a broad liability toward psychopathology. Since the “discovery” of the p factor in 2011–2012, the p factor literature has proliferated (see Caspi & Moffitt, 2018, for a review) and almost always emerges from bifactor models of psychopathology (cf. Forbes et al., 2017; Kim & Eaton, 2015). Notwithstanding warnings from quantitative methodologists and social scientists alike (Bagby, Taylor, Quilty, & Parker, 2007; Bonifay, Lane, & Reise, 2017; Morgan, Hodge, Wells, & Watkins, 2015; Murray & Johnson, 2013; Vanheule, Desmet, Groenvynck, Rosseel, & Fontiane, 2008), this near exclusive reliance on bifactor models persists. Herein, we argue that much of this literature has developed incautiously in that it tends to assert the validity of bifactor models of psychopathology, and the p factor that emerges from them, on the basis of preferential model fit. In this study, we offer a set of criteria that better assess the validity of the bifactor models of psychopathology, in turn placing these models and the p factor at stronger theoretical risk (Meehl, 1978; Popper, 1959). Ultimately, we hope that these “riskier tests” will aid researchers in adjudicating structural models of psychopathology more broadly.
Hierarchical Models of Psychopathology
Dating back to at least the 1960s, research has organized child and adult psychopathology hierarchically, with the chief levels of the hierarchy reflecting at least two broad spectra termed externalizing and internalizing (Achenbach, 1966; Achenbach & Edelbrock, 1984; Krueger & Markon, 2006). 2 Externalizing represents conditions characterized by poor behavioral and emotional control, including but not limited to antisocial behavior and substance use in adults and conduct disorder, attention-deficit/hyperactivity disorder, and oppositional defiant disorder in youth. Internalizing represents conditions characterized by elevated levels of negative emotionality, including anxiety, depression, and phobias. Subsequent research suggests that internalizing can be parsed into distress, which includes depression, anxiety, and posttraumatic stress disorder, and fears, which includes phobias and panic disorder (Clark & Watson, 2006; Krueger, 1999). Cross-sectional and longitudinal efforts to disentangle externalizing and internalizing have revealed that the two are separable but substantially overlapping (meta-analytic r = .50; Krueger & Markon, 2006).
Many authors have adopted the p factor as the explanation for this covariation. To their credit, this explanation is relatively intuitive because the shared variance among externalizing and internalizing spectra ought to reflect what is common to psychopathology. Thus, the recent psychopathology literature has proceeded by including the p factor in structural models of psychopathology, where it has now been studied across most developmental periods, including childhood (e.g., Lahey et al., 2015; Olino, Dougherty, Bufferd, Carlson, & Klein, 2014; Waldman, Poore, van Hulle, Rathouz, & Lahey, 2016), adolescence (e.g., Carragher et al., 2016; Laceulle, Vollebergh, & Ormel, 2015; Murray, Eisner, & Ribeaud, 2016; Patalay et al., 2015), and adulthood (e.g., Caspi et al., 2014; Lahey et al., 2012).
Nevertheless, there are a number of plausible broad interpretations of the p factor, including that it reflects the broad constructs of maladaptivity, distress, or impairment or is an artifact of social desirability (Caspi & Moffitt, 2018; Widiger & Oltmanns, 2017a). Tacitly implied in much of the existing research is that the p factor reflects a singular causal mechanism that underlies all forms of psychopathology. Researchers have posited a number of candidates for this mechanism, including poor emotional and behavioral control (Carver, Johnson, & Timpano, 2017), the tendency to experience negative emotionality (Widiger & Oltmanns, 2017b), and poor cognitive ability or disordered thought (Caspi & Moffit, 2018). Although research has yet to converge on any one of these explanations, the majority of validation research has proceeded presupposing that the p factor is substantive and meaningful given that it has been established as a correlate of “real-world life outcomes” (Caspi & Moffitt, 2018, p. 835).
The p factor has been linked to a broad swath of ostensible transdiagnostic risk factors, downstream sequelae, and other correlates, including increased family history of psychiatric conditions (Caspi et al., 2014; Lahey et al., 2012; Martel, Pan, et al., 2017); maternal smoking during pregnancy, harsh parental discipline, and peer delinquency (Waldman et al., 2016); self-reported physical and sexual abuse (Lahey et al., 2012); impairment (Caspi et al., 2014); neuroticism (Caspi et al., 2014; see also Widiger & Oltmanns, 2017b); impulsivity, sensation seeking, and hopelessness (Castellanos-Ryan et al., 2016); decreased intelligence, executive functioning, early life brain integrity (Caspi et al., 2014; Martel, Pan, et al., 2017; Snyder, Young, & Hankin, 2017); academic performance (Lahey et al., 2015); and conscientiousness and agreeableness (Caspi et al., 2014). Other research has demonstrated that the p factor is moderately heritable (Neumann et al., 2016; Waldman et al., 2016) and that it exhibits strong homotypic continuity across time (Snyder et al., 2017).
Bifactor Models and Their Applications to Psychopathology
Just as there are several interpretations of the p factor, there are at least two possible structural representations of psychopathology that accommodate a p factor: the higher-order factor model and the pervasively used bifactor model (Brunner, Nagy, & Wilhelm, 2012; Mansolf & Reise, 2017; see Markon, 2019, for a thorough review). Both of these models construe psychopathology as comprising multifaceted, hierarchically organized constructs. In addition, these models are more comprehensive than correlated factors models insofar as they consider multiple levels of the psychopathology hierarchy.
In higher-order psychopathology models, the covariation among first-order symptom dimensions is influenced and explained by second-order latent factors, such as externalizing and internalizing, whose covariation is in turn influenced by a superordinate third-order latent factor, such as the p factor. In this model, the effects of the p factor on individual psychopathology dimensions (i.e., generalized anxiety) are completely mediated by second-order, more specific factors (i.e., distress, internalizing).
In contrast, the bifactor model decomposes covariation among psychopathology into two types of factors, general and specific (Holzinger & Swineford, 1937). The general factor reflects a single source of common variance among all indicators in the model, meaning that the general factor directly influences all individual psychopathology symptom dimensions. Any remaining covariation among indicators is decomposed into residual specific factors that influence and explain the covariance among subsets of psychopathology that are similar in content, such as externalizing and internalizing. In this model, the general factor is orthogonal to the specific factors, which are themselves held to be orthogonal given that all covariation among the psychopathology indicators is assumed to be captured by the general factor. In most applications, the general factor represents the central construct of interest, whereas the specific factors represent more conceptually related subdomains or even nuisance method factors (Markon, 2019).
The lion’s share of the existing literature has relied on bifactor models of psychopathology when detecting and validating the p factor (cf. Forbes et al., 2017; Kim & Eaton, 2015), which may be due to several factors. Compared with the hierarchical model, bifactor models are generally easier to specify in latent variable models and are not burdened by the same proportionality constraints required of hierarchical models (Gignac, 2016; Mansolf & Reise, 2017). Nevertheless, there are several reasons that bifactor models of psychopathology are problematic.
First, bifactor models are vulnerable to overfitting data, engendering their parameter estimates unstable (e.g., Bonifay, 2015; Bonifay et al., 2017; Reise, Kim, Mansolf, & Widaman, 2016). Second, various simulation studies have demonstrated that bifactor models are prone to fitting any possible data, including potentially invalid (Reise et al., 2016) or random response patterns (Bonifay & Cai, 2017). Third, numerous simulation studies have demonstrated that bifactor models show preferential fit compared with higher-order and correlated factors models even when the population (“true”) model does not follow a bifactor structure (Greene et al., in press; Maydeu-Olivares & Coffman, 2006; Morgan et al., 2015; Murray & Johnson, 2013). Together, mounting evidence from these and other studies suggests that model fit statistics are unreliable indicators of the validity of bifactor models.
Riskier Tests of the Validity of the Bifactor Model of Psychopathology
As testament to its popularity, the seminal articles on the p factor and bifactor models of psychopathology (Caspi et al., 2014; Lahey et al., 2012) together have been cited more than 1,100 times. Others have integrated the p factor into a contemporary hierarchical taxonomy of psychopathology (HiTOP; Conway et al., 2019) that was previously agnostic with regard to the nature and number of highest order psychopathology dimensions (Kotov et al., 2017). Still others have integrated the p factor into structural models of allied constructs in subdisciplines of clinical psychology, including personality and personality disorders (Oltmanns, Smith, Oltmanns, & Widiger, 2018) and executive functioning (Snyder, Miyake, & Hankin, 2015).
Emerging from this literature are two general findings, namely, that (a) bifactor models of psychopathology are better fitting than the widely studied correlated factors models and (b) the general factor in a bifactor model of psychopathology is significantly associated with a broad swath of theoretically relevant correlates, presumably substantiating it as a valid psychopathology construct (i.e., the p factor; Caspi & Moffitt, 2018). At the same time, the field’s near exclusive reliance on bifactor model examinations of the p factor warrants scrutiny in light of the aforementioned criticisms and limitations associated with such models. In this article, we propose a set of riskier tests that more closely evaluate the structural properties and validity of bifactor models of psychopathology.
The present study serves as a demonstration of these riskier tests, using data from a large community-based sample of children and adolescents whose parents completed ratings of diagnostic criteria for common childhood disorders from the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM–IV; American Psychiatric Association, 1994). Parents also provided ratings on theoretically relevant external criteria, including temperament, aggression, and antisociality. After evaluating a set of alternative structural models of psychopathology using confirmatory factor analysis (CFA) and conventional fit indices, we consider each riskier test in turn, contrasting each model using the following set of criteria.
Test 1: If the general factor in a bifactor model reflects broad liability for psychopathology, it should be relatively equally represented by its constituent indicators. Presuming the p factor reflects a unitary process general to all forms of psychopathology, the general factor in a bifactor model should be well represented by its indicators, and its influence on these indicators should be relatively uniform (i.e., tau equivalent; Graham, 2006). In contrast with this principle, the general factor within a bifactor model tends to be underrepresented by some forms of psychopathology (i.e., fears) and overrepresented by others (e.g., distress, thought problems, autism spectrum disorders; Caspi & Moffitt, 2018).
Hypothesis 1: We hypothesize that the general factor within a bifactor model will not be influenced equally by all psychopathology dimensions and instead will be defined primarily by distress (Oltmanns et al., 2018) and antagonism-imbued forms of externalizing (i.e., conduct disorder, oppositional defiant disorder). We predicted that antagonism-imbued forms of externalizing would load highly onto the general factor given that our demonstration relies on youth data and antagonism and neuroticism tend to be more highly correlated and thus more difficult to disentangle in youth compared with adults (Tackett, Kushner, De Fruit, & Mervielde, 2013; Waldman et al., 2018).
Test 2: A bifactor model should produce reliable specific factors that are well represented by their constituent indicators. After accounting for the shared variance among psychopathology symptom dimensions captured by the general factor, a bifactor model should produce reliable specific factors that are well represented by their constituent indicators. By reliable, we mean that a specific factor should show high convergent validity and thus be characterized by appreciable and statistically significant factor loadings, thereby increasing the likelihood that the specific factor will be replicable across studies (Hancock & Mueller, 2001; Rodriguez, Reise, & Haviland, 2016).
Rodriguez and colleagues (2016) described a number of indices available for quantifying latent factor reliability. H, the most appropriate index for bifactor model factor reliability, conceptually reflects the extent to which a latent variable is represented by its indicators and thus how likely it is to be replicated across studies. More specifically, H is the proportion of variance explained in the indicators by the latent factor divided by the variance unexplained by the latent factor. It ranges from 0 to 1 and increases as a function of the magnitude of factor loadings and the number of indicators. Compared with coefficient omega (McDonald, 1999), another popular index of latent factor reliability, H reflects the correlation between a factor and an optimally weighted, as opposed to unit-weighted, item composite, making it better suited for latent variable modeling (see Rodriguez et al., 2016, for a more thorough discussion and formula).
Few studies have reported reliability indices for factors derived from bifactor models of psychopathology. In one exception reported by Martel, Pan, and colleagues (2017), only the general factor in a bifactor model with three correlated specific factors (fears, distress, and externalizing) was characterized by highly reliable variance on the basis of coefficient omega. Stated differently, there was little reliable variance in psychopathology indicators after taking into account the general factor. Although overlooked in the literature, these findings raise the possibility that bifactor models produce unreliable specific factors that are not likely to be replicated across studies (see Rodriguez et al., 2016, for a broader demonstration), which is a worthy concern given that researchers are interested in the nature and correlates of a bifactor model’s specific factors in addition to the general factor.
Hypothesis 2: We hypothesize that bifactor models will yield specific factors of decreased reliability relative to both the general factor in the model and the factors yielded in a correlated factors model. Even further, we expect that specific factors’ reliability will fall below the benchmark level of acceptable reliability (Rodriguez et al., 2016).
Test 3: Including a general factor should improve on the external validity of the correlated factors model. As we noted earlier, much research has proceeded with validating the general factor of psychopathology in a bifactor model by examining its relations with a number of theoretically relevant external criteria (e.g., personality, intelligence). We agree in principle that examining a bifactor model of psychopathology’s correlates is a useful first step in validating the general factor, but we emphasize the virtue of comparing the external validity of the bifactor model of psychopathology with alternative structural models of psychopathology. Although it is common practice to report model fit for multiple alternative structural models of psychopathology, it is uncommon to report the external validity of multiple alternative models within the same article (cf. Caspi et al., 2014; Martel, Pan, et al., 2017).
This practice overlooks two important possibilities, namely, that a bifactor model might not explain more variance in (a) its indicators and (b) external criteria than a correlated factors model. Should bifactor and correlated factors models of psychopathology explain an equivalent amount of the variance in their constituent psychopathology indicators, it would indicate that bifactor models merely redistribute aspects of psychopathology into a greater or different number of factors. Simply redistributing variance in psychopathology renders it unlikely that bifactor models outperform correlated factors models in terms of explained variance in external criteria.
Hypothesis 3: Compared with correlated factors models, we hypothesize that bifactor models of psychopathology will neither explain more variance in psychopathology indicators nor explain more variance in external criteria.
Method
Participants and procedure
The present sample comprises 2,498 individuals (51% female) aged 4 to 17 (mean age = 8.6 years, SD = 2.9) whose families participated in the Georgia Twin Registry and provided ratings on a battery of psychopathology and relevant external criteria for each child in the household (Waldman, Rhee, Feigon, & Bar, 1998). Mothers typically completed the questionnaires (53%), and the remaining questionnaires were completed either by fathers (1%) or by mothers and fathers together (46%). Seventy-one percent of the sample were twin pairs (45% MZ, 55% DZ), and 29% were siblings of twins. Eighty-two percent of the participants were White, 11% were African American, 1% were Hispanic, and 6% were other or mixed ethnicity.
Measures
Psychopathology
The Emory Combined Rating Scale (Waldman et al., 1998) assesses DSM–IV symptoms of common youth psychiatric disorders by means of ratings of how much a statement is representative of the child, using a scale ranging from 0 (not at all) to 4 (very much). Items were summed into continuous symptom dimension composites. Included in this measure were anxiety disorders (i.e., agoraphobia, generalized anxiety disorder, obsessions, compulsions, panic disorder, social phobia, specific phobia), mood disorders (i.e., major depression), attention-deficit and disruptive disorders (i.e., attention-deficit hyperactivity disorder, conduct disorder, oppositional defiant disorder), separation anxiety disorder, and vocal and motor tics. Cronbach’s αs for all symptom dimensions ranged from .54 (agoraphobia) to .98 (panic disorder).
External criteria
The Antisocial Process Screening Device (APSD; Frick & Hare, 2001), the EAS Temperament Survey for Children (EAS; Buss & Plomin, 1984), and the Reactive-Proactive Aggression Scale (RPA; Dodge & Coie, 1987) were used as external criteria given that they are established correlates of childhood psychopathology (e.g., Nigg, 2006). Items were rated on a scale from 0 to 4 identical to that described for the psychopathology measure.
The APSD comprises 20 items designed to assess youth psychopathic traits and other antisocial characteristics. The APSD yields scores on three subscales: narcissism, which reflects grandiosity, manipulativeness, and superficial charm; callous-unemotional, which reflects lack of empathy, remorse, and guilt as well as shallow and constricted affect; and impulsivity, which reflects engagement in rash behavior and boredom susceptibility (αs = .84, .61, and .78, respectively).
The EAS comprises 20 items designed to assess four dimensions of temperament. Emotionality reflects a general tendency toward psychological distress, Activity reflects preferred levels of activity and speed of action, Sociability reflects a preference for the company of others as opposed to isolation, and Shyness reflects feelings of inhibition and awkwardness in novel social situations (αs = .84, .64, .58, and .75, respectively). Temperament data used in external validity analyses were available for 913 participants. 3
The RPA comprises 12 items designed to assess proactive (instrumental, “cold-blooded”) and reactive (defensive, “hot-blooded”) forms of childhood aggression (αs = .71 and .75, respectively).
Data analysis
We conducted all analyses using Mplus (Version 7.2; Muthén & Muthén, 2014). Continuous symptom dimension scores were analyzed using the MLR estimator (maximum likelihood with robust standard errors; Yuan & Bentler, 2000) to account for their nonnormality and using the “cluster” option to account for nonindependence of twins and siblings nesting within families. Factor analyses were fit to the raw data using full-information maximum likelihood, given that this is an optimal way of handling missing data that yields less biased parameter estimates than do listwise or pairwise deletion (Enders & Bandalos, 2001).
Goodness of fit
We reported the χ2 test statistic, its associated degrees of freedom (df), and p value; the comparative fit index (CFI), the Tucker-Lewis index (TLI), root-mean-square error of approximation (RMSEA), the Bayesian information criterion (BIC), and standardized root-mean-square residual (SRMR). We did not rely on χ2 and their p values to determine adequacy of model fit because χ2 significance tests are highly sensitive to sample size and would be virtually certain to be rejected given our large sample size. Adequacy of model fit was determined on the basis of guidelines suggested in the literature and the comparison of TLI, RMSEA, SRMR, and BIC (see Loehlin, 2004; Markon & Krueger, 2006), although we present the χ2, CFI, and the Akaike information criterion (AIC) in the spirit of full disclosure.
Latent variable reliability
We reported H, which assesses construct replicability (reliability) of latent factors in a structural equation modeling context (Hancock & Mueller, 2001; Rodriguez et al., 2016). Hancock and Mueller (2001) suggested .70 as a useful benchmark for acceptable reliability. Although H is more informative than omega and its progeny for structural equation models (Rodriguez et al., 2016), we present ωS, ωH, and relative ω for each factor from all psychopathology models (see Table 3; for details on these reliability indices, see Rodriguez et al., 2016).
External validity
We examined psychopathology models’ relations with external criteria by regressing each criterion onto each psychopathology factor while simulta-neously estimating the psychopathology structural model, obviating the need to save factor scores. In addition to examining zero-order correlations, we conducted regressions in which subscales within measures were entered as simultaneous predictors of psychopathology dimensions to (a) account for the significant overlap among within-measure constructs and (b) yield relations with psychopathology unique to each external criterion.
Results
Alternative models of psychopathology
We first conducted eight tests of alternative CFA models for the underlying structure of psychopathology. The models tested were as follows:
1 and 2: Models with correlated and uncorrelated externalizing and internalizing factors (Krueger & Markon, 2006);
3 and 4: Bifactor counterparts to the two correlated factors models, with either correlated or uncorrelated externalizing and internalizing specific factors (Lacuelle et al., 2015; Olino et al., 2014);
5 and 6: Models with correlated and uncorrelated externalizing, distress, and fears factors (Krueger, 1999); and
7 and 8: Bifactor counterparts to the three correlated factors models, with either correlated or uncorrelated specific factors (Martel, Pan, et al., 2017; for all model-fit statistics, see Table 1).
It is important to note that we tested bifactor models that allowed for correlated specific factors on the basis of practices observed in the psychopathology literature (Carragher et al., 2016; Caspi et al., 2014; Laceulle et al., 2015; Martel, Pan, et al., 2017; Olino et al., 2014), although bifactor-model theories argue that specific factors in a bifactor model should be orthogonal (Holzinger & Swineford, 1937).
Fit Statistics for Confirmatory Models
Note: RMSEA = root-mean-square error of approximation; CI = confidence interval; TLI = Tucker-Lewis index; CFI = comparative fit index; BIC = Bayesian information criterion; AIC = Akaike information criterion; SRMR = standardized root-mean-square residual; N params = number of free parameters.
p < .001.
For both correlated factors and bifactor models, model fit indicated that a model that parsed internalizing into fear and distress was preferable to a model that did not, although fear and distress were highly correlated in both correlated factors model (r = .82, 95% CI = [.74, .90]) and bifactor model with correlated specific factors (r = .75, 95% CI = [.64, .87]). Results of Satorra-Bentler χ2 difference tests were significant between (a) the two- and three-factor oblique models, Δχ2(2) = 36, p < .001, and (b) the bifactor models with two and three oblique specific factors, Δχ2(2) = 10, p = .007. In addition, we could not equate fears and distress relations with external criteria for the three correlated factors model and its bifactor counterparts without a significant decrement in model fit (Δχ2 all significant at p < .001; see the Supplemental Material available online), indicating that fears and distress related differentially with external criteria. Given these characteristics, we proceeded with the three correlated factors model and its bifactor counterparts, with either three correlated and uncorrelated specific factors, for our remaining riskier tests. It is worth noting, however, that all conclusions drawn from the three-factor models hold for the two-factor models. 4 Results from each three-factor model are presented with their respective factor loadings and factor correlations in Table 3 (see the Supplemental Material for complete results for the two-factor models).
Fit Statistics for Bifactor Models That Equated Psychopathology Symptom Dimensions’ Loadings on General and Specific Factors
Note: Satorra-Bentler χ2 difference tests were conducted on full bifactor models and their constrained counterparts. Constrained models are indented below the full model. RMSEA = root-mean-square error of approximation; CI = confidence interval; TLI = Tucker-Lewis index; CFI = comparative fit index; BIC = Bayesian information criterion; AIC = Akaike information criterion; SRMR = standardized root-mean-square residual; N params = number of free parameters.
p < .001.
Factor Loadings, Intercorrelations, and Reliability Indices for Confirmatory Factor Analytic Models
Note: CSFs = correlated factors; UNCSFs = uncorrelated specific factors; DIST = distress; EXT = externalizing; FEAR = fears; GEN = general.
p < .01. ***p < .001.
Riskier tests
Test 1: if the general factor in a bifactor model reflects broad liability for psychopathology, it should be relatively equally represented by its constituent indicators
In the bifactor model with three correlated specific factors, psychopathology symptom dimensions’ loadings on the general factor were quite variable, ranging from .06 (specific phobia) to .68 (major depression), with a median value of .22. On average, the general factor explained 9% of the variance in the first-order symptom dimensions. The general factor was characterized by small loadings from fear-related internalizing dimensions, with the exception of obsessions, and impulsivity (λs ranged from .06 for specific phobia to .22 for impulsivity); moderate loadings from compulsions, generalized anxiety, inattention, and hyperactivity (λs ranged from .25 for hyperactivity to .40 for inattention); and large loadings from conduct disorder, oppositional defiant disorder, and major depression (λs ranged from .55 for oppositional defiant disorder to .68 for major depression).
Relative to the bifactor model with three correlated specific factors, a model equating all symptom dimensions’ loadings on the general factor fit significantly more poorly, Δχ2(14) = 679, p < .001 (see Table 2 for all tests that equate loadings on the general and specific factors). In light of the general factor being relatively well represented by distress and externalizing but not fear symptom dimensions, we also tested a post hoc model equating externalizing and distress but not fears symptom dimensions’ loadings on the general factor, which also fit the data significantly more poorly, Δχ2(6) = 31, p < .001. Together, these findings indicate that the general factor was not equally represented by the first-order symptom dimensions.
In the bifactor model with three uncorrelated specific factors, psychopathology symptom dimensions’ loadings on the general factor were quite variable, ranging from .25 (tics) to .84 (generalized anxiety), with a median value of .36 (see Table 3 for all factor loadings). On average, the general factor explained 18% of the variance in the first-order symptom dimensions. Most notably, the loadings on the general factor were appreciably larger in magnitude relative to the bifactor model with three correlated factors. The general factor was characterized by moderate loadings from conduct disorder, hyperactivity, impulsivity, tics, specific phobia, compulsions, panic disorder, and social phobia—λs ranged from .25 (tics) to .41 (inattention)—and moderate to large loadings from the remaining symptom dimensions—λs ranged from .44 (obsessions) to .84 (generalized anxiety)—with loadings being most pronounced for major depression and generalized anxiety. Relative to the bifactor model with three correlated specific factors, a model equating all symptom dimensions’ loadings on the general factor fit more poorly, Δχ2(14) = 86, p < .001. In general, however, this model’s general factor tended to be more equally represented by its indicators than the bifactor model with three correlated specific factors.
To evaluate the robustness of the general factor to excluding any one psychopathology indicator, we next conducted a series of analyses in which we excluded each psychopathology dimension from the general factor in turn. This resulted in 15 models tested, with 14 loadings on the general factor freely estimated per model, for both bifactor models. Results from these analyses are displayed in two box plots. Figure 1a displays the variability of each symptom dimension’s loadings on the general factor as a function of excluding each other psychopathology dimension in turn. For instance, in Figure 1a, we show the variability of inattention’s loadings on the general factor when each other symptom dimension is dropped in turn. Figure 1b reorganizes these data to display variability of loadings on the general factor as a function of the symptom dimension dropped from the general factor (see the Supplemental Material for all models’ factor loadings). For instance, in Figure 1b, we show the variability of the other symptom dimensions’ loadings on the general factor when inattention is dropped.

Variability in loadings on the general factor. The box-and-whiskers plots display (a) the variability of each symptom dimension’s (x-axis) loading on the general factor (y-axis) when each other symptom dimension is dropped from the general factor, one at a time across the 15 models tested, and (b) loadings on the general factor (y-axis) when each symptom dimension is dropped from the general factor one at a time (x-axis). The lines in the center of the boxes represent the medians. The lower and upper edges of the boxes represent the first and third quartiles, respectively. The upper and lower ends of the whiskers represent values 1.5 times the interquartile range above and below the third and first quartiles, respectively, and the circles represent outliers. Bi3CSF = bifactor model with three correlated specific factors; Bi3UNCSF = bifactor model with three uncorrelated specific factors; INA = inattention; HYP = hyperactivity; IMP = impulsivity; ODD = oppositional defiant disorder; CND = conduct disorder; DEP = major depression; GAD = generalized anxiety; PAN = panic disorder; SOC = social anxiety; SAD = separation anxiety; AGO = agoraphobia; OBS = obsessions; COM = compulsions; TIC = tics; SPH = specific phobia; none = full model with no symptom dimensions dropped.
We then extracted factor scores for general factors from all models, the full model, and the 15 models in which each symptom dimension was dropped one at a time. We correlated these two sets of factor scores, specifically those (a) from the full model with those (b) from the reduced models to demonstrate the sensitivity of the general factor to the exclusion of a symptom dimension. Lower correlations indicate greater sensitivity to that symptom dimension being dropped; we viewed correlations lower than .90 as problematic (see the Supplemental Material for all convergent correlations).
Several trends can be gleaned from the box plots. First, symptom dimensions’ loadings on the general factor were typically more variable in the bifactor model with correlated specific factors compared with the bifactor model with uncorrelated specific factors (Fig. 1a). In addition, in the bifactor models with correlated and uncorrelated specific factors, some symptom dimensions’ loadings on the general factor varied relatively little as a function of dropping each other symptom dimension from the model; this was especially true of symptom dimensions characterized by high levels of fear. For instance, tics’ loadings on the general factor ranged from –.03 (when dropping obsessions) to .14 (major depression) and from .07 (oppositional defiant disorder) to .28 (major depression) in the bifactor models with correlated and uncorrelated specific factors, respectively (see the Supplemental Material). Other symptom dimensions’ loadings on the general factor varied considerably. For instance, conduct disorder’s loadings on the general factor ranged from –.48 (obsessions) to .63 (tics) and –.45 (oppositional defiant disorder) to .63 (generalized anxiety) in the bifactor models with correlated and uncorrelated specific factors, respectively; the same was true of psychopathology symptom dimensions characterized by distress and externalizing tendencies.
Finally, for the bifactor model with correlated specific factors but not the bifactor model with uncorrelated specific factors, many symptom dimensions’ loadings on the general factor were either near zero or negative once symptom dimensions that essentially defined the general factor in the full model were dropped. For instance, in the bifactor model with correlated specific factors, when oppositional defiant disorder was dropped from the general factor, conduct disorder’s and obsessions’ loadings on the general factor were –.45 and –.23, respectively. In general, in the bifactor model with correlated specific factors, when anxiety or oppositional defiant disorder was excluded, the general factor reflected psychological adaptivity, with negative loadings from most of the indicators that define the general factor in the full model. Finally, most of the fear-related symptom dimensions could be dropped from the general factor with little to no effect on its contents (see the Supplemental Material).
These trends were also evident in the convergent correlations between the general factor in the full model and models in which one symptom dimension was dropped. For the bifactor model with correlated specific factors, convergent correlations were high (> .90) when the following symptom dimensions were excluded from the general factor: agoraphobia, compulsions, generalized anxiety, hyperactivity, impulsivity, panic disorder, separation anxiety, social anxiety, separation anxiety, and tics. Convergent correlations were slightly lower for inattention (.84). In contrast, convergent correlations were essentially zero for conduct disorder (.01) and major depression (–.09) and strongly negative for obsessions (–.89) and oppositional defiant disorder (–.65).
Together, these findings indicate that the general factor in a bifactor model with correlated specific factors was highly sensitive to the inclusion of certain symptom dimensions, namely, conduct disorder, depression, obsessions, and oppositional defiant disorder. In the case of the latter two symptom dimensions (i.e., obsessions, oppositional defiant disorder), the general factor was so sensitive to their inclusion that it its contents after dropping these symptom dimensions appeared to reflect psychological adaptivity, broadly construed, as opposed to maladaptivity. Consistent with the data presented in Figure 1, the general factor in a bifactor model with uncorrelated specific factors was relatively robust, with convergent correlations dropping below .90 when only two symptom dimensions were excluded: major depression (.87) and generalized anxiety (.84).
Test 2: a bifactor model should produce reliable specific factors that are well represented by their constituent indicators
For the three correlated factors model, H statistics were above the .70 benchmark, indicating sufficient construct replicability (.74 for fears and distress, .84 for externalizing). Along these lines, these factors were generally well represented by their psychopathology indicators. As shown in Table 3, standardized factor loadings on the fears factor ranged from .26 (tics) to .63 (separation anxiety), with a median value of .49. Loadings on the distress factor ranged from .64 (major depression) to .83 (generalized anxiety), with a median value of .74. Loadings on the externalizing factor ranged from .39 (conduct disorder) to .80 (hyperactivity), with a median value of .71.
For the bifactor model with three correlated specific factors, H statistics were above the benchmark for most fear and externalizing specific factors (.72, .80) and slightly below the benchmark for distress (.67). Compared with the three correlated factors model, the fears factor was equally well represented by its indicators, with loadings ranging from .22 (tics) to .64 (separation anxiety) and a median value of .46. Distress tended to be less well represented by its indicators, with loadings ranging from .35 (major depression) to .81 (generalized anxiety) and a median value of .58, but this was primarily because major depression was not a strong indicator of distress after accounting for the general factor (see also Tackett, Kushner, et al., 2013). Externalizing was similarly less well represented by its indicators, with loadings ranging from .10 (conduct disorder) to .81 (hyperactivity) and a median loading of .56, but this was primarily because antagonism-related externalizing loaded less strongly on externalizing after the general factor was taken into account. At the same time, specific factors’ reliabilities were generally adequate, with the exception of distress, and a number of loadings on the specific factors were comparable with those from the three correlated factors model.
For the bifactor model with three uncorrelated specific factors, H statistics were sufficient for the externalizing specific factor (.77) but poor for the fear and distress specific factors (.55, .22). Compared with both the three correlated factors model and the bifactor model with correlated specific factors, specific factors were generally less well represented by their indicators. Loadings on the fear-specific factor ranged from .07 (tics) to .54 (separation anxiety), with a median loading of .33. Loadings on the distress-specific factor ranged from –.34 (generalized anxiety) to .36 (major depression), with a median loading of .01. Note that generalized anxiety’s loading on distress became moderately negative after accounting for the general factor. Loadings on the externalizing-specific factor ranged from .20 (conduct disorder) to .79 (hyperactivity), with a median loading of .54.
Test 3: including a general factor should improve on the external validity of the correlated factors model
Figure 2 displays the variance explained in psychopathology symptom dimensions broken down by model. Readily apparent from inspection of this figure is that the three correlated factors model and its bifactor counterparts explained a comparable and, in most cases, equivalent amount of the variance in the psychopathology symptom dimensions or external criteria. There were two exceptions to this rule. The first was that bifactor models explained more of the variance in major depression (72% and 61% for the bifactor models with uncorrelated and correlated specific factors, respectively; there was no difference in explained variance between these two models) than the correlated factors model (28%). The second was that bifactor models explained more of the variance in proactive aggression (36% for the bifactor models with uncorrelated and correlated specific factors) than the correlated factors model (26%).

Explained variance in psychopathology and external criteria by psychopathology model. Explained variance = R2. All error bars reflect 95% confidence intervals. 3CF = three correlated factors; Bi3CSF = bifactor model with three correlated specific factors; Bi3UNCSF = bifactor model with three uncorrelated specific factors; INA = inattention; HYP = hyperactivity; IMP = impulsivity; ODD = oppositional defiant disorder; CND = conduct disorder; DEP = major depression; GAD = generalized anxiety; PAN = panic disorder; SOC = social anxiety; SAD = separation anxiety; AGO = agoraphobia; OBS = obsessions; COM = compulsions; TIC = tics; SPH = specific phobia; EAS EMO = EAS Temperament Survey for Children emotionality; EAS ACT = EAS Temperament Survey for Children activity; EAS SOC = EAS Temperament Survey for Children sociability; EAS SHY = EAS Temperament Survey for Children shyness; APSD CU = Antisocial Process Screening Device callous-unemotional; APSD NARC = Antisocial Process Screening Device narcissism; APSD IMP = Antisocial Process Screening Device impulsivity; RPA PROACT = Reactive-Proactive Aggression Scale proactive aggression; RPA REACT = Reactive-Proactive Aggression Scale reactive aggression.
Discussion
Bifactor model representations of psychopathology have arguably defined the contemporary zeitgeist of clinical psychology (Caspi & Moffitt, 2018). This now considerable body of research has raised the provocative notion that one’s broad liability for psychopathology can be quantified by a single dimension referred to as the p factor. Most of this literature relies on bifactor models and privileges model fit statistics over other criteria in adjudicating among structural models of psychopathology and establishing their validity. Such a practice is increasingly questionable in light of evidence that model fit statistics are biased in favor of detecting bifactor models even when the true population model follows a more parsimonious correlated factors structure (e.g., Murray & Johnson, 2013).
A simple replication and extension of the structure of youth psychopathology using these practices would have reported the following: A bifactor model of psychopathology fit better than a correlated factors model; the p factor in children appears to reflect antagonism and distress; the p factor, externalizing, fears, and distress factors displayed divergent relations with external criteria; and the p factor was related to increased emotionality, antisociality, and proactive aggression. As we noted earlier, several others have reported findings to this effect, and the analyses we used to generate these findings were largely identical to those used in the existing literature. Nonetheless, on the basis of the findings from our riskier tests, we worry that such conclusions are misguided.
Summary and implications
We advanced several riskier tests to better adjudicate among structural models of psychopathology, including that (a) the general factor in a bifactor model should be well represented by its constituent indicators, (b) bifactor models of psychopathology should produce reliable specific factors that are well represented by their constituent indicators, and (c) including a general factor should improve on the external validity of a correlated factors model. Using our riskier tests, we compared the widely adopted correlated factors model with two bifactor models increasingly used in the literature, ones with either three uncorrelated or three correlated specific factors. In turn, we discuss each of our central findings and their implications for the psychopathology literature.
Test 1
Consistent with our hypotheses and the existing literature, we found that the general factor in both bifactor models was not well represented by psychopathology symptom dimensions and was instead overrepresented by distress and externalizing psychopathology and underrepresented by fears. This is broadly consistent with a handful of studies in the adult literature, including those demonstrating that depression and generalized anxiety are strong indicators of a general factor of psychopathology (e.g., Lahey et al., 2012) and others establishing neuroticism as a robust correlate of the general factor (e.g., Caspi et al., 2014). That oppositional defiant and conduct disorders also loaded substantially on our general factors developmentally extends these adult findings (see also Tackett, Lahey, et al., 2013; Waldman et al., 2011; Waldman, Rowe, Boylan, & Burke, 2018; but see Lahey et al., 2015, for inconsistent evidence) and corroborates the developmental literature in which negative affectivity and antagonism tend to intersect substantially in children (Tackett, Kushner, et al., 2013).
Moreover, our attempts to equate the contributions of all or some of the psychopathology symptom dimensions to the general factor were unsuccessful, calling into question the p factor as a single, general liability for psychopathology. Such findings are more consistent, rather, with the p factor reflecting an amalgam of differentially weighted and psychologically distinct sources of variance (Krueger et al., 2018; Reise, 2012). The existing literature nevertheless suggests that the p factor not only differentially weights psychopathology but that its defining contents vary considerably across studies (Levin-Aspenson, Watson, Clark, & Zimmerman, 2018). Even within our own data, wherein we tested two different bifactor models of psychopathology, the contents of the general factor varied considerably.
Some studies do find that the general factor in a bifactor model is defined by distress (Lahey et al., 2012, 2015; Olino et al., 2014; Waldman et al., 2016), but others find that the general factor is defined by externalizing (Castellanos-Ryan et al., 2016), thought disorder (Caspi et al., 2014), or even autism spectrum disorders (Martel, Pan, et al., 2017; Noordhof, Krueger, Ormel, Oldehinkel, & Hartman, 2015). These cross-study discrepancies have given rise to a dizzying array of interpretations of the p factor (Caspi & Moffitt, 2018) but are also potentially consistent with the bifactor model’s tendency to fit any data (Reise et al., 2016).
If our assertion is correct, cross-study comparisons of p factors appear prima facie specious. These comparisons are especially dubious when two studies do not include identical psychopathology measurement. Although it is unlikely that depression and anxiety will be omitted from studies of the structure of psychopathology, other forms of psychopathology (e.g., bipolar disorder, schizophrenia, autism spectrum disorders) are not assessed consistently in these studies because of their relatively low base rates in the population. One study that does not assess schizophrenia or bipolar disorder, for instance, might find that the general factor is saturated with distress or negative emotionality (Lahey et al., 2012), whereas a study that assesses thought problems might find that the general factor is saturated with thought problems in addition to distress (Caspi et al., 2014). We suspect that the general factor in a bifactor model pulls for pathology (i.e., maladaptivity) and thus will be most strongly represented by indicators that are ostensibly “most severe.” In addition, comparisons between youth and adult p factors are even more challenging given that diagnoses of certain forms of psychopathology are limited to childhood (e.g., conduct disorder, oppositional defiant disorder, autism spectrum disorders) and others to adulthood (e.g., personality disorders). We suspect, moreover, that the content of our general factors would differ had we included other forms of psychopathology given that general factors within bifactor models appear highly sensitive to their contents.
Test 2
We observed a general tendency for the bifactor models to yield less reliable specific factors. This was especially the case for the bifactor model with uncorrelated specific factors, in which reliability indices were often well below the threshold of acceptable reliability. In the bifactor model with correlated specific factors, reliability indices were still above the threshold for acceptable reliability but of decreased reliability relative to those for the correlated factors model. These findings accord with the scant reliability indices reported for specific factors within bifactor models of psychopathology (Martel, Pan, et al., 2017). Moreover, they accord with reliability of specific factors from bifactor models of narrower clinical constructs, including those of attention-deficit/hyperactivity disorder (Arias, Ponce, & Núñez, 2018; Willoughby, Fabiano, Schatz, Vujnovic, & Morris, 2017), disgust (Olatunji, Ebesutani, & Reise, 2015), and obsessive-compulsive disorder (Olatunji, Ebesutani, & Abramowitz, 2017).
In addition, in our own data, specific factors tended to be underrepresented by psychopathology symptom dimensions that were strong indicators of the general factor, raising interpretability challenges in several cases. In the bifactor model with three correlated specific factors, for example, conduct disorder was not a robust indicator of externalizing, nor was major depressive disorder a robust indicator of distress (see also Lahey et al., 2012; Waldman et al., 2016). In the absence of these symptom dimensions, externalizing and distress in this model appear to reflect narrower forms of their constructs, namely, poor behavioral control and anxiety, respectively. In the bifactor model with uncorrelated specific factors, moreover, neither major depression nor generalized anxiety was a robust indicator of the distress factor, and the sign of the latter was negative. Such idiosyncrasies have been observed for specific factors in other bifactor models of psychopathology (e.g., Caspi et al., 2014; Martel, Pan, et al., 2017). Martel, Pan, and colleagues (2017) reported, for instance, that psychosis loaded slightly negatively (λ = –.10) onto the thought disorders factor in a bifactor model with orthogonal specific factors. At minimum, these findings indicate that specific factors should be interpreted cautiously given that their contents might not be consistent with their conceptualization and that researchers should pay close attention to what variance is extracted from a specific factor and allotted to the general factor.
Test 3
We found that our correlated factors and bifactor models of psychopathology explained a nearly identical amount of the variance in the first-order psychopathology dimensions and external criteria. The exception to this rule was that the bifactor models explained a significantly larger amount of the variance in depression and proactive aggression than did the correlated factors model, which is again consistent with our observation that our general factors were disproportionately saturated with antagonism, antisocial behavior, and distress. These findings bear important implications for the way the p factor’s validity is instantiated in much of the existing literature. Thus far, the psychopathology literature has proceeded by merely correlating the general factor in a bifactor model with theoretically relevant external criteria, with the following justification: “evidence that [the p factor] predicts objective, real-world life outcomes (e.g., suicide) suggests that it may be indexing something substantive, not merely something about how people behave while data are being collected” (Caspi & Moffit, 2018, p. 835). In our view, simply demonstrating that the general factor is related to some external criterion is an insufficient test of the validity of the general factor, or the p factor for that matter, without also demonstrating that the same correlates are not observed for one or more of the factors within a more parsimonious correlated factors model. It appears from these data that there is little added value in introducing the general factor into structural models of psychopathology, at least when generated by means of bifactor models, inasmuch as bifactor models do not incrementally predict relevant external criteria over and above correlated factors models. We argue for cross-model validity comparisons given that bifactor models of psychopathology appear in large part to redistribute variance described by a correlated factors model into a different number and set of components.
A cautionary note
An admittedly alluring quality of bifactor models is that we can study multiple levels of the psychopathology hierarchy in tandem. To wit, bifactor models of psychopathology might be viewed as conciliatory, striking a balance between lumping and splitting approaches in psychopathology classification efforts (Greven, 2005) given that their purpose is to identify the sources, correlates, and sequelae of both the general and specific factors of psychopathology. Broadening our focus to multiple levels of the hierarchy presumes, however, that our model produces reliable and interpretable factors at each level. We suspect that this assumption is not met consistently for bifactor models of psychopathology.
Somewhat unexpectedly, there was a general tendency for bifactor models to yield either a reliable general factor or reliable specific factors but not both. The bifactor model with correlated specific factors yielded a less reliable and interpretable general factor but more reliable and well-defined specific factors, whereas the opposite was true for the bifactor model with uncorrelated specific factors. In this way, bifactor models of psychopathology may be placing undue burden on the data (Bonifay, 2015; Bonifay et al., 2017; Reise et al., 2016), in turn compromising the integrity of either the general or specific factors.
All told, our results indicate that a bifactor model of psychopathology is difficult to interpret because the general factor is not equally represented by all forms of psychopathology (Krueger et al., 2018) and is instead defined by different disorders depending on what forms of psychopathology are included in the model (Levin-Aspenson et al., 2018). Moreover, bifactor models of psychopathology tend to yield specific factors of decreased reliability and interpretability. These specific factors often exhibited decreased reliability relative to the correlated factors model and often exhibited concerningly low reliability. Lastly, introducing a general factor into a correlated factors model explains little, if any, additional variance in its constituent indicators or its correlates but rather appears to redistribute the variance explained by the correlated factors model into more complex and more heavily parameterized model. These difficulties should not arise if the p factor truly reflects a unitary mechanism common to all forms of psychopathology or if researchers wish to use bifactor models to probe multiple levels of the psychopathology hierarchy in their study (Caspi & Moffit, 2018).
Limitations
For the purpose of demonstrating the implications of our riskier tests, we adopted an analytic approach that is consistent with those used in the existing literature. There are a number of additional limitations with our approach that were not addressed by our riskier tests. First, the present demonstration relied on single-rater psychopathology data, which are vulnerable to the influence of social desirability or other response sets. In this way, it is possible that socially desirable responding inflates covariation among externalizing, fear, and distress factors. We encourage future research to adopt a multirater approach to modeling the structure of psychopathology but acknowledge that attempts to do so will likely be met with challenges given modest convergence between self- and other-reported psychopathology, perhaps especially in youth (Achenbach, Krukowski, Dumenci, & Ivanova, 2005; Martel, Markon, & Smith, 2017). Second, we relied on data from a representative community sample of youth in the southeastern United States, in which there were relatively low rates of endorsement for psychopathology. This sample was also predominantly Caucasian. Thus, it is possible that these findings might not generalize to those of more clinically severe or culturally diverse samples. Third, we did not assess ostensibly more severe forms of psychopathology, including bipolar disorder, thought disorder, and autism spectrum disorder, given their low base rates of diagnosis in the population writ large. Although this exclusion is common practice in the psychopathology literature, especially the child psychopathology literature, studies excluding these forms of psychopathology from their structural models likely paint an incomplete picture.
Future directions
The riskier tests we generated are a starting place meant to generate increased scrutiny and discussion surrounding bifactor models of psychopathology. Moreover, although we emphasize the utility of these riskier tests for evaluating the validity of bifactor models of psychopathology, they can be applied to bifactor models of other constructs and used to adjudicate among structural models more generally. By no means are our riskier tests exhaustive. We encourage continued research comparing the performance of bifactor models with alternative structures of psychopathology.
Tested models could include but are not limited to correlated factors models and higher-order models. Other promising research has adopted other methods to yield hierarchical models of psychopathology. These include exploratory top-down approaches, such as “bass-ackwards” (or sequential principal components analysis; Forbes et al., 2017; Kim & Eaton, 2015), and bottom-up approaches, such as hierarchical agglomerative cluster analysis (Forbes et al., 2017). In addition, further research might extend existing simulation studies (Gignac, 2016; Greene et al., in press; Maydeu-Olivares & Coffman, 2006; Murray & Johnson, 2013) by examining the conditions under which we artificially identify bifactor models on the basis of preferential model fit as well as conditions under which we may report a more parsimonious correlated factors model when the population model indeed follows a bifactor structure (Greene et al., in press). Lastly, given that bifactor models in many cases appear to either compromise the integrity of the general factor or the specific factors in the model, researchers might seek simpler alternatives to generating patterns of correlates with psychopathology broadly construed or more specific aspects of psychopathology, such as externalizing and internalizing spectra (Kotov et al., 2017). Researchers might extract a single psychopathology factor if they are interested in correlates common to all forms of psychopathology (but see Smith, McCarthy, & Zapolski, 2009). In contrast, researchers might enter psychopathology factors derived by means of correlated factors models simultaneously into regressions if they are interested in correlates that are unique to narrower psychopathology spectra.
Conclusion
A clear and accurate empirical structure of psychopathology is critical for scientific progress on the classification of and ultimately the etiology and treatment of mental illness. The prospect of rapid progress in psychiatric classification in a field otherwise marked by slow and steady progress is an exciting one. We must not, however, be fooled into scientific progress (Zachar, 2015) by pursuing the p factor without bearing in mind the limitations associated with models used to generate it, lest we pursue a p factor mirage.
To be clear, our results are not necessarily definitive, nor are our findings dispositive of an absence of the p factor. Indeed, there is considerable phenotypic covariation among forms of psychopathology (Markon & Krueger, 2006), and existing evidence further suggests that there are genetic, neuroanatomical, and otherwise biological underpinnings common to many forms of psychopathology (e.g., Beauchaine & Thayer, 2015; Neumann et al., 2016; Pettersson, Larsson, & Lichtenstein, 2016; Selzam, Coleman, Caspi, Moffitt, & Plomin, 2018). Moving forward, we encourage researchers to evaluate their alternative structural models of psychopathology using these riskier tests, especially before adopting bifactor models on the basis of preferential model fit and significant associations with external criteria. More broadly, on the basis of our findings and observations from the existing literature, adopting the p factor as the uppermost level of the psychopathology hierarchy (Conway et al., 2019), integrating it into structural models of allied constructs (i.e., personality, personality disorders, executive functioning; Oltmanns et al., 2018; Snyder et al., 2015), and focusing on its clinical implications (Conway et al., 2019; Meier & Meier, 2018) may be putting the proverbial cart before the horse.
Supplemental Material
Watts_Open_Practices_Disclosure – Supplemental material for Riskier Tests of the Validity of the Bifactor Model of Psychopathology
Supplemental material, Watts_Open_Practices_Disclosure for Riskier Tests of the Validity of the Bifactor Model of Psychopathology by Ashley L. Watts, Holly E. Poore and Irwin D. Waldman in Clinical Psychological Science
Supplemental Material
Watts_Supplemental_Material – Supplemental material for Riskier Tests of the Validity of the Bifactor Model of Psychopathology
Supplemental material, Watts_Supplemental_Material for Riskier Tests of the Validity of the Bifactor Model of Psychopathology by Ashley L. Watts, Holly E. Poore and Irwin D. Waldman in Clinical Psychological Science
Footnotes
Acknowledgements
An earlier and limited version of this article was presented at the 2017 annual meeting of the Behavior Genetics Association and the 2018 annual meeting of the Society for Research in Psychopathology.
Action Editor
Christopher G. Beevers served as action editor for this article.
Author Contributions
A. L. Watts developed the study concept. All the authors contributed to study design. Data collection was performed by the laboratory of I. D. Waldman. A. L. Watts performed the data analysis. All of the authors contributed to interpretation. A. L. Watts drafted the manuscript, and H. E. Poore and I. D. Waldman provided critical revisions. All of the authors approved the final manuscript for submission.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Open Practices
The data used in this study were collected from 1993 to 1997. Consequently, the data were not preregistered, and participants did not consent to sharing of individual-level data with the public. Queries regarding the data-analytic procedure and results can be directed to A. L. Watts. The complete Open Practices Disclosure for this article can be found at
.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
