Abstract
Recent research has suggested that a range of psychological disorders may stem from a single underlying common factor, which has been dubbed the p-factor. This finding may spur a line of research in psychopathology very similar to the history of factor modeling in intelligence and, more recently, personality research, in which similar general factors have been proposed. We point out some of the risks of modeling and interpreting general factors, derived from the fields of intelligence and personality research. We argue that: (a) factor-analytic resolution, i.e., convergence of the literature on a particular factor structure, should not be expected in the presence of multiple highly similar models; and (b) the true underlying model may not be a factor model at all, because alternative explanations can account for the correlational structure of psychopathology.
In recent years, general factor models have been proposed to explain the correlational structure of psychopathological data. This line of research advocates the hypothesis that a wide range of symptoms that arise in psychopathology and personality disorders are determined by the pervasive influence of a single latent variable. In particular, Caspi et al. (2014; and replicated by Laceulle, Vollebergh, & Ormel, 2015) proposed that the correlational structure of psychopathology may be explained by a general factor that underlies a wide variety of disorders. Caspi et al. refer to this general factor as the p-factor or p. Sharp et al. (2015) found similar results in a wide set of personality disorders, and argued in a parallel fashion that a general factor may underlie personality pathology; in fact, Sharp et al. suggested that their general factor might be the same as the p identified by Caspi et al.
These research efforts are based on a methodology and line of reasoning that will sound familiar to researchers in the domains of intelligence and personality research, where models involving a general factor have long been proposed to explain the correlational structure of cognitive and personality tests, respectively (Musek, 2007; Rushton & Irwing, 2008; Spearman, 1904). In both fields, these factors (“g” and the “Big One” or General Factor of Personality [GFP], respectively) have been interpreted as reified causal entities that exist independent of the data and, by causing variation in their indicators, give rise to an observed correlational structure (Gould, 1996). This interpretation of general factors has led researchers to search for genetic or biological properties that may play such a causal role (Detterman, 2002; Musek, 2007), a search that is now beginning to take hold in psychopathology (Pettersson, Larsson, & Lichtenstein, 2016). So far, this search has yielded no clear candidates that might instantiate these general factors in the brain.
The goal of research investigating the structure of psychopathology is to come to understand why disorders arise, and thus how to prevent and cure them. Charting the structure of psychopathology is thus a vitally important endeavor, so it is crucial to point out misunderstandings that arise when general factors are over-interpreted. Although the present paper mainly focuses on the analyses performed by Caspi et al. (2014), the arguments apply to a broader line of research that started with Simms, Grös, Watson, and O’Hara (2008) and has included multiple research groups since then (Laceulle et al., 2015; Lahey et al., 2012; Patalay et al. 2015; Snyder, Young, & Hankin, 2016). All these researchers fit bi-factor models to psychopathology data to demonstrate support for a general factor. Here, we argue that: (a) factor-analytic resolution, that is, convergence of the literature on a particular factor structure, is unlikely to arise in the presence of multiple nearly-equivalent models; and (b) the true underlying model may not be a factor model at all, because alternative explanations can account for the correlational structure of psychopathology. We follow this argument with several suggestions for future research on the structure of psychopathology.
The problem of nearly-equivalent models
In advocating for a general factor, whether it is psychopathology, personality, or intelligence, most research relies on comparing models that are nearly indistinguishable at a statistical level, but radically different in terms of interpretation. As a result, small differences in data due to sampling variability result in very different theories on the structure of psychological traits and abilities.
Near-equivalence of bi-factor and higher-order models
The bi-factor model of psychopathology is a special case of a hierarchical factor model, 1 in which all indicators load on a general factor, and several specific factors account for the remaining shared variance among subsets of items. Caspi et al. found that a bi-factor model and a correlated three-factor model “fit [their] data similarly well, with [the bi-factor model] offering a slightly more parsimonious solution” (Caspi et al., 2014, p. 126), 2 leading them to conclude that the general factor, p, is a “single dimension that represents the tendency to experience psychiatric problems as persistent and comorbid” (Caspi et al., 2014, p.131). The correlated three-factor model is equivalent to a higher-order factor model in which subsets of indicators load onto lower-order factors that, in turn, load onto a general factor. As such, the model comparisons reported in Caspi et al. are equivalent to those that would have resulted from comparing their bi-factor model with a higher-order model.
Model fit in structural equation modeling (SEM) is measured by assessing how similar the observed variance–covariance matrix is to the one implied by a model. Exactly equivalent models lead to the exact same variance–covariance matrix, so fitting two equivalent models to any sample dataset will result in the same fit. Equivalent models cannot be distinguished on statistical grounds, even though they may imply very different interpretations. Nearly equivalent models imply very similar variance–covariance matrices. While, in theory, these models can be distinguished statistically, the outcome is highly subject to sampling variability. For some nearly equivalent models, standard statistical fit indices such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), and root mean square error of approximation (RMSEA) frequently fail to identify the true data-generating model. As we will show, the higher-order factor model and bi-factor model are nearly equivalent. Statistical comparisons of these two models lead to biased conclusions (Morgan, Hodge, Wells, & Watkins, 2015; Murray & Johnson, 2013).
Higher-order and hierarchical factor models imply very similar covariance matrices because they are mathematically closely related to each other. In fact, the higher-order factor model is equivalent to a special case of the hierarchical factor model (i.e., the class of higher-order factor models is nested in the class of hierarchical factor models, see Yung et al., 1999). Morgan et al. (2015) demonstrated an extensive overlap of fit values when comparing bi-factor models to higher-order factor models and correlated factor models. They even showed that when cases are selected from a population that was generated by a higher-order structure, approximate fit indices tend to incorrectly identify the bi-factor model as the best fitting model. Likewise, Murray and Johnson (2013) noted that studies of cognitive ability tend to find that bi-factor models fit better. They conducted a simulation study to evaluate that tendency, and found a substantial bias favoring the bi-factor model, even when the true structure underlying the data was a higher-order factor model. These and similar simulations have led multiple researchers to argue that, when comparing higher-order factor models and bi-factor models, the decision about which model to adopt should not rely on model fit (Bonifay, Lane, & Reise, 2016; Murray & Johnson, 2013).
Theoretical interpretation of bi-factor and higher-order models
While the higher-order and bi-factor model differ only slightly in fit, their theoretical implications are very different. Figure 1(a) shows a higher-order model, in which some general factor, “g” (e.g., the p-factor), explains the shared variance of the lower-order factors (e.g., Internalizing, Externalizing, and Thought disorder). The lower-order factors, in turn, represent the entirety of the variance that is shared among their indicators. Figure 1(b) shows the bi-factor model, in which the specific factors, Internalizing, Externalizing, and Thought disorder, are no longer caused by the general factor but in contrast are orthogonal to the general factor. In this model, the specific factors explain the shared variance among items that remains after the effect of the general factor is partialled out. Thus, the specific factors in the two models account for very different parts of the variance shared among indicators.

(a) Higher-order factor model, in which g represents a general factor; (b) bi-factor model: a specific kind of Hierarchical factor model (Yung et al., 1999), in which g represents a general factor.
Despite this difference in the reference of the specific factors, researchers tend to give them the same labels in both models (e.g., “Internalizing,” “Externalizing,” and “Thought”), suggesting that these specific factors are mapped onto the same psychological constructs. When specific factors are given the same meaning, the general factor means something different as a result. Whereas in the bi-factor model, variance in p is hypothesized to be independent of Internalizing, Externalizing, and Thought disorders; in the higher-order factor model p is hypothesized to explain the correlations between Internalizing, Externalizing, and Thought disorders. Moreover, interpreting specific factors as referring to the same constructs in both models leads to different theories of how these constructs relate to each other, as well as to external variables. Thus, although both models contain a “general factor,” the posited mechanisms by which these respective factors relate to other factors in the model (and to external variables) are radically different.
Implications for factor-analytic resolution
In the preceding sections, we argued that: (a) the bi-factor and higher-order factor models cannot be reliably distinguished on the basis of statistical fit due to their near equivalence and (b) that despite their statistical near-equivalence, their interpretations diverge strongly. Thus, small differences due to sampling variation can lead to completely different theories of what the general factor is. Because different models will fit best in different studies, it seems unlikely that the field will arrive at a consensus on which model best reflects the structure of psychopathology.
This lack of consensus is evident in the fields of intelligence and personality research. In the intelligence literature, many studies have been dedicated to finding the “true” structure of general intelligence. Various models have been proposed to explain the structure of correlations found in cognitive ability data. Some of these modeling efforts result in a single general intelligence factor (“g”; Gustafsson, 1984), others have several correlated higher-order factors (Flanagan & McGrew, 1998; Lansman, Donaldson, Hunt, & Yantis, 1982), and some feature a bi-factor model structure (Gignac & Watkins, 2013). Although all of these purport to have found the best model of intelligence, the literature has not converged upon a single best model. When the structure of intelligence is modeled as a higher-order model, g is conceptualized as a factor that is superordinate to the lower-order factors. By contrast, proponents of the bi-factor model for the structure of intelligence argue that g should be conceptualized as a breadth factor that is defined by a larger number of observed variables than the narrow group factors (Gignac, 2008).
In personality psychology, a similar line of research has been pursued; competing factor models ranging from five correlated factors to one general personality factor (“the Big One” or “GFP”), as well as variations on these models, have been purported to explain the correlations between personality test responses (e.g., Digman, 1997; McCrae & Costa, 1987; Musek, 2007; Rushton & Irwing, 2008). This literature has also not converged upon a single best model. Taken together, one should not expect factor-analytic resolution to arise when multiple similar models are fitted to correlational data, due to the highly similar covariance matrices such models imply.
Although the issue of equivalent and nearly equivalent models in latent variable models has been recognized for decades (Duncan, 1975; Raykov & Marcoulides, 2001), it is still not widely understood. As we noted above, every paper that has suggested a general factor of psychopathology has based their conclusions on a statistical comparison of nearly equivalent models. While the correlated factor and bi-factor models reflect very different theories on the structure of psychopathology, the models are statistically too similar to rely on model fit to decide which model to adopt.
The risk of affirming the consequent
Factor modeling allows researchers to test causal assumptions in the model based on the logic that if the model does not fit the data, some causal assumptions implied by the model must not hold (Bollen & Pearl, 2013). Including a general factor in the model implies the causal assumption that the shared variance among the items is due to a common cause. The logic of model testing has led many researchers to reify a general factor upon finding that a general factor model fits the data. In the following section, we argue that the existence of a general factor is not tested against the data because any dataset that features a positive manifold will necessarily support a general factor model, whether or not a general factor underlies the data.
The search for general factors
After the g-factor was proposed, many researchers tried to find a neural basis for general intelligence based on the belief that g is real and that, if it is real, it must have a biological basis (e.g., Detterman, 2002; Garlick, 2002; Gray & Thompson, 2004). Many cognitive abilities have been proposed to explain individual differences in g, including speed and efficiency of processing, working memory, and the capacity to deduce relationships (Detterman, 2002), as well as biological variables such as cortical thickness, the size of specific brain regions, or the overall grey matter in these brain regions (Deary, Penke, & Johnson, 2010). Data from twin and family studies have consistently provided support for the hypothesis that general intelligence is heritable and therefore must have a genetic component (Devlin, Daniels, & Roeder, 1997; Haworth et al., 2010; Jacobs et al., 2001). Alternative models that do not introduce a general factor to explain the covariation among cognitive abilities, however, do not preclude the finding that the shared variance among these cognitive abilities is correlated within twins. After all, as long as the underlying structure giving rise to these cognitive abilities is to some extent heritable, one will find that a general factor that comprises the shared variance among these cognitive abilities correlates within twins. Despite the consistent support for the heritability of general intelligence in twin studies, not a single gene has been identified to be reliably associated with general intelligence (Payton, 2009). In personality research, Musek (2007) introduced the Big One and proposed that genes responsible for biological systems may give rise to this general factor of personality. Since then, the Big One has been re-labeled as the General Factor of Personality (GFP) by Rushton, Bons, and Hur (2008), and a similar search as in the g-factor literature has followed for the genetic underpinnings of the GFP (Loehlin & Martin, 2011; Rushton et al., 2008; Veselka, Schermer, Petrides, & Vernon, 2009). In neither field have researchers succeeded in empirically identifying the abstract factors of factor analysis anywhere other than in factor analysis itself.
As in the g-factor and GFP literatures, the nascent p-factor literature exhibits a similar tendency to reify the general factor. For example, Sharp et al. (2015) found support for a general factor underlying a small set of personality disorders, and they suggested this factor might be the same factor as the broader p-factor identified by Caspi et al. (2014). This suggestion only makes sense if one interprets p as an entity that is external to the statistical model: If these factors do not refer to entities outside the statistical model, then they cannot refer to the same entity. Caspi et al. conclude that p “reflects meaningful differences between persons on a single dimension that represents the tendency to experience psychiatric problems as persistent and comorbid” (Caspi et al., 2014, p. 131). They finish their paper by stating that “at a minimum, researchers should no longer assume a specific relation between the disorder they study and a biomarker/cause/consequence/treatment without empirical verification. Rather, [their] finding suggests the default assumption must be that biomarkers/causes/consequences/treatments relate first to p” (Caspi et al., 2014, p. 134). Clearly, the suggestion that p mediates effects of biological variables on specific disorders does not make sense if p is just the shared variance among the disorders.
Just like for g and GFP, a similar search for a neural basis for p has begun in psychopathology (Lahey, Van Hulle, Singh, Waldman, & Rathouz, 2011; Pettersson, Anckarsäter, Gillberg, & Lichtenstein, 2013; Pettersson et al., 2016). As with g, twin studies suggest that p is heritable, but a similar argument for g can be made: whatever structure explains the correlations between disorders in psychopathology, to the extent that this structure is heritable, it will manifest as a correlation across twins.
General factor model relies on positive manifold
The previous section showed that the statistical support for a p-factor has resulted in speculation about what the external referent of p would be, and the search for a genetic basis of p. Finding a fitting general factor model, however, does not provide support for the existence of a general factor that goes beyond the observation of a positive manifold in the data. To see why this is the case, it is important to observe that any general factor model relies statistically on the existence of a positive manifold, and that there are alternative explanations for a positive manifold that do not rely on the logic of factor modeling. A positive manifold simply means that all variables are positively correlated with each other (or become positively correlated after variables are appropriately re-coded). Given any such positive correlation matrix, all factor loadings and all covariances between factors in a simple structure factor analysis model (i.e., one in which indicators only load onto one latent factor) will be positive (Krijnen, 2004). Because all correlations between factors are positive, the same step can be repeated to model these factors with higher-order factors, again resulting in only positive factor loadings and factor correlations; eventually, therefore, a single general factor will always be found at some “stratum” of the factor hierarchy (Carroll, 1993). In sum, it is a mathematical necessity that whenever there is a positive manifold, factor analysis will result in a general factor.
Explanations for the positive manifold
Now, if the data are caused at some level by a single underlying common cause, the variance–covariance matrix will feature a positive manifold. But the reverse argument is not valid: finding a positive manifold does not entail a common cause. There are many alternative explanations for the existence of a positive manifold, several of which have been proposed in the literature on general intelligence (e.g., Van der Maas et al., 2006). Because alternative explanations exist, finding that a general factor model fits does not provide additional evidence that a general factor exists, over and above the initial observation of a positive manifold.
One such alternative explanation for the positive intercorrelations between cognitive tests is sampling theory (Bartholomew, Deary, & Lawn, 2009; Thomson, 1950; Thorndike, 1927). In sampling theory it is hypothesized that cognitive tasks require the use of multiple independent components or elements of the mind (so-called bonds), 3 such that each task is a multidimensional measure. Each cognitive test measures a group of these independent bonds, but the groups of bonds will overlap across tests. As a result, the cognitive tasks that are used to measure different cognitive abilities will be positively correlated because they draw on overlapping bonds, resulting in a positive manifold. Sampling theory is likely to explain at least some of the positive correlations among mental disorders, which show a clear pattern of overlap in the symptoms that are used to diagnose them. For example, Major Depressive Episode and Generalized Anxiety Disorder each feature insomnia, fatigue, concentration problems, and psychomotor agitation as diagnostic criteria (American Psychiatric Association, 2013). Such patterns of overlap are present throughout the realm of psychopathology. Unsurprisingly, these patterns of overlap explain part of the correlation structure between disorders; for example, Borsboom (2002) reported a correlation of .62 between the number of overlapping symptoms for any two disorders, and the empirical correlation found between them in empirical studies of comorbidity.
Van der Maas et al. (2006) offer a second alternative explanation to the positive manifold in cognitive test scores, based on the biological concept of mutualism. In this perspective, cognitive ability is modeled analogously to the way ecosystems of lakes are modeled in aquatic ecology. Here, researchers seek to explain why some lakes flourish better than others, with a wider variety of life and higher water quality. Measuring variables such as the variety of life and quality of water in multiple lakes will likely lead to a collection of positive correlations—a positive manifold; lakes that have better water quality will have a wider variety of life. Biologists do not model such systems as a common cause structure in which a single general factor of “lake health” is used to explain positive correlations, even though, as explained above, a general factor would no doubt emerge from such data. Instead, biologists rely on a more plausible explanation of such data: having high water quality allows for more variety of life, and more life will improve the quality of the water (Scheffer et al., 2009; Scheffer, Carpenter, Foley, Folke, & Walker, 2001). This concept is termed mutualism because it describes a system of mutually beneficial relationships between the causal factors that determine its dynamics. Cognitive systems could be viewed in a similar way; strength in one cognitive system (e.g., reading comprehension) improves one’s ability in another cognitive system (e.g., reasoning). Van der Maas et al. (2006) showed analytically as well as in a simulation study that such a system can lead to a positive manifold, even though no general factor is present.
The analogy to cognitive systems can be extended to systems of psychopathology as well (Borsboom, 2008; Cramer, Waldorp, van der Maas, & Borsboom, 2010; Schmittmann et al., 2013). Rather than modeling symptoms as consequences of a latent common disease, symptoms can be seen to influence each other; someone might not be able to concentrate as well because he or she is worrying a lot, and that person might worry a lot because a poor concentration lead to problems at work (Borsboom & Cramer, 2013). The term mutualism may be less suitable for psychopathology, because many causal relationships might only be one-directional—insomnia leads to fatigue but fatigue might not lead to insomnia—but the overall interpretation is that problems lead to more problems (e.g., insomnia → concentration problems), and problems hardly ever solve other problems (e.g., feeling depressed !→ fewer panic attacks). These positive causal relations between symptoms may give rise to a positive manifold. Following this line of thought, psychopathological symptoms are modeled as active agents in networks of interacting components rather than passive indicators of latent variables.
In these networks, typically observed variables are represented by nodes that are connected by edges (links) when two variables are conditionally dependent given other nodes in the network. When the variables are normally distributed, these conditional dependencies are reflected in the partial correlations between the variables. A network of such partial correlation coefficients is termed a Gaussian graphical model (GGM), and it has been shown that any SEM model can be characterized by an equivalent GGM model (Epskamp, Rhemtulla, & Borsboom, 2017). Thus, for any bi-factor model (e.g., Figure 2(a)) there will be a statistically equivalent network model (e.g., Figure 2(b)) that explains the covariance matrix with direct relations rather than with a general factor. It is also possible to explain the positive manifold in psychopathology data with a model that includes both latent variables and direct relations between symptoms, but does not include a general factor. The Residual Network Model (RNM; Epskamp et al., 2017), is a model in which the residuals of a factor model form a GGM. Figure 2(c) shows an example of such a RNM. 4 When the residual network is properly constrained (in this case, it would be expected to be low-rank), the RNM in Figure 2(c) can also be equivalent to the bi-factor model in Figure 2(a). As a result, all three figures can, under certain constraints of the parameter space, be completely equivalent while featuring strikingly different causal interpretations. Network theory therefore, just like sampling theory, provides an alternative explanation for the positive manifold without a general factor.

Three different models that can be equivalent to one another. (a) Bi-factor model; (b) network model; (c) Residual Network model.
In sum, extracting a general factor does not adduce novel evidence for the existence of a general factor, if one already knows that the positive manifold holds—a finding that can be explained in multiple different ways. In the cases of Caspi et al. (2014) and Sharp et al. (2015), therefore, their model fitting exercises do not produce evidence for general factors that goes beyond the fact that we see a positive manifold in data on DSM disorders; and this fact has been known for many years (Krueger, 1999).
Conclusions
We discussed a recent line of research in which bi-factor models are fitted to psychopathology data to demonstrate support for p, the general factor of psychopathology (Caspi et al., 2014; Laceulle et al., 2015; Lahey et al., 2012). Because analogous modeling strategies have been followed for over a century in the context of general intelligence research, we discussed the g-factor literature as well as some of the literature on general factors in personality to point out some of the risks in interpreting a general factor model as identifying a common cause that explains the positive manifold. We focused our discussion not on the specific models found by the abovementioned authors, but rather on their more general—and arguably more important—finding: psychopathology, like cognitive ability and personality, features a positive manifold; correlations between symptoms are almost invariably positive.
As we have argued in this paper, such a positive manifold will allow some form of general factor model to fit the data by mathematical necessity. We emphasize that we do not regard the positive manifold as something trivial; the observation of a positive manifold is certainly a remarkable observation that has proven to be a robust finding in many areas within psychology. Not only the positive manifold, but also the observation that some variables within this positive manifold form clusters of more strongly correlated variables, are robust findings in many fields (e.g., responses on mathematical items correlate more strongly with each other than with responses on language items). Such phenomena mandate the search for an explanatory model, and in many cases the factor model is a very fruitful candidate. There are, however alternative explanations for the emergence of a positive manifold, and factor analysis does not choose between these alternatives. Instead, factor analysis will always explain the positive manifold with a general factor. Therefore, we already knew, before Caspi and colleagues even started their research, that they would come up with a general factor as a matter of mathematical necessity. We suggest that scientists in the field of psychopathology do not rush into the p-factor but also carefully explore alternatives for the structure of psychopathology that do not rely on the logic of factor modeling.
Factor modeling is a powerful method for measuring constructs that are not directly observable—e.g., factor analysis enables researchers to create a measurement model of constructs like working memory by observing the effects of working memory, such as performance on memory tasks. The problem we addressed in this paper is not the use of factor analysis for psychological data, but the risks of interpreting general factors of which it is more dubious whether such a construct exists as a causal entity. In such cases factor analysis is not used as a tool to measure unobserved constructs, but instead, factor analysis is used as a method to discover some inscrutable variable that explains as much as possible. Our aim is to show that the same care should be taken in interpreting the general p-factor as is now being carried out in intelligence (and, to some extent, personality) research.
To do so, model comparison based on fit measures can be supplemented with theory on how the general factor relates to external factors or specific factors within the model, and preferably experimental or quasi-experimental interventions to test such relations. Additionally, insight into the structure of psychopathology might benefit from the comparison between factor models and models different from factor modeling, instead of only comparing factor models within the paradigm of factor analysis. More generally, we propose to use all of the insights that can be gained with these different models (factor models, network models, or other models) and try to find ways to unify these different methods, instead of constraining ourselves to merely one subpopulation of models.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially funded by ERC Career Integration Grant no. 631145 held by Mijke Rhemtulla, European Research Council Consolidator Grant no. 647209 held by Denny Borsboom and NWO “research talent” Grant Number 406-11-066 held by Sacha Epskamp.
