Abstract
To date, no studies have examined a range of structural models of the interpersonally aversive traits tapped by the Short Dark Tetrad (SD4; narcissism, Machiavellianism, psychopathy, sadism), in conjunction with their measurement invariance (males vs. females) and how the models each predict external correlates. Using a large sample of young adults (N = 3,975), four latent variable models were compared in terms of fit, measurement invariance, and prediction of intrapersonal and interpersonal functioning. The models tested were as follows: (Model A) confirmatory factor analytic, (Model B) bifactor, (Model C) exploratory structural equation model, and (Model D) a reduced-item confirmatory factor analytic that maximized item information. All models accounted for item covariance with good precision, although differed in incremental fit. Strong invariance held for all models, and each accounted similarly for the external correlates, highlighting differential predictive effects of the SD4 factors. The results provide support for four theoretically distinct but overlapping dark personality domains.
Keywords
So-called dark trait research has focused on measures of Machiavellianism (MAC), subclinical narcissism (NAR), psychopathy (PSY), and more recently, sadism (SAD). This constellation of theoretically distinct, but empirically overlapping, trait domains continues to grow in popularity (for reviews, see Furnham et al., 2013; Muris et al., 2017; Schreiber & Marcus, 2020). The “Dark Triad” notion originated from Paulhus and Williams’s (2002) research on the most studied offensive traits in the subclinical literature (MAC, NAR, and PSY), and in part, on the longstanding idea that humans have a dark side. Two brief measures of this triad, the Short Dark Triad (SD3; Jones & Paulhus, 2014) and the Dirty Dozen (Jonason & Webster, 2010) have enjoyed widespread use, but neither includes a measure of SAD.
All four domains are captured by the Short Dark Tetrad (SD4)—which has already shown promise (Paulhus et al., 2020). Although there may be other dark components (Marcus & Zeigler-Hill, 2015), we limit our modeling to MAC, NAR, PSY, and SAD domains as defined by the SD4. All four satisfy the requirement of callous affect (Paulhus, 2014) and capture aversive interpersonal behavior (Neumann et al., 2020). 1
The current study sought to provide further validation of the SD4. First, we address the structure of the SD4 by examining a range of model types (confirmatory, bifactor, and exploratory). Next, we test for gender invariance to ensure that the SD4 subscales are meaningful for both men and women. Finally, we provide validity evidence for the SD4 factors in terms of external correlates that have relevance for understanding the four dark domains.
Modeling Dark Domains and Beyond
Structural models offer statistical representations of constructs but are only a starting point for construct validation: The models should also have utility for exploring both the constructs they represent and their nomological network. Latent variable models provide precise parameter estimates with common variance estimated separately from error variance (Brown & Moore, 2012). One limitation of most trait modeling is that the analyses are based on manifest variables (e.g., Muris et al., 2017; Vize, Collison, et al., 2018) and thus common variance is contaminated with method and other sources of error variance. This limitation often applies to dark trait research, where key model analyses focus on scale scores (e.g., Vize et al., 2020a; Watts et al., 2017). As with any measure, self-reports of dark traits reflect a mixture of “signal” (common trait variance) and “noise” (error). Hence, item-level latent models are advantageous because they provide sophisticated statistical information to untangle the sources of participants’ responses.
Theoretical Models
Latent variable models are used to test whether participant responses can be accounted for in terms of a given model structure (e.g., correlated dimensions), and thus the nature of a model may shed light on the larger construct. Item-level modeling has recently provided evidence for the original three dark factors, and now a fourth tapping SAD (Johnson et al., 2019; Paulhus et al., 2020; Plouffe et al., 2019). The theoretical question is how should these four domains be represented? A strict confirmatory factor analytic (CFA) model of the four dark domains involves specific item-to-factor relations representing each domain along with their factor correlations. This model proposes that the domains are interrelated but distinct.
On the other hand, meta-analytic research has led some investigators to highlight the overlap among the dark domains (Muris et al., 2017; Vize, Collison, et al., 2018). This overlap may call for a theoretical model structure which represents shared item variance across dark domains, and also, perhaps to a lesser extent, item variance specific to each domain. In this case, a bifactor model is useful (Reise et al., 2010): It partitions item variance in terms of a general factor (on which all items load) and residual specific factors (whose items also load on their respective content factor). The general factor represents dark trait overlap, and the specific factors represent separate dark factors devoid of shared variance. The different theoretical implications for these two models are considerable. The CFA model proposes four distinct but correlated domains, with each having unique etiologies and correlates, whereas the bifactor model proposes that the four domains are alternative indicators of a single dark factor (Moshagen et al., 2018).
Exploratory Models
Another potential theoretical structure is akin to viewing the dark domains as multidimensional alloys, with bits and pieces of a particular aversive trait mixing in with one another (e.g., some SAD items cross-loading onto a PSY domain and vice versa). Each dark domain is primarily represented by its specific items, but some traits from one domain can “blend” in with traits from another domain. In this case, to statistically represent this alloy structure, a hybrid model has been offered by Marsh et al. (2014): It offers the flexibility of exploratory factor analysis (items freely loading across factors) along with the model testing feature of CFA, referred to as an exploratory structural equation model (ESEM). This approach was used to represent SD3 structure (Jones & Paulhus, 2014). The ESEM is based on the idea that items are “fallible indicators” of constructs and therefore tend to have residual associations with other constructs (Marsh et al., 2014, p. 87). Examples of the three model types discussed are displayed in Figure 1.

Simplified examples of the strict CFA correlated factors Models A and D, the bifactor Model B, and the exploratory structural equation (ESEM) Model C.
Comparing the Models
Each of these models (CFA, bifactor, ESEM) have advantages and disadvantages. The CFA model provides evidence of unambiguous unidimensional factors and often yields stronger loadings compared with other models (Marsh et al., 2010; Roy et al., 2020), and thus better discrimination parameters (Reise, 1999). But to the extent that there are nontrivial item cross-loadings not represented in the CFA model, the factor correlations can be inflated (Marsh et al., 2014). As noted, bifactor models have the advantage of representing items in terms of a broad general factor and potentially meaningful specific factors. They are also relatively easy to fit, given they require many parameters. Yet recent research suggests a number of disadvantages of the bifactor model. In particular, there are suboptimal factor loadings (Roy et al., 2020), questionable incorporation of implausible response patterns (Reise et al., 2016), doubts about accurate representation of underlying construct processes (Bonifay et al., 2017), and the inability to compute orthogonal manifest variable scale composites. Moreover, the bifactor model can produce peculiar results (Moshagen et al., 2018; Watts et al., 2019).
The ESEM has the advantage of testing a specific factor structure (e.g., four correlated factors), while also allowing item cross-loadings which can reduce factor correlations (Marsh et al., 2014). At the same time, item cross-loadings create ambiguity regarding factor interpretation, and as well as a quandary as to how to form manifest variable composite scores (e.g., shared items across scales). Clearly, the CFA model is the most straightforward model for representing the structure of dark traits, and one that corresponds to how investigators employ manifest variable composite scores. The CFA model also uses the fewest parameters to account for the trait data (i.e., a risky test), whereas the bifactor and ESEM models require far more parameters.
In Dark Triad research, variants of a bifactor model have shown acceptable fit (McLarnon & Tarraf, 2017; Moshagen et al., 2018; Persson et al., 2019); However, the general factor loadings were rather low (e.g., mean loadings .24-.54), thus providing limited discrimination (Roy et al., 2020). One recent study found support for a strict correlated factors (CFA) model and noted problems with the bifactor model (Chiorri et al., 2019). Other research has also found acceptable fit for the strict CFA model (Moshagen et al., 2018). Most relevant, development of the SD3 was supported by a good fitting ESEM (Jones & Paulhus, 2014).
External Correlates
Beyond fit, a network of correlates can also shed light on the nature of dark domains. If each of the four domains predicts correlates similarly, then the broad dark factor from the bifactor model is sufficient. On the other hand, if each domain predicts correlates in a unique fashion, then it may be wise to employ intercorrelated but distinct dark variables.
To-date, dark trait modeling studies have been limited in the prediction of external correlates. Investigators have either (a) used manifest variables to study how dark domains are associated with various correlates after the latent variable modeling was conducted (Persson et al., 2019), or (b) if latent variable approaches were used to examine regression effects between the dark factors and an external correlate (McLarnon & Tarraf, 2017), investigators have not assessed the bivariate relations between the dark traits and the correlates to check for potential “perils of partialling” (Vize, Collison, et al., 2018; Vize et al., 2020b). Thus, it is critical to check the concordance between bivariate correlations and the regression parameters.
Studies have examined links between dark traits and self-report correlates (Muris et al., 2017; Vize, Collison, et al., 2018), primarily those reflecting pathological correlates such as aggression and substance use (Vize, Lynam, et al., 2018). However, to understand more fully how dark traits impact self and others, a broader scope of external correlates would be helpful (Neumann et al., 2020). Hence, we assessed an array of correlates to capture how dark traits are linked with intrapersonal and interpersonal functioning. They involved aspects of evolutionary concerns (sex, family), as well as indicators of intrapersonal (self-esteem, depression, self-harm) and interpersonal (likeability, tattoos) adjustment. We employed single items for these relevant external correlates to minimize the volume of items participants were required to complete, given our goal to collect a very large sample. Of course, single items can affect the magnitude of any associations, but if significant may encourage future research.
Sexuality
All of the Dark Triad have been associated with frequent sexual encounters (Jonason et al., 2009; Jones & de Roos, 2017), and each are associated with higher sex drives (Baughman et al., 2014). Yet the nature of the reproductive advantage differs across the dark traits. Recent research also suggests SAD is associated with hypersexuality and increased sexual desire (Castellini et al., 2018). In fact, among the tetrad traits, SAD may have the strongest links with sexual desire (Paulhus et al., 2020).
Family Closeness
A human motivation often contrasted with sex drive is family focus (Ko et al., 2020). An individual difference variable that contrasts these motivations is labeled life history strategy (LHS), and a central indicator of LHS is closeness with family members (Figueredo et al., 2005). There is evidence that PSY is associated with fast LHS (Jonason et al., 2010; Neumann et al., 2012). By contrast, aspects of MAC are associated with a slow LHS (Jones & de Roos, 2017), as are grandiose NAR traits (McDonald et al., 2012). Thus, PSY (−), MAC (+), and NAR (+) should have distinctive links with closeness to family.
Intrapersonal Adjustment
Research suggests subclinical dark traits may not be robustly linked to intrapersonal maladjustment, such as depression or poor self-esteem (Jonason & Webster, 2010; Paulhus et al., 2020). Yet PSY in nonclinical samples is linked to suicide (Coid et al., 2009) and negative affect (Garofalo et al., 2019). Self-harm is linked to both depression and poor self-esteem (Klonsky et al., 2018). Hence, we included items tapping depression, self-esteem, and self-harm as indicators of intrapersonal adjustment.
Self-Esteem
Grandiose NAR is associated with positive self-esteem (Miller et al., 2017). Thus, SD4 NAR should show a positive association with self-esteem. Other research shows minimal associations of self-esteem with PSY, MAC, and SAD (Paulhus et al., 2020).
Depression
Emerging research finds PSY has positive links with negative affect (Colins et al., 2017; Garofalo et al., 2019, though see Paulhus et al., 2020). In contrast, grandiose NAR is negatively associated with depression (Miller et al. 2017).
Self-Harm
Deliberate self-harm takes a number of forms but cutting and burning are most prominent in the literature. Some theorists assume self-harm is an indicator of depression and a precursor to suicide, but that is not always the case (Hawton et al., 2003). In subclinical samples, self-harm has recently been construed as a (temporary) coping mechanism (Klonsky et al., 2018). Given that NAR is inversely associated with depression, it is reasonable to expect a negative association with self-harm, but PSY should be positively associated with self-harm. To date, little can be said regarding MAC, SAD, and self-harm (Lämmle et al., 2014).
Interpersonal Adjustment
Difficulty relating to others is indicative of interpersonal maladjustment and studies show that those with dark traits are disliked by their peers (Rauthman & Kolar, 2013). A defining feature of the four dark domains is callousness, which may contribute to manipulation of others. Narcissistic individuals make good first impressions, but this fades over time (Paulhus, 1998). Still the rosy self-view of narcissists should lead them to report that others like and admire them. Given that PSY and SAD are more malevolent than NAR and MAC, we predicted the former pair would be robustly associated with antipathy. Research comparing all four dark traits is limited, yet it appears those high on SAD are disliked the most (Rogers et al., 2018). Thus, we employed a “People don’t like me” item as an interpersonal correlate. We also employed a “tattoo/piercings” item, since extensive body markings have been linked with dark traits (Nathanson et al., 2006), and can have an effect on how one presents to others.
Invariance Across Gender
Finally, we believe that there is insufficient attention to measurement invariance across gender. Different predictive effects as a function of gender could provide theoretical insights into the nature of dark personality. However, for this to be meaningful, it is critical to show that dark traits reflect the same thing for males and females (i.e., strong measurement invariance).
There is considerable interest in understanding why men and women differ in terms of aggregate dark trait scales (Jonason et al., 2009; Jonason et al., 2013; Jones & Olderbak, 2014; Muris et al., 2017; Szabó & Jones, 2019). Research has found that there are gender differences in how a Dark Triad trait is expressed with outcomes such as aggression (Dinić & Wertag, 2018) and infidelity and impulsivity (Szabó & Jones, 2019). Although there are theoretical reasons for these differences, in order to make such comparisons with confidence, it is necessary to establish item-level measurement invariance (Chiorri et al., 2019). Recent dark trait research indicates item invariance across gender, at least for the Dirty Dozen (Chiorri et al., 2019), MAC (Collison et al., 2020), and PSY traits (Neumann et al., 2012; Walsh et al., 2019).
Summary of Our Goals
A central aim was to examine fit for different SD4 structural models. Of equal importance was to understand the utility a given model had in accounting for correlates of dark traits. Dark trait modeling studies have not compared how different latent models perform in accounting for external correlates. Modeling research in other areas suggests this may be a productive means for comparing models (Watts et al., 2019), and ultimately facilitate a better understanding of dark traits. Finally, evidence of measurement invariance is necessary for viable comparisons between men and women. Thus, we evaluated measurement invariance as well as possible differences in the predictive effects of dark traits across gender (Walsh et al., 2018).
Method
Participants
A large sample (N = 3,975) of undergraduates (women = 65%; men = 32%; reported as other = 3%) volunteered for the current study. Mean age was 20.2 (SD = 2.95; Range = 18-61). The most common ethnicities were European heritage (46%), East Asian (40%), South Asian (11%), and other (5%). Data collection was done online via Qualtrics with individuals receiving a .5 bonus on their final grade .9
Measures
Items were based on the recently developed SD4 (Paulhus et al., in 2020). There are seven items per SD4 subscale to capture MAC and subclinical NAR, PSY, and SAD. All items were formatted as 5-point Likert-type scales with Anchors 1 (not at all) and 5 (very much). To preclude keying factors, no reversals were included. The potential threat of acquiescence was addressed by showing that controlling for acquiescence made no difference in the factor structure (Paulhus et al., 2020). See Table 2 for brief item descriptors.
External Correlates
Seven questions were used as external correlates to assess how well they were predicted by SD4 factors from each model. They were chosen to reflect a wide variety of lifestyle issues with implications for everyday adjustment. The items were as follows: (a) I am close to my family (b) I have high self-esteem, (c) Many people dislike me, (d) My sex drive is pretty high, (e) I have purposely cut or burned myself, (f) I sometimes get depressed, and (g) I have tattoos/piercings. All items were endorsed on a 5-point scale (1= strongly disagree to 5 = strongly agree). Participants with elevated responses (agree/strongly agree) differed across males and females: close to family (M = 72%, F = 77%), self-esteem (M = 43%, F = 26%), people dislike me (M = 7%, F = 4%), sex drive (M = 43%, F = 27%), cut/burn (M = 5%, F = 12%), depressed (M = 47%, F = 55%), and tattoos/piercings (M = 12%, F = 54%) with significant chi-squares for each comparison (ps < .001).
Data Analytic Plan
All modeling was carried out via Mplus (Muthén & Muthén, 2012), using robust weighted least squares estimation. Three model types (strict CFA, Bifactor, ESEM) were tested to examine how well they accounted for the 28 SD4 items: First, we tested a strict CFA Model A with the SD4 items set to load only on their respective factors. This model requires 62 estimated parameters (28 loadings, 28 error variances, 6 factor correlations; factor variances fixed to 1 for identification). Next we tested the bifactor Model B, which requires 84 estimated parameters (28 loadings on a general factor, combined 28 loadings for seven items per each of the four specific factors, 28 error variances, zero-factor correlations). The ESEM Model C requires 146 parameters (i.e., 28 loadings × four ESEM factors, 28 error variances, six-factor correlations). Clearly, the models differ in estimated parameters required to account for the 406 SD4 data points (p[p + 1]/2 variances/covariance of 28 items).
We also developed a mini-model (Model D), also entailing a four correlated factors CFA, but reduced to 12 items: It exploited the parametric information provided by the three previous models, acknowledging that each one has specific advantages. We selected three items per SD4 domain (for identification purposes): Each trio had demonstrated stronger loadings relative to other items within its domain, relatively lower general factor loadings, and limited cross-loadings. The items that met these specifications are shown for Model D in Table 2. This model requires 30 estimated parameters to account for 78 variances/covariances.
To test for measurement invariance, a series of multiple-group (MG) model analyses were conducted with loadings and threshold parameters constrained across gender (i.e., scalar invariance). The invariance model was statistically compared with a configural model (free loadings, thresholds). Based on previous research (Chiorri et al., 2019; Neumann et al., 2012), we expected evidence of measurement invariance of dark traits across men and women.
The same invariance approach (SD4 and correlate items) was employed to run a series of MG-SEMs, regressing the external correlates onto the dark factors for each model. To address potential partialling issues (Sleep et al., 2017), we also generated latent bivariate correlations of the SD4 factors with the external correlates and compared these with the structural beta parameters from the structural equation models (SEMs). In-line with Sleep et al. (2017), we used intraclass correlations (ICCs) to quantify the concordance between the bivariate and SEM parameters.
To assess model fit, we followed a standard two-index strategy (Hu & Bentler, 1999), using the incremental comparative fit index (CFI) and an absolute fit index, the root mean square error of approximation (RMSEA). The traditional CFI ≥ .90 and RMSEA ≤ .08 were all considered indicative of acceptable model fit to avoid falsely rejecting viable latent variable models. 2 In terms of comparing the invariance and configural models, we did not rely on chi-square difference tests since large samples can produce significant values even when the discrepancies between two models are trivial: West et al. (2012) suggest using guidelines laid out by Cheung and Rensvold (2002) to assess statistical differences in model fit. If the change in the CFI (ΔCFI) between one model and a nested, more constrained, model is ≤ .01, then the two models do not differ in statistical fit. A RMSEA of .015 or less has also been used.
Finally, multivariate analyses of variance were conducted to examine possible sex differences in levels of SD4 traits. Muris et al. (2017) found that men and women did not differ in MAC or NAR after controlling for their shared variance with PSY. Following suit, we evaluated gender differences in terms of raw SD4 trait scale scores and also whether the genders continued to differ on a particular trait scale when using the other three scales as covariates.
Results
Descriptives and Gender Effects
The SD4 means and standard deviations are graphically presented in Figure 2 (Panel A) for the total sample, and men and women separately. Supplementary Table 1 (available online) also provides descriptive information. Men reported significantly higher dark traits for all SD4 domains, with the largest effects sizes evident for the PSY and SAD domains, MAC: F(1, 3918) = 67.59, p < .001, η2 = .02; NAR: F(1, 3913) = 90.19, p < .001, η2 = .02; PSY: F(1, 3907) = 183.98, p < .001, η2 = .05; SAD: F(1, 3913) = 969.44, p < .001, η2 = .20. The multivariate analysis of variance results with covariates revealed men continued to score higher than women for all dark traits, except MAC.

(A) Total and subscale scores with standard deviations for men, women, and total sample.
In terms of the number of cases with elevated SD4 scores, 11.2% had a mean item total score of 3.1 or higher, indicating some positive endorsement of aversive personality features. Also, as would be expected, Figure 2 (Panel B) shows there were proportionally more men (25%) compared with women (6%) with elevated SD4 endorsement, x(1)2 = 298.94, p < .001.
Modeling Results
As shown in Table 1, all models (Models A, B, C, and D) were able to reproduce the observed data with acceptable levels of precision (RMSEAs = .04-.08). In terms of incremental fit, the 12-item correlated factors (mini) model (Model D) performed best, and the ESEM (Model C) also showed adequate fit. 3 The bifactor model (Model B) fell just short of acceptable incremental fit. The 28-item correlated factors model (Model A) had the least acceptable incremental model fit, though this is the riskiest model to test statistically, given that it uses the fewest estimated parameters.
Model Fit Results for Confirmatory Factor Analytic (CFA), Bifactor, and Exploratory Structural Equation Models (ESEM): Overall Fit and Invariance Across Gender.
Note. Data points refer to p(p + 1)/2 item variances/covariances (28 × [28 + 1]/2) to be modeled. CFI = comparative fit index; RMSEA = root mean square error of approximation.
As shown in the bottom half of Table 1, the invariance analyses indicated that the parameters (loadings, thresholds) could be held equivalent across men and women without any substantive change in model fit, compared with the configural model. Thus, irrespective of model type, the SD4 items evidenced measurement invariance across gender.
Table 2 displays the standardized factor loadings by model type. At the bottom of Table 2 are overall mean factor loadings. The strict CFA correlated factors Models A and D had the highest loadings (discrimination parameters). Similarly, when looking at average loadings within each factor, Models A and D also had the highest averages. The pattern of factor loadings for the bifactor model indicated that the general factor strongly reflected PSY and SAD item content. In contrast, the loadings for the NAR and MAC items were the strongest on their respective specific factor in the bifactor model, though two MAC items (M2, M7) and one NAR item (N11) had modest representation on the general factor. With respect to the ESEM model, the strength of loadings largely followed the strict CFA models. In addition, the vast majority of item cross-loadings (~80%) were trivial. There were however, a few cross-loadings that may have some substantive interpretation. In particular, two SAD items (S27, S33) and one NAR item (N12) cross-loaded on the PSY factor at values at or above .30.
Standardized Factor Loadings by Model Type.
Note. All loadings greater or equal to .35 highlighted in bold font. Loadings greater than or equal to .04 are statistically significant, p’s < .05 to .001.
For the ESEM model, within factor averages pertain only to the items loading prominently on a select factor. ESEM = exploratory structural equation model; CFA = confirmatory factor analytic.
In terms of latent factor associations (see Table 3), the CFA (Models A, D) and ESEM (Model D) models had moderate-to-strong correlations between the PSY and SAD factors. 4 Note that these values correspond to disattenuated versions of the raw subscale intercorrelations. The 28-item Model A also showed a moderate association between the MAC and PSY factors. Still, for the most part, across model types, most of the factor associations were relatively modest or low-moderate in strength, thus limiting potential multicollinearity. Also, intercorrelations among the raw subscales are always lower than those among latent factors.
Latent Factor Correlations by Model Type.
Note. These latent correlations correspond to disattenuated correlations among the subscales. The Bifactor model by definition specifies orthogonal factor associations. CFA = confirmatory factor analytic.
The SEM results by model type and gender are displayed in Table 4; those for variance accounted for are reported in Table 5. The pattern of results revealed considerable uniformity in the regression effects across model type. There were clear differential effects for the NAR (−) and PSY (+) factors in predicting several external correlates (likability, self-harm, and depression), and opposite ± differential effects for family closeness. Moreover, these results held up across gender. Also significant were NAR and SAD in positively predicting “high sex drive” similarly across gender. The SEM results also identified some differences across gender, and these are covered in the discussion. The results also revealed that the general factor from the bifactor model was a significant predictor in most SEMs, and also that the reduced 12-item Model D frequently had the strongest factor predictors compared with the other models; these results held up across gender.
Structural Equation Modeling Results by Model Type for Males and Females.
Note. Structural equation model beta parameters ≥.20 are in bold. All ps < .05 to .001 unless otherwise indicated. MAC = Machiavellianism; NAR = narcissism; PSY = psychopathy; SAD = sadism.
External Correlate Variance Accounted for by Model Type.
In dark trait modeling research by Sleep et al. (2017), suppression effects between a dark trait and an external correlate were considered substantive when there was a difference between the raw bivariate and regression parameters of .30 or greater, or there was a substantial change in direction of association. Table 6 provides the raw bivariate latent associations between the dark factors and the criterion variables, which can be compared with the SEM results (Table 4). There were few instances of such suppression effects. The ICCs used to compare absolute agreement between the latent correlations and regression parameters were high across models for both men (ICCs = .93-.98) and women (ICCs = .92-.96), indicative of strong correspondence.
Correlations Between Dark Factors and External Correlates by Model Type for Males and Females.
Note. Bold correlations reflect substantive differences between structural equation model beta and bivariate parameters. All correlations of .08 or greater are significant, ps < .05 to .001.
Discussion
Although designed as a brief screening device, we found support for the SD4 in terms of structure, generalizability, and external validity. Modeling the 28 items required accounting for 406 data points across nearly 4,000 participants. Despite such complexity, each model was able to distinguish the dark trait domains, and account for the data with precision. The 12-item four-factor CFA mini-model (Model D) showed the best incremental (CFI) and absolute (RMSEA) fit of all models, followed by the ESEM correlated factors Model C. The bifactor Model B fell slightly below conventional incremental fit, as did the 28-item correlated four factor CFA Model A, though both accounted for observed item covariance acceptably. Model A poses the riskiest model test since it uses the fewest parameters.
Measurement Invariance
The multiple group model analyses indicated strong measurement invariance across gender, irrespective of model type. The invariance results indicate that the SD4 items discriminated equally well across gender in identifying persons who vary in dark features and that subscale scores reflect the same latent level of dark traits between men and women. Thus, investigators can confidently employ both the SD4 total score and separate scale scores across gender knowing that statistical differences between genders on the manifest variable SD4 trait scores reflect true differences in the average level of dark traits. The results confirmed pronounced gender differences for psychopathic and sadistic features (men > women).
Model Parameter Comparisons
The more complex a model is, the more parameters it requires, and the greater likelihood it will fit. Because research is often focused on fit, model complexity has increased in representing the structure of dark traits (e.g., Rogoza & Cieciuch, 2020). However, models that become nearly as complex as the data they attempt to account for are of limited value (Vitacco et al., 2005)—as are models that are theoretically questionable, such as the bifactor model (Bonifay et al., 2017).
The 12-item CFA “mini-model” (Model D) was by design a strong model in terms of (a) discrimination parameters, (b) minimal item cross-loadings, and (c) generally low factor correlations. As it turned out, this model also performed well in accounting for the external correlates, signifying the value of models with strong parametric profiles. Also, this shows that a small set of dark traits can have meaningful links with external correlates, signifying the value of isolating dark traits which may differ from general personality traits (Neumann et al., 2020).
At the same time, there was a fair degree of uniformity in model parameters across the correlated factors Models A, C, and D. Each model had moderately strong mean factor loadings (.47-.63), and thus the factors were accounting for 22% to 40% of item variance (i.e., factor loadings squared). They all had generally similar patterns of factor correlations, although Model A had higher correlations among MAC, PSY, and SAD domains. Finally, the three correlated factors models showed similar SEM results in predicting the external correlates. The strong uniformity of predictive effects across correlated factors models is arguably an important finding. As such, an ESEM approach did not appear to provide any benefit over and above the CFA Models A and D. Yet these results are consistent with related research by Watts et al. (2019): They found few meaningful differences in predictive effects between correlated factors versus bifactor models in representing symptoms of psychopathology. In this sense, the results indicate that the choice of analytic model (among those that fit the data reasonably well) may not matter substantially when studying external correlates of dark traits.
Taken together, our results provide support for employing unidimensional scales to represent the SD4 domains. Given that factor loadings are essentially item discrimination parameters (Reise, 1999), and the strict CFA Models A and D had the strongest loadings, investigators can feel confident that manifest variable scale score composites based on unidimensional SD4 factors can distinguish individuals who vary on dark trait domains. Arguably, the correlated factors model might be best for uncovering how etiological factors are linked to the expression of each dark domain. Already, behavior genetic research indicates unique genetic and environmental variation among dark trait domains (Veselka et al., 2012).
Predicting External Correlates and Moderation by Gender
The SEM results were largely in line with expectations and previous findings and revealed a reassuring convergence across model types and gender. Especially notable was the consistent negative association between PSY and likability. In addition, the differential SD4 factor associations with the external correlates have theoretical significance: Note, for example, the negative association of self-harm with NAR (and to a lesser extent, MAC) as opposed to the positive association with PSY. Differences in impulsive recklessness may play a role (Paulhus, 2014).
Of special interest was the consistent pattern of positive associations of sex drive with SAD and NAR across gender, over and above PSY and MAC. Given the low correlation of SAD with NAR, their links with sex drive may involve different intra- and interpersonal processes that warrant further examination (Paulhus et al., 2020).
While there was considerable similarity in the SEM results across gender, consistent with previous research (Miller et al., 2011), there were also several differences. SAD predicted (+) “cut or burn” for women but not men and performed better in predicting “tattoos/piercings” for women, compared to men. In contrast, MAC and NAR had stronger (−) effects on “tattoos/piercings” and “cut or burn” variables, respectively, for men compared with women. Also, PSY was a moderate-to-strong positive predictor (+) of these correlates for men. These results suggest gender plays a role in the link between self-harm, body modifications, and the dark domains.
Theoretical and Practical Model Implications
Even with relatively similar fit and predictive effects, the three model types (CFA, bifactor, and ESEM) provide different interpretations of the larger dark personality construct(s). The CFA-based Models A and D suggest that the dark domains are interrelated, but nonetheless separable constructs. In contrast, the bifactor model locates all dark trait domains on a single broad (general) dark factor. Moshagen et al. (2018; Hilbig et al., 2020) suggest that expression of different dark traits is due to a common core (i.e., the D-factor of personality), represented by the general factor in the bifactor model. If so then one would expect the general factor to carry most of the predictive effects (Figueredo et al., 2015). However, our results indicated that the individual SD4 factors from the correlated factors Models A, C, and D had stronger regression parameters than did the general factor and accounted for more variance for several external correlates.
The PSY and SAD items were well represented in the bifactor model’s general factor, perhaps because they involve the greatest malevolence. Nonetheless, our results and related bifactor research highlight that the PSY, and perhaps SAD, domains should not be equated with the general dark factor (Moshagen et al., 2018). Also, the MAC and NAR domains were not well represented on the general factor: Hence, our bifactor results fail Test 1 of this model, as framed by Watts et al. (2019). A caveat with our bifactor factor results was the low overall general factor loading (.39), indicating a limited ability to discriminate individuals on this broad dark domain. Finally, limited representation of items on specific factors, particularly for PSY, translates into poor reliability and replicability, a failure of Test 2 for the bifactor model (Watts et al., 2019).
The precise meaning of the specific factors in the bifactor model is unclear, particularly the PSY factor where few items have strong loadings. Whereas item loadings on the general factor have a clear interpretation (common variance across dark items), what exactly do the (orthogonal) specific factors reflect? For instance, the SAD item “torture scenes” loads similarly on both the general factor and the specific SAD factor. How should we understand the “enjoyment” people are reporting on this item with respect to the general and specific factors? Are there different (orthogonal) “variants” of sadistic enjoyment? Relatedly, Moshagen et al. (2018) found peculiar effects of the specific factors that conflicted with prevailing theory. There is also a practical limitation of the bifactor model. To our knowledge, it is not possible to create orthogonal manifest variable general and specific composite scores.
The ESEM results support the “alloy” perspective, where aspects (items) of dark traits run through one another (factors). What does this mean theoretically? If we stick with the technical point that item cross-loadings are due to small residual item covariances, then it may be best to refine (or drop) such items to produce clear unidimensional factors. Alternatively, one might be willing take the theoretical step to argue that diverse personality features run through one another. For instance, can investigators live with a PSY construct that includes bits and pieces of SAD, NAR, and MAC? This option raises a challenge in identifying the underlying causes of the dark personality domains if they essentially represent trait mixtures.
Perils of Partialling
The latent correlations among factors were mostly in the low-to-moderate range (mean r = .25). Our SEM and ICC results indicated that issues regarding “perils of partialling” (Sleep et al., 2017) were not pervasive in our SEMs. Only a small percent (6/238 = 2.5%) of latent correlations differed from the SEM results (see Table 5). This consistency may be due to our large sample size: Vize et al. (2020b) found that partialling was more problematic for smaller sample sizes. Technically, partialling perils could also be due to measurement error (i.e., manifest vs. latent variable analyses).
Subclinical, But Dark
There was a sizable percentage of participants with elevated dark traits (11%), indicating that they chose to endorse items reflecting manipulative, self-absorbed, callous, reckless, and/or sadistic tendencies. 5 The finding of elevated dark traits is consistent with research using general population samples from the U.S. (Neumann & Hare, 2008) and Sweden (Colins et al., 2017), as well as research with global samples (Neumann et al., 2012, 2020). Since dark trait research usually involves subclinical samples, there is no guarantee that dark traits will show the same patterns in clinical or forensic samples. Nonetheless, recent studies document that a percentage of so-called “high functioning” individuals (U.S. senators, investment fund managers, corporate executives, those with graduate degrees) also display dark traits that are linked with poorer job performance (Babiak et al., 2010; ten Brinke et al., 2016, 2018) and poor attachment, low empathy, and aggression (Neumann et al., 2020).
Adjustment
Replicating previous subclinical research, the SD4 traits tended to be less related to intrapersonal versus interpersonal maladjustment (Jonason et al., 2015; Paulhus et al.,2020), such as a positive link between NAR and self-esteem. Of course, for any positive attribute, narcissists often claim superior levels of adjustment. In contrast, PSY and SAD traits had low-moderate to strong associations with the self-harm and depression. The key insight to this result may lie in recent research on the association between dark traits and emotion dysregulation in offender and community samples (Garofalo et al., 2020).
Replicating previous work (e.g., Ali & Chamorro-Premuzic, 2010; Paulhus et al., 2020), dark trait links to interpersonal maladjustment were more evident. PSY and SAD were negative predictors of family closeness and likability, highlighting the interpersonally aversive nature of these dark domains, even at subclinical levels. Moreover, the fact that this maladjustment is self-reported is consistent with research suggesting that individuals with these traits have some insight into the aversiveness of their interpersonal style (Carlson et al., 2011; Sleep et al., 2019).
Limitations
Although the current study employed a large sample and a sophisticated modeling approach, it is nonetheless limited in terms of generalizability, given use of self-report methodology. Also, a more fine-grained analysis is possible, given that each of the four dark domains are themselves multidimensional. 6 With seven items per subscale, however, the SD4 was intended only as a screening device: To render clinical or forensic decisions, follow up with more granular assessments is advisable. Also, psychometric analyses of instruments with more subscales (items) will require larger samples size, thereby constraining the modeling enterprise (e.g., Watts et al., 2017). The SD4 provides a balance between distinguishing four dark domains, while acknowledging their statistical overlap, along with a fairly low number of required items.
Our single item correlates were not in alignment with those regularly used in dark trait studies. As such, some key outcomes (e.g., drug use, aggression) were not assessed. Of course, there has been extensive meta-analyses with these regular correlates (e.g., Vize, Lynam, et al., 2018). Future research will also have to establish convergence with alternative modes of measurement (e.g., peer-ratings, laboratory behavior). With increased reliability, such links are certain to be stronger than the single item self-reports that we employed. Although convergence with clinician ratings is on solid ground (Samuel & Widiger, 2004), the perspective of peers is a different matter. For example, peers may not agree with a narcissist’s claims of superiority.
Finally, we did not include higher level personality traits (e.g., Big Five or HEXACO factors), which some suggest account for dark traits (see Footnote 1). However, meta-analytic modeling research highlights that general personality is related to but by no means isomorphic with the dark domains (Schreiber, & Marcus, 2020). Also, compared with general personality factors, dark traits are more efficient predictors of interpersonal aversiveness because their content is devoted more specifically to that domain (e.g., Dinić & Wertag, 2018; Veselka et al., 2012). This specificity advantage is not unique to dark traits—has long been accepted as a general rule in personality measurement (see Paunonen & Ashton, 2001).
Conclusions
Using a large sample of young adults, we analyzed the SD4 structure by comparing a range of latent variable models. All models were able to account for the data with good precision. Irrespective of model type, each model showed strong measurement invariance across gender, and each one accounted for the external correlates in a generally similar fashion. Overall, the results highlight four distinct but overlapping aspects of dark personality.
Supplemental Material
sj-pdf-1-asm-10.1177_1073191120986624 – Supplemental material for Examining the Short Dark Tetrad (SD4) Across Models, Correlates, and Gender
Supplemental material, sj-pdf-1-asm-10.1177_1073191120986624 for Examining the Short Dark Tetrad (SD4) Across Models, Correlates, and Gender by Craig S. Neumann, Daniel N. Jones and Delroy L. Paulhus in Assessment
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
