Abstract
Personality is of great lay, clinical, and research interest with important functional implications. The field has largely settled on five- or six-factor models as being largely sufficient for descriptive purposes, at least in W.E.I.R.D settings and, as such, numerous measures have been created of varying length and breadth. For a number of reasons, however, super-short forms have come to be quite popular in research endeavors with a number created in the past 20 years. The goal of the present study was to compare the time with completion and general psychometric properties of these measures, as well as examine their convergence with one another and with longer measures in an online community sample (N = 494). Generally, the psychometric properties of the measures varied considerably in terms of internal consistency and convergence with one another. The brief measures demonstrated mostly adequately convergence with longer measures. Despite this convergence, longer measures were found to contain considerably more variance that was not accounted for by brief measures. We consider the advantages and disadvantages of these measures and suggest that longer measures be prioritized whenever possible.
Personality has long been a construct of significant interest among researchers, clinicians, and laypersons alike (Digman, 1990). Indeed, across multiple research domains, personality has been used to examine gender differences (Feingold, 1994), temperament (Caspi et al., 2005), education (Poropat, 2009) and in relation to various important life outcomes (e.g., happiness, physical and mental health, longevity, romance, and occupational success; Bouchard et al., 1999; Lahey, 2009; Marshall et al., 1994; Ozer & Benet-Martinez, 2006). Given the substantial interest in personality, a wealth of personality measures from an array of theoretical orientations have been developed. While this diversity in personality measurement can be beneficial, there are concerns with the extent to which they capture the same personality content.
Current Models and Measures of General Personality
Although personality has been conceptualized using a variety of theoretical frameworks, to-date, the most commonly used personality taxonomy is the Big Five and/or the five-factor model (FFM). The origins of the FFM are typically credited to the work of Costa and McCrae (1992). However, it is noteworthy that McCrae and Costa (1983) began with a three-factor model (neuroticism, extraversion, and openness) based on their review of the existing trait literature. Within a year of its release they were introduced to the Big Five (emotional instability, surgency, intellect, agreeableness, and conscientiousness) that was based on factor analyses of the trait terms within the English language (Goldberg, 1992). Costa and McCrae then expanded their model to include agreeableness and conscientiousness. There are some differences in how measures of the FFM defined these constructs (see John & Srivastava, 1999). For instance, measures of agreeableness from the Big 5 tradition do not appear to include markers of honesty and humility, whereas FFM-based measures do via the agreeableness facets of straightforwardness and modesty (Miller et al., 2011). However, they did not revise their conceptualization or assessment of neuroticism, extraversion, and openness. This was not a significant problem for neuroticism and extraversion as these were already very closely aligned with emotional instability and surgency, but it has been somewhat of a problem for the alignment of openness with intellect (Widiger & Crego, 2019). For a period of time, there was some opposition to referring to the FFM and Big Five interchangeably, but the models are so closely congruent that this objection is no longer typically expressed.
Personality Assessment and General Considerations
Given the substantial interest in personality across research domains, the validity and reliability of its measurement is of chief importance. In addition to considering a measure’s reliability and validity, there are numerous factors that researchers must consider when choosing an appropriate measure. For instance, depending on available resources, a researcher may choose to incorporate brief measures to decrease the likelihood of participant fatigue, or in instances when researchers have little time with participants or have numerous constructs of interest (Credé et al., 2012). Given the widespread use of personality assessment tools in a variety of research settings, assessment of multiple constructs in a given assessment battery, and limited resources, numerous measures of personality have been developed (Smith et al., 2012). In what follows, we briefly review measures of the FFM/Big Five.
Measures of the Big Five
The Big Five is most commonly assessed using the Big Five Inventory (BFI; John et al., 1991), and its various derivatives (e.g., the Big Five Inventory–2 [BFI-2]; Soto & John, 2017). The original BFI was developed by way of expert ratings and subsequent structural verification using factor analysis. The developers of the BFI used prototypical brief phrases thought to assess the core of each trait. For instance, to assess Extraversion, items such as “is talkative” and “is full of energy” are used. Given the popularity of the Big Five, several brief measures have been created as a means for researchers and clinicians to assess these dimensions efficiently and flexibly with limited resources. For instance, the Big Five Inventory–10 (BFI-10; Rammstedt & John, 2007) measures each of the domains with two items per scale and was created for use in contexts in which time was extremely limited. Similar to the full BFI, the BFI-10 uses brief phrases to assess traits.
The Ten-Item Personality Inventory (TIPI; Gosling et al., 2003) is another brief, 10-item measure of the Big Five. Although the creators clearly noted its psychometric limitations compared with longer measures of the Big Five, Gosling et al. (2003) found that the TIPI demonstrated adequate convergence with longer measures, test–retest reliability, patterns of relations with theoretically relevant outcomes, and convergence between self and other ratings. Rather than using brief prototypical phrases, like its longer parent measure, the TIPI uses adjectives to assess traits (e.g., “Extraverted, enthusiastic” to assess Extraversion).
The Mini-International Personality Item Pool (Mini-IPIP; Donnellan et al., 2006) is a 20-item measure of the Big Five derived from the 50-item International Personality Item Pool (IPIP) measure developed by Goldberg (1999). Although still a brief measure, the Mini-IPIP uses four items per Big Five domain and has evidenced as adequate psychometric properties (i.e., acceptable internal consistency and test–retest reliability), and similar trait coverage and convergent, discriminant, and criterion validity as longer measures of the Big Five.
Measures of the Five-Factor Model
The FFM is most commonly assessed using the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992), which measures each of the five broad domains as well as six, more specific, facets which underlie each domain (24 items per domain). Although commonly used, the NEO PI-R is a copyrighted, proprietary measure, and thus, several freely available measures of the FFM have been created. One such measure is the International Personality Item Pool NEO–120 (IPIP-NEO-120; Maples et al., 2014). The IPIP-NEO-120 was developed from the comprehensive pool of freely available personality items created from Goldberg et al. (Goldberg, 1999, 2001; Goldberg et al., 2006). Unlike other measures of the FFM derived from the IPIP items, the Maples et al. (2014) used item response theory to choose items, which allows for an in-depth look at the psychometric properties of the items as well as their relations to the broader, latent constructs. Although much shorter than other versions of the FFM (e.g., the 300-item IPIP-NEO; Goldberg, 2001), the IPIP-NEO-120 still necessitates considerable time and resources, and thus, other briefer measures of the FFM have been created.
The Five-Factor Model Rating Form (FFMRF; Mullins-Sweatt et al., 2006) is a 30-item rating form, which uses two to four adjectives to describe each pole of the FFM facets. Individuals then rate the extent to which they are high or low on each of the facets. For instance, the anxiousness facet of Neuroticism is described by “fearful, apprehensive versus relaxed, unconcerned, cool.” Research on the FFMRF suggests that the measure exhibits adequate psychometric properties (i.e., internal consistency, convergent validity, and discriminant validity) and that it exhibits relatively robust relations with theoretically relevant constructs (Mullins-Sweatt et al., 2006).
Costs of Brief Measures
Although brief measures enjoy certain benefits related to their efficiency, the brevity of short measures can come with costs to reliability and validity (Credé et al., 2012; Nunnally & Bernstein, 1994; Soto & John, 2019). To date, some of these limitations have been examined using measures of the BFI (Credé et al., 2012; Soto & John, 2019). For instance, Credé et al. (2012) examined various brief measures of the BFI and found that very short measures were associated with elevated risk for Type I and Type II errors as well as reduced criterion validity. Nevertheless, more work is needed to understand the extent to which reductions in scale length can affect the reliability and validity of personality measures, and a quantification of the actual time savings provided by the use the hyperbrief measures in comparison with their longer counterparts. In what follows, we briefly review several potential costs of the use of brief measures.
Loss of Reliability and Content Validity
Longer scales generally have higher coefficient alphas, an index of reliability, than shorter ones. This is because alpha is function of three parameters: the number of item covariances in the scale, the average covariance between pairs of items, and the variance of the total score. The number of covariances increases nonlinearly as the number of items increases. As such, the loss of reliability, which is typically observed with shorter scales, results in attenuated effect sizes. Longer scales can also provide greater coverage of a construct. This is especially important when the construct being assessed is itself rather broad, as is the case for broader personality constructs like extraversion. According to the FFM, neuroticism as a domain has six different aspects: anxiety, angry hostility, depression, self-consciousness, impulsiveness, and vulnerability. Neuroticism within the BFI-2 has three aspects: anxiety, depression, and emotional volatility. As the number of items decreases, coverage of these domains may shrink; a two-item of neuroticism may only be able to inquire about anxiety and depression, leaving other aspects without explicit coverage. Therefore, the resulting variance obtained is representative of the two facets, rather than the entire breadth of the construct. Conversely, for scales defined by several distinct facets, there will be a larger proportion of common variance accounted for (see McCrae, 2015). For instance, in their validation paper, Rammstedt and John (2007) noted that although the BFI-10 was able to capture 70% of the full variance in the BFI, the loss of information was not small. Because the loss was most noticeable and substantial for Agreeableness, the authors suggested supplementing its assessment with an additional item (i.e., “Is considerate and kind to almost everyone”).
Scale Homogeneity and Loss of Subscales
Although scale homogeneity, which is often observed with brief measures, may not be problematic for unidimensional, homogenous constructs, personality is multifaceted in nature, and is thus not well-suited to be assessed in this manner (DeYoung et al., 2007; Digman, 1997; Smith et al., 2009). For instance, due to personality domain heterogeneity, in which various facets of the domain may function differently, interpretation at the domain level can be problematic or unclear (Smith et al., 2009). To illustrate this point, Smith et al. (2009) highlighted how two individuals could each be similarly high on domain level neuroticism, but one may be highly self-conscious, whereas the other may be particularly high on angry hostility. Additionally, this loss of the ability to work at the lower order level is significant given that the most predictive power is afforded at this level (Paunonen & Ashton, 2001). For instance, at the broadest level, neuroticism may not bear strong relations to antisocial behavior, however, when considering these relation at the lower order level, significant relations between antisocial behavior with neuroticism facets of angry hostility and impulsiveness are observed (Vize et al., 2018). Given the potential for interpretive challenges at the domain level, there’s considerable value in gaining a more nuanced understanding of the lower order facets, which is not typically afforded by brief measures.
The Current Study
Given the increased popularity of brief measures, the goals of the present study were to quantify the time to completion of various brief and long measures of the Big Five/FFM and to compare their reliabilities and validities. This study builds on previous work (e.g., Credé et al., 2012) by directly comparing the performance of brief measures to their longer, parent measure. Although the NEO-PI-R remains the most popular assessment for the FFM, its proprietary and copyrighted nature represents a substantial cost and barrier to its use in research and clinical settings. As such, we chose to use the IPIP-NEO-120 as the longer measure representative of the FFM given that it is free to use (and to modify), 50% shorter, and demonstrates close convergence with the NEO PI-R (Maples et al., 2014). 1 To measure the Big Five, we chose to utilize the BFI-2, as it was recently developed and represents an improvement over the original BFI, due to its greater breadth and increased distinction between facets (Soto & John, 2017). Specifically, we examine (a) the time to completion of each measure; (b) the internal consistencies of domains from brief and longer measures; (c) compare the bivariate relations among the various short measures (e.g., TIPI), as well in relation to domains and facets from to two relatively well-validated, longer measures of the FFM and Big Five, respectively—the IPIP-NEO-120 and the BFI-2; and (d) test the extent to which brief measures are able to capture the full range of the domains and facets operationalized in longer measures of general traits (i.e., IPIP-NEO-120; BFI-2). The present investigation was designed to be largely exploratory in nature; therefore, we do not present any specific a priori hypotheses other than a general hypothesis that there would be a significant cost to the use of brief measures in terms of their reduced reliability and validity.
Method
Participants and Procedure
Participants were 494 adults from the United States recruited from Amazon’s Mechanical Turk (MTurk) who were compensated $3.00 for completion of the online study. A two-step recruitment process was used to screen out individuals who may have taken steps to bypass MTurk’s inclusion/exclusion rules regarding IP addresses (N = 700). Individuals were compensated $0.25 for their initial responses. If participants passed the initial validity check, which was based on content coding of the open-ended questions, 2 participants were invited to participate in the full study, of the 669 participants invited, 496 agreed to participate.
To rule out invalid responses, participant data were excluded if they exhibited an invalid response style on the basis of elevated scores on the Infrequency and/or Virtue scales of the Elemental Psychopathy Assessment (Lynam et al., 2011; n = 0), failed ≥3 (out of 6) attentional checks (n = 0), indicated that they did not pay attention to every item or answer honestly (n = 0), for a response time suggestive of invalid responding (500 seconds; n = 2), or for exhibiting a singular response style (e.g., responding to survey with all 1s) on 85% or more of the items (n = 0). After invalid responders were removed, self-report data were available for 494 individuals (58% female; 80% White, 8% Asian; 7% Black; mean age = 38.6; SD = 11.6). All measures were presented in a randomized order. Institutional review board approval was obtained for all aspects of the study and the study was pre-registered at https://osf.io/4qngu/?view_only=94c503e772c84289b65cba6ba0896b5f.
Measures
Long Measures
International Personality Item Pool NEO–120 (Maples et al., 2014)
The IPIP-120 is a 120-item, open-source measure of the FFM domains and facets. Items are aggregated to yield five broad domains (i.e., Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness) and 30 facets. Participants respond using a 5-point Likert-type scale with options ranging from disagree strongly to agree strongly.
Big Five Inventory–2 (Soto & John, 2017)
The BFI-2 is a 60-item measure of the Big Five personality domains. Participants respond using a 5-point Likert-type scale with options ranging from disagree strongly to agree strongly. Items are aggregated to yield five broad domains and 15 facets.
Brief Measures
Big Five Inventory–10 (Rammstedt & John, 2007)
The BFI-10 is a 10-item, self-report, abbreviated measure of the Big Five Personality Inventory. Participants respond using a 5-point Likert-type scale with options ranging from disagree strongly to agree strongly.
Ten-Item Personality Inventory (Gosling et al., 2003)
The TIPI is a self-report measures of the FFM. Participants respond using a 7-point Likert-type scale with options ranging from disagree strongly to agree strongly. Each domain (i.e., Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness) is comprised of two items.
Mini-International Personality Item Pool (Donnellan et al., 2006)
The Mini-IPIP is a 20-item measure of the Big Five, which is an abbreviated form of the 50-item IPIP-FFM measure (Goldberg, 1999). Participants respond using a 5-point Likert-type scale with options ranging from very inaccurate to very accurate.
Five-factor model rating form (Mullins-Sweatt et al., 2006)
The FFMRF is a rating form, which consists of bipolar descriptions representing each of the five domains. Participants respond using a 5-point Likert-type scale with options ranging from extremely low to extremely high.
Five-factor model unipolar domain descriptions (Created for the present investigation)
For the purpose of the present investigation, unipolar domain descriptions were written to assess the high and low poles of each of the FFM domains (see online Supplemental Materials). For instance, high Neuroticism was described as, “These individuals tend to experience negative emotions more regularly and intensely (e.g., fear, sadness, embarrassment, anger, guilt). They also tend to be self-conscious in social situations and sensitive to criticism;” whereas low Neuroticism is described as, “These individuals tend to be calm (even in stressful situations), even tempered, and relaxed. Individuals rated both items for each domain on a 1 (i.e., strongly disagree) to 5 (i.e., strongly agree) Likert-type scale.” We created these scales with the hope that they might provide fuller content coverage of the underlying domains by describing all facets in the descriptions.
Validity Checks
Elemental Psychopathy Assessment-Validity Items (EPA; Lynam et al., 2011)
The EPA is 178-item self-report measure of psychopathy from the FFM perspective. Here only the 16 items that comprise two validity scales—Infrequency and Unlikely Virtue—were used.
Results
For all analyses for which significance testing is reported, a p value of ≤.005 was used. Prior to conducting analyses, item content of the brief measures was examined to determine whether there was criterion contamination (e.g., identical items) when comparing the shorter and longer inventories. For instance, for the BFI-10, 5 of the 10 items overlap with the BFI-2 (e.g., BFI-10 Item 4; “ . . . is relaxed, handles stress well.”). For the TIPI, no overlapping items were observed. Although the Mini-IPIP does not include overlapping items with other measures of the Big Five, it has six overlapping items with the IPIP-NEO-120 (Maples et al., 2014; e.g., Mini-IPIP Item 5 “have a vivid imagination”). In the instances for which criterion contamination was observed, the overlapping items were removed from the longer inventories. For instance, five items from the BFI-10 overlap with the BFI-2 (i.e., one item from each domain), as such, corresponding BFI-2 facet and domain scores were recalculated after removing the overlapping items. These updated composites were used in place of the original BFI-2 domains and facets when examining their relations with BFI-10 domains.
Alphas, Omegas, average item correlations, and descriptive statistics for all measures are presented in Table 1. Completion time for FFM measures ranged from 24.09 (BFI-10) to 363.61 seconds (IPIP-NEO-120), with a mean completion time of 128.41 seconds. A repeated measures multivariate analysis of variance revealed significant differences among completion times, multivariate F(6, 488) = 300.74; follow-up pair-wise comparisons indicated that each measure differed from all other measures in terms of completion time, all Fs(1, 493) > 40.0; all ps < .001. Reliabilities for the IPIP-NEO-120 domains ranged from .87 (Openness) to .94 (Neuroticism), with a mean α of .76, whereas reliabilities for facets ranged from .72 (Activity Level) to .92 (Depression and Trust), with a mean α of .79. For the BFI-2, reliabilities for domains were similarly good and ranged from .88 (Agreeableness) to .93 (Neuroticism), with a mean α of .75, and facet reliabilities ranged from .72 (Compassion) to 89 (Sociability), with a mean α of .77. Reliabilities for brief Big Five/FFM measures were, as expected, somewhat lower. For the BFI-10, reliabilities for domains ranged from .61 (Openness) to .80 (Neuroticism), with a mean α of .69, and for the TIPI alphas ranged from .51 (Agreeableness) to .83 (Extraversion), with a mean α of .69. For the Mini-IPIP, alphas for domains ranged from .82 (Neuroticism and Openness) to .87 (Extraversion and Agreeableness), with a mean α of .85. Finally, for the FFMRF, reliabilities ranged from .76 (Openness) to .88 (Conscientiousness), with a mean α of .81. Among the brief measures, the FFM domain descriptions evinced the lowest reliabilities, for which αs for domains ranged from .47 (Openness) to .77 (Extraversion), with a mean α of .59.
Descriptive Statistics.
Note. α = Alpha; ω = Omega; AIC = Average Item Correlation; Time S = Mean completion time in seconds; Time SD = standard deviation for completion time in seconds; IPIP = International Personality Item Pool NEO–120 (Maples et al., 2014); BFI = Big Five Inventory–2 (Soto & John, 2017); BFI-10 = Big Five Inventory–10 (Rammstedt & John, 2007); TIPI = Ten-Item Personality Inventory (Gosling et al., 2003); MINI = Mini-International Personality Item Pool (Donnellan et al., 2006); FFMRF = Five-Factor Model Rating Form (Mullins-Sweatt et al., 2006).
Convergence Among Brief Measures
Convergence among brief FFM measures was variable (see Table 2). Convergence was highest for Extraversion (convergent rs ranged from .64 [Mini-IPIP − FFMRF] to .89 [BFI 10 − TIPI], with a mean r of .78), followed by Neuroticism (convergent rs ranged from .67 [FFMRF − BFI-10] to .85 [BFI 10 − TIPI], with a mean r of .76), Conscientiousness (convergent rs ranged from .56 [FFM domain description − BFI-10 and FFMRF − FFM domain description] to .82 [Mini-IPIP − TIPI], with a mean r of .67), Openness (convergent rs ranged from .43 [FFM domain description − BFI-10] to .79 [BFI 10 − Mini-IPIP], with a mean r of .60), and lastly, Agreeableness (convergent rs ranged from .48 [Mini-IPIP − BFI-10] to .66 [FFMRF − FFM domain description], with a mean r of .58). Among the brief measures, the mean convergent validity with other brief measures was .62 for the FFMRF, 3 .66 for the FFM domain descriptions, .70 for the BFI-10, .71 for Mini-IPIP, and .73 for TIPI.
Bivariate Relations Between Brief Measures.
Note. Convergent rs are in bold. BFI–10 N = Big Five Inventory–10 (Rammstedt & John, 2007) Neuroticism; BFI–10 E = Big Five Inventory–10 Extraversion; BFI–10 O = Big Five Inventory–10 Openness; BFI–10 A = Big Five Inventory–10 Agreeableness; BFI–10 C = Big Five Inventory–10 Conscientiousness; TIPI N = Ten–Item Personality Inventory (Gosling et al., 2003) Neuroticism; TIPI E = Ten–Item Personality Inventory Ex–traversion; TIPI O = Ten–Item Personality Inventory Openness; TIPI A = Ten–Item Personality Inventory Agreeableness; TIPI C = Ten–Item Personality Inventory Conscientiousness; MINI N = Mini–International Personality Item Pool (Donnellan et al., 2006) Neuroticism; MINI E = Mini–International Personality Item Pool Extraversion; MINI O = Mini–International Personality Item Pool Openness; MINI A = Mini–International Personality Item Pool Agreeableness; MINI C = Mini–International Personality Item Pool Conscientiousness; FFMRF N = Five–Factor Model Rating Form (Mullins–Sweatt et al., 2006) Neuroticism; FFMRF E = Five–Factor Model Rating Form Extraversion; FFMRF O = Five-Factor Model Rating Form Openness; FFMRF A = Five-Factor Model Rating Form Agreeableness; FFMRF C = Five-Factor Model Rating Form Conscien–tiousness; D FFM N = FFM Neuroticism domain description; D FFM E = FFM Extraversion domain description; D FFM O = FFM Openness domain description; D FFM A = FFM Agreeableness domain description; D FFM C = FFM Conscientiousness domain description.
Convergent and Discriminant Relations Between Domains From Brief and Long Measures
Next, we examined the bivariate relations between brief and long measures of Big Five/FFM domains (see Table 3). Here we tested whether the relations for a given short measure differed as a function of the longer measure (e.g., BFI-10 Neuroticism with BFI-2 and IPIP Neuroticism) and whether different short measures differed from one another in relation to a given longer measures (e.g., BFI-10 Neuroticism vs. TIPI Neuroticism with IPIP-NEO Neuroticism). Generally, brief measure domain scores evinced the most robust relations with BFI-2 and IPIP-NEO-120 Neuroticism (mean rs = .83 and .81, respectively) and Conscientiousness (mean rs = .79 and .77, respectively), followed by Extraversion (mean rs = .77 and .72, respectively), Openness (mean rs = .72 and .70, respectively), and Agreeableness (mean rs = .72 and .70, respectively). Across convergent rs for domains, brief measures typically evinced more robust relations with the BFI-2 than the IPIP-NEO-120. In this same vein, in comparison with brief measures of the Big Five (e.g., BFI-10, TIPI), brief measures of the FFM (e.g., FFMRF and FFM domain descriptions) evinced relations that were often significantly smaller in magnitude with the IPIP-NEO-120 and the BFI-2. Absolute discriminant relations across domains were calculated by averaging the absolute value of each measure’s domain correlations with divergent domains (i.e., absolute value of the average bivariate relation between BFI-10 Neuroticism and Extraversion, Openness, Agreeableness, and Conscientiousness for the IPIP-NEO-120 and the BFI-2). Absolute divergence for brief measures with the longer measures’ domains ranged from .18 (FFM domain descriptions) to .27 (TIPI) with a median of .25.
Bivariate Relations Between Brief and Longer Measures.
Note. |.12 | = significant at p ≤ .005. Correlations in diagonal represent convergent relations between IPIP-NEO-120 and BFI–2. Significant differences (p ≤. 005) within domains are represented by different superscript letters (i.e., down columns) and significant differences (p ≤ .005) across measures are represented by italics. Mean r listed in rows = Mean r for brief measures only; BFI N = Big Five Inventory–2 (Soto & John, 2017) Neuroticism; BFI E = Big Five Inventory–2 Extraversion; BFI O = Big Five Inventory–2 Openness; BFI A = Big Five Inventory–2 Agreeableness; BFI C = Big Five Inventory–2 Conscientiousness; IPIP N = In–ternational Personality Item Pool NEO–120 (Maples et al., 2014) Neuroticism; IPIP E = In–ternational Personality Item Pool NEO–120 Extraversion; IPIP O = International Personality Item Pool NEO–120 Openness; IPIP A = International Personality Item Pool NEO–120 Agreeableness; IPIP C = International Personality Item Pool NEO–120 Conscientiousness; BFI–10 N = Big Five Inventory–10 (Rammstedt & John, 2007) Neuroticism; BFI–10 E = Big Five Inventory–10 Extraversion; BFI–10 O = Big Five Inventory–10 Openness; BFI–10 A = Big Five Inventory–10 Agreeableness; BFI–10 C = Big Five Inventory–10 Conscientiousness; TIPI N = Ten–Item Personality Inventory (Gosling et al., 2003) Neuroticism; TIPI E = Ten–Item Personality Inventory Extraversion; TIPI O = Ten–Item Personality Inventory Openness; TIPI A = Ten Item Personality Inventory Agreeableness; TIPI C = Ten Item Personality Inventory Conscientiousness; MINI N = Mini-International Personality Item Pool (Donnellan, Oswald, Baird, & Lucas, 2006) Neuroticism; MINI E = Mini–International Personality Item Pool Extraversion; MINI O = Mini–International Personality Item Pool Openness; MINI A = Mini–International Personality Item Pool Agreeableness; MINI C = Mini–International Personality Item Pool Conscientiousness; FFMRF N = Five–Factor Model Rating Form (Mullins–Sweatt et al., 2006) Neuroticism; FFMRF E = Five–Factor Model Rating Form Extraversion; FFMRF O = Five–Factor Model Rating Form Openness; FFMRF A = Five–Factor Model Rating Form Agreeableness; FFMRF C = Five-Factor Model Rating Form Conscientiousness; D FFM N = FFM Neuroticism domain description; D FFM E = FFM Ex–traversion domain description; D FFM O = FFM Openness domain description; D FFM A = FFM Agreeableness domain description; D FFM C = FFM Conscientiousness domain de-scription.
Convergence Among Brief Measures’ Domains and IPIP-NEO Facets
Next, we examined the relations between brief domain scores and the 30 IPIP-NEO-120 facets to determine the degree to which they capture facet-level variance (see Table 4). The relations for the BFI-2 with the IPIP-NEO are also presented as benchmarks for these convergent correlations. For IPIP-NEO-120 Neuroticism facets, brief measure domains were found to be most robustly related to Anxiety (mean r = .76) and most weakly related to Immoderation (mean r = .35). Across brief measures, the Mini-IPIP captured the most facet-level variance in IPIP-NEO Neuroticism (mean r = .59), followed by the BFI-10 (mean r = .58), the TIPI (mean r = .58), the FFM domain description (mean r = .57), and the FFMRF (mean r = .51), although the differences in mean variance accounted for across brief measures were generally small. In general, the brief measures accounted for 6% to 15% less variance in the IPIP-NEO Neuroticism facets than did the BFI-2. 4
Bivariate Relations Between Brief Measures and IPIP Facets.
Note. |.12 | = significant at p ≤ .005. Significant differences (p ≤. 005) within domains are represented by different superscript letters (i.e., down columns) and significant differences (p ≤. 005) across measures are represented by different superscript numbers (i.e., across rows). Mean r listed in rows = Mean r for brief measures only; BFI N = Big Five Inventory–2 (Soto & John, 2017) Neuroticism; BFI E = Big Five Inventory–2 Extraversion; BFI O = Big Five Inventory–2 Openness; BFI A = Big Five Inventory–2 Agreeableness; BFI C = Big Five Inventory–2 Conscientiousness; BFI-10 N = Big Five Inventory–10 (Rammstedt & John, 2007) Neuroticism; BFI-10 E = Big Five Inventory–10 Extraversion; BFI-10 O = Big Five Inventory–10 Openness; BFI-10 A = Big Five Inventory–10 Agreeableness; BFI-10 C = Big Five Inventory–10 Conscientiousness; TIPI N = Ten-Item Personality Inventory (Gosling et al., 2003) Neuroticism; TIPI E = Ten-Item Personality Inventory Extraversion; TIPI O = Ten-Item Personality Inventory Openness; TIPI A = Ten-Item Personality Inventory Agreeableness; TIPI C = Ten-Item Personality Inventory Conscientiousness; MINI N = Mini-International Personality Item Pool (Donnellan, Oswald, Baird, & Lucas, 2006) Neuroticism; MINI E = Mini-International Personality Item Pool Extraversion; MINI O = Mini-International Personality Item Pool Openness; MINI A = Mini-International Personality Item Pool Agreeableness; MINI C = Mini-International Personality Item Pool Conscientiousness; FFMRF N = Five-Factor Model Rating Form (Mullins-Sweatt et al., 2006) Neuroticism; FFMRF E = Five-Factor Model Rating Form Extraversion; FFMRF O = Five Factor-Model Rating Form Openness; FFMRF A = Five-Factor Model Rating Form Agreeableness; FFMRF C = Five-Factor Model Rating Form Conscientiousness; D FFM N = FFM Neuroticism domain description; D FFM E = FFM Extraversion domain description; D FFM O = FFM Openness domain description; D FFM A = FFM Agreeableness domain description; D FFM C = FFM Conscientiousness domain description.
For Extraversion facets, brief measure domains were found to be the most robustly related to Friendliness (mean r = .68) and most weakly related to Activity Level (mean r = .33) and Excitement Seeking (mean r = .43). Across brief measures, the FFMRF accounted for the largest mean variance in the facets (mean r = .49), followed by the Mini-IPIP (mean r = .47), the BFI-10, TIPI, and the FFM domain description (mean rs = .46), although these differences were very small in nature. In general, the brief measures accounted for 8% to 11% less variance in the IPIP-NEO Extraversion facets than in the BFI-2.
For Openness, brief measures evinced the most robust relations with Intellect (mean r = .58), followed by Imagination (mean r = .55), Artistic Interests (mean r = .52), Adventurousness (mean r = .46), Liberalism (mean r = .37), and Emotionality (mean r = .13). Across facets, mean rs for brief measures ranged from .36 (TIPI) to .42 (Mini-IPIP). In general, the brief measures accounted for 4% to 7% less variance in the IPIP-NEO Openness facets than did the BFI-2.
Apart from Modesty (mean r = .18), brief measures exhibited moderate to large relations with Agreeableness facets, with mean rs ranging from .41 (Morality) to .55 (Trust), with the most robust relations across facets observed for the Agreeableness FFM domain description (mean r = .45), followed by the Mini-IPIP (mean r = .44), the TIPI (mean r = .43), the FFMRF (mean r = .42), and the BFI-10 (mean r = .39). Overall, the differences across brief measures were relatively small. In general, the brief measures accounted for 6% to 11% less variance in the IPIP-NEO Agreeableness facets than did the BFI-2.
Finally, within IPIP Conscientiousness facets, brief measures evinced the most robust relations with Orderliness (mean r = .69), followed by Self-Discipline (mean r = .64), Self-Efficacy (mean r = .59), Dutifulness (mean r = .56), Achievement Striving (mean r = .56), and Cautiousness (mean r = .55). There were relatively substantial differences across brief measures in the mean variance accounted for in the IPIP-NEO facets, which ranged from .44 [FFM domain description] to .59 [TIPI]). In general, the brief measures accounted for 6% to 21% less variance in the IPIP-NEO Conscientiousness facets than did the BFI-2.
Convergence Among Brief Measures’ Domains and BFI-2 Facets
Next, we examined the relations between brief domain scores and the BFI-2 facets to determine the degree to which they capture facet-level variance (see Table 5). The relations for the IPIP-NEO with the BFI-2 are also presented as benchmarks. Within BFI-2 Neuroticism facets, mean rs for brief measures were observed to be large in magnitude (rs ranged from .73 [Depression] to .76 [Anxiety and Emotional Volatility]). Similarly, robust relations were observed across neuroticism facets such that mean rs for brief measures ranged from .55 (FFMRF) to .68 (MINI). In general, the brief measures accounted for 5% to 16% less variance in the BFI-2 Neuroticism facets than the IPIP-NEO.
Bivariate Relations Between Brief Measures and BFI-2 Facets.
Note. All rs were significant at p ≤ .005. Significant differences (p ≤. 005) within domains are represented by different superscript letters (i.e., down columns) and significant differences (p ≤. 005) across measures are represented by different superscript numbers (i.e., across rows). Mean r listed in rows = Mean r for brief measures only; IPIP N = International Personality Item Pool NEO–120 (Maples et al., 2014) Neuroticism; IPIP E = International Personality Item Pool NEO–120 Extraversion; IPIP O = International Personality Item Pool NEO–120 Openness; IPIP A = International Personality Item Pool NEO–120 Agreeableness; IPIP C = International Personality Item Pool NEO–120 Conscientiousness; BFI-10 N = Big Five Inventory–10 (Rammstedt & John, 2007) Neuroticism; BFI-10 E = Big Five Inventory–10 Extraversion; BFI-10 O = Big Five Inventory–10 Openness; BFI-10 A = Big Five Inventory–10 Agreeableness; BFI-10 C = Big Five Inventory–10 Conscientiousness; TIPI N = Ten-Item Personality Inventory (Gosling et al., 2003) Neuroticism; TIPI E = Ten-Item Personality Inventory Extraversion; TIPI O = Ten-Item Personality Inventory Openness; TIPI A = Ten-Item Personality Inventory Agreeableness; TIPI C = Ten-Item Personality Inventory Conscientiousness; MINI N = Mini-International Personality Item Pool (Donnellan et al., 2006) Neuroticism; MINI E = Mini-International Personality Item Pool Extraversion; MINI O = Mini-International Personality Item Pool Openness; MINI A = Mini-International Personality Item Pool Agreeableness; MINI C = Mini-International Personality Item Pool Conscientiousness; FFMRF N = Five-Factor Model Rating Form (Mullins-Sweatt et al., 2006) Neuroticism; FFMRF E = Five-Factor Model Rating Form Extraversion; FFMRF O = Five-Factor Model Rating Form Openness; FFMRF A = Five-Factor Model Rating Form Agreeableness; FFMRF C = Five-Factor Model Rating Form Conscientiousness; D FFM N = FFM Neuroticism domain description; D FFM E = FFM Extraversion domain description; D FFM O = FFM Openness domain description; D FFM A = FFM Agreeableness domain description; D FFM C = FFM Conscientiousness domain description.
For Extraversion, in most instances, brief measures captured significantly more variance in Sociability (mean r = .83) than in Assertiveness (mean r = .53) or Energy Level (mean r = .52). Across Extraversion facets, the most robust relations were observed the TIPI (mean r = .56) and the Mini-IPIP (mean r = .55), followed by the BFI-10 (mean r = .53), the FFMRF (mean r = .51), and the FFM domain description (mean r = .49). In general, the brief measures accounted for 2% to 9% less variance in the BFI-2 Extraversion facets than the IPIP-NEO.
For BFI Openness, the brief measures exhibited similarly robust relations within and across facets. Within facets, the most robust relations were observed for Intellectual Curiosity and Creative Imagination (mean rs = .63), followed by Aesthetic Sensitivity (mean r = .56). Across brief measures, the mean rs ranged from .32 (FFM domain description) to .60 (Mini-IPIP), suggesting rather substantial divergence in some cases. In general, the brief measures accounted for up to 22% less variance in the BFI-2 Openness facets than the IPIP-NEO, although in two instances the brief measures accounted for as much (BFI-10) or more (Mini-IPIP) than did the IPIP-NEO.
For BFI-2 Agreeableness, brief measures exhibited slightly larger relations with BFI Compassion (mean r = .65) than with BFI Trust (mean r = .62) and Respectful (mean r = .56). Across brief measures, differences were observed such that mean rs ranged from .45 (FFMRF) to .55 (TIPI). In general, the brief measures accounted for 2% to 12% less variance in the BFI-2 Agreeableness facets than the IPIP-NEO.
Finally, for BFI Conscientiousness, mean rs for brief measures were large in magnitude (rs ranged from .71 [Productiveness] to .74 [Organization]); however, across measures, relations were rather variable in magnitude, with mean rs ranging from .46 (FFM domain description) to .67 (MINI). In general, the brief measures accounted for 3% to 24% less variance in the BFI-2 Conscientiousness facets than the IPIP-NEO.
Variance Accounted for by Brief Measures in Domains from the BFI-2 and IPIP-NEO
Finally, the incremental variance accounted for by the longer inventories (i.e., the IPIP-NEO-120 or BFI-2) and each of the brief measures (e.g., the BFI-10) was examined (see Table 6). For each of the trait domains, we conducted two regressions for each of the longer inventories (i.e., the IPIP-NEO-120 and the BFI-2). In the first set of analyses, shown in the top part of the Table 6, a domain score from one of the longer inventories was regressed onto the same domain scores from all of the shorter inventories simultaneously. This analysis provides information as to whether any of the short inventories capture unique variance in the longer form beyond the other short inventories. In the second set of analyses, shown in the bottom of Table 6, the domain score from a longer inventory was regressed onto the same domain score from the other longer inventory and the domain score from a single short inventory. 5 These analyses provide information on the degree to which the other longer inventory predicts above and beyond each short inventory; in a sense, it demonstrates how much variance is lost in moving to the shorter inventory.
Multivariate Relations Between Brief and Longer Measures.
Note. The top part of the Table presents results (sr2) from an analysis in which the domain score from each longer inventory is regressed on to the domains scores of all five short inventories simultaneously. The bottom part of the Table presents results (sr2) from analyses in which the domain score from each longer inventory is regressed onto the domain score from the other longer inventory and the domain score from a single short inventory. BFI = Big Five Inventory–2 (Soto & John, 2017); IPIP = International Personality Item Pool NEO–120 (Maples et al., 2014); BFI–10 = Big Five Inventory–10 (Rammstedt & John, 2007); TIPI = Ten–Item Personality Inventory (Gosling et al., 2003); MINI = Mini–International Personality Item Pool (Donnellan et al., 2006); FFMRF = Five–Factor Model Rating Form (Mullins–Sweatt et al., 2006); FFM Domains = FFM domain description.
p ≤ .005.
In analysis 1, for which all brief measures were entered simultaneously to predict the domains scores from the two longer measures, the individual brief measures generally accounted for small amounts of unique variance in the IPIP-NEO-120 domains, with mean sr2 values ranging from 1% (Neuroticism) to 3% (Extraversion). For the BFI-2, brief measures also accounted for small amounts of unique variance, such that mean sr2 values ranged from 1% (Neuroticism) to 3% (Agreeableness). There was evidence for model-based effects for which FFM measures (e.g., FFMR; FFM domain descriptions) accounted for unique variance in some IPIP-NEO domains, whereas measures more aligned with the Big Five were more successful in accounting for unique variance in the BFI-2.
In the analyses that examined the incremental utility of the longer measures, above and beyond the individual brief measures, the BFI-2 routinely contributed a significant amount of variance beyond brief measures in the prediction of IPIP-NEO domains, with mean sr2 values ranging from 16% (Neuroticism) to 24% (Conscientiousness) with a mean sr2 of .20. Similarly, IPIP domains accounted for significantly more variance in BFI-2 domains beyond each of the brief measures, with mean sr2 values ranging from 16% (Neuroticism) to 23% (Conscientiousness) with an overall mean sr2 of .19. Averaging across the longer measures and within domain, the longer measures accounted for between 13% (Openness) to 27% (Conscientiousness) in additional variance beyond the individual brief measures.
In contrast and as expected, brief measures accounted for relatively little unique variance beyond the longer inventory. For the IPIP-NEO, brief measures accounted for, on average, an additional 1% (Extraversion and Conscientiousness) to 4% (Openness and Agreeableness) beyond BFI-2 domain scores. Similarly, brief measures accounted for small increments of variance in the BFI-2 domains beyond the IPIP-NEO domains with mean sr2 values ranged from 3% (Conscientiousness) to 6% (Openness) on average. It is worth noting, however, that there were cases for which brief measures demonstrated more substantial incremental utility but here again they seemed to be examples of model-based effects. For instance, the two FFM-based brief measures accounted for significant additional variance in IPIP-NEO Openness above and beyond the BFI-2, whereas Big Five-based brief measures like the TIPI and Mini-IPIP accounted for additional variance in BFI-2 Openness above and beyond the IPIP-NEO.
Discussion
Personality matters. It is a critical correlate and predictor of many important outcomes including academic (Poropat, 2009) and occupational success (e.g., Barrick & Mount, 1991), marital functioning (Malouff et al., 2010), parenting (e.g., Prinzie et al., 2009), as well as physical (e.g., Luchetti et al., 2014; Sutin et al., 2016) and psychological health (e.g., Kotov et al., 2017) including substance use (e.g., Malouff et al., 2006), antisocial behavior and aggression (Vize et al., 2019), well-being (e.g., DeNeve & Cooper, 1998), and even longevity (Terracciano et al., 2008). It goes without saying that the reliable and valid assessment of personality warrants significant attention. The aim of the current study was to examine relatively commonly used, brief measures of the Big Five/FFM that prioritize efficiency and compare them with validated longer measures—the BFI-2 (Soto & John, 2017) and IPIP-NEO 120 (Maples et al., 2014). We believe such comparisons are needed as measures such as the TIPI (Gosling et al., 2003) have become increasingly popular given their brevity, which allows them to be included in a wide array of surveys and experiments, even those for which personality may be relatively ancillary. Of course, with the use of short forms comes concerns that efficiency may be achieved at the cost of validity (e.g., Smith et al., 2000). In the current study, we examined the time savings associated with the use of briefer versus longer measures, reliability, convergent validity among the measures and with the longer measures, coverage across personality facets, and incremental validity.
Timing
Bearing in mind the use of MTurk, which includes semi-professional survey takers who have reason to prioritize efficiency and speed when responding, participants were able to take all personality measures relatively quickly. The two longer measures—the IPIP-NEO and BFI-2 (120 and 60 items, respectively)—took, on average, just over 6 and 3 minutes each. The shorter measures ranged from only 24 seconds for the BFI-10 to just over 2 minutes for the FFMRF. These results suggest that the largest difference across the measures is a difference of a little over 5 minutes. Among the shorter measure, which varied from 10 to 30 questions, all were able to be completed in under 3 minutes. Ultimately, quite a bit of personality information can be gleaned relatively quickly, at least among samples with relatively high levels of education and motivation to proceed quickly. Although the differences among the shorter and longer ones may be meaningful in some circumstances, we believe that for most personality-focused work, savings of 2 to 5 minutes may not be sufficient to recommend one of the particularly short measures over longer measures such as the BFI-2 and IPIP-NEO-120. In addition to the reductions in reliability and validity noted here, one loses the ability to work at the level of the lower order facets. The inability to assess facet-level personality data is problematic given research demonstrating that facets often trump domains in terms of predictive power (Paunonen & Ashton, 2001) and can be particularly helpful in describing multidimensional constructs such as personality disorders (e.g., Lynam & Widiger, 2001). For example, a few personality disorders are characterized by only one or two facets within a given domain (e.g., modesty and avoidant personality disorders; Lynam & Widiger, 2001), whereas others are characterized by a mixture of low and high facet scores from within the same domain (e.g., facets of Neuroticism and psychopathy; Miller et al., 2001). Similarly, working at the level of facets is also particularly helpful when the narrower traits of a broader personality domain relate to important outcomes or constructs in relatively differentiable ways that can be hidden by total scores (Ashton et al., 1995), as is the case for extraversion’s relations to psychopathology (e.g., Watson et al., 2019) and neuroticism’s relations to antisocial behavior and aggression (e.g., Vize et al., 2018).
Reliability
As expected, the reliability of the various measures differed quite substantially. The mean trait domain omegas for the IPIP-NEO and BFI-2 were .94 and .92, respectively, whereas the shorter measures varied quite a bit, with the Mini-IPIP and FFMRF faring well (median omegas = .89 and .86) but with more mixed results for the TIPI (median omega = .73), BFI-10 (median omega = .68), and FFM domain descriptions (median omega = .57). In fact, if one sets an omega of .70 as a cutoff for minimally acceptable internal consistency, several of the measures had multiple unacceptable domain scores: FFM domain description scores (4 of 5), BFI-10 (3 of 5), and TIPI (2 of 5). Although some might argue that test–retest reliability might be a better indicator (McCrae et al., 2011), which we did not assess due to limited resources for the study, these findings are concerning for several of the brief measures.
Convergent Validity
Within Brief Measures
Among the briefer measures, convergence varied across domains and measures. Across domains, the brief measures manifested the lowest mean convergence for Agreeableness (mean r = .58), followed by Openness (mean r = .60), Conscientiousness (mean r = .67), Neuroticism (mean r = .76), and Extraversion (mean r = .78). On the one hand, convergent validities for measures of the same construct that range from .70 to .80 seem reasonable and are consistent with those found across many different constructs in clinical psychology (e.g., depression, anxiety, personality disorders, etc.; Miller et al., 2012; Watson et al., 1995) but even this range is mostly below the convergent correlations for the longer forms which ranged from .78 for Openness to .90 for Neuroticism and Conscientiousness. Even in the best-case scenario (i.e., Extraversion) using the mean rs reported above, the measures only share 61% of their variance; in the worst-case scenario (i.e., Agreeableness), they share only 34% of their variance. At the level of the individual short forms, the best- and worst-case scenarios range from 79% variance shared for TIPI and BFI-10 Extraversion to 18% variance shared for the FFM Openness domain description and BFI-10 Openness. The data are clear that the short measures are not isomorphic with one another and this seems especially true for Agreeableness and Openness. To provide some context, the two longer measures shared from 61% (Openness) to 81% (Neuroticism; Conscientiousness) of their variance indicating much closer correspondence.
Brief Measures With Longer Measures
In general, the briefer measures generated reasonable convergent correlations with the two longer measures that was lowest for Openness (mean rs = .72 and .70) and highest for Neuroticism (mean rs = .83 and .81). When examined in relation to the BFI-2 domains, mean convergence across domains were: .65 (FFM domain descriptions), .68 (FFMRF), .77 (BFI-10), .81 (TIPI), and .84 (Mini-IPIP). The mean convergence with the IPIP-NEO-120 did not yield as wide of a range of convergence with mean rs of .70 (FFM domain descriptions), .71 (FFMRF), .75 (BFI-10), .76 (Mini-IPIP), and .77 (TIPI). In general, the two brief FFM-based measures performed more poorly than did the Big Five-based measures and these differences were larger in the cross-model comparisons.
Coverage of Personality Facets
In general, the brief measures provided relatively poor coverage of several IPIP-NEO-120 facets including Emotionality (mean r = .13), Modesty (mean r = .18), Activity Level (mean r = .33), Immoderation (mean r = .35), Liberalism (mean r = .37), Morality (mean r = .41), Excitement Seeking (mean r = .43), and Adventurousness (mean r = .46). There were no substantial gaps (mean rs ≤ .50) in coverage provided by the briefer measures in relation to the BFI-2 facets. Overall, the briefer FFM measures did not fare as well as the Big Five-based measures in accounting for variance in BFI-2 facets of Openness and none of the measures tended to fare as well as the IPIP-NEO-120 in accounting for variance in BFI-2 facets. It is also worth noting that while the FFMRF assesses each of the same facets as the IPIP-NEO-120 using a single item per facet, there was little evidence to suggest that this feature of the FFMRF resulted in more successful capture of the IPIP-NEO-120’s lower order facets, although it did do marginally better for some facets such as Modesty and Cheerfulness.
These results have important implications. As noted previously (e.g., Miller et al., 2011), Big Five measures do not capture the honesty–humility aspects of antagonism that are included in FFM-based measures, which likely explains the rise of the HEXACO model of personality (e.g., Ashton & Lee, 2007). 6 Thus, Agreeableness domain scores from the BFI-2 and all tested brief measures will demonstrate more modest relations with constructs for which these facets are relatively central, such as grandiose narcissism and psychopathy (e.g., O’Boyle et al., 2015) or the “dark triad” more generally (Vize et al., 2020). The current results also suggest that the various brief measures are not as well-situated to capture all aspects of impulsivity, at least as conceptualized by the UPPS model (e.g., Whiteside & Lynam, 2001) that has become quite prominent over the past 20 years as they lack content related to negative urgency (tendency to act rashly in the face of negative emotions—captured by IPIP-NEO-120 facet of Immoderation) and excitement seeking (tending to be drawn to novel and potentially dangerous activities). At the domain level, the brief measures appear to prioritize the communal aspects of Extraversion to a greater extent than the agentic components, which bears important implications for understanding some aspects of psychopathology (e.g., Watson et al., 2015), including narcissism (Crowe et al., 2019).
There were also notable differences in the degree of convergence for measures of openness, which likely reflects, in part, substantial differences in how these domains are represented in the Big Five and FFM measures. For instance, in documenting the creation and revision of the NEO personality inventories, Widiger and Crego (2019, p. 73-74) reported that Costa and McCrae “conceptualized openness prior to their knowledge of the Big Five to involve ideal personality traits of self-actualization, an open mind, and self-realization . . . this was not how intellect was conceptualized within the Big Five (Goldberg, 1980, 1982).” Widiger and Crego (2019) note that “McCrae (1990) eventually acknowledged that FFM Openness was problematically aligned with Big Five intellect, but no revision to the openness scale ever occurred” (p. 74). Widiger and Crego (2019) also suggest that these differences explain why some measures of openness demonstrate more robust relations to clinically relevant constructs such as schizotypy, oddness, and psychoticism than do others. Similarly, DeYoung’s work (e.g., DeYoung et al., 2007) has emphasized that this domain is undergirded by factors of openness and intellect and that these subdomains can have important differential relations with constructs like intelligence and apophenia (DeYoung et al., 2012). The current results make it clear that different measures of Openness are far from fungible and while this may be particularly the case across Big Five and FFM-based measures, the degree of convergence within these families of measures is far from perfect.
Incremental Validity
We tested incremental validity in two ways. First, we examined whether any of the brief measures accounted for unique variance above the others in the domain scores from the two longer measures. Second, we tested whether the long measures accounted for substantial additional variance in one another not captured by individual brief measures. There were generally only small pockets of unique variance explained by individual brief measures. For instance, the Mini-IPIP Openness domain accounted for an additional 7% of variance in BFI-2 Openness and 6% of the variance in BFI-2 Conscientiousness not explained by the other brief measures. Alternatively, the FFMRF accounted for an additional 9% and 11% of the variance in BFI-2 and IPIP-NEO-120 Extraversion scores. Other than these few cases, the unique variance accounted for by any given brief measure was relatively minimal.
With regard to the second question, the longer measures accounted for substantial amounts of variance above and beyond the various brief measures. The additional variance accounted for by the IPIP-NEO-120 and BFI-2 ranged from a mean semi-partial r-square of 13% above and beyond the Mini-IPIP domains to a mean semi-partial r-squared of 27% above and beyond the FFM domain descriptions. This predictive advantage was found for all domains but was lower for Neuroticism (mean sr2 = .16) and highest for Conscientiousness (mean sr2 = .23). The variance accounted for by the brief measures over the longer measures was relatively minimal, ranging from a low of 3% (FFMRF) to a high of 5% (Mini-IPIP). Here again, the additional variance was equally dispersed across domains with a range from 2% for Conscientiousness to 7% for Openness. These analyses document that longer measures of personality contain a substantial amount of variance not accounted for by the briefer measures that can be assessed when using one of the longer, faceted measures.
Assessment Implications
The present results are somewhat equivocal regarding short forms. The brief measures do, in fact, take little time to administer and do assess reliable personality signal. However, in line with previous findings (e.g., Credé et al., 2012), efficiency comes with substantial costs in that several of the shorter measures manifested problematic internal consistency for one or more domain, manifested limited convergence with other brief measures of the Big Five/FFM, and leave meaningful personality variance unexplained. Of the shorter measures, the 20-item Mini-IPIP performed the best across the aforementioned markers. The FFMRF had domain scores that were relatively reliable but manifested more limited convergence in some cases with both shorter and longer measures. Both the BFI-10 and FFM domain descriptions demonstrated problems with internal consistencies for multiple domains with the latter manifesting reliably weaker convergent validity. Given that the longer measures—the BFI-2 and IPIP-NEO—can be administered with little additional cost in terms of participant time and do a better job at capturing the full range of content in these domains, we believe these measures should be prioritized except in cases where time is truly of the essence (e.g., broad-scale epidemiological work). In those cases, the Mini-IPIP is probably the best choice given it only takes 1 minute (in this setting) and provides the best mixture of reliability and validity. It is worth noting that none of the brief measures nor the BFI-2 captured a great deal of variance in certain IPIP-NEO facets related to aspects of impulsivity and honesty–humility. These facets are central to certain forms of psychopathology.
Limitations and Conclusions
There are several notable limitations of the current study, which include the reliance on sole methodology—self-report—to evaluate the performance of these measures. Ideally, one might use a multitrait, multimethod approach to studying these issues (e.g., informant-report, behavioral tasks). Relatedly, due to the finite resources we had to conduct the study we were limited with respect to the number of theoretically relevant criterion constructs included (e.g., only including longer parent measures). Nevertheless, the FFM/Big Five boasts a rich empirical base which connects these measures to a number of well-established outcomes (e.g., Jones et al., 2011; Malouff et al., 2005; Ruiz et al, 2008; Samuel & Widiger, 2008). Additionally, the current study did not examine test–retest reliability, which would have provided important information as to whether the shorter versus longer measures differed in important ways on this index of reliability. The current sampling approach also relies on quasi-professional survey takers who are typically quite familiar with measures of this kind, are positively reinforced for efficiency, and have relatively high levels of education and thus reading ability. This is particularly relevant, for instance, when examining the mean time to completion for these measures. Scholars would be well-served to view these as the higher end for speed and thus somewhat longer times when used in samples with lower reading levels and less explicit reinforcement for an efficient approach. Similarly, this sample is limited in terms of demographic diversity and thus the degree to which these results would generalize to less W.E.I.R.D (Henrich et al., 2010) samples requires further testing.
The current results suggest that brief measures of the Big Five/FFM demonstrate reasonable convergence with one another (median r = .65) and with two longer, faceted measures of personality—the BFI-2 and the IPIP-NEO (median r = .64). There was substantial heterogeneity within these findings, however, such that some domains were more substantially related to one another (i.e., Extraversion and Neuroticism) than others (i.e., Agreeableness and Openness) and some brief measures were more generally convergent with the longer measures than others (i.e., Mini-IPIP, TIPI) than others (i.e., FFM domain descriptions, FFMRF). All measures demonstrated some gaps in their coverage, at least with regard to facets included in the FFM; these “holes” were found most substantially for content related to modesty, straightforwardness/morality, urgency/immoderation, activity level, openness to feelings/emotionality, and openness to values/liberalism. It did not appear that the pattern of relations observed were related to format/method factors. Many of these facets are important components of antisociality/prosociality (e.g., Jones et al., 2011; Thielmann et al., 2020), personality disorders (APA, 2013; Lynam & Widiger, 2001), impulsivity (e.g., Whiteside & Lynam, 2001), and psychopathology (e.g., Watson et al., 2019). More important, the timing data suggest both that shorter and longer measures can be given relatively quickly and that a substantial proportion of meaningful personality variance is left unaccounted for by shorter measures as compared with longer ones. As such, we urge that longer, more reliable, and faceted measures, such as the BFI-2 (Soto & John, 2017) and IPIP-NEO (Maples et al., 2014), be used when feasible.
Supplemental Material
Supplemental_material – Supplemental material for A Comparison of the Validity of Very Brief Measures of the Big Five/Five-Factor Model of Personality
Supplemental material, Supplemental_material for A Comparison of the Validity of Very Brief Measures of the Big Five/Five-Factor Model of Personality by Chelsea E. Sleep, Donald R. Lynam and Joshua D. Miller in Assessment
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
