Abstract
The Hogan Personality Inventory (HPI) and Hogan Developmental Survey (HDS) are among the most widely used and extensively well-validated personality inventories for organizational applications; however, they are rarely used in basic research. We describe the Hogan Personality Content Single-Items (HPCS) inventory, an inventory designed to measure the 74 content subscales of the HPI and HDS via a single-item each. We provide evidence of the reliability and validity of the HPCS, including item-level retest reliability estimates, both self-other agreement and other-other (or observer) agreement, convergent correlations with the corresponding scales from the full HPI/HDS instruments, and analyze how similarly the HPCS and full HPI/HDS instruments relate to other variables. We discuss situations where administering the HPCS may have certain advantages and disadvantages relative to the full HPI and HDS. We also discuss how the current findings contribute to an emerging picture of best practices for the development and use of inventories consisting of single-item scales.
The Hogan Personality Inventory (HPI) and Hogan Developmental Survey (HDS; Hogan & Hogan, 2009) have been extensively validated for predicting job performance in high-stakes settings, where respondent scores are used for selecting applicants for jobs and other organizational purposes (Hogan et al., 1996). Hundreds of thousands of people from more than 100 countries complete the HPI and HDS instruments each year for such purposes, with over 11 million total assessments to date. Despite this, the Hogan inventories are rarely utilized in basic research. Particularly given the aims of measuring job performance, leadership potential, and other aspects of how individuals behave in organizational settings, the HPI and HDS have been designed to measure somewhat different personality content than the most commonly used personality inventories in basic research. Although Hogan Assessment Systems offers free administration and scale/subscale scoring of their assessments for academic researchers, the scoring key is considered critical intellectual property and is not shared. To prevent reverse engineering of the scoring key and to protect individual privacy, item-level responses are also not shared. The overall length of the HPI and HDS instruments may further lead many basic researchers to use other instruments for their investigations. Ultimately this serves to further increase the misalignment between basic researchers and practitioners (Kaufman, 2022).
Here we describe the development of the Hogan Personality Content Single-Items inventory, or HPCS, to address these limitations and to help bridge the gap between assessment tools used in applied settings and those used in basic research. Within the HPCS, each of the 74 subscales within the HPI and HDS—which are assessed by 3 to 6-item scales referred to as Homogeneous Item Clusters, or HICs—are assessed by a single item, resulting in an inventory approximately one-fifth the length of the combined HPI and HDS instruments. We continue by describing some reasons for developing single-item inventories in general, and then for doing so more specifically to assess the personality content within the HPI and HDS instruments.
Why Create an Inventory Consisting of Single-Item Scales?
There are a number of personality inventories that have been designed to consist entirely of single-item scales. Examples include the Adjective Check List (ACL; Gough, 1960); the California Adult Q-Sort (CAQ) and California Child Q-Sort (CCQ) (Block, 1961); the Shedler-Westen Assessment Procedure (SWAP; Shedler & Westen, 2007); the Riverside Situation Q-Sort (RSQ; Funder, 2016) and Riverside Behavior Q-Sort (RBQ; Furr et al., 2010); and the Inventory of Individual Differences in the Lexicon (IIDL; Wood et al., 2010). There are also numerous single-item scales that have been developed to assess more specific constructs, such as the Big Five personality traits (Denissen et al., 2008; Woods & Hampson, 2005) and associated facets (Soto & John, 2017b), narcissism (Konrath et al., 2014), trait empathy (Konrath et al., 2018), self-esteem (Robins et al., 2001), and job satisfaction (Wanous et al., 1997).
Single-item scales are often offered as “scales of last resort”—or scales that should be used almost exclusively when corresponding longer measures cannot be administered due to limited time and resources. For instance, in presenting a 15-item inventory to assess the 15 traits found on the BFI-2 with an item apiece, the scale developers suggest administering this instrument “would clearly be better than not measuring personality at all” (p. 77), but use the final sentence of their article to emphasize “for most studies, however, we recommend administering the full measure due to its greater reliability” (p. 79; Soto & John, 2017b).
We depart from the “scales of last resort” stance and describe some positive arguments for the use of inventories comprised of single-item scales, and how they can be developed and used more effectively (Block, 1961; Condon et al., 2020; de Vries et al., 2016; Fisher et al., 2016; Matthews et al., 2022). As we detail, single-item scales are not just “not as bad” as people might think relative to longer multi-item scales but have some advantages that are unshared with multi-item scales.
Brevity of Administration
The most salient and widely understood advantage of inventories comprised of single-item scales to many researchers is simply their practicality. For instance, completing the current HPI and HDS instruments requires participants to rate 374 items. Having respondents rate a single item for each of the 74 HICs contained within these instruments would result in an instrument less than one-fifth this length.
The advantages of shorter inventories should not be readily dismissed. Since ultimately time is a limited commodity for both the survey administrators and respondents, the use of shorter inventories opens opportunities to assess additional content within the survey. Or it can just result in a shorter overall assessment. This in turn can result in higher quality data in many situations as longer surveys often increase careless or insufficient effort responding (Bowling et al., 2022; DeSimone et al., 2015). Shorter surveys can also open opportunities to collect data from individuals who otherwise would be unlikely to complete longer assessments at all (Bergkvist & Rossiter, 2007; Dejonckheere et al., 2022). For instance, it is sometimes recommended to not offer financial incentives when collecting observer reports, as incentives can increase invalid responding (Vazire, 2006). We may reasonably expect a considerably higher response rate if such volunteers are asked to complete a survey requiring about 10 to 15 minutes to complete rather than one requiring closer to an hour.
The Reliabilities of Single-Item Scales are Estimable and Typically Decent
A major reservation toward the use of single-item scales has been the question of how to estimate their reliability,
Some authors have shown how internal consistency-type reliability estimates can be provided for single-item measures (e.g., Denissen et al., 2008; de Vries et al., 2016; Wanous & Hudy, 2001). However, we argue that the more important advance has been a recent understanding that retest correlations may be more appropriate indices of measurement reliability. McCrae and colleagues (2011) noted that the square root of a measure’s reliability serves as the expected upper-limit of its ability to correlate with other variables:
A major advantage of retest correlations is that they can be estimated for scales of any length—including single-item scales. The reliability of single-item scales is increasingly indexed as the items’ retest correlations over short intervals, such as a couple of days or weeks (de Vries et al., 2016; Henry et al., 2022; Lowman et al., 2018; Matthews et al., 2022; Wood et al., 2010). When retest correlations are estimated in this manner, they have generally been found to be higher than many researchers may expect. For instance, Henry et al. (2022) found the 100 items of the HEXACO-100 to have a mean retest correlation of .65 over 13 days, compared with a mean of .81 for the 25 four-item scales of HEXACO facets formed from the same items. Similarly, Wood et al. (2010) found the 61 items of the IIDL instrument to have a mean retest correlation of .62 over 4 days.
Given that the correlations between items within a multi-item scale are frequently in the neighborhood of .40 (e.g., Clark & Watson, 1995, 2019), the fact that retest correlations of single-item scales over such intervals may tend to exceed .60 may strike some researchers as surprising. Furthermore, for many common research purposes, retest correlations estimated over even a couple of days should be regarded as lower bounds of the scale’s reliability, as such estimates will underestimate how highly the scale may correlate with other scales administered within a larger survey due to occasion-specific variance (Chmielewski & Watson, 2009; Lowman et al., 2018; Wood et al., 2023).
“Hit the Nail on the Head” Item-Development Strategy
When using single-item scales, it is generally possible to create an item that indicates the intended construct in a fairly direct and face-valid manner (Matthews et al., 2022; Wood et al., 2010). For instance, Robins et al. (2001) developed a single-item self-esteem consisting of the item “I have high self-esteem”; and Konrath and colleagues (2014, 2018) developed a single-item narcissism scale consisting of the item “I am a narcissist” and a empathy scale consisting of the item “I am an empathetic person.” As these examples indicate, single-item scales can be created to largely serve as a direct self-report of the construct of interest (Gough, 1960; Wood et al., 2010), which we will call the “hit the nail on the head” strategy.
This strategy appears to be somewhat implicit or accidental in the development of single-item measures and differs a bit from how multi-item personality scales are frequently created. For instance, if a scale developer is interested in creating a multi-item scale of sociability, they may use items to measure some of the behaviors more sociable people are more likely to do, such as talking to strangers in lines, making friends easily in new environments, or to enjoy crowds. In contrast, using the “hit the nail on the head” strategy may instead involve asking people to report directly on whether they have the trait understood as the common element of these and other behaviors, either by measuring agreement with a direct self-report of the trait (“are you sociable?”) or with a description of the trait’s central defining characteristics (e.g., “do you enjoy interacting with other people?”).
This very direct approach could be regarded as problematic when creating a scale consisting of more than one item, as the resulting items are likely to be highly synonymous or semantically similar to one another, which scale developers sometimes prescribe avoiding to prevent the creation of “bloated specific” factors (Boyle, 1991; Cattell & Tsujioka, 1964; Oltmanns & Widiger, 2018). However, we argue this is a valuable strategy when the intention is ultimately to use a single item to measure the construct—or at least, the individual’s perceived level of the construct. Indeed, the use of direct, face-valid items is recommended for the assessment of certain constructs (Bergkvist & Rossiter, 2007; Burisch, 1984; Thurstone, 1928).
Ability to Present the Full Scale as Seen by Respondents
Many issues within behavioral science research involve instances where the label used to represent the mean or sum of a multi-item scale fails to make clear the specific content seen and rated by respondents. Differences between scale content and scale labels form the essence of the well-known jingle and jangle problems (Block, 1995; Gonzalez et al., 2021).
On the jingle side: scale labels regularly obscure the existence of semantically redundant items across different scales being correlated. For instance, Nicholls and colleagues (1982) noted that the relatively large association between commonly used self-report measures of masculinity and assertiveness could be largely attributed to the fact that many of the items across the two measures were virtually redundant. Problems of this sort continue to persist, with researchers questioning the meaningfulness of operationalizations of transformational and charismatic leadership (van Knippenberg & Sitkin, 2013), Machiavellianism and psychopathy (Miller et al., 2017), job engagement and other job attitudes (Newman & Harrison, 2008; Newman et al., 2010), psychological grit, and Conscientiousness (Credé et al., 2017), among others. The problem is exacerbated by the regular use of “broad” scales, where a broad range of specific items are averaged to form a single scale score, which regularly obscures the existence of semantically similar items forming the scale scores being correlated (Mõttus, 2016).
On the jangle side: diverging correlations in how different measures with the same label relate to a variable of interest can be due to systematic differences in item content. For instance, differences in how various “Extraversion” scales relate to gender (Costa et al., 2001; Feingold, 1994) have been attributed to differences in content emphases within the broader Extraversion domain (e.g., assertiveness, positive affectivity, and sociability). More generally, there is considerable variation in the nature of what different Big Five scales measure, which largely concerns variation in content emphases (Pace & Brannick, 2010).
One way to see whether both jingle and jangle fallacies may be in operation is simply to “eyeball” the items within the scales to see what content is represented and whether this content is overlapping (e.g., Mõttus, 2016; Newman & Harrison, 2008; Nicholls et al., 1982). Single-item scales provide a means of making such inspections easier: When a scale consists of a single item, it is possible to simply not create a separate label for the scale at all—and instead label the scale using the exact item as seen by participants. This can make it clearer to consumers of the research—including the investigators themselves—how specific properties of the stimuli rated by the respondents could account for observed phenomena.
Why Create an Inventory of Single-Item Scales for the HPI and HDS Inventories?
We continue by describing three additional reasons to create a short inventory focused on the content found within the HPI and HDS inventories in particular.
Connecting Basic and Applied Personality Assessments
A range of commonly used brief personality inventories—such as the 60-item BFI-2 (Soto & John, 2017a), the 60- and 100-item versions of the HEXACO (Ashton & Lee, 2009; Henry et al., 2022), and the 60-item NEO-FFI (Costa & McCrae, 1992)—were designed to assess core vectors of individual differences identified from factor analyzing self-ratings of common lexical terms, following an understanding that these vectors may be particularly socially important (Goldberg, 1993; Saucier & Goldberg, 2001; Wood, 2015). In contrast, the HPI and HDS instruments were designed specifically to assess key traits within socioanalytic theory (Hogan, 1982, 1991). Within socioanalytic theory, individuals are expected to have greater success in groups to the extent that they have more specific traits that facilitate “getting along” and “getting ahead” in interpersonal contexts. The focus on assessing traits theorized to facilitate or inhibit success in groups and organizational settings has resulted in the HPI and HDS assessing many traits that are less emphasized within surveys designed to measure lexically derived personality factors—such as competitiveness, conformity, leadership capacity, and mastery orientation. Although dimensions resembling the Big Five can be extracted from the HPI and HDS instruments (Hogan, 1996), the greater focus on assessing traits relevant to success in organizational settings could result in an associated inventory of single-item scales being better positioned to predict such outcomes.
Assessing More Dysfunctional Individual Differences
Research has shown that measures of traits that concern more dysfunctional and antisocial patterns of behavior often add substantially to the prediction of outcomes such as job performance, leader effectiveness, and the quality of interpersonal relationships (Harms & Sherman, 2021; Padilla et al., 2007). Brief measures in this space frequently focus specifically on assessing the “Dark Triad” dimensions of narcissism, psychopathy, and Machiavellianism (e.g., Jonason & Webster, 2010; Jones & Paulhus, 2014; Paulhus et al., 2021). In contrast, the HDS instrument (Hogan & Hogan, 2001) was designed to assess subclinical levels of the personality disorders found in Axis II of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994). Some of these scales are indeed similar to Dark Triad dimensions—for instance, the Bold scale broadly reflects narcissistic tendencies, while the Mischievous scale is broadly similar to psychopathy. However, the HDS assesses many additional dimensions in this space; for instance, the Cautious scale can be regarded as a subclinical variant of Avoidant personality disorder, reflecting tendencies toward being overly indecisive and risk-averse; and the Diligent scale can be regarded as a subclinical variant of Obsessive–Compulsive personality disorder, reflecting tendencies toward being perfectionistic and micromanaging.
The HDS scales have been found to regularly correlate highly (rs≥ .50) with matching personality disorder scales on instruments such as the Minnesota Multiphasic Personality Inventory (MMPI) (Hogan & Hogan, 2001). However, they were not designed for use in clinical settings, but rather to assess dysfunctional or self-defeating patterns of behavior that may derail one’s relationships or career. These derailing tendencies are thought to manifest when individuals are either unable or unwilling to engage in reputation management by inhibiting their darker impulses.
Better Operationalizing the Distinction Between Identity and Reputation
Socioanalytic theory distinguishes between a person’s identity and reputation—how the person sees oneself and how the person is seen by others, respectively, and argues that a person’s reputation should ultimately be the more predictive of their performance (Hogan & Blickle, 2018; Roberts & Wood, 2006). In practice, the HPI and HDS instruments aim to indicate a person’s reputation indirectly through their responses to self-report items—for instance, a person’s extraversion may be indicated by their self-reported tendencies to go to parties, speak up in meetings, and talk to strangers—actions that may reveal how extraverted the person is seen by others even if the person does not personally identify as an extravert (Hogan & Nicholson, 1988). However, a more direct operation of reputation would be to simply ask other observers who know the person to describe whether the person is an extravert.
However, the length of the full HPI and HDS instruments has made the collection of observer reports relatively infrequent. A considerable briefer inventory would make it much easier to collect observer reports. In turn, a person’s trait identity can be operationalized more directly as the person’s self-reported level of the trait, whereas their trait reputation can be operationalized as the person’s mean reported level of the trait from observer reports (compared with Connelly et al., 2022). This in turn can increase the ability to see how self-perceptions and observer perceptions of the traits assessed by the HPI and HDS instruments differ. For instance, we will explore whether self-ratings show a lower tendency to track trait desirability estimates than observer ratings made by friends and family, as has been found in other personality inventories (Kim et al., 2019), or whether scales concerning more observable traits tend to have higher mean ratings (Funder & Colvin, 1997; Wood, 2015) and self-other agreement (Funder & Dobroth, 1987; Watson et al., 2000).
Method
Stage 1: Development of the HPCS Instrument
We created one item for each of the narrow HICs assessed by the HPI and HDS personality inventories (Hogan & Hogan, 2007, 2009). Each HIC consists of a three- to six-item scale designed to measure a single narrow characteristic. The seven primary dimensions of the HPI each contain four to eight HICs, totaling 41 subscales, whereas each of the 11 dimensions assessed by the HDS contains scales for three HICs, for a total of 33 subscales. There were thus a total of 74 narrow dimensions that would be represented by one item each within the new instrument.
A complete list of the HPI and HDS subscales and of the items created to measure these subscales within the HPCS is given in Table 1. We continue by describing the considerations that were used to create the HPCS items below.
Items of HPCS Inventory in Sample 1 and corresponding HICs Within HPI and HDS.
Note. a The item differed across Samples 1 and 2; see Appendix for the item given in Sample 2.
Item Format
We created each item to be rated under the general stem “Is someone who. . .”—similar to stems used in inventories such as the BFI and BFI-2 (e.g., “I am someone who. . .”; John & Srivastava, 1999; Soto & John, 2017a) and in the International Personality Item Pool (IPIP; Goldberg et al., 2006). This stem was chosen in part to result in items that could be presented without alteration both for self-ratings and for observer-ratings by only making the general stem slightly more specific—for instance “I am someone who. . .” for self-ratings, and “[target] is someone who. . .” for observer-reports.
Each item was formed to fit a “label (subclause)” format—with a short abstract label followed by a longer subclause given in parentheses. As can be seen in Table 1, the label was typically taken fairly directly from the label given to the HIC within the HPI and HDS technical manuals. We additionally examined descriptions of HICs within the manuals and supporting guides (Hogan Assessment Systems, 2016) to assist in writing the item’s subclause in a manner corresponding to the intended emphasis of the HIC. For instance, within the HPI technical manual high scorers on the Easy to Live With subscale are described as “Tolerant and easygoing nature”; the item we created to indicate this subscale was “is Easy to Live With (easy-going/tolerant nature; works well with others).” We intended the “label (subclause)” format of the items to function somewhat analogous to word (word definitions) linkages within dictionaries, where the subclause served to help ground and clarify the intended meaning of the broader label through somewhat more concrete terms. We additionally tried to make the label and subclause largely synonymous to help focus the meaning of the broader item (Wood et al., 2010), so researchers could generally report just the label component when presenting study results (as we will illustrate in subsequent tables and figures) with minimal drift from the meaning participants understood from reading and responding to the complete item.
Additional Considerations in Item Development
Removal of Negations
A number of the HICs involved labels containing negations, such as “Not Anxious” and “No Hostility.” We removed all negations from the items created for this inventory, as items containing negation terms are thought to be more difficult for respondents to process and rate correctly and have been associated with lower levels of properties associated with item reliability and validity—such as retest correlations and self-other agreement (de Vries et al., 2016; McCrae et al., 2011).
For the HPCS items, we usually simply removed negations found in the HIC labels, which reversed the direction of high scores on the dimension from desirable to undesirable. For instance, the HIC “Not Anxious” was assessed by the item “is Anxious (prone to anxiety or worry).” These reversals resulted in an increase in the percentage of items in which high scores indicated undesirable traits. This is often seen as a desirable property of an inventory, in part as inventories with a better balance in the percentage of items indicating positive and negative characteristics increase the ability to use participant response patterns such as mean ratings, normativeness, and indices of response similarity over retests as data quality screens (Henry et al., 2022; Soto et al., 2008; Wood et al., 2017).
Item Length
We additionally aimed to keep the total length of the item relatively short to encourage more frequent reporting of the complete item by researchers and additionally given that shorter items likely facilitate both quicker and more consistent responses among participants (de Vries et al., 2016). Items in the HPCS ranged from 33 to 99 characters, with a mean of 66. As items will sometimes be shown in results with just the label portion (i.e., by removing the “subclause” portion of the item given in parentheses), we also aimed to keep the abstract label portion of the item particularly short; the abstract label portion of the HPCS items ranged from 7 to 26 characters, with a mean of 16.
Stage 2: Validational Evidence of HPCS Instrument
We utilized data from two samples to examine the properties of the resulting HPCS items, as described below. Materials administered to Sample 1 were approved under University of Alabama IRB #22-09-5972. Sample 2 was conducted by a research team at Hogan Assessment Systems, and the materials administered were reviewed internally by the research team to avoid ethically questionable research practices.
Sample 1 afforded the ability to estimate item-level retest stability and self-other agreement, to provide reliability and validity evidence of the measures. Respondents in Sample 2 completed both the HPCS and the HPI and HDS instruments, allowing estimation of the convergent validity with the full instruments the HPCS was designed to assess. Sample 2 respondents additionally completed measures of organizational attitudes and experiences, which afforded the ability to examine whether the HPCS items and corresponding HICs from the full HPI/HDS instruments performed similarly in predicting organizational criteria. Subsequently, we report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.
Sample 1: Undergraduate Management Students
The HPCS was collected as part of a larger data collection effort in which students in an undergraduate management course completed one major survey each week for 5 weeks on the Qualtrics platform. Students completed the surveys for course participation credit and completed the HPCS in both the first and the fourth week of the larger data collection.
HPCS Self-Ratings
Items were rated on a continuous slider scale, anchored from “Strongly Disagree” to “Strongly Agree,” with a scale midpoint of “Neither Agree Nor Disagree,” which were scaled to have scores ranging from 0 to 100. Each of the 74 items was presented in one of five sets, with four sets containing 15 items, and one set containing 14 items. The order in which participants rated the five sets was randomized, and the order of the items within each set was additionally randomized. All of the HPCS items were rated under the general stem “I am someone who. . .” A total of 377 students completed the first administration of the HPCS, and 341 students completed the second administration. Note that some of the students who completed the second administration did not complete the first.
Participants were excluded if they failed either a response time or response variability screen. We describe these screens below, and the number and percentage of respondents failing each screen during the first and second administration of the HPCS.
Low Response Time
For each of the five pages of HPCS items, the time of the first and last click on the page was saved. This makes it possible to compute a seconds-per-item index; participants were excluded if they were indexed as completing any of the five sets of items at a rate of faster than 1-second-per-item, which has been found to be a valid indicator of careless or inattentive responding for items consisting of short phrases or sentences (Jaso et al., 2022; Wood et al., 2017). A total of 25 (6.6%) participants failed the response time screen during the first administration of the HPCS, and 28 (8.2%) failed this screen during the second administration.
Low Response Variability (N = 20 [5.3%])
The percentage of maximum response variability was indexed for each participant by this equation:
A total of 20 (5.3%) participants failed the response variability screen during the first administration of the HPCS, and 29 (8.5%) failed this screen during the second administration.
When coded dichotomously (failed screen = 1, passed screen = 0), the two screens were highly positively correlated both during the first administration of the HPCS (r = .51), and the second administration 3 weeks later (r = .52). A total of 33 (8.8%) of the 377 respondents of the first administration of the HPCS were excluded due to failing at least one screen. A total of 41 (12.0%) of the 341 respondents of the second administration of the HPCS failed at least one screen and were excluded.
Of the 344 who completed the first assessment, 196 (57%) were male and 148 (43% were female. On average, participants were
HPCS Peer-Ratings
After providing self-reports on the HPCS, participants were asked to provide the names of individuals who could provide observer reports. Specifically, participants read: Please provide the names of up to 6 individuals who know you at least relatively well and provide their contact information. Try to include at least some individuals who may know you outside of UA (for instance: parents/siblings, high school friends, coworkers). They will be contacted only once to provide an independent description of your personality on the items you just rated. Their ratings will also be used for in-class exercises. Their ratings will not be shared with you, and their participation (or non-participation) will not affect your course credit.
When the individuals who the participants had listed were contacted, they received the following instructions: Please read the following before continuing to the next page. You have been nominated by [target] as someone who can describe their personality. As part of the activity, you will be asked to complete a personality survey describing [target]. The survey usually takes participants about 10 minutes to complete, and should not take more than 15 minutes. The responses you provide will be used as part of an in-class exercise which concerns how much individuals and the people that know them agree about descriptions of their personality. Please note that [target] will not be able to see your specific ratings, so please feel free to answer honestly. As we value your privacy, we will not contact you again. However, we hope you will agree to help with this exercise! To begin the activity, please select “Continue to survey” below.
The same response time and response variability screens described above were also used to screen observer ratings; these were again highly correlated (r = .60). However, the percentage of observer ratings that failed the screens was considerably lower than the self-ratings, with only 5 (1.2%) of the 420 observer reports failing each screen, and only 7 (1.7%) failing at least one screen, resulting in 413 observer reports included in the analysis.
Of the 344 participants who completed the first self-report of the HPCS, 200 (58%) had at least one observer report. More specifically, 86 participants had exactly one observer report, 63 had two, 22 had three, 19 had four, 6 had five, and 4 had six.
These observers were asked to indicate the nature of their relationship with the participant; the response options provided were not mutually exclusive and each observer could check multiple options. In all, 46% of observers described themselves as the participant’s friend, 2.9% as their current coworker, 1.9% as a past coworker, 1.7% as their manager or boss, 7% as their romantic partner, 32.4% as their parent or guardian, 9.9% as their sibling, 3.4% as an extended relative, 6.3% as their fraternity brother or sorority sister, and 2.9% as “other,” which included grandparents, roommates, or teammates.
Sample 2: Hogan Assessment Systems Online Panel
A second sample was collected from MTurk by the research team at Hogan Assessment Systems. The sample began with 2,051 workers who were invited to “complete a survey about your attitudes and behaviors.” Workers were required to live in the United States and to have a HIT (Human Intelligence Task) approval rate to be greater than 95%.
From this initial sample, 407 workers (19.84% of the initial sample) were selected to participate in a multipart, yearlong study. To be selected, workers had to (a) be employed at least part-time, (b) work at least 20 hours per week (outside of MTurk), (c) report a valid job title (workers who listed MTurker, student, or retired were not included), (d) speak English as their primary language, and (e) be willing to take multiple surveys over the course of 1 year. In addition to these criteria, workers needed to correctly answer at least 16 (33%) of 48 true or false questions on a speeded cognitive assessment. Workers were paid $8.50 per hour to complete study materials over the course of a year.
HPI/HDS Assessments
The MTurk participants completed both the full HPI and HDS instruments in addition to the HPCS, allowing us to test convergent validity with full scales. The HPI and HDS were completed by almost all participants on a day in early February, whereas almost all participants completed the HPCS on a day at the end of August, resulting in a measurement interval of 202 days (or 6.7 months) separating these measurements for most participants.
HPCS Assessment
A total of 287 participants who had completed the earlier HPI/HDS instruments went on to complete the HPCS assessment. Of these, 151 (52.6%) were male, and 135 (47%) were female; one participant did not disclose their biological sex. On average, participants were
Thirteen (4.5%) of the 287 participants were removed from data analyses because they either responded to the questions too quickly (i.e., less than 1 second per question, on average) or entered an incorrect sequence of letters and numbers to prove their human identity.
Between Samples 1 and 2, nine HPCS items were modified slightly in consultation with the research team at Hogan Assessment Systems; the changes were made to clarify the item by removing difficult terms or to bring the item closer to the understood focus of the HIC it was designed to assess.
The largest change consisted of changing the item “is Satisfied (satisfied/content with one’s circumstances)” to “is Dissatisfied easily (has many complaints),” as this reversed high scores from indicating satisfaction to dissatisfaction. More moderate differences included the item “is Tough (interpersonally cold; focused on tasks rather than people)” shifting to “is Unsympathetic (interpersonally cold; shows little sympathy for others’ problems),” and the item “is Highly Confident (believes is capable of accomplishing anything they set their mind to)” shifting to “is Overconfident (believes is capable of accomplishing more than realistically possible).” We have indicated the 9 items that differed from Sample 1 to Sample 2 by the “†” symbol in Table 3, and these items are given in Appendix.
The HPCS items were rated on a 7-point scale with scale anchors of “Strongly Disagree” and “Strongly Agree,” and a midpoint of “Neither Agree nor Disagree.” Scores were converted to a 0- to-100 percentage of maximum possible (POMP; Cohen et al., 1999) metric to make item means and standard deviations more directly comparable to Sample 1. The full set of interitem correlations of the HPCS items observed in the two waves of Sample 1 and in Sample 2 are provided in Supplemental Tables S3 to S5.
Job/Organization Experience Scales
Respondents from the MTurk sample also completed a range of measures about their employment experiences, including organizational commitment (Allen & Meyer, 1990), job satisfaction (Eid et al., 2008), task performance (Goodman & Svyantek, 1999), organizational citizenship behaviors (Lee & Allen, 2002), counterproductive work behaviors (Fox & Spector, 1999), burnout (Demerouti et al., 2010), and work engagement (Schaufeli et al., 2006). These measures were administered 62 days after the full HPI and HDS assessment and 140 days before the HPCS.
Coding HPCS Item Properties
Following the collection of data from these two samples, the HPCS items were rated for their desirability and observability. A total of 13 research assistants rated both the desirability and the observability of the HPCS items.
Item Desirabilities
Raters were instructed to “Please indicate the extent to which you understand it as
Item Observabilities
Raters were instructed: “Some traits refer to attributes that can be easily observed by an outside observer. Other traits refer to attributes that are much more difficult to perceive accurately about someone. For each term, consider how easy or difficult you think it is
Results
Factor Structure of the HPCS
We conducted a factor analysis of the items from the data collected in Sample 1. As there were two waves in which the HPCS was administered, these ratings were averaged, both to create more stable estimates of the participant’s self-rating and to maximize sample size as some participants completed only the first or second wave, resulting in 403 participants included in the analysis. Note that averaging scores in this manner will tend to increase eigenvalues somewhat by increasing score reliability. We conducted a factor analysis using principal axis factoring (PAF) and varimax rotation using the fa function of the psych package in R (Revelle, 2017). The first 10 eigenvalues were 10.1, 7.1, 6.1, 4.5, 3.2, 2.4, 1.7, 1.7, 1.5, and 1.4. We focused on the five-factor solution both to compare the nature of the resulting factors to the Big Five dimensions and as the six-factor solution did not yield an easily interpretable additional dimension (the largest loading of the sixth factor in this solution was .445, and there were only four items with loadings above .400 in magnitude).
The factor loadings from the five-factor solution are given in Table 2. The factors can be loosely mapped to the Big Five; however, they appeared to differ moderately from the usual Big Five content emphases (John, 2021); we have provided provisional labels for these dimensions in Table 2. The first dimension, labeled as Volatile, maps loosely to the Big Five’s Neuroticism dimension but with a greater emphasis on angry affect. The second dimension, labeled as Sociable, maps well to the Big Five’s Extraversion dimension, blending descriptions of liking and feeling confident or comfortable in social situations. The third dimension, labeled as Diligent, maps loosely to the Big Five’s Conscientiousness dimension, but with greater emphasis on achievement orientation and ambitious and perfectionistic tendencies. The fourth dimension, labeled as Sensitive, maps loosely to the Big Five’s Agreeableness dimension, particularly with the emphasis on empathic and caring tendencies, but also placing considerable emphasis on tendencies to feel guilt-prone, anxious, and fearful. The fifth dimension, labeled as Creative, maps loosely to the Big Five’s Intellect or Openness dimension, with items reflecting self-descriptions of being uniquely creative and gifted, but also of being eccentric and risk-taking.
Five-Factor Structure of HPCS with Varimax Rotation, from Sample 1 Self-Reports.
Note. Items are sorted by their highest absolute factor loading. Positive associations are shown in blue; negative associations are shown in red, with larger associations are shown with darker coloration. See the online version of the article for a color version of this table.
More generally, the factors extracted from the HPCS appeared to have somewhat more maladaptive content emphases than are usually found in Big Five models or in five-factor extractions of personality inventories. This is likely due to the HPCS’s greater emphasis on assessing more dysfunctional traits to represent the content assessed by the HDS. For instance, Conscientiousness veers toward perfectionism, Agreeableness toward oversensitivity, and Neuroticism toward emotional volatility. However, some of these content emphases—such as anger and perfectionism—are more emphasized within the six-factor HEXACO model of personality (Ashton et al., 2014).
Properties of HPCS Items
Reliability and Validity Evidence of HPCS Items
Table 3 provides information about the properties of the single-item scales within the HPCS, including means and standard deviations from each sample, 3-week retest correlations, self-other agreement, and observer (or other-other) agreement.
Descriptive Properties of HPCS Items.
Note. S1 = Sample 1; S2 = Sample 2. † indicates that the HPCS item administered to S1 and S2 differed; see Appendix for the full items. Differences in the label portion of the item across S1 and S2 samples are shown in brackets. Subscript “a” indicates this correlation was reversed before examining correlations with other item properties due to the assessed HPCS item reversing the scoring of the content from the HIC label. Supplemental Tables S1 and S2 contains additional properties of the HPCS items and corresponding HICs.
The 3-Week Retest Stability
The 3-week retest stability of the HPCS items ranged from a low of .36 for “Feels Uniquely Sensitive (believes one has unique abilities to understand issues/people)” to a high of .77 for “has Math Ability (works well with numbers)”;
Self-Other Agreement Correlations
Self-other agreement correlations for the HPCS items, were estimated by correlating the person’s self-rating at the first survey session with a single observer rating. Self-other agreement for the HPCS items ranged from a low of r = .07 for “is Directionless (lacks well-defined beliefs or interests)” to a high of r = .49 for “Likes Parties (enjoys parties, social gatherings)”;
Observer Agreement Correlations
Observer agreement concerns the degree to which ratings from two different raters of the same target agree tend to be correlated, and are generally expected to range from a low of 0 (no agreement) to 1 (perfect agreement). This was estimated as the intraclass correlation of ratings using the statsBy function within the psych package in R (Revelle, 2017). The observer agreement correlations ranged from a low of r = .04 for “is Trusting (unsuspicious of others’ intentions)” to a high of r = .46 for “is Perfectionistic (exacting and obsessive about work quality)”;
Relationship Between Matching HPCS and HPI/HDS Scales
Within Sample 2, we were able to estimate the correlations between the single items of the HPCS and scale scores for the corresponding HICs from the HPI or HDS inventory. All correlations differed significantly from zero (p < .05). Furthermore, the only negative correlations between HICs and HPCS items were for the six items in which negations within HIC labels were removed when creating the corresponding HPCS item (e.g., the HIC “No Social Anxiety” and the corresponding HPCS item “is Socially Anxious (reserved/anxious in social situations).”) Consequently, we reversed the correlations for these six items before examining properties about the convergence of HPCS and HIC scales.
The correlations between corresponding HPCS and HIC dimensions ranged from a low of r = .16 for the items “is Indecisive (overly reliant on advice; reluctant to act independently)” and “is Passive-Aggressive (can act outwardly pleasant while feeling inwardly resentful)” to a high of r = .78 for “Likes People (enjoys social interaction),” with a mean correlation of
Correlations Between Item Properties
We next examined how various estimated properties of the HPCS items were correlated with one another at the between-item level of analysis. Note that between-item correlations are referenced here with q to more clearly distinguish “between-item” and “between-person” (or r-) correlations (Cattell, 1952; Wood & Furr, 2016). We were particularly interested in exploring whether retest correlations served as useful indicators of the HPCS item’s reliability. A common way of exploring this question is to examine how retest correlations (or other candidate reliability indicators) relate to other item properties (de Vries et al., 2016; Henry et al., 2022; McCrae et al., 2011; Wood et al.,2023, 2018). These correlations are reported in Table 4; all
Relations Between HPCS Item Properties and Corresponding HIC Properties.
Note.
As shown in Table 4, an item’s 3-week retest correlation was a strong predictor of a range of other item properties. Items with larger retest correlations tended to have higher standard deviations (q = .22), observer agreement (q = .50) and self-other agreement (q = .57). The number of characters of the HPCS item was negatively associated with the item’s estimated retest correlation (q = −.28), self-other agreement (q = −.17), and other-other agreement (q = −.19).
The item’s retest correlation was also a robust predictor of the item’s correlation with the corresponding HIC from the HPI or HDS (q = .41). The estimated correlation between the HPCS item and corresponding HIC was additionally highly related to the HIC’s estimated coefficient alpha (q = .50), and was negatively related to the HPCS item’s number of characters (q = −.30) and rated observability (q = .27). This indicates that the correlation between the HPCS item and corresponding HIC could be strongly predicted by the estimated reliabilities of both measures, consistent with classical test theory (Lord & Novick, 1968; McDonald, 1999), and by factors expected to influence their reliabilities. This was also observed although HPCS retest correlations and HIC internal consistencies were estimated in different samples, which provides some evidence that the estimates of both properties have nontrivial levels of generality across samples.
Finally, HPCS items with higher retest correlations tended to be associated with HICs which had higher interitem correlations (q = .31). Interestingly, this was despite the fact that the HPCS was developed without sampling actual items from the HICs and despite the fact that these properties were estimated in separate samples. Items showing higher levels of both properties included items “has Math Ability,”“is Academic-Oriented,” and “Likes People,” whereas items with lower levels of both properties included the items “is Satisfied,”“is Passive-Aggressive,” and “Avoids Trouble.”
Consistent with prior research (Goldberg, 1982; Wood & Furr, 2016), the mean self-report for the item was very highly related to the rated desirability of the item (q = .86). Interestingly, the mean of observer-reports for the item were even more highly related to the item desirability (q = .94); this difference was significant by Steiger’s (1980) test of dependent correlations (t = 7.78, p < .01). As most raters that supplied observer reports were friends or family members of the participant, this is consistent with the meta-analytic finding that observer-reports of personality from close acquaintances tend to more closely track item desirabilities than self-reports (Kim et al., 2019).
The rated observability of the HPCS items was also highly associated with various item properties. Items rated as more observable tended to have higher 3-week retest stability (q = .28), self-other agreement (q = .35), and observer agreement (q = .40).
Dimension-Level Scoring of HPCS
It is also possible to create scale scores of the larger dimensions assessed by the HPI and HDS by averaging scores for all HPCS items representing HICs within the dimension. This produces three-item scales for each of the HDS dimensions, and scales ranging from 4 to 8 items for the broader HPI dimensions. In Table 5, both the coefficient alphas and average interitem correlations for the resulting scales are given, and additionally the correlation between the HPCS dimension-level scales and the corresponding dimension-levels scales of the HPI or HDS.
Properties of Dimension-Level Scores Created from Aggregating HPCS Scale.
Note. “Average interitem correlation” is computed only after appropriate reverse-scoring of items if the broader scale would have both negatively and positively keyed HPCS items.
Clark and Watson (2019) suggested that average interitem correlations in the .40 to .50 range may be preferable for narrow constructs, which could concern the HDS dimensions; whereas interitem correlations in the .15 to .20 range may be preferable for broader constructs, which could concern the HPI dimensions. As shown in Table 5, the average interitem correlations for the different HPI and HDS scales created from HPCS items ranged from lows of .11 and .19 for the HDS Colorful scale and HPI Prudence scale, respectively, to highs of .70 and .56 for the HDS Skeptical scale and HDS Cautious scale, respectively, with a mean of .40. This indicates that the scales differed dramatically in the heterogeneity of their items, with the three HPCS items corresponding to the HDS Skeptical scale providing particularly redundant information about participants.
The correlations between the dimension-level HPCS scales and dimension-level HPI and HDS scales ranged from .42 to .85, with a mean of .64, and were again somewhat higher for HPI dimensions (
Most Redundant HPCS Items
As shown by Wood and colleagues (2023), it is possible to estimate the informational similarity of all item pairs in an inventory when the inventory is rated by the same set of participants twice, via this equation:
This equation estimates how much lower “lagged” correlations between items X and Y (where
The full set of estimated retest-adjusted correlations for all pairs of HPCS items is available in Supplemental Table S7. There were only three item-pairs that were found to exceed the value of
Similarity of Associations Between HICs and HPCS Items With Other Variables
Finally, we examined whether the HPCS items and corresponding HICs from the HPI/HDS showed similar associations with other variables. We first reverse-scored the six HICs in which the HPCS item was scored in the opposite direction. 2 The correlations between the 74 HICs or HPCS items with each organizational variable of interest are given in Table 6.
Correlations of Demographic and Organizational Variables with HPI/HDS HICs and Corresponding HPCI Items.
Note. All Ns between 253 and 256. All|rs|≥ .13 are statistically significant. All|rs| ≥ .30 have been shown in bold. Subscript “(R-HIC)” indicates that HIC correlations have been reverse-scored in this row to correspond to scaling of the HPCS item.
The pattern similarity between these vectors is additionally given in Table 6, using center-point correlations,
As shown in Table 6, the 74 HICs and their corresponding HPCS items showed very similar patterns of associations with the organizational experience variables, with the pattern-vector correlations ranging from a low of .83 for organizational commitment and task performance to a high of .88 for organizational burnout.
We have provided an example of how the HICs and corresponding HPCS items are associated with self-rated task performance in Figure 1A; this variable was selected as it was one with the lowest level of pattern similarity as indexed in Table 6, and as task performance has traditionally been among the most important variables of interest to organizational researchers (Hall et al., 1917; Hogan et al., 1996). Figure 1A shows that task performance related somewhat differently to certain corresponding HICs and HPCS items; for instance, the HPCS item “is Self-Controlled (has considerable self-discipline, control over impulses)” showed a .30 correlation with the Task Performance measure, whereas the corresponding HIC from the HPI, Impulse Control, showed a −.06 correlation. Overall, 6 of the 74 correlations differed by more than a .30 magnitude across the HPI/HDS HICs and the corresponding HPCS item. However, the strong positive linear pattern of how the 74 HICs and HPCS items related to task performance as shown in Figure 1A indicates that the two inventories largely functioned similarly.

Scatterplots Detailing Correlations Between Task Performance (1A), Age (1B), and Gender (1c), With the Full HPI or HDS HIC Scale (X axis) and Corresponding Correlations Between the DVs and the Corresponding HPCS Item (Y axis).
The estimated pattern similarities were slightly smaller for how the HPCS items and HPI/HDS HICs related to age (
General Discussion
In the present study, we provided evidence that the HPCS functions as a useful inventory for assessing the content found within the HPI and HDS inventories via single-item scales. The HPCS items were found to show decent retest reliability over a period of 3 weeks, and additionally tended to show high convergence with corresponding HICs from the HPI and HPS instruments despite being assessed over half a year apart. The HPCS additionally showed high similarity to the HPI and HPS in terms of how corresponding scales related to other organizational and demographic variables.
Relative Benefits of HPCS Compared to Full HPI/HDS Instruments
There are some situations in which the HPCS may be preferable to the HPI and HDS. The most obvious is when respondent time is at a premium; the HPCS consists of about one-fifth the number of items as the full HPI and HDS instruments. However, there are some other situations in which the HPCS may be preferred as well.
First, the HPCS is an open instrument which can be used freely. The information needed to reproduce the full HPCS, including all items as they are seen by respondents, is given within this article and the Supplemental Materials. We encourage that researchers use the HPCS items as administered in Sample 2 for research with this instrument (see Appendix for items that differ across Samples 1 and 2 or see column “Item[Sample2]” in Supplemental Table S1).
Relatedly, the HPCS was designed to facilitate the reporting of item-level results as fully as possible. As we detailed, each HPCS item was written to have a “label (subclause)” format; the full HPCS items are given in Table 1, and the “label” portions of the HPCS items as they were seen by respondents are given in Tables 2, 3 and 6 and Figure 1. This aspect of the inventory is valuable as it allows researchers to have greater clarity about the nature of the scale’s associations with other variables, which can otherwise be masked by discrepancies between the scale label and items—a problem that has been noted as complicating interpretations of personality–outcome associations for decades (Block, 1995; Mõttus, 2016; Nicholls et al., 1982; Wood & Harms, 2016).
The HPCS will also be particularly valuable as a tool for collecting observer reports. In research settings, obtaining observer reports is regarded as a key way to better triangulate the individual’s actual levels of traits of interest, given how issues related to scale-use, self-presentation, and self-insight can complicate interpretations of self-report measures (Connelly et al., 2022; McCrae & Mõttus, 2019; Roberts & Wood, 2006). The HPCS serves as a tool that will allow observer assessments to be completed much more economically and to better ensure that the somewhat distinct content emphases of the HPI and HDS instruments are better represented in such assessments.
There are also situations in which the HPCS should not be used relative to the full HPI and HDS instruments. Perhaps most importantly, the HPCS is not appropriate to use for selection purposes or other “high-stakes” assessments (i.e., assessments in which respondent outcomes are determined by their responses). Although the HPCS was shown here to be highly predictive of corresponding HPI and HDS scales which are used for high-stakes testing (Table 5), the open-source nature of the HPCS makes the instrument less appropriate for such purposes due to the ease of sharing strategies for faking. Furthermore, although the 3-week retest correlations indicate single items of the HPCS to be more reliable than many researchers might suspect (Tables 3 and 4), the retest correlations for corresponding scales from the original HPI and HDS instruments will typically be higher. When score reliability is at a premium, this will serve as another reason to use the original HPI and HDS instruments. It is important to reiterate that the HPI and HDS assessments are available for academic research purposes with the caveat that item-level data gathering and scoring be completed by Hogan Assessment Systems. The HPCS instruments are most appropriate for researchers with testing time constraints or who are interested in item-level analyses.
Future Directions
Further Validation Work
Although we provided evidence of the reliability and validity of HPCS items, further pieces of evidence would be valuable. First, it would be valuable to estimate retest correlations in additional samples, and over different retest intervals. It may be particularly important when using single-item scales to show that estimates of retest correlations generalize across samples—both in terms of their rank-ordering across different items and in their absolute magnitude—as most investigators will be unlikely to readminister the inventory twice. Although investigations have found item-level retest correlations to generalize across samples (Henry et al., 2022; Lowman et al., 2018), more stable estimates of these properties would be valuable to obtain for the HPCS.
In addition, the correlations between HPCS items and corresponding HICs should be considerably attenuated by the fact that they were administered over 6 months apart, indicating that the average correlation of .49 across the 74 items may be a “lower-bound” of how highly scores should converge. Administering the HPCS and the full HPI and HDS instruments within a single larger survey session would provide a better picture of how similarly the inventories function.
It would also be valuable to examine how the HPCS and full HPI and HDS compare in predicting non-self-reported outcomes, such as how much the person is liked by others, manager ratings of job performance, objective records of workplace accidents, or credit scores. We expect that this may alter the picture of how HPCS items and corresponding HPI/HDS scales relate to outcomes to some extent. For instance, the HPCS item “is Perfectionistic (exacting and obsessive about work quality)” correlated relatively negligibly with self-reported job performance (r = .08; Table 6). A more positive relationship with objective performance could be masked when using self-reported performance if perfectionistic people tend to appraise the quality of their work in an overly harsh manner.
Finally, given that the HPCS is created to be considerably more amenable for completing brief observer reports, it would be valuable to explore how HPCS observer reports might supplement other methods of predicting expected performance—including standard HPI/HDS self-reports. Such assessments could be expected to improve upon the limited value of letters of recommendation (Kuncel et al., 2014) as a means of obtaining information from references, by having these references provide ratings on a common set of scales.
More Direct Comparison to Other Common Short Measures
We have argued that an advantage of the HPCS relative to commonly used short personality measures of comparable length, such as the 60-item BFI-2 (Soto & John, 2017a), HEXACO-60 (Ashton & Lee, 2009), and NEO-FFI (Costa & McCrae, 1992) instruments, is that the surveyed content was selected to include a greater range of dysfunctional content. Specifically, almost half of the HPCS items (45%) were selected to survey content from the HDS, which was designed to assess traits related to the DSM-IV personality disorders, such as tendencies to be cynical, perfectionistic, impulsive, manipulative, ingratiating, and directionless.
In addition, responses to inventories consisting entirely of single-item scales may be expected to communicate more total information about the respondent relative to inventories of similar length consisting of multi-item scales (Condon & Mõttus, 2021; Condon et al., 2020). This is because standard scale development practices—such as selecting items to cross benchmarks for internal consistency within multi-item scales (e.g., coefficient
These two aspects of the HPCS could result in better prediction of other outcomes than other inventories of similar length. However, it would be valuable to more directly test this through a “good old-fashioned horse race” study, where the HPCS is pitted against other inventories of comparable length to see which measure can best predict outcomes of interest, using modern cross-validational techniques to prevent artifactual overprediction (Condon et al., 2020; Rocca & Yarkoni, 2021; Saucier et al., 2020).
More General Procedures for Developing and Using Single-Item Inventories
The present research contributes to an emerging picture of “best practices” for both the construction and the usage of inventories consisting of single-item scales (see also Condon et al., 2020; Matthews et al., 2022; Shedler & Westen, 2007; Wood et al., 2010). We discuss additional themes below.
Make the Item the Label
With Tables 3 and 6, Figure 1, and in the text, we illustrated it is possible to present part or all of the exact item responded to by respondents, which helps to reduce issues whereby the scale’s label fails to accurately indicate aspects of the scale driving how it is endorsed and correlates with other variables. We additionally illustrated how secondary codings of the items, such as their rated desirability or observability, can help to identify reasons for effect heterogeneity across indicators of a broader scale or dimension (e.g., Mõttus et al.,2017, 2019; Revelle et al., 2020), such as why certain aspects of the broader HPI Adjustment dimension (e.g., whether the person is Anxious, Calm, Well-Attached) may have higher self-other agreement than others (e.g., whether the person is Trusting or Guilt-Prone; Table 3).
Use Retest Correlations as Reliability Coefficients
The present research provides further evidence that test–retest correlations should be regarded as privileged indicators of the test’s reliability (Guttman, 1945; Henry et al., 2022; McCrae et al., 2011). In the present study, retest correlations were estimated over a 3-week period, which is within the range of about a week to 2 months that some have recommended for obtaining dependability estimates, in which the level of trait change is expected to be negligible (Cattell et al., 1970; Watson, 2004). The present results demonstrated once again that retest correlations over dependability intervals correlate highly with a range of validity-related properties of the scale, such as its level of self-other agreement, observer (or “other-other”) agreement, and the scale’s correlation with the corresponding HIC from the original HPI and HDS instruments. As a scale’s reliability places an upper-limit on its ability to relate to other variables, the ability of retest correlations to track these properties helps to establish these as appropriate and valid reliability estimates.
The results also reinforce the understanding that it is often desirable to attempt to maximize retest correlations when creating scales (Condon et al., 2020), and indicate some ways this can be done. For instance, the present study reinforces prior evidence that respondents rate longer items less reliably (de Vries et al., 2016), and additionally indicates that respondents rate less observable traits (“is Mistrusting,”“Feels Uniquely Sensitive”) less reliably as well. This indicates that test developers should place considerable effort into simplifying items such as by shortening items and eliminating jargon to the extent possible (Graziano et al., 1998; Hogan & Hogan, 2001; McCrae et al., 2005).
As a final note, the 3-week retest correlations of the HPCS items (Table 3) should be regarded as indicating the upper limit of each item’s ability to correlate with other scales administered 3 weeks apart but should be expected to somewhat underestimate the ability of HPCS scores to correlate with items administered in the same survey session, due to mood and other relatively transient factors affecting how people respond to surveys (Chmielewski & Watson, 2009; Wood et al., 2023). This is nontrivial as much research involves estimating relationships between scales rated within a single testing session. It can be valuable to collect same-session retest correlations to address this issue—for instance, by administering the same inventory twice about 15 minutes apart within a larger survey (Dejonckheere et al., 2022; Lowman et al., 2018).
Appraising the “Hit the Nail on the Head” Item Selection Strategy
In selecting HPCS items, we aimed to measure the scale construct as directly as possible. However, some of the traits that the HPI and HDS instruments were designed to assess—such as those related to overconfidence, self-enhancement, and humility—can be regarded as ones where there is arguably a definitional lack of awareness of one’s trait level. For such constructs, respondents may need to cut through certain paradoxes or metacognitive loops to provide valid self-reports. For instance, if someone strongly agrees that they are someone who “is Overconfident (believes is capable of accomplishing more than realistically possible),” does this mean that they recognize that they can’t do as much as they think? And if they do, why are they strongly agreeing to the item? As we noted, the original HPI and HDS instruments were designed to assess these and other traits somewhat more indirectly, by assessing more specific beliefs people with these traits may tend to have, or actions they may tend to do. Other researchers have suggested that such traits may be usefully thought of and measured as discrepancies between self-reports and other-reports or via related “componential” techniques (Davis et al., 2010; Kwan et al., 2004). It would be worth further exploring the limits of the direct self-report method used here toward forming valid estimates of traits expected to affect career success and other socially consequential outcomes.
Inventory-Level Considerations
Finally, developers of single-item inventories regularly aspire to create more comprehensive assessments of some domain of content (Furr et al., 2010; Sherman et al., 2010; Wood et al., 2010). When this is the aim, scale developers may attempt to identify, remove, and replace items identified as largely redundant to others within the set (Block, 1961). Here, we utilized a recently developed method of identifying redundant items through the use of retest-adjusted correlations (Equation 1) (Wood et al., 2023). Further development of single-item inventories could use these estimates as a tool for iteratively increasing the breadth and comprehensiveness of the inventory, by pointing to items that could be replaced or adjusted to provide more distinct nuances of information. For instance, the items “is Moralistic (follows moral rules/conventions)” and “is Virtuous (makes sure to act in a morally upstanding manner)” were identified as providing nearly identical information about participants; in future versions of the HPCS, one of these items could be tweaked to get at a more distinct meaning or one could be removed to make room for a more distinct construct. If done effectively, this should decrease the correlations between items to some degree, which in turn should increase the ability for the set as a whole to predict other outcomes effectively (Altgassen et al., 2023; Condon et al., 2020).
Further work in developing single-item inventories may also aspire to better balance positive and negative content within the broader domains measured by the inventory. For instance, most of the broader factors shown in Table 2 were represented predominantly with items loading positively on the factor. This tendency could be reduced by using antonymous items for some of the traits of interest. For instance, the HPCS item labeled “is Thrill-Seeking” could be replaced by one labeled “Avoids Stimulating Situations” and the item labeled “is Eccentric” could be replaced by one labeled “is Ordinary” to create negative markers for the Openness/Intellect-like factor presented in Table 2. Such modifications should help to better disentangle factor-level scores from the rater’s level of acquiescent responding—which concerns the degree to which a rater is inclined (or “leans”) toward agreeing or disagreeing with any item largely independent of its content—by increasing the number of item-pairs representing largely antonymous content (Rammstedt & Farmer, 2013; Soto et al., 2008).
Conclusion
The HPI and HDS personality instruments are among the most well-validated inventories for predicting employee performance and differ in nontrivial ways from other personality inventories in the traits they have been designed to assess. However, they are infrequently used in basic research, due in part to their length and proprietary nature. In the present article, we have illustrated how the Hogan Personality Content Single-Items (HPCS) Inventory can be employed to measure the 74 subscales (HICs) of the HPI and HDS inventories over a total of 74 total items. We detailed that the HPCS items show respectable test–retest stability, self-other agreement, and observer agreement, and function in a highly similar manner to their corresponding subscales within the full HPI and HDS instruments. We have also detailed how single-item inventories such as the HPCS offer certain benefits for research purposes, such as the ability to avoid creating distinct scale labels and instead presenting the items as seen by respondents when reporting results. We expect that the HPCS will serve as a valuable tool in both research and applied settings.
Supplemental Material
sj-xlsx-1-asm-10.1177_10731911231207796 – Supplemental material for Development of the Hogan Personality Content Single-Items Inventory
Supplemental material, sj-xlsx-1-asm-10.1177_10731911231207796 for Development of the Hogan Personality Content Single-Items Inventory by Dustin Wood, P. D. Harms, Ryne A. Sherman, Michael Boudreaux, Graham H. Lowman and Robert Hogan in Assessment
Footnotes
Appendix
HPCS Items Differing from Sample 1 to Sample 2.
| Item | Sample 1 item | Sample 2 item |
|---|---|---|
| 6 | is Satisfied (satisfied/content with one’s circumstances) | is Dissatisfied easily (has many complaints) |
| 13 | has a Sense of Identity (has a sense of life direction; knows what one wants to be) | has a Clear Sense of Identity (has a sense of life direction; knows what one wants to be) |
| 38 | is Academic-Oriented (enjoys academics, pursuing education) | is School-Oriented (enjoys academics, pursuing education) |
| 45 | is Cynical (doubts others’ intentions; assumes ulterior motives) | is Cynical (doubts others’ intentions; assumes secret reasons for others’ behavior) |
| 53 | is Tough (interpersonally cold; focused on tasks rather than people) | is Unsympathetic (interpersonally cold; shows little sympathy for others’ problems) |
| 54 | is Passive-Aggressive (can act outwardly pleasant while feeling inwardly resentful) | is Passive-Aggressive (act outwardly pleasant while feeling inwardly resentful) |
| 58 | is Highly Confident (believes is capable of accomplishing anything they set their mind to) | is Overconfident (believes is capable of accomplishing more than realistically possible) |
| 62 | is Manipulative (has ‘Machiavellian’ tendencies; not remorseful about manipulating others) | is Manipulative (enjoys deceiving others in an effort to control them) |
| 63 | is Confident in Public (comfortable being center of attention, prominent in social settings) | is Confident in Public (enjoys being the center of attention, prominent in social settings) |
Note. Parts of the items that differ from Samples 1 and 2 are underlined.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Science Foundation under award no. 2121275 to P. D. Harms and Dustin Wood. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Science Foundation.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
