Abstract
The present study examined whether informant-reported personality was more or less internally consistent than self-reported personality in an epidemiological community sample (n = 1,449). Results indicated that across the 5 NEO (Neuroticism–Extraversion–Openness) personality factors and the 10 personality disorder trait dimensions, informant reports tended to be more internally consistent than self reports, as indicated by equal or higher Cronbach’s alpha scores and higher average interitem correlations. In addition, the informant reports collectively outperformed the self reports for predicting responses on a global measure of health, indicating that the informant reports are not only more reliable than self reports, but they can also be useful in predicting an external criterion. Collectively these findings indicate that informant reports tend to have greater internal consistency than self reports.
There are potential limitations to relying on self reports of personality. For example, self reports of personality may be limited because people are often motivated to present themselves positively (Achenbach, Krukowski, Dumenci, & Ivanova, 2005) or because people lack insight about themselves, and therefore do not report certain aspects of their personalities (Clark, Livesley, & Morey, 2007). These limitations call into question the validity of self-reported personality information—are self reports accurately measuring personality? Indeed, the literature is ripe with studies addressing this issue (Funder, 1995; Furr, Doughery, Marsh, & Mathias, 2007; Krueger,2005; Vazire, 2010). Yet there is relatively little research on the reliability of self-reported personality.
The reliability of self versus informant reports is important to investigate because reliability is generally considered a prerequisite for validity, and reliability can be critical for finding associations with external variables. Validity indicates whether the instrument or scale measures the correct construct, whereas reliability indicates whether the instrument or scale measures the construct consistently (Loevinger, 1957). Thus, for a measure to be valid—to accurately measure the construct of interest (personality)—it generally would need to be reliable—to measure the construct consistently. Of course, there are a number of different types of reliability, including test–retest reliability, interrater reliability, and internal consistency. Of these three, internal consistency is particularly important for personality scale construction because items on a scale are often used as composites and these composites are then used as predictors of other variables (say, examining whether neuroticism predicts mortality). If an investigator is interested to study how neuroticism relates to mortality, then the investigator may use a neuroticism scale, add up the scores across items, and use the aggregate (or the composite) score as the single indicator of neuroticism. This composite score is then used to determine whether there is an association with the external variable (mortality).
When composite scores are used, which is commonplace in personality research, it is typically important to understand the internal consistency of the items that make up that scale (Streiner, 2003). For one, the internal consistency indicates whether, and to what degree, a scale contains measurement error. If a measure is not internally consistent because it contains significant measurement error, then the composite will have a limited ability to correlate with an external variable. Regarding self and informant reports of personality, if the particular personality scale has greater internal consistency when used by informants than when used by selves, then the informant reports could be more useful predictors of an external variable than would the self reports. For these reasons, it is important to determine whether self or informant reports are more internally consistent.
There are two common indications of internal consistency. One is alpha reliability (see Cronbach, 1951), which is derived directly from the average split half correlation among the items. Alpha is an indication of how well the different items on a measure capture the core part of a construct. If alpha is relatively high, such as .75 or so, then 1 minus alpha is a metric that generally represents the error of measurement for that scale (Tavakol & Dennick, 2011). One problem with the alpha metric is that it is influenced by the number of items included in a scale. This latter point is important because alpha increases as the number of items in a scale increases. For this reason, it is helpful to examine a second indication of internal consistency (Clark & Watson, 1995). A commonly used second indication of internal consistency is average interitem correlation. This metric also is an indication of how well the different items on a measure capture the core construct the measure is designed to assess. Unlike alpha, however, the interitem correlation is less dependent on the number of items included in a scale. In tandem, the two metrics can provide solid information about a measure’s internal consistency.
One could make differing predictions as to whether self or informant reports of personality would be more internally consistent. A possibility is that informant reports of major personality traits would tend to be more internally consistent than self reports of these traits. This perspective rests on the notion that informants may have a more consistent and generic view of the targets across situations, whereas the self-ratings could be less internally consistent because they are based on more complex and particular variations in personality expression and perception across multiple roles and contexts (cf. Srivastava, Guglielmo, & Beer, 2010; Vazire & Mehl, 2008). For example, informants might essentially label a target as “extraverted” and then answer the questions from an extraversion scale in a very consistent manner. So, on the item: “Likes to go to cocktail parties,” the informant would give the person a high rating, even though the target does not drink, likes to socialize, but doesn’t like cocktail parties in particular. The target person might be less likely to see herself in such a consistent manner and might be more likely to consider the particulars of the context. The target might decide that her feelings and behavior depend on the social event and rate this item lower than other items. Thus, informant reports would be more internally consistent than self reports and perhaps more systematically capture the overarching latent trait (though might sacrifice a certain degree of nuance or accuracy), while self report would provide nuance that might be less helpful for predicting outcomes associated with the overarching trait. We see two relevant questions here: (a) Which perspective contains more internal consistency? and (b) Is this internal consistency useful for predicting important outcomes?
It is entirely possible for a measure to be internally consistent but not useful. For example, informants may see the target in an unrealistically positive light, a so-called “halo” effect (Thorndike, 1920). The opposite can occur as well where informants see targets in a negative light, a so-called “devil” effect (Nisbett & Wilson, 1977). Both of these scenarios would result in a higher internal consistency, but they would not necessarily lead to increased utility. As Loevinger noted in her 1957 review, internal consistency alone cannot tell an investigator if a measure is useful for a particular purpose. So to determine which type of report, self or informant, has better utility, we examined whether self- and informant-reported personality could better predict an important outcome.
We chose to examine global health as our external criterion in the current study because it is an important and broad construct that is affected by personality, and in our study we have an item that captures this construct from both the perspective of the self and the informant. Ratings of global health on the RAND-36 Health Status Inventory have been shown to be significantly associated with individuals’ actual health outcomes. The cumulative effects model (see Canada, Stephan, Jaconelli, & Duberstein, 2014) suggests that personality should affect important outcomes such as physical functioning and health and that this relationship should especially begin to emerge as individuals enter later life. This model and external criterion work well for our aging sample. Thus, we hypothesized that informant-reported personality should reveal greater internal consistency than self-reported personality and informant-reported personality should perform at least as well as (or outperform) self-reported personality when predicting not just informant-reported health but also self-reported health.
Method
Participants
Participants were drawn from the St. Louis Personality and Aging Network study, a longitudinal epidemiological study of adults nearing later life (Oltmanns, Rodrigues, Weinstein, & Gleason, 2014). In total, 1,449 participants had the required data for the current study. Just over half (55%) of the sample was female (n = 799). Participants ranged in age from 55 to 65 years (M = 59.56, Mdn = 60.00, SD = 2.73). In terms of ethnicity, 67% (n = 975) self-identified as being White and 30% (n = 438) as being Black. For the highest level of education achieved, 29% had a high school diploma or equivalent (n = 415), 5% had a vocational degree (n = 77), 11% had some post–high school, pre–bachelor’s education (n = 159), 26% had a bachelor’s degree (n = 373), and 27% had some level of post–secondary education (n = 391).
Informants
Each participant nominated an informant. Participants were instructed to pick a person whom they knew well, were in regular contact with, and who could comment on the participant’s personality traits. The participant-nominated informants were roughly two-thirds female (68%, n = 991). Ages ranged from 16 to 92 years, and averaged about 55-years-old (M = 54.95, Mdn = 57.00, SD = 11.47). Informants were 67% White (n = 970) and 31% Black (n = 444). Fourteen percent had a high school diploma or equivalent (n = 208), 29% had completed some college or had an associate’s degree (n = 427), 23% had a bachelor’s degree (n = 327), and 27% had an advanced degree (n = 400).
Forty-nine percent of the participant-nominated informants were spouses or romantic partners of the participants (n = 707). The remaining half was constituted by other family members (27%, n = 395); friends (22%, n = 314); and other individuals such as coworkers, ex-spouses, and neighbors (2%, n = 32). Participants reported that they had known the informants for an average of about 30 years (M = 31.96, Mdn = 32.00, SD = 15.15). Slightly more than half (51%) of the participant–informant dyads were currently living with the participant (n = 734). Of the informants not currently living with the participants, 46% had lived with the participant during some point in their lives (n = 325). Fifty-four percent of the dyads saw each other at least once per day (n = 783), and nearly two-thirds (66%) talked to each other every day. Finally, 52% of the dyads knew each other “better than anyone else” (n = 752); the remaining half knew each other “very well” (42%, n = 605) or “fairly well” (6%, n = 82).
Materials
Revised NEO Personality Inventory (NEO-PI-R)
The NEO-PI-R (Costa & McCrae, 1992) is a widely used inventory of general personality from the perspective of the five-factor model (neuroticism, extraversion, openness, agreeableness, and conscientiousness). The inventory consists of 240 items to which participants can respond on a 5-point scale ranging from 0 (strongly disagree) to 4 (strongly agree). The NEO-PI-R has a self-report and informant-report version, written in first and third person, respectively. All 48 items for each scale were entered to establish alpha and average interitem coefficients for each of the main factor scales. Eight items were entered to establish alpha and average interitem coefficients for each of the facet scales.
Multisource Assessment of Personality Pathology (MAPP)
The MAPP (Oltmanns & Turkheimer, 2006) is a self-report measure of the Diagnostic and Statistical Manual of Mental Disorders, fourth edition, text revision (DSM-IV-TR) Personality Disorders (PDs) and was specifically designed to measure personality pathology when collecting information from multiple sources. With the exception of one DSM-IV-TR narcissistic PD diagnostic criterion, which was split into two MAPP items, each item corresponded to a single DSM-IV-TR diagnostic criterion (American Psychiatric Association, 2000). Each of the 80 MAPP items were developed by translating into lay language the DSM-IV-TR PD diagnostic criteria (Oltmanns & Turkheimer, 2006). Respondents were asked to rate on a 5-point scale how often each of the items is true of them. The scale ranges from 0 (I am never like this/0% of the time) to 4 (I am always like this/100% of the time). Being a multisource instrument, the MAPP has two versions: A self-report version and an informant-report version, written in first and third person, respectively.
Health Status Inventory
The Health Status Inventory (Hays, Prince-Embury, & Chen, 1998) is a 36-item self-report measure of various aspects of one’s physical health and functioning. The measure includes an item that assesses one’s self-rated level of overall health, as well as other items assessing whether one’s physical health has interfered with one’s usual activities. Specifically, the self-rated health question asks, “In general, would you say your health is” followed by five response options that range from Poor to Excellent. There is a large psychometric literature on this scale, and for the purposes of the current analyses we considered responses as existing on a 1 to 5 scale, which is one standard practice in this literature. For the purposes of the analyses reported in this article, it is important to note that we obtained both self and informant ratings of the participant’s health, which we examine in relation to personality scores.
Results
To examine the internal consistency of the scales, we assessed coefficient alpha (Cronbach, 1951) for the 15 personality variables (see Table 1). As can be seen from the table, for most of the traits, alpha was higher for the informant reports than it was for the self reports. For general personality, as measured by the NEO-PI-R, alpha from informant reports ranged from .89 (Openness) to .94 (Neuroticism and Conscientiousness), with a median value of .93. For self reports, alpha ranged from .87 (Agreeableness) to .93 (Neuroticism), with a median value of .89. For disordered personality, alpha from informant reports ranged from .61 (Obsessive–Compulsive) to .82 (Avoidant) with a median value of .79. For self reports, alpha ranged from .53 (Antisocial) to .81 (Avoidant), with a median value of .72. For 13 of the 15 personality variables (87%), alpha reliability for the informant reported information was higher than alpha reliability for the self-reported information.
Internal Consistency for Self- and Informant-Reported Personality.
Note. NEO-PI-R = Revised NEO Personality Inventory; MAPP = Multisource Assessment of Personality Pathology.
Practical implications could result from these differences. For example, less than perfect reliability places a theoretical type of cap on correlations with a second variable (Spearman, 1904, 1910). Imagine a researcher who wanted to know the relationship between a hypothetical third-variable, say interpersonal success, and the Borderline Personality Disorder (BPD) dimension, which is one of the most widely studied PD dimensions. Now assume that the alpha values between the self- and the informant-reported scales differed the same as they do in Table 1 (self-report α = .69, informant-report α = .79). If this were the case, then the reliability inasmuch as it is reflected in these alpha values could differentially influence the maximum possible correlation between interpersonal success and the BPD variable. Using self report, the association between interpersonal success and the BPD variable would have a maximum explained variance of r2 = .69 (maximum r = square root of .69, or .83, this assumes a perfectly reliable measure of interpersonal success). Using informant report, the association between interpersonal success and the BPD variable would have a limit of r2 = .79, or a 10% difference (maximum r = square root of .79, or .88; see Muchinsky, 1996 for relevant formulas and discussion). Notably, 10% of explained variance is an effect that is very meaningful by most standards in the social or medical sciences. This represents, of course, a maximum theoretical difference in that it assumes, among other things, perfect reliability of the interpersonal success measure. What might be a typical difference remains an open empirical question.
The trends in alpha converged quite nicely with the trends found for the average interitem correlations, another metric of internal consistency. The average interitem correlations, for most of the traits, were higher for the informant -reports than they were for the self reports. For general personality, as measured by the NEO-PI-R via informant reports, average interitem correlations ranged from .14 (Openness) to .26 (Neuroticism and Conscientiousness), with a median value of .15. For self reports, interitem correlations ranged from .12 (Agreeableness) to .21 (Neuroticism), with a median value of .15. For disordered personality, interitem correlations from informant reports ranged from .17 (Obsessive–Compulsive) to .40 (Avoidant), with a median value of .29. For self reports, average interitem correlations ranged from .16 (Antisocial) to .39 (Avoidant), with a median value of .22. Both sets of analyses converged on the same general trends. Across 13 of the 15 scales, average interitem correlations were greater for informants than selves. For Openness, the values were even, and for Obsessive–Compulsive Personality Disorder, the values trended in the opposite direction.
To further illuminate the general trends in the NEO-PI-R, we analyzed these data at the facet level (see Table 2). We found that the alpha values and average interitem correlations for the 30 NEO-PI-R facets (6 facets for each of the 5 factors = 30 facets) tended to differ across self- and informant-reported perspectives. Indeed, 27 of the 30 facets (90%) contained higher alpha values for informant report relative to self report. Twenty eight of the thirty (93%) contained higher mean interitem correlations for informant report relative to self report. The greatest difference was found for the “Compliance” facet of Agreeableness. Self-reported Compliance had an alpha of .60, but informant-reported Compliance had an alpha of .78, a difference of .18.
Internal Consistency for Self- and Informant-Reported NEO-PI-R Facets.
Note. NEO-PI-R = Revised NEO Personality Inventory; N = Neuroticism; E = Extraversion; O = Openness to experience; A = Agreeableness; C = Conscientiousness.
As noted in the introduction, it is possible for scales to contain greater internal consistency for reasons unrelated (or even at odds with) validity. For this reason, we conducted analyses to judge which perspective was more valid. Results indicated that only informant-reported personality had the ability to predict unique variance for both informant-reported health and self-reported health. After controlling for demographic factors, self-reported personality (all 15 NEO and MAPP scores) was associated with self-reported health, explaining a unique 8.5% of the variance, p < .01. Meanwhile after controlling for demographic factors, informant-reported personality (all 15 NEO and MAPP scores) was associated with informant-reported health of the target, explaining a unique 10.9% of the variance, p < .01. Perhaps more importantly, self-reported personality did not predict significant variance above and beyond what informant-reported personality could predict. On the other hand, informant reports of personality were still predictive of self-reported health, and explained 2.8% of the variance above and beyond demographic factors and self-reported personality, p < .01. This result demonstrates that informant-reported personality is quite relevant and outperforms self-reported personality in two instances. The finding does not necessarily mean, however, that informant reports are more valid, and it does not necessarily mean that informant reports would outperform self reports on other variables.
Discussion
Using data from a representative community sample, we investigated whether self or informant reports of personality showed higher levels of internal consistency. Results indicated that the informant reports were more reliable than the self reports on a large majority of personality dimensions. The differences were particularly striking for Cluster B PDs (antisocial, borderline, histrionic, and narcissistic), which is consistent with previous speculation (Carlson, Vazire, & Oltmanns, 2013) and these differences showed up for both alpha and interitem correlations. Regarding the direction of these findings, for the large majority of the scales, we found that the difference in reliability was consistent. Indeed, with just 15 scales in the main analysis (Table 1), we found that 13 of the 15 have higher alpha values for informants versus selves. The same general trend was found for the facet scales, 27 of 30 had higher alpha values for informants versus selves. Given that personality is so ubiquitous and influences so many different outcomes, we think that the self- versus informant-report effect is potentially very important.
Our second hypothesis was that informant reports are more internally consistent than self reports because they capture meaningful variance in personality leading to important systematic variance. If this is the case, then informant reports should not only have high internal consistency but they should have good predictive utility, that is, informant reports should better predict important outcome, such as health, compared with self reports. This is exactly what we found. Informant reports better predicted global health reports than did self reports of personality. Thus, we find that informant reports are more internally consistent, and they are also potentially more useful. The finding does not necessarily mean, however, that informant reports are more valid, and it does not necessarily mean that informant reports would outperform self reports on other variables.
There are limitations to the current findings. First, the relationships between the informants and the targets should be considered more fully. Because the informants in our study knew the participants for an average of more than 30 years, we feel confident that they were in a position to provide meaningful ratings on these individuals. We do not know, however, if our findings would generalize to circumstances in which the informants and participants were less well acquainted (see Connelly & Ones, 2010). Second, the advantage of informant reports with regard to internal consistency of these personality scores does not necessarily extend to other forms of reliability such as interrater reliability. That said, internal consistency is an important aspect of reliability as described in the introduction. Among other things, it provides a general gauge as to how well the scale, if used as a composite, could potentially work. And third, although the informant reports were more internally consistent than the self reports, they would almost certainly not be more useful than self reports for all purposes. We tried to guard against this limitation by using an external criterion, but we acknowledge that self-report measures will probably be more useful than informant reports in predicting other criteria (Paunonen & O’Neill, 2010). Through these analyses, we think that we have shown the power of the informant perspective here and guarded to a certain extent against this criticism. Future research will extend these findings to examine how the type of report, self or informant, influences utility and other types of reliability across different facets of personality.
As the findings indicate, the informant reports were modestly more internally consistent, but we do not necessarily know why this is the case nor if this attribute offers any advantages over self report in certain settings. Indeed, it is possible that informants simply have less knowledge about the target and so respond more generically in a way that could sacrifice nuance and even accuracy. Nonetheless, we find it interesting that informants have a more consistent view of the target, and this finding may help explain how informant reports function.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a grant from the National Institute of Mental Health (RO1 MH077840).
