Abstract
Previous work on trait perception has evaluated accuracy at discrete stages of relationships (e.g., strangers, best friends). A relatively limited body of literature has investigated changes in accuracy as acquaintance within a dyad or group increases. Small groups of initially unacquainted individuals spent more than 30 hr participating in a wide range of activities designed to represent common interpersonal contexts (e.g., eating, traveling). We calculated how accurately each participant judged others in their group on the big five traits across three distinct points within the acquaintance process: zero acquaintance, after a getting-to-know-you conversation, and after 10 weeks of interaction and activity. Judgments of all five traits exhibited accuracy above chance levels after 10 weeks. An examination of the trait rating stability revealed that much of the revision in judgments occurred not over the course of the 10-week relationship as suspected, but between zero acquaintance and the getting-to-know-you conversation.
Who among your colleagues consistently asks you to organize social events though you would much rather stay at home curled up with a good book? Who is likely to offer to assist you in performing these tasks? In this report, we have framed the question of trait perception accuracy as the relative ability to discriminate a small group of individuals on the basis of a specific trait. More importantly, we are interested in learning when (at what point in a relationship) this ability to discriminate between others, or accuracy, improves the most. We focused on five major trait domains to uncover which traits can be observed immediately and which traits require an extensive interaction history before they are revealed. We report the trajectories of trait perception accuracy for the five traits assessed at three critical points throughout a developing relationship: (a) zero acquaintance, (b) immediately after completing the first conversation, and finally (c) after becoming well-acquainted with each other 10 weeks later.
Investigations of Trait Perception Accuracy at Discrete Phases in a Relationship
Up until now, much of the research on trait perception accuracy has been conducted at discrete time points in the acquaintanceship process (e.g., after strangers formulate a first impression, among best friends). One of these time points, known as zero acquaintance, attempts to capture the precise moment before a relationship has begun. This construct has been operationalized in a number of ways, ranging from judgments employing photographs of targets (e.g., Shevlin, Walker, Davies, Banyard, & Lewis, 2003) to “thin-slice” paradigms where judgments are made via audio and/or visual recordings (e.g., Ambady, Bernieri, & Richeson, 2000; Beer & Watson, 2010; Borkenau & Liebler, 1992; Dabbs, Bernieri, Strong, Campo, & Milun, 2001), and even in face-to-face contexts where interaction has not yet taken place (e.g., Albright, Kenny, & Malloy, 1988; Ambady, Hallahan, & Rosenthal, 1995). Across all these operationalizations, zero acquaintance is defined as the point at which the acquaintanceship process has not begun.
One might not expect much judgment accuracy at zero acquaintance as the information available is minimal and may not be particularly diagnostic of the constructs being judged (e.g., noting an individual has dark hair is not necessarily indicative of his or her agreeableness). However, from the first investigation to examine this topic (Norman & Goldberg, 1966), studies have consistently demonstrated that many, though not all, attributes of a person can be perceived with some accuracy at zero acquaintance (Kenny, 1994; Kenny & West, 2008). For example, naïve judgments of the interpersonal traits of extraversion and conscientiousness have demonstrated some level of accuracy (Borkenau & Liebler, 1992; Kenny & West, 2008), though indicators of agreeableness, neuroticism, and openness have shown to be less perceptible by observers (e.g., Carney, Colvin, & Hall, 2007; Watson, 1989). Trained raters show concurrence in ratings of dependency and dominance (Moskowitz & Schwarz, 1982). We also know that a still photograph displayed for 2 s can lead to an accurate judgment of emotion (Nowicki & Duke, 2001), and a thin slice of recorded expressive behavior a minute or so in length can provide diagnostic information leading to the valid prediction of a wide array of individual and interpersonal outcomes (Ambady et al., 2000).
In an emerging relationship where individuals might be considered to be recently acquainted or even label themselves as friends, one might anticipate superior trait perception accuracy with the additional information above what is available at zero acquaintance. Recent acquaintances permitted to interact in an unstructured manner exhibited more trait agreement than those constrained to complete a task (Letzring, Wells, & Funder, 2006). Furthermore, friends, compared with strangers at zero acquaintance, appear to be better at assessing each other’s personality (e.g., Funder & Colvin, 1988). Even college roommates are able to accurately assess one another’s traits (Bernieri, Zuckerman, Koestner, & Rosenthal, 1994; Kurtz & Sherker, 2003), suggesting that during the relationship development process some amount of information is gained about others.
On the far end of the acquaintance scale are those one might consider to be well-acquainted; research has demonstrated that family members and others who have been in relationships for a long time show a good deal of agreement in their perceptions of one another (Biesanz, West, & Millevoi, 2007; Letzring et al., 2006; Watson, Hubbard, & Wiese, 2000). Trait ratings made by well-acquainted others align better with self-ratings than do random pairing (Allik, de Vries, & Realo, 2016). In a comparison between friends, dating couples, and married couples, Watson and colleagues (2000) found significant self–other agreement for all five major traits in all of the dyads; married couples showed the greatest agreement in the evaluation of each other’s traits. Furthermore, parents are more accurate at differentiating the personality of their college-age children from the average personality than are short-term peer acquaintances (Biesanz et al., 2007).
This literature suggests that individuals who are at increasingly familiar levels of acquaintance (e.g., best friends compared with strangers) are likely to interact and observe one another in a larger array of situations than individuals at the beginning stages of their relationship. The accuracy differences may stem from the different relationship histories between groups—Individuals in familiar relationships such as friends or family members may have been exposed to traits not yet encountered in the less familiar relationships such as strangers or classmates. These studies operationalized acquaintance in terms of the nature of the relationship (e.g., family, friends, roommates, lovers, strangers; Biesanz et al., 2007; Watson et al., 2000) or the length of time a perceiver has known a target (e.g., hours, days, weeks, years; Letzring et al., 2006). Although these labels define relationships in important ways, they have been criticized for being theoretically ambiguous and were never intended to capture or map precisely the amount of trait-relevant information that was accessible to the perceiver (Bernieri et al., 1994; Kenny, 1991). The rate or trajectory of trait perception accuracy improvement within the same set of individuals from strangers to close friends and beyond (e.g., Paulhus & Bruce, 1992) is less documented and understood.
The Acquaintance Trajectory
Intuitively and theoretically (Funder, 1995, 1999; Kenny, 1991, 1994, 2004), first impressions should not be as accurate as trait judgments of well-known acquaintances, friends, and colleagues. A few studies have explored the acquaintance trajectory and have found some support for its moderation of impression accuracy (e.g., Bernieri et al., 1994; Biesanz et al., 2007; Kurtz & Sherker, 2003; Letzring, Wells, & Funder, 2006; Paunonen, 1989). Paulhus and Bruce (1992), for example, found that trait judgment validity increased for all five major traits as participants spent more time working with one another.
However, at least two phenomena suggest the improvement in accuracy with increasing acquaintance may not be nearly as pronounced as one might expect. First, exhaustive meta-analyses have demonstrated no relationship between the amount of information a perceiver has about a target and their ability to predict interpersonal outcomes (Ambady et al., 2000; Ambady & Rosenthal, 1992). That is, similar levels of accuracy were attained for perceivers observing 10 s of information as for perceivers observing 5 min of information. In fact, the level of accuracy in these brief observations matched that attained by professional personality assessments in other studies. A surprising degree of interpersonally diagnostic information can be chronically embedded throughout people’s expressive behavior (Ambady et al., 2000; Ambady & Rosenthal, 1992), suggesting it is at least possible for first impressions to show a remarkable degree of accuracy.
Whereas this first phenomenon gives us reason to expect some accuracy at zero acquaintance, a second phenomenon can explain why and how accuracy might not improve much even after becoming well-acquainted with a target. Stage theories in person perception (Gilbert, 1998; Trope, 1986) have pointed out that our cognitive evaluations of others are (a) determined first by our prior expectancies about them and (b) resistant to change because of schema effects and cognitive busyness. In other words, even though perceivers have the capacity to observe, evaluate, and assess the behavior of others (see also Funder, 1995, 1999), this process is effortful and requires many resources that typical perceivers may not be motivated or able to devote to the task. Life, it turns out, can be extremely distracting to an interpersonal perceiver. As a result, there is a general tendency for interpersonal assessments to be anchored to initial expectations and perceptions rather than unbiasedly revised as one is exposed to trait-relevant behavior.
A longitudinal study of trait perception accuracy will provide an opportunity to investigate a counterintuitive prediction derivable from these two well-established phenomena. Our intuition would suggest that accuracy should be near zero at zero acquaintance and then monotonically increase with the amount of information the perceiver accumulates about the target. Accuracy after living or working with somebody for 10 weeks should be much greater than that achieved by very first impression or after the first 5-min getting-acquainted conversation. At present, we do not know if this happens. The relevant studies in this area, though valuable, were not designed to address this particular research question with such precision.
Research Objectives
The goal of this project was to investigate person perception as it developed within ecologically representative relationships over several months. This approach contrasts study methodologies that attempt to collect relevant data within one or two 60-min sessions in a lab. To achieve this goal, we assessed interpersonal perception in these relationships at key points that we felt were psychologically meaningful. For example, new acquaintances typically engage in getting-acquainted conversations where they attempt to learn about one another. Together with the zero acquaintance phase (i.e., when people first encounter one another but before they have had the opportunity to interact), these mark two distinct developmental points in most relationships. A third point occurs after people become well-acquainted with each other. Although the criteria for being “well-acquainted” do not formally exist, one can assume researchers would agree that after acquaintances have done things together (e.g., play, eat, travel, work) and have experienced many of the same events (e.g., parties, work meetings, classes, football games, etc.) they would be classified as “well-acquainted.”
In this article, we introduce a research methodology that systematically exposed participants to a standard set of acquaintance-building activities that we felt were ecologically representative of the interpersonal process seen in typical emerging relationships. Participants met 4 times a week for an hour with their group members (i.e., targets) to engage in a series of interpersonal activities designed to represent a wide range of ecologically valid social and interpersonal contexts and experiences. We assessed each participant’s ability to discriminate their group members along five trait dimensions at three theoretically critical points in their relationship: (a) zero acquaintance, (b) immediately after their first getting-to-know-you one-on-one conversation, and (c) 10 weeks later after they had become well-acquainted with each other.
The method employed in this investigation was arrived at after carefully considering the fundamental issues that have been raised over the years regarding the study of impression accuracy and group designs. An explicit attempt was made to heed the call of others who advocated for research designs to be more responsive to the needs of the research question than to designs previously established by convention (e.g., Albright & Malloy, 2000; Crocker, 2010). Albright and Malloy (2000) have noted that just because a study has achieved a high degree of causal validity, it does not follow that any theoretical or practical question of interest has been answered. The issues of valid operationalization of a construct and the existence of a particular causal relationship within the experiment are completely independent to the achievement of causal validity and “real world” application (Campbell, 1986; Cook & Campbell, 1979).
According to both intuition and the theoretical person perception models of Funder (1995, 1999) and Kenny (1991, 1994), the association between perceiver judgments and target Neuroticism-Extraversion-Openness Personality Inventory–Revised (NEO PI-R; Costa & McCrae, 1992) scores would be expected to increase as people become acquainted. However, as stated earlier, the research suggesting first impressions may be more accurate than expected combined with social cognitive processes inhibiting unbiased revisions upon exposure to additional information leads to the counterintuitive prediction that increases in accuracy may be minimal. This investigation compared zero acquaintance judgments with both thin-slice judgments (i.e., first impressions after a 5-min one-on-one conversation) and judgments after becoming well-acquainted with targets. Therefore, there was an opportunity to assess directly just how anchored are we to our first impressions and also examine which traits were those most likely to be revised through observation and experience.
Method
Participants
University students enrolled in a 10-week long “Psychological Assessment” research practicum for which they received academic credit. Enrollment was unrestricted with respect to class standing and major, but only 15 to 21 participated during any one term. Of the 183 participants who initially enrolled, six were omitted from this report due to missing data, and a seventh was omitted because they were not 18 years of age at the time of enrollment. This left a sample of 176: 110 women and 66 men. Ages ranged from 18 to 54 with a mean of 22.1 years. The majority described themselves as White and identified English as their primary language (78% and 86%, respectively). According to Cohen (1988), a study with more than 160 participants is operating above a power level of .77 in detecting a medium-sized effect (ρ = .21) at the p < .05 level (two-tailed). Participants were treated in accordance with the “Ethical Principles of Psychologists and Code of Conduct” (American Psychological Association, 2002).
The practicum met for 50 min, 4 times a week, for 10 weeks. Three of the weekly meetings were structured as lab sessions where participants completed a large number of psychological measures and interpersonal activities, all designed to be relevant to interpersonal behavior and skills. Only those pertinent to the present investigation will be reported here. Participants also met once a week outside of the supervision of an experimenter at a location of their choosing to engage in various activities designed to be representative of those done with others (e.g., playing games, traveling, eating meals, cleaning). These were intended to afford participants the opportunity to become more acquainted with each other over the course of 10 weeks. Throughout the term, participants heard lectures on test validity and received many (but not all) of their own test results.
Procedures
Zero acquaintance
The first personality judgment was made before any interaction could take place. A multiple perceivers, multiple targets reciprocal (MPMT-R) round-robin design (Kenny & Winquist, 2001) was employed. As participants entered the lab for the very first time, they were given a nametag with a letter of the alphabet as their ID for the term. Participants were divided into three groups ranging from five to seven unacquainted members. A written questionnaire was completed to confirm the level of prior acquaintance between group members. The male to female ratios across groups were made similar.
Once participants received their nametag identifying their group assignment, they were instructed to sit in a chair desk with a matching ID until they received further instructions. After all participants arrived and were seated, participants turned to face one another (see Figure 1). 1 Participants began by rating the participant sitting to their right. Specifically, Participant A would complete ratings for Participants B, C, D, E, F, and G in that order. Participant B would rate Participants C, D, E, F, G, and A, and so on.

Zero acquaintance judgment room and participant layout.
Five-minute interactions
Over the next two sessions spaced 48 hr apart, each group member conversed with every other group member, one at a time, for 5 min about anything they chose (see Figure 2). The same trait judgments employed at zero acquaintance were collected after these 5-min long “getting-to-know-you” conversations.

Five-minute interaction room and participant layout.
Ten-week acquaintance
Beginning in the second week, participants came to the lab for three 50 min sessions weekly where they (a) completed various questionnaires, (b) were given instructions about tasks and activities, or (c) listened to brief lectures on psychological assessment. With the exception of the NEO PI-R and the personality judgment scales, none of the other measures and in-lab activities are relevant to the present article, and thus will not be discussed. In addition to lab meetings, groups had to organize themselves to meet once a week outside of the supervision of the experimenters. They were given a different purpose for this meeting every week, but no specific instructions were provided to tell them how to conduct themselves (see Table 1 for task descriptions and sample activities). The meetings were designed to be representative of acquaintance development in organically occurring relationships. In addition, topics of the meetings were selected to elicit certain personality characteristics; for example, when participants were asked to clean something, the trait of conscientiousness would be evident.
Descriptions of Lab Activities Outside the Classroom.
Participants completed a final set of round-robin ratings 10 weeks after their zero acquaintance session. By this time, the members of the groups had spent more than 35 hr engaged in activities together.
Materials
Personality judgment measure: Ten-Item Personality Inventory (TIPI)
Judgments of personality traits were made using the items appearing in Gosling, Rentfrow, and Swann’s (2003) TIPI. Each trait was represented by two anchors on a bipolar scale and presented in a Likert-type format. 2 Convergent validities with the Big Five Inventory for the five trait domains range from .65 to .87, and the test–retest reliability is .72 (Gosling et al., 2003). Self-reported traits assessed by the TIPI correlate strongly with their counterparts in the NEO PI-R: extraversion = .87, agreeableness = .70, neuroticism/emotional stability = .81, openness to experience = .65, and conscientiousness = .75 (Gosling et al., 2003).
Criterion measure: NEO PI-R
One of the objectives of this study was to use a temporally stable and robust accuracy criterion that would allow us to remove many of the measurement artifacts plaguing single-item self-reports typically employed as criteria in round-robin impression formation research (Albright et al., 1988; Ambady et al., 1995; Beer & Watson, 2008; Kenny, Horner, Kashy, & Chu, 1992; Levesque & Kenny, 1993; Marcus & Leatherwood, 1998). Thus, we opted for the longest, most comprehensive and accepted version of a Five-Factor Model Inventory available (Costa & McCrae, 1992, 2008). Internal consistency for the NEO PI-R facets ranges from .87 to .93, and retest reliability ranges from .83 to .93 for the five trait domains (McCrae, Kurtz, Yamagata, & Terracciano, 2011). Participants completed the 240-item NEO PI-R several weeks after the zero acquaintance and 5-min interaction judgments but several weeks before the final assessment judgments.
Table 2 reports the descriptive statistics for the NEO PI-R for our sample and for the college population norms reported by Costa and McCrae (1992). The generalizability of the effects that follow will be determined by the representativeness of our sample of participants with respect to their standing on the five major trait domains. All five trait measures fell within the confidence intervals reported for the NEO PI-R.
Sample Mean Trait Levels (Population Norms a in Parentheses) With Sample CIs.
Note. CI = confidence interval.
Population norms for college-age students as reported by Costa and McCrae (1992) in bold.
Analyses
To determine both the accuracy of trait judgments at a given time point as well as how this accuracy changes across level of acquaintance, we employed a series of regressions representing the predictability of each trait. That is, a multilevel model for each of the five traits, such as the one below, demonstrated the accuracy and stability of personality judgments:
where Yijkg represents perceiver i’s assessment of target j’s personality for a given trait at time k within group g. Xj represents target j’s personality as measured by the NEO PI-R, standardized within the sample. T1 and T2 are indicator variables representing the time period at which the measurement took place (i.e., T1 is an indicator representing the 5-min interaction and T2 is an indicator representing the final assessment; when both equal zero, the zero acquaintance time point is represented; when T1 = 1 and T2 = 0, the 5-min interaction time point is represented; when T1 = 0 and T2 = 1, the final assessment time point is represented). 3 u0g represents the random effect for group g, u0i represents the random effect for perceiver i, and u0j represents the random effect for target j.
Of primary interest for this investigation are b1, b4, and b5. The accuracy coefficient is represented by b1; it is the association between the criterion NEO PI-R scores and the trait judgments. Similar, correlation-based, trait accuracy metrics have been employed in previous research (Hall, Andrzejewski, Murphy, Mast, & Feinstein, 2008; Lippa & Dietz, 2000). The changes in accuracy between time points are represented by b4 and b5. For the case in which the zero acquaintance time point is the reference group, b4 would represent the change in accuracy between zero acquaintance and 5 min for a given trait and b5 would represent the change in accuracy between zero acquaintance and the final assessment for that trait.
Analyses were conducted with targets, perceivers, and groups as the levels of analysis. The intercept and slope were allowed to vary randomly across the levels of analysis. All analyses were conducted in R (R Core Team, 2016) using the lme4 package (Kuznetsova, Brockhoff, & Christensen, 2016). Separate ordinary least squares (OLS) regressions were calculated for each trait to estimate the unstandardized regression slope b.
Results
Accuracy
We expected to confirm the general findings that have been previously reported for the zero acquaintance context (e.g., Kenny & West, 2008, 2010). With the exception of extraversion and conscientiousness, we did not expect to see much evidence that people would accurately perceive individuals on the basis of their personality traits as assessed by the NEO PI-R. We did, however, expect that perceivers would become more accurate after 10 weeks due to the activities and situations they experienced together. What we were also keen on learning was how much accuracy improved across the three critical relationship phases. How much accuracy was gained by that first 5-min long conversation compared with the accuracy gained after having access to 35 hr of trait-relevant behaviors? Intuitively, the accuracy after 35 hr should dwarf that attained after a mere 5 min. However, according to the findings reported by Ambady and colleagues (Ambady et al., 2000; Ambady & Rosenthal, 1992) combined with what is known about the social perception process (e.g., Gilbert, 1998; Trope, 1986), accuracy might peak at five minutes and not be impacted by the mass of trait-relevant information that follows.
Accuracy at zero acquaintance
Zero acquaintance accuracy coefficients across perceivers are reported in the first column of Table 3. As expected, extraversion showed the highest accuracy, β = .17, F(1, 290) 4 = 14.32, p < .001. Although the absolute level of accuracy appears low, it is nevertheless statistically significant. Conscientiousness was also judged according to expectations, β = .08, F(1, 330) = 4.26, p < .05. Contrary to previous reports on zero acquaintance judgments, however, openness, β = .12, F(1, 383) = 9.93, p < .01, also showed levels of accuracy significantly greater than zero while agreeableness ratings were trending toward accuracy, β = .08, F(1, 344) = 3.00, p < .10. Finally, neuroticism accuracy was not statistically significant, β = .06, F(1, 340) = 2.20, p = .14 (see Table 3 for all coefficients and confidence intervals).
Regression Coefficients Predicting Assessments of Target Personality Over Time by Trait.
Note. All of the estimates in the table are derived from the same model but with different indicator codings (i.e., switching the reference group). CI = confidence interval.
p < .10. *p < .05. **p < .01. ***p < .001.
Accuracy after 5 min and after 10 weeks
In line with proposed theoretical models (Funder, 1995; Kenny, 1991) and previous findings (Paulhus & Bruce, 1992), every trait was judged with some level of accuracy after 10 weeks. We did not, however, have any precise predictions about how much accuracy would be observed after the first 5 min compared with the accuracy gained after the 10-week project concluded. On one hand, the models described by Funder and Kenny would predict enormous increases in accuracy after becoming well-acquainted. On the other hand, theories of the person perception process (e.g., Asch & Zukier, 1984; Gilbert, 1998; Trope, 1986) would lead one to expect minimal gains in accuracy despite the potential information value contained within the 35 hr of trait-relevant behavior available for analysis.
After 5 min of conversation, all five traits showed significant levels of accuracy (Table 3, column 3). Extraversion increased in accuracy the most and the magnitude of this gain in only 5 min was the largest out of the five traits. The magnitude of the accuracy gains for the other traits was lower (Table 3, column 2), yet the gains in accuracy were great enough to make their absolute level of accuracy significant.
Accuracy of judging all five traits increased between zero acquaintance and 10 weeks (Table 3, column 6). In fact, all five traits showed an intuitively appealing monotonic increase in accuracy as relationships developed. The interesting results in Table 3, however, are revealed in the comparisons of column 1 (zero acquaintance accuracy) with columns 3 and 5 that show the relative increases after the very first conversation in relation to the increases that occur through an actual relationship with an individual. Intuitively, the amount of trait information about a person must be orders of magnitude greater over the course of a relationship than the amount one gets when he or she first meets someone. The traits show different and intriguing disparities in sensitivities to this information.
For extraversion, accuracy is apparent at zero acquaintance, increases after an initial conversation, and increases more over the course of the relationship. Whereas this alone may not be surprising, what may take some readers by surprise is the fact that the magnitude of the accuracy gained as a result of having a 10-week long relationship is about the same as the gain achieved after only 5 min. On the contrary, Table 3 shows that agreeableness and conscientiousness require a relationship of shared experiences and events before they become more accurately assessed. Finally, in the situational context examined here, the accuracy to judge another’s neuroticism and openness appeared to occur mostly within the first 5 min. Taken together, the striking lesson to be learned is how quickly and immediately accuracy was achieved and, for some traits, how relatively insensitive it is to actual behavioral events and experiences shared with others. For some traits, first impressions appear to last.
Stability of Trait Perceptions
Here we are not trying to assess the accuracy the judgments attained, but instead we focus on the adjustments made to the ratings themselves in terms of the observers’ readjustment of the targets’ relative standings on these traits in relation to each other. Stability in this case does not refer to the consistency of trait ratings across levels of acquaintance (e.g., rating a target’s extraversion as 7-7-7) but refers roughly to the similarity with which the relative standing of targets for a given trait at zero acquaintance corresponds to their perceived standing later. Essentially, this correspondence can be determined by correlating judgments across targets at Time 1 with the judgments across targets at Time 2. A correlation of 1.0 for a given judge would mean that their perception of the relative standing of a group of targets on a trait did not change, even if the estimated absolute levels of the trait in question did change. In other words, if an observer rated the extraversion of every member of their group two units higher after 5-min conversations—presumably because everyone seemed talkative in this context—the stability of their judgments would be perfect (r = 1.0) despite the fact that every judgment for each target changed from Time 1 to Time 2 (see Figure 3 for a demonstration of this coefficient).

Sample zero acquaintance, five-minute interaction, and final assessment trait ratings made by one fictitious judge on group members.
Table 4 presents the mean stability coefficients (test–retest reliabilities) of observer judgments of the relative standing of group members within each trait. The high standard deviations indicate that many individual observers revised their judgments of group members relative to each other dramatically while others did not. We consider this individual difference (i.e., the stability of the perceived relative standing of targets on a given trait over time) to warrant additional exploration as we suspect that interpersonally sensitive perceivers should become more accurate after being exposed to contexts that are diagnostically relevant.
Average Stability of Judgments (Ordering of Targets According to a Trait) Over Acquaintance Level.
Note. For each observer, the judgments made for all of the targets in their group at one acquaintance level were correlated with those made at another acquaintance level, creating a test–retest reliability score for every observer. Correlations were converted to Fisher’s z scores, then averaged, then transformed back to Pearson’s r for display. The more positive the value, the more similar was the judgment of the relative standing of targets from one period to another. These correlations can be interpreted as test–retest correlations for their judgments across a set of targets on a given trait.
An intriguing result that merits further study was that the perceptions of four out of the five traits made after the first 5 min resembled more closely (higher stability coefficients) those made after 10 weeks than those made at zero acquaintance. In other words, perceptions of group members changed more after the very first 5-min conversation than they changed in the subsequent 10 weeks after 35 hr of doing things together. A group by level of acquaintance (27 between × 2 within) ANOVA was performed on these stability coefficients to determine the significance of these differences. Only the differences for extraversion reached statistical significance. Although accuracy for judging extraversion appeared to increase over 10 weeks, the relative standing of group members changed more after the first 5-min conversation than it did after working, playing, eating, and traveling with each other over the subsequent 10 weeks, F(1, 149) = 38.32, p < .0001, r = .45.
Contrary to what one might expect, we did not find a significantly greater revision of relative trait judgments after an extensive accumulation of trait-relevant information. Impressions changed as much (more in the case of extraversion) after the first 5 min than they did after 10 weeks.
Discussion
Development of the Acquaintance Effect
Previous literature suggested that at a zero level of acquaintance, only extraversion and conscientiousness would be accurately judged. As expected, extraversion was the most accurately perceived trait with conscientiousness also displaying significant levels of accuracy. Contrary to expectations, openness and agreeableness (marginal) were accurately assessed at zero acquaintance. We suspect that the precise procedures a lab uses for a zero acquaintance situation could have a large impact on what traits are being encoded by targets (Kenny & West, 2008). For example, some studies allow participants to interact prior to the rating procedure to the extent each has to find a chair and sit down without invading each other’s personal space (Norman & Goldberg, 1966; Passini & Norman, 1969; Paulhus & Bruce, 1992; Watson, 1989), whereas others have not (Ambady et al., 1995). Our procedures required participants to remain silent in the presence of each other for up to 15 min. To preserve and maintain the zero acquaintance operationalization, we instructed them not to react or communicate with each other nonverbally (e.g., smiles, eyebrow flashes, etc.). Although participants were not directly facing one another, it is conceivable that we may have unwittingly created a situational context that pulled for behaviors related to agreeableness or openness. Although no conversations took place among participants, they may have been able to observe each other’s relative behaviors in a situation that exposed trait-relevant information (e.g., compliance with instructions, comfort for novelty). This would lead to increased encoding and detection of traits such as agreeableness and openness.
Three out of the five traits showed significant levels of accuracy at zero acquaintance, and all five traits were accurately judged after 5 min of conversation; the increase in accuracy between these time points was highest for extraversion. In contrast to what one might expect intuitively, the increases in accuracy between the 5-min conversation and the 10-week acquaintance period were similar to the increases seen between the 5-min conversation and zero acquaintance. We find it remarkable that more than 30 hr of interaction experiences over the course of 10 weeks barely increased the accuracy with which our participants could evaluate the other members of their group in terms of how willing they are to engage in novel experiences.
Intuitively, and according to theoretical models (Funder, 1995, 1999; Kenny, 1991), perceptions of group members should change more after 10 weeks of working, playing, eating, cleaning, scheduling, and traveling together than after a prototypical 5-min getting-acquainted conversation. Among the already acquainted, those with more exposure to the target generally demonstrate a greater ability to distinguish the target’s traits from the average (e.g., Biesanz et al., 2007), suggesting revision of initial impressions over time. A close comparison of the changes in accuracy with the stability coefficients reveals a few notable effects. First, the changes in accuracy between zero acquaintance and 10 weeks are significant for every trait. As one might expect, the stability coefficients for this time period are low, indicating relatively substantial revisions in the ratings themselves. However, inspection of the changes between 5 min and 10 weeks suggests a different process may be taking place. Again, the changes in accuracy, on the whole, are significant or at least marginally so. Rating stability, on the contrary, is high for this time period. That is, observers made most of their trait judgment revisions after the first 5 min. Although in the present study accuracy did creep up over time, the stability correlations showed that the revisions after 5 min were larger than those that followed. This contradicts the common sense notion that new information gained about another, especially if relevant, matters. Yet this finding is perfectly consistent with long-standing theorizing about how resistant to change our perceptions and thinking can be (Anderson & Barrios, 1961; Asch, 1946; Fiske & Taylor, 2010; Lord, Ross, & Lepper, 1979; Sedikides, 1990; Snyder & Swann, 1978) as well as research demonstrating reliability in trait ratings within situational contexts (Borkenau, Mauer, Riemann, Spinath, & Angleitner, 2004).
The Operationalization and Experimental Manipulation of Acquaintance
Although commonly employed in impression formation research, the measurement of acquaintance by assessing the amount of time individuals have been exposed to one another is inherently problematic and thus limited in its value as a theoretical construct to the extent that the variance in social situations is not taken into account. The current methodology addressed this weakness by using the level of acquaintance, defined by interpersonal contexts rather than time, as a proxy for the amount of information known about a target individual (Kenny, 1991; Kruglanski, 1989).
Inspired by Brunswik’s (1956; see also Hammond & Stewart, 2001) notion of representative design and Funder’s (1995, 1999) stage model of interpersonal perception accuracy, a set of situations and activities were carefully selected on the basis of their presumed relevance to the traits being assessed. For example, travel and eating experiences provided relevant opportunities for the expression of openness. In this manner, we not only specified the number of hours participants were exposed to each other over the course of 10 weeks (at least 35 hr) but defined explicitly what social situations and experiences constituted acquaintance. Furthermore, several activities were deliberately unstructured, unsupervised, and held off-site to necessitate coordination within groups, compelling members to undergo all the leadership and interpersonal processes involved in the formation and maintenance of small groups (Tuckman, 1965; Tuckman & Jensen, 1977). We reasoned these processes would be relevant to the expression of agreeableness and neuroticism to the extent groups faced inevitable coordination and scheduling problems. In other words, acquaintance in this study was defined by a set of structured and unstructured typical group contexts that sought to maximize the content validity of interpersonal experiences that define a relationship.
To our knowledge, the standardized exposure of a carefully selected, content-valid sample of situations defining the course of relationship development has not yet been done. In fact, our own experience in this project leads us to believe that more theoretical work is needed to discriminate among different types of relationships by assessing partners’ shared experiences. Terms like friend, best friend, and significant other are too vague to be used theoretically or in empirical studies. We suggest that it may be possible to develop a taxonomy of relationships on the basis of a necessary and sufficient set of shared experiences that would define them.
Limitations and Considerations for Future Research
In accuracy research, there are numerous methodologies to investigate how ratings correspond to criteria (e.g., self–other agreement, consensus, predictive regression models) and these methods should be selected based on the question of interest. This is not to say that each of these means of assessment is without flaws; there are different concessions and considerations that need to be made in each instance, the present research notwithstanding. Below three of these issues are detailed.
One consideration for the present methodology is that the trait ratings our participants made involved only two TIPI items per trait. It should also be noted that ratings of just two items may be naturally unstable over time, leading to less reliability than if the judgments were based on a greater number of items. Prior work indicates the TIPI and NEO PI-R domains have generally high correlations with one another, and the test–retest reliability of the TIPI is moderate to high (Gosling et al., 2003). Nevertheless, this reliability limitation remains. We accepted this limitation gladly, however, because the nature of our experimental design and procedures required the least intrusive, most efficient trait judgment measure we could find. The length of the TIPI was ideally suited to the needs of the present research, though these questions can certainly be investigated further in future work. At the very least, however, we must acknowledge the very real possibility that our results may underestimate the true level of accuracy that actually exists within the general population.
In addition, an issue inherent in the comparison of accuracy across methodologies is the medium through which personality, and subsequent accuracy, is assessed. Indeed, different, and more accurate, information can be attained when judgments are made via spontaneous photographs compared with posed, standardized photographs (Naumann, Vazire, Rentfrow, & Gosling, 2009). Furthermore, evaluating photographs is experientially different from making active judgments in person (Tickle-Degnen & Rosenthal, 1990). Photographs by nature do not carry the rich display of nonverbal behaviors, expressive style, and body posture, and charisma targets may exude in interpersonal contexts. Thus, ratings of photographs may dampen trait perception accuracy compared with ratings made in more dynamic environments as a result of a reduction in information transmission. To more intricately discern how the results in the person perception accuracy literature differ, it would be useful to compare the acquaintance trajectory in a face-to-face, ecologically valid context with photograph-based evaluations of the same targets.
Finally, across psychological research, there is a consistent and pervasive exchange between experimental control and generalizability. This study is no different. We wanted to ensure relationship development within each group progressed as naturally as possible and allowing this freedom meant we had to relinquish some experimental control. Although we did provide contexts in which certain trait-relevant information would become available to perceivers (e.g., a cleaning task that might reveal traits of conscientiousness and agreeableness), we cannot say for sure that participants displayed certain characteristics or that their group members picked up on the traits if and when they were revealed. Answering these types of questions would require systematic exposure with the inclusion of comparison groups. Instead, the present research provides a more ecologically representative longitudinal investigation of the development of person perception accuracy in small group contexts.
Conclusion
We examined the effect of acquaintance on how well an individual could compare group members along a trait dimension. The research question framed this way required a correlational approach within small group contexts. By employing a series of regression models, we were able to investigate not only the accuracy of each trait judgment at the different time points, but how accuracy in trait judgment shifts from one time point to the next. These performance metrics can be extended to other contexts, situations, and research purposes. The method employed here generated a pragmatic measure that allows future researchers to investigate judgment-criterion alignment as well as moderator effects.
Overall, we found evidence for some degree of trait perception accuracy at every phase of the relationship, even at zero acquaintance. Such a finding is consistent with the implications of thin-slice research, where it has been established that mere seconds of the behavioral stream encode diagnostic information about an individual (Ambady et al., 2000; Ambady & Rosenthal, 1992). It is clear from the data reported here that trait behavior is encoded within our expressive behavior and is perceived almost immediately. Although we may revise our assessments of others as we become well-acquainted, our personality judgments do not appear to be influenced by behavioral events as much as the very first conversation we experience with another. Our early impressions of others appear to be both surprisingly accurate and more resistant to change than our intuition would have us believe.
Footnotes
Acknowledgements
Special thanks to Nicholas Reyna, Hooman Zoonoozy, Alyson Kraus, Ryan Armstrong, Sara Vogan, James Babcock, Jim Scovell, Elysia Todd, Sarah Erickson, Michelle Best, Sarah Noyes, Becky Baker, Jordan Clark, Chris Grooms, Jessica Waldo, Krikor Gazarian, Joshua Landin, Crystal Fisher, Jason Gibbs, Josh Klein, Shaila McCarthy, Naomi Sprague, Stewart Risinger, Jackson Pugh, Matthew Austin, Lisa Furumasu, Laura Romrell, Tiffany Diec, Rebecca Wooldridge, Greg Hagg, Shawna Smith, and Rory Running for help in conducting the study.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
The supplemental material is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
