Self-Other Agreement in Experience Sampling Measures

Abstract

Experience sampling methodology (ESM) has provided researchers with a flexible and innovative measurement tool, and the methodology has become increasingly popular in several fields of psychology. Therefore, validity studies on such measures are important. The present study investigated convergent and discriminant validity of ESM measures using peer-ratings as the criteria. We obtained ESM self- and other-ratings of personality states, situation perceptions, and feelings from 344 occasions, from 49 target participants. The results showed that several—but not all—widely used ESM self-ratings have substantial and distinct self-other agreement. We conclude that many ESM self-reports are likely to capture, at least to an extent, target persons’ actual personality states, feelings, and situation perceptions.

Keywords

experience sampling methodology self-other agreement convergent validity discriminant validity situation perception personality states

Experience sampling methods in personality and social psychology have massively increased in popularity during the last 30 years (e.g., Hamaker & Wichers, 2017). Experience sampling methodology (ESM; also referred to as ecological momentary assessment or ambulatory assessment) refers to a data gathering technique in which data are collected from participants repeatedly on many occasions over some—usually relatively short—period of time (Stone & Shiffman, 2002). An example ESM procedure would have participants respond to five brief questionnaires per day for 13 days (e.g., Fleeson, 2001). However, there is plenty of variation in both the length and the frequency of measurement in ESM studies (e.g., Fleeson & Gallagher, 2009; Sun & Vazire, 2019; Wrzus et al., 2015).

Experience sampling methods have given psychologists a novel way to access people’s everyday lives. To name just a couple of examples, ESM studies allow personality researchers to connect personality states to simultaneous feelings and situation perceptions (e.g., Fleeson et al., 2002; Horstmann et al., 2020), motivation researchers to connect everyday life events to goal pursuit (e.g., Ghassemi et al., 2021), and mental health researchers to connect momentary ruminative thoughts to depressive symptoms (Connolly & Alloy, 2017). Furthermore, by sampling individuals’ personality states, feelings, or other psychological constructs repeatedly over time, researchers are able to capture participants’ typical or habitual way of acting or feeling and connect such variables to other person-level characteristics such as personality traits or life satisfaction (e.g., Fleeson, 2001; Grühn et al., 2008; Grzywacz et al., 2004). In sum, ESM offers a plethora of opportunities of gaining new types of psychological data.

Because of the popularity and usefulness of ESM techniques, psychometric validation of such techniques is essential. ESM studies typically collect data via self-reports. In a prototypical ESM questionnaire, participants are asked to describe their behavior, feelings, or other psychological activity during the last hour or last 30 min. Increasingly, physiological measures such as indicators of blood pressure (e.g., Ilies et al., 2010) are included into the ESM protocols, but self-reports remain the main method. Because participants are asked to describe what they are doing at the moment, ESM greatly reduces the memory bias problem of traditional self-reports. However, ESM self-reports are still susceptible to social desirability concerns (e.g., participants may not want to report undesirable behavior) and lack of self-knowledge (e.g., participants may evaluate their own behavior inaccurately). Thus, validating the ESM self-report measures is important.

Several studies have provided evidence of convergent validity for some ESM measures. First, studies using audio recordings of target participants’ everyday lives have shown that observer-ratings of such audio files correlate with time-matched ESM self-ratings of personality states (Sun & Vazire, 2019), quality and quantity of social interaction (Sun et al., 2020a), and positive and negative emotions (Sun et al., 2020b). Fleeson and Law (2015) found correspondence between ESM self-reports and observer-ratings of aggregated extraverted and conscientious behavior in a laboratory setting. Bleidorn and Peters (2011) recruited work colleague dyads and found evidence for self-colleague agreement for PANAS ratings. Abrahams et al. (2021) had student teachers and their supervisors rate student teachers’ personality states and situation perceptions in the same (teaching) situations and found significant (though small) momentary self-other correlations. Finally, Breil et al. (2019, Study 2) found that ESM informant-ratings of sociability correlated with ESM self-ratings of sociability in an event-contingent ESM design.

In sum, previous studies have shown that self-reports of personality states, emotions, situations, and social interactions are related to observer-ratings of the same dimensions. The present study aims to complement this literature in following ways: First, like Abrahams et al. (2021, Table 7, Table S2), we study self-other agreement on personality states and situation characteristics using an informant embedded into the same situation. However, we will use an everyday life ESM design that has the potential to cover a more diverse set of both situations and peer-raters, and, thus, to provide information about self-other agreement across several types of natural life situations. Breil et al.’s (2019, Study 2) procedure was quite similar to ours, but they used event-contingent data collection, only one personality state (sociability) and focused on prediction of sociability in particular classes of situations. Instead, we use a general ESM procedure, a wider range of personality state dimensions, and focus on self-other agreement. Second, the present study complements research using observer-ratings of audio recordings from natural life situations (e.g., Sun & Vazire, 2019): We also sample natural life situations, but use peer-ratings as the criteria, providing a cross-method replication attempt of the results obtained via audio recordings (Sun et al., 2020b; Sun & Vazire, 2019).

Finally, the present study is, to our knowledge, the first to investigate discriminant validity in ESM ratings. Establishing discriminant validity for ESM measures is important because lack of it may result in false conclusions. For instance, a researcher may wish to investigate whether momentary dominance is related to momentary social status; however, if their measure of momentary dominance is confounded with other states (lacking discriminant validity), it is possible that those other states are in fact responsible for the relation between dominance and status. As most of the previous ESM studies studying self-other agreement (Breil et al., 2019; Fleeson & Law, 2015; Sun et al., 2020a, 2020b; Sun & Vazire, 2019b), we focus on rank-order agreement, instead of absolute agreement, as the validity indicator of interest. Finally, we focus on agreement on momentary ratings instead of agreement on average (between-person) ratings.

In choosing the dimensions of interest, the present study draws, first, from the Whole Trait theory (Fleeson & Jayawickreme, 2015). According to this theory, personality traits manifest in everyday life as momentary states related to the underlying traits (e.g., Fleeson, 2001). Personality states derived from the Big Five/Five Factor/HEXACO models are commonly used in ESM research (e.g., Abrahams et al., 2021; Fleeson & Gallagher, 2009; Leikas & Ilmarinen, 2017; Sun & Vazire, 2019). Therefore, self-ratings of such personality states are important targets of validation efforts.

Classifying psychologically relevant features of situations has been notoriously difficult in the history of behavioral, personality, and social sciences (e.g., Rauthmann et al., 2015). This is not surprising, as “situation” is a very broad term and can refer to a plethora of characteristics, such as physical (e.g., location) or social (e.g., dyadic vs. group setting) features of the situation, or to the psychological meaning of the situation. Rauthmann et al. (2015) argued that in psychology, it is most fruitful to investigate situations at the level of their psychological meaning. This is also how situations are defined in the present study.

There are currently two well-founded models of situational perceptions: the DIAMONDS model (Rauthmann et al., 2014) and CAPTION model (Parrigon et al., 2017). However, due to an ongoing effort to build a complementary situational taxonomy using somewhat different starting point than the DIAMONDS and CAPTION models (a top-down approach; e.g., Reis, 2018), the present study took a different approach. In essence, several theories that have produced reliable evidence of situational effects on behavior and mood were reviewed and used to select situational features. The extent to which the situation was approach- and avoidance-related (derived from the approach-avoidance motivation theory, e.g., Elliot, 1999), competitive and cooperative (derived from the social interdependence theory, e.g., Deutsch, 1949, 2011), free and constrained by circumstances (derived from the strong situation theory, e.g., Cooper & Withey, 2009; Mischel, 1977), and pleasant and unpleasant (to cover basic valence differences between situations) were included. These features were selected because they are likely to vary both between persons and between situations and to have consequences to personality states and emotions (e.g., Heller et al., 2007; Impett et al., 2010; Meyer et al., 2010; Stanne et al., 1999).

Finally, momentary mood, stress, and fatigue were measured to investigate self-peer agreement on momentary feeling states.

Method

Transparency and Openness

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study, and we follow JARS (Kazak, 2018). All data, analysis code, and research materials are available in the Online Appendix at https://osf.io/gc8jw/?view_only=36078622b03d4f35b1ebb5800b1dc259. This study’s design and its analysis were not preregistered.

Power Calculations

The target effect of interest in the present study was the relation between momentary self-reports and momentary peer-reports, that is, a relation between two Level 1 variables in multilevel data. The planned ESM protocol was 10 days: four questionnaires per day. Our previous experiences suggested that with the resources we had for this study, N = 70 is a reasonable expected sample, and about 70% is a reasonable expected response rate. However, the effective sample size would be determined by the number of participants providing an adequate number of peer-reports, and, without precedent, it was difficult to predict how many peer-reports per participant would be obtained. Furthermore, it was difficult to evaluate the expected effect size. In Breil et al.’s (2019, Study 2) study, the average within-person correlation between self-reported and other-reported momentary sociability was r = .28, suggesting a small-to-moderate effect size. As this variable was identical to one of our variables, we used this correlation to guide our power calculations and conducted power calculations in a simulated data set for an effect size of .28. Random intercept variance was set at 0.60, slope variance at 0.03, and residual variance at 1.0. Random parameters were retrieved from previous ESM studies using comparable variables (e.g., Leikas, 2020; Leikas et al., 2021; Leikas & Ilmarinen, 2017). We used the powerSim function in the simr package (Green & MacLeod, 2016) in R (R Core Team, 2020).

The simulation results suggested that with a sample of 30 and with five observations per participant, the power to detect a Level 1–Level 1 association with an effect size of 0.28 was 0.80 (95% confidence interval [CI] = [0.79, 0.81]). Thus, it seemed likely that assuming the effects of interest were at least small to moderate in size, we would be able to detect the effects even if the number of participants and the number of reports per participants would be considerably lower than in a typical ESM study.

No formal stopping rule for data collection was set because based on our previous experiences in recruiting for similar studies, we anticipated that we would end up with a smaller sample than we would prefer. The data were collected in February and March 2018.

Participants

Participants were recruited via e-mail invitations inviting recipients to participate in a daily life study, with a compensation of two cinema tickets (total value ca. 22 €). Invitations were sent to several student mailing lists of Finnish universities. In total, 57 people responded. These individuals were sent detailed information about the study and a link to an online consent form and a background questionnaire. All 57 individuals completed the consent form and were included in the ESM phase. One participant did not respond to any ESM prompt, and seven participants did not provide any peer-reports, leaving us with a sample of 49 (44 women, five men). Participants’ mean age was 26.1 years (SD = 6.02 years, range = 19–41 years).

Procedure

Participants completed a background questionnaire¹ online and went through a 10-day ESM phase with four prompts per day. The prompts were always sent between 9 and 10 a.m., 12 p.m. and 1 p.m., 4 and 5 p.m., and 8 and 9 p.m. (at random times within these hours). The prompt consisted of a text message containing a link to the online ESM questionnaire (programmed via form software “E-lomake”). Answering was possible for 1 hr after the prompt arrived, after which the questionnaire was locked. Participants were asked to respond to as many questionnaires as possible but informed that neither response rate nor the number of peer-reports provided would affect compensation.

The ESM questionnaire had two parts: self-report part and peer-report part. The self-report part always appeared first. In that part, participants were first asked to rate their current mood, stress, and fatigue. Then, they were asked to rate their personality states during the last 30 min, and then, features of the situation(s) they had encountered during the last 30 min (starting from the moment they started to complete the questionnaire). Next, they were asked whether there was another person present who was willing to respond to questions about the participant. The response alternatives were “Yes,” “No,” “I can’t ask anyone at the moment,” and “I’m alone.” If a participant replied other than “yes,” the questionnaire was saved and closed. Participants were not given detailed instructions regarding what to do if peers refused; they could keep asking others or stop asking at their volition. This omission was meant to lower participant burden by allowing them to decide what level of pressure they were willing to inflict on others in each situation.

If participant responded “yes” to the peer-reporter question, the peer part of the questionnaire appeared, and the participant was instructed to report how long they had been consecutively with the person willing to respond, and then to give their phone to the peer. The peer questionnaire instruction read as follows:

Please respond to the questions below regarding the person who asked you to respond. This person will not see your answers (as long as you press “save” at the end of the questionnaire). We are interested in your perspective, so answer according to your own view. It takes about 2 to 3 minutes to complete the questionnaire. Thank you very much for your help!

The peer-report part first asked the peer to report what is their relationship with the participant. Then, they were asked to respond to the mood, stress, fatigue items in peer-report form. Next, they were asked how long they had been in the presence of the participant consecutively, after which they were asked to respond to the personality state and situation questions in the peer-report form. In addition, peers were asked to rate how easy it was to make the personality state and situation ratings.

Measures

Momentary Mood, Stress, and Fatigue

Momentary feelings were measured with single items happy, stressed out, and tired, and all answered with a 5-point scale from 1 (not at all) to 5 (very much). Participants were asked to “Evaluate your state of mind right now. Are you . . .” followed by the three items. Peers were asked to “Evaluate his/her state of mind right now. Is s/he . . .” followed by the three items.

Personality States

Self- and peer-reported personality states were measured with items derived from the Big Five/Five-Factor models. These items were sociable, dominant, energetic, insecure, nervous, friendly, imaginative, productive, and responsible, complemented with socially competent. Participants were asked to “evaluate your behavior during the last 30 minutes. During the last 30 minutes I have been . . .” followed by the personality state items. Peers were asked to “evaluate his/her behavior during the last 30 minutes or during the time you have spent together (if less than 30 minutes). He/she has been . . .” followed by the behavior items. A 5-point response scale from 1 (not at all) to 5 (very much) was used.

Situational Features

Participants were asked to “Evaluate the situation or situations you have been in during the last 30 minutes” with eight items: (1) the situation required cooperation, (2) a desired goal was pursued, (3) the situation was pleasant, (4) there was an attempt to prevent an unwanted outcome from happening, (5) I was free to do what I wanted, (6) there was competition between myself and someone else, (7) circumstances or other people dictated what I had to do, and (8) the situation was unpleasant. Peers were asked to “Evaluate the situation or situations she/he has been during the last 30 minutes or during the time you have spent together (if less than 30 minutes),” followed by the situation items in peer-report form. The items were responded on a scale from 1 (not at all) to 5 (very much).

Additional ESM Rating (Participant)

If participants responded “yes” to the peer question, they were asked how long they had been consecutively with the person willing to provide the peer-ratings. The response options were a few minutes, 5 to 10 min, 10 to 20 min, 20 to 30 min, 30 to 60 min, and over an hour.

Additional ESM Ratings (Peer)

At the beginning of the peer questionnaire, peers were asked what was their relationship with the participant (response options were friend, relationship partner, mother/ father/sister/brother, other relative, work colleague, teacher, child, stranger, none of the above) and how long they had known the participant (with response options I don’t know this person, less than 6 months, 6–12 months, 1–2 years, 3–5 years, 5–10 years, more than 10 years). After the feeling questions, they were asked how long they had been with the participant consecutively, with the same response options as in the participant version of this question. Furthermore, after the personality state and situation ratings, peers were asked how easy it was to make each set of ratings on a scale from 1 (very difficult) to 5 (very easy).

Results

Participants provided a total of 1,542 ESM reports (78.7% of the potential maximum of 1,960). On average, participants provided 31 reports (median = 32, mode = 37, range = 10–38 reports). Out of the 1,542 reports provided, 344 (22.3%) included a peer-report. Participants provided seven peer-reports on average (median = 6, mode = 1, range = 1–23). For more details of the ESM reports, see Online Appendix.

Table 1 presents descriptive statistics for self- and peer-ratings from peer-rating available situations (n = 344), and Table 2 presents variance components for self- and peer-ratings. Table A1 in the Online Appendix presents descriptive statistics for momentary self-reports as a function of the availability of a peer-report, along with multilevel regression results for the differences between the two types of situations (Table A2 in the Online Appendix shows Ms and SDs for all four response options). As shown there, participants seemed to experience peer-report available situations as more enjoyable than no-peer-report situations, reporting, for instance, better mood, lower stress, and higher pleasantness in peer-report available situations. We return to this issue in section “Limitations.”

Table 1

Descriptive Statistics for Self- and Peer-Ratings (From Situations for Which a Peer-Report Was Available)

Rating		Self-ratings	Peer-ratings
Rating		M (SD)	M (SD)
Feelings	Happy	3.67 (1.02)	3.79 (0.89)
	Stressed	1.89 (1.03)	2.08 (1.07)
	Tired	2.32 (1.11)	2.43 (1.06)
Personality states	Sociable	3.79 (1.04)	3.94 (0.96)
	Energetic	3.06 (1.12)	3.45 (1.04)
	Dominant	1.99 (1.02)	2.24 (1.15)
	Insecure	1.73 (0.97)	1.65 (0.85)
	Nervous	1.75 (1.01)	1.78 (1.01)
	Friendly	3.73 (0.98)	4.24 (0.87)
	Imaginative	2.23 (1.10)	2.92 (1.05)
	Productive	2.92 (1.15)	3.49 (1.10)
	Soc. comp.	3.39 (1.02)	3.87 (0.99)
Situation perceptions	Cooperative	2.80 (1.32)	2.85 (1.26)
	Competitive	1.27 (0.76)	1.35 (0.87)
	Approach	3.30 (1.44)	3.06 (1.45)
	Avoidance	2.05 (1.31)	1.90 (1.25)
	Free	3.44 (1.19)	3.64 (1.09)
	Constrained	2.46 (1.15)	2.29 (1.16)
	Pleasant	3.84 (1.12)	3.96 (1.06)
	Unpleasant	1.55 (0.95)	1.41 (0.82)

Note. All ratings were made on a scale from 1 to 5 (actual and possible range). Due to partial reports, number of situations varied from 337 to 344. N = 49.

Table 2

Between- and Within-Person Variance Components of Self- and Peer-Reports (Peer-Report Situations Only, N = 49, n = 344)

	Self		Peer
Rating	Between (%)	Within (%)	Between (%)	Within (%)
Sociable	7	93	5	95
Energetic	29	71	12	86
Dominant	35	65	26	74
Insecure	19	81	8	92
Nervous	22	78	21	79
Friendly	24	76	16	84
Imaginative	50	50	23	77
Productive	14	86	7	93
Responsible	31	69	25	75
Socially competent	21	79	12	88
Cooperative	22	78	8	92
Competitive	12	88	1	99
Approach	13	87	14	86
Avoidance	29	71	12	88
Free	14	86	9	91
Constrained	11	89	8	92
Pleasant	14	86	8	92
Unpleasant	18	82	1	99
Happy	37	63	17	83
Stressed	27	73	17	83
Tired	24	76	10	90

Note. Figures represent percentages of total variance, calculated from unconditional (null) multilevel random intercept models by (1) dividing the person variance with total variance and (2) dividing the residual variance with total variance.

Main Analyses

First, we set out to investigate convergent validity. To address the multilevel nature of the data, multilevel regression analyses were used for this purpose. We focused on the situations for which both a self-report and a peer-report was available (N = 49, n = 344);² that is, at this point, we dropped the data from situations for which there was no peer-report. Next, self-ratings were person-mean centered (Raudenbush & Bryk, 2002). Then, each peer-report was predicted with the corresponding self-report in a series of multilevel regressions including a random slope. The analyses were conducted in the R environment (v. 4.0.2, R Core Team, 2020) using the lme4 package (Bates et al., 2015). Random slope models for eight dimensions had singular fit, most likely due to a combination of very small slope variance and low N. These models were re-run as random-intercept only models (see Table 3). Standardized betas were calculated via “pseudo” method in the sjstats package (v. 0.18.1; Lüdecke, 2021) in R; this method standardizes the Level 1 coefficients according to within-cluster (here: within-participant) variance, and it is recommended for multilevel models (Hoffman, 2015, p. 342).³ Because many comparisons (21) were administered to the same data set, Bonferroni correction was used. The corrected p value was 0.05/21 = .002.

Table 3

Results of Multilevel Regression Models Predicting Each Peer-Report From the Corresponding Person-Mean Centered Self-Report (N = 49, n = 338–344)

Personality states	Intercept	B (SE)	β	95% CIs	p	Random intercept	Residual σ²	Slope σ²	R ²
Sociable	3.94	.44 (.05)	.43	[.33, .54]	.000007	.07	.68	.01	.18
Energetic	3.45	.33 (.07)	.30	[.19, .41]	.00004	.15	.84	.01	.08
Dominant	2.21	.30 (.07)	.23	[.13, .33]	.00002	.36	.93	–	.04
Insecure	1.67	.31 (.08)	.31	[.16, .46]	.0007	.08	.54	.08	.09
Nervous	1.77	.46 (.05)	.42	[.32, .51]	<.00000001	.24	.64	–	.14
Friendly	4.24	.22 (.07)	.22	.09, .35	.006	.13	.60	.04	.04
Imaginative	2.95	.15 (.08)	.12	[.00, .24]	.064	.25	.80	.04	.01
Productive	3.50	.45 (.05)	.42	[.33, .52]	<.00000001	.11	.89	–	.16
Responsible	3.62	.23 (.06)	.21	[.10, .31]	.0001	.32	.92	–	.03
Soc. comp.	3.91	.27 (.07)	.25	[.13, .37]	.0008	.12	.76	.03	.06
Situation features
Approach	3.09	.30 (.06)	.28	[.17, .39]	.00003	.33	1.64	.02	.07
Avoidance	1.89	.29 (.06)	.24	[.14, .36]	.000006	.20	1.28	–	.05
Competition	1.35	.31 (.11)	.23	[.07, .40]	.017	.02	.66	.10	.05
Cooperation	2.87	.44 (.06)	.39	[.29, .50]	.00000002	.15	1.19	.02	.14
Free	3.63	.19 (.06)	.19	[.07, .31]	.007	.10	1.04	.03	.03
Constrained	2.33	.31 (.08)	.28	[.14, .42]	.002	.12	1.06	.08	.07
Pleasant	3.97	.32 (.06)	.31	[.19, .43]	.00006	.10	.90	.03	.09
Unpleasant	1.42	.33 (.05)	.31	[.19, .43]	.000000003	.01	.60	–	.10
Feeling states
Happy	3.78	.28 (.06)	.26	[.16, .36]	.000001	.14	.60	–	.06
Stressed	2.11	.42 (.06)	.36	[.26, .45]	<.0000001	.21	.80	–	.10
Tired	2.45	.39 (.07)	.35	[.23, .46]	.000002	.13	.84	.03	.11

Note. R ²s are marginal pseudo-R ²s calculated with the MuMIn package (v. 1.43.17; Bartoń, 2020) in R (Marginal pseudo-R ²s were calculated using the MuMIn package [v. 1.43.17; Bartoń, 2020] and the function r.squaredGLMM, which calculates the statistic based on Nakagawa et al.’s [2017] recommendation. The formula is R_GLMM(m)² = (σ_f²)/(σ_f²+σ_α²+σ_ε²), where σ_f² is the variance of the fixed effects, σ_α² is the variance of the random effect(s), and σ_ε² is the observation level variance [derived via delta method]). A hyphen in the slope σ² column indicates that the random slope model failed to converge, and the model was run as random-intercept only model. The variables for which the self-report was a significant predictor per the adjusted p value are in boldface.

The results are presented in Table 3. As shown there, for all dimensions, the 95% confidence intervals excluded zero, and the adjusted p value was significant for 17 out of the 21 dimensions. Agreement was strongest for the personality states sociable, nervous, and productive, for situational cooperativeness, and for stress and fatigue (βs > .36). For the personality states friendly and imaginative, as well as for situational features competitive and free, self-peer agreement was rather weak (βs < .21) and not significant per the adjusted p value. Individual-level slopes for the dimensions for which they were available are presented in Supplemental Figures S1 to S13. Figure 1 presents a forest plot summarizing the convergent validity findings.

Figure 1

Self-Peer Agreement Effect Sizes (Standardized Betas) With 95% CIs.

Next, discriminant validity of the ESM reports was investigated via creating a multitrait-multimethod matrix (MTMM; Campbell & Fiske, 1959, see supplemental file) by computing repeated measures correlations between all ratings using the rmcorr package (Bakdash & Marusich, 2021) in R. Self-peer convergence correlations are highlighted in red. In addition, self–self correlations (heterotrait-monomethod correlations) higher than the relevant convergence correlation are highlighted in yellow, peer–peer correlations (also heterotrait-monomethod correlations) higher than the relevant convergence correlation are highlighted in pink, and self–peer-non-convergence correlations (heterotrait-heteromethod correlations) equal to or higher than |.30| are highlighted in turquoise.

Convergence correlations for most ratings were higher than the relevant heterotrait-heteromethod correlations, which is the first criterion for discriminant validity (Campbell & Fiske, 1959). The exceptions were happiness, energetic, friendly, responsible, social competence, and situational pleasantness. Several heterotrait-monomethod correlations were higher than all convergence correlations, which violates the second criterion for discriminant validity (Campbell & Fiske, 1959). However, there were strong conceptual overlaps between many of the ratings (e.g., tired and energetic; nervous and insecure). Therefore, we interpret the overall level of discriminant validity through the first criterion only (see Table 4).

Table 4

Summary of Validity Results

Rating	Convergent validity	Discriminant validity	Incremental validity
Personality state
Sociable	Good ++	Average+	Average+
Energetic	Good++	Poor–	Poor–
Dominant	Good++	Good++	Average+
Insecure	Good++	Poor –	Average+
Nervous	Good++	Average+	Good++
Friendly	Poor–	Poor–	Poor–
Imaginative	Poor–	Poor –	Poor–
Productive	Good++	Average+	Average+
Responsible	Average+	Poor–	Average+
Soc. comp.	Average+	Poor–	Poor–
Situation perception
Approach	Good++	Poor–	Poor–
Avoidance	Average+	Good++	Average+
Cooperation	Good++	Average+	Good++
Competition	Poor–	Good++	Average+
Free	Poor–	Poor–	Poor–
Constrained	Average+	Good++	Poor–
Pleasant	Good++	Poor–	Good++
Unpleasant	Good++	Average+	Good++
Feeling state
Happy	Average+	Poor–	Good++
Stressed	Good++	Average+	Good++
Tired	Good++	Good++	Good++

Note. Convergent validity was considered “Good” if the convergent validity beta was ≥.30 and the p value was <.002, “Average” if beta was between .20 and .29 and the p value was ≤.002, and “Poor” if these criteria were not fulfilled. Discriminant validity was considered “Good” if the corresponding convergent correlation in the matrix was higher than all relevant heterotrait-heteromethod (hthm) correlations, and the relevant hthm correlations were <|.30|; “Average” if the convergence correlation was higher than all relevant hthm correlations but there were relevant hthm correlations higher than or equal to |.30|, and “Poor” if the convergence correlation was not higher than all relevant hthm correlations. Incremental validity was considered “Good” if the corresponding rating was the only significant predictor in the multiple multilevel regression, the SEM with the target coefficient free had better fit than the fully constrained SEM, and the target coefficient was larger in the free SEM than in the constricted SEM. Incremental validity was considered “Average” if the corresponding rating predicted the peer-report but was not the only significant predictor, and if the free SEM had better fit than the fully constrained SEM. Incremental validity was considered “Poor” if the corresponding rating did not significantly predict the peer-report and/or if the free SEM did not have better fit than the fully restricted SEM. “Good” validity ratings are in boldfaced and marked with++, “Average” ratings are in regular font and marked with +, and “Poor” ratings are in italics and marked with –.

Finally, we investigated incremental validity via multiple multilevel regression analyses and multilevel structural equation models. Details of these analyses can be found in the Online Appendix.

Discussion

Self-other agreement was found for several experience sampling measures in the present study—for personality states, feelings, and situation perceptions, and in convergent, discriminant and incremental validity analyses. The results suggest that many ESM self-ratings reflect, at least to an extent, actual personality states, feelings, and perceived situation characteristics that are visible to outside observers. This, in turn, indicates that quite a few ESM self-ratings have some level of validity.

The present results are important given that ESM self-ratings are widely used but have not been extensively validated. To our knowledge, the present study is the first to investigate self-other agreement for personality states, situation ratings, and feelings using informants embedded into the situations, and the first to provide evidence of discriminant validity of ESM measures. Thus, it complements previous ESM validity studies that investigated convergent validity of ESM ratings using observer-ratings from laboratory situations (Fleeson & Law, 2015), observer-ratings of audio recordings (e.g., Sun & Vazire, 2019), or other-ratings of personality states in a teaching situation (Abrahams et al., 2021), as the criteria.

Strongest self-other agreement was found for the personality states sociable, nervous, and productive; cooperativeness of the situation; and tiredness and stress. All these dimensions also showed good or moderate discriminant validity. These results are well in line with previous studies on personality judgment; out of all traits, extraversion and conscientiousness are easiest to judge by observers and informants (e.g., Carney et al., 2007; Connelly & Ones, 2010), and here, personality states related to these traits (plus nervousness) showed highest self-other agreement. This increased our confidence that self-other agreement observed was based on target participants’ actual personality states.

The high self-peer agreement for nervousness and stress was somewhat surprising, as trait neuroticism is typically difficult to judge (e.g., Carney et al., 2007). However, it seems plausible that momentary stress and nervousness are somewhat visible, even if dispositional nervousness or stress-proneness is not. In addition, informants may have had information about whether the target person was in a stressful situation. Another surprising finding was the low agreement for friendliness. Intuitively, it would seem easy to judge others’ friendliness. However, we believe that this was due to peers rating targets as very high in friendliness on average (Table 1). Self-reports of friendliness, by contrast, were more normally distributed. Thus, a possible reason for this result is that peer-ratings of momentary friendliness are based on target person’s behavior, typically perceived as very friendly, whereas self-ratings of friendliness may be additionally based on target’s internal feelings and thoughts, which may have been less friendly on occasion.

The results also provided novel information about self-other agreement on situation perceptions. The results suggested at least moderate agreement for cooperativeness and valence of the situation, and weak-to-moderate agreement for the other situation perceptions, except for freedom. All situation ratings except pleasantness showed moderate discriminant validity. Thus, though it might seem plausible that it is easier for outsiders to judge personality states (and, perhaps feelings) than situation perceptions, we found no evidence for that: convergent and discriminant validity was found for both personality states and situations perceptions. Perhaps, the peer-raters often shared the targets’ situational context or perhaps peers can evaluate others’ situation perceptions about as well as they can evaluate personality states. It would be interesting to see self-peer agreement results for more established situation perceptions measures, such as the DIAMONDS (Rauthmann et al., 2014) or CAPTION (Parrigon et al., 2017) measures.

Limitations

The most prominent limitation of the present study was the very small and almost all-female sample. This makes it necessary to view the results as very preliminary. The reliability of the findings is uncertain until replicated in a larger and more representative sample. It is also quite possible that small self-other agreement associations were left undetected due to inadequate power. Furthermore, differences between peer-report and self-report only situations (Tables A1 and A2 in the Online Appendix) may have contributed to limitations of the study. Unsurprisingly, participants reported higher levels of sociability and friendliness in peer-report situations than in no-peer-report situations. Furthermore, situations for which peer-reports were available seemed to be more enjoyable to participants than situations with self-reports only. Thus, our results may not reflect self-peer agreement in less social and less enjoyable situations.

The lack of explicit instructions for participants regarding asking a peer may have affected the results as well. For instance, some participants may have been more persistent in attempting to recruit peer-raters, and some participants’ daily routines may have allowed more opportunities for acquiring peer-raters. Thus, the results may reflect self-other agreement only among certain types of individuals. We ran some control analyses to explore this possibility and found that participants’ age, personality traits,¹ self-reported number of close friends,¹ and loneliness¹ were unrelated to the number of peer-reports provided (Table A3 in the Online Appendix). Participants who were in a relationship provided significantly more peer-reports on average than single participants (Table A4 in the Online Appendix). However, given the null relations with the other variables (Table A3 in the Online Appendix), we believe this difference reflects access to a peer-rater rather than relevant psychological differences between single and non-single participants.

Using mostly familiar others as peer-reporters is a methodological limitation as socially desirable responding may affect the reports made by familiar others (i.e., they may have a positive overall view of the target person, or they may wish to present the target person in a positive light; e.g., Leising et al., 2010). However, previous research suggests that socially desirable responding does not have a substantial influence on self-other agreement in personality trait ratings (Lönnqvist et al., 2007; Roth & Altmann, 2019). Second, it seems plausible that momentary personality state ratings by peers would be less susceptible to social desirability concerns than peer-ratings of personality traits because a rating of someone’s momentary state does not say anything about the target person’s stable characteristics, whereas a rating of someone’s personality does. Nevertheless, in group situations, our participants could choose their most preferred rater from the group, and they could respond “no” to the peer-rater availability question if they did not want an available peer to respond. Thus, it is possible that the peer-ratings presented here are typically from peers that participants saw as desirable raters; it remains unclear whether there is self-peer agreement between targets and less desirable peer-raters. Finally, using an ad hoc measure of situation perceptions instead of an established measure made it difficult to connect the present results to existing ones.

Conclusion

Self-other agreement is a central topic in personality and social psychology (e.g., Funder, 1995; Funder & West, 1993; Watson et al., 2000), and it has long been considered a well-founded way of validating self-report instruments (e.g., McCrae et al., 2004; Watson & Clark, 1991; Woodruffe, 1985). The present study provided evidence that many commonly used experience sampling self-ratings also show self-other agreement and, thus, some convergent validity. To sum up, to an extent, momentary ESM self-reports seem to capture target participants’ actual personality states, feelings, and situation perceptions.

Footnotes

Handling Editor: Marlone D. Henderson

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Academy of Finland (grant numbers 266076 and 309537).

ORCID iDs

Sointu Leikas

Jan-Erik Lönnqvist

Supplemental Material

The supplemental material is available in the online version of the article.

Notes

Author Biographies

Sointu Leikas was a post-doctoral researcher at the Swedish School of Social Science, University of Helsinki. She is now a university researcher at the Helsinki Institute for Social Sciences and Humanities, University of Helsinki.

Jan-Erik Lönnqvist is a professor of social psychology at the Swedish School of Social Science, University of Helsinki.

References

Abrahams

Rauthmann

J. F.

Fruyt

F. D.

(2021). Person-situation dynamics in educational contexts: A self-and other-rated experience sampling study of teachers’ states, traits, and situations. European Journal of Personality, 35, 598–622.

Bakdash

J. Z.

Marusich

L. R.

(2021). rmcorr: Repeated measures correlation (R package version 044). https://CRAN.R-project.org/package=rmcorr

Bartoń

(2020). MuMIn: Multi-model inference (R package version 14317). https://CRAN.R-project.org/package=MuMIn

Bates

Maechler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Bleidorn

Peters

A. L.

(2011). A multilevel multitrait–multimethod analysis of self- and peer-reported daily affective experiences. European Journal of Personality, 25(5), 398–408. https://doi.org/10.1002/per.804

Breil

S. M.

Geukes

Wilson

R. E.

Nestler

Vazire

Back

M. D.

Donnellan

M. B.

(2019). Zooming into real-life extraversion: How personality and situation shape sociability in social interactions. Collabra: Psychology, 5(1), 7. https://doi.org/10.1525/collabra.170

Carney

D. R.

Colvin

C. R.

Hall

J. A.

(2007). A thin slice perspective on the accuracy of first impressions. Journal of Research in Personality, 41(5), 1054–1072. https://doi.org/10.1016/j.jrp.2007.01.004

Connelly

B. S.

Ones

D. S.

(2010). An other perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136(6), 1092.

Connolly

S. L.

Alloy

L. B.

(2017). Rumination interacts with life stress to predict depressive symptoms: An ecological momentary assessment study. Behaviour Research and Therapy, 97, 86–95. https://doi.org/10.1016/j.brat.2017.07.006

10.

Cooper

W. H.

Withey

M. J.

(2009). The strong situation hypothesis. Personality and Social Psychology Review, 13(1), 62–72. https://doi.org/10.1177/1088868308329378

11.

Deutsch

(1949). A theory of co-operation and competition. Human Relations, 2(2), 129–152.

12.

Deutsch

(2011). Cooperation and competition. In Christie

D. J.

(Ed.), Conflict, interdependence, and justice (pp. 23–40). Springer.

13.

Elliot

A. J.

(1999). Approach and avoidance motivation and achievement goals. Educational Psychologist, 34(3), 169–189. https://doi.org/10.1207/s15326985ep3403_3

14.

Fleeson

(2001). Toward a structure- and process-integrated view of personality: Traits as density distributions of states. Journal of Personality and Social Psychology, 80(6), 1011–1027. https://doi.org/10.1037/0022-3514.80.6.1011

15.

Fleeson

Gallagher

(2009). The implications of Big Five standing for the distribution of trait manifestation in behavior: Fifteen experience-sampling studies and a meta-analysis. Journal of Personality and Social Psychology, 97(6), 1097–1114. https://doi.org/10.1037/a0016786

16.

Fleeson

Jayawickreme

(2015). Whole trait theory. Journal of Research in Personality, 56, 82–92. https://doi.org/10.1016/j.jrp.2014.10.009

17.

Fleeson

Law

M. K.

(2015). Trait enactments as density distributions: The role of actors, situations, and observers in explaining stability and variability. Journal of Personality and Social Psychology, 109(6), 1090–1104. https://doi.org/10.1037/a0039517

18.

Fleeson

Malanos

A. B.

Achille

N. M.

(2002). An intraindividual process approach to the relationship between extraversion and positive affect: Is acting extraverted as “good” as being extraverted? Journal of Personality and Social Psychology, 83(6), 1409–1422. https://doi.org/10.1037/0022-3514.83.6.1409

19.

Funder

D. C.

(1995). On the accuracy of personality judgment: A realistic approach. Psychological Review, 102(4), 652–670. https://doi.org/10.1037/0033-295X.102.4.652

20.

Funder

D. C.

West

S. G.

(1993). Consensus, self-other agreement, and accuracy in personality judgment: An introduction. Journal of Personality, 61(4), 457–476. https://doi.org/10.1111/j.1467-6494.1993.tb00778.x

21.

Ghassemi

Wolf

B. M.

Bettschart

Kreibich

Herrmann

Brandstätter

(2021). The dynamics of doubt: Short-term fluctuations and predictors of doubts in personal goal pursuit. Motivation Science, 7(2), 153–164. https://doi.org/10.1037/mot0000210

22.

Green

MacLeod

C. J.

(2016). Simr: An R package for power analysis of generalised linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498. https://doi.org/10.1111/2041-210X.12504. https://CRAN.R-project.org/package=simr

23.

Grühn

Rebucal

Diehl

Lumley

Labouvie-Vief

(2008). Empathy across the adult lifespan: Longitudinal and experience-sampling findings. Emotion, 8(6), 753–765. https://doi.org/10.1037/a0014123

24.

Grzywacz

J. G.

Almeida

D. M.

Neupert

S. D.

Ettner

S. L.

(2004). Socioeconomic status and health: A micro-level analysis o exposure and vulnerability to daily stressors. Journal of Health and Social Behavior, 45(1), 1–16. https://doi.org/10.1177/002214650404500101

25.

Hamaker

E. L.

Wichers

(2017). No time like the present: Discovering the hidden dynamics in intensive longitudinal data. Current Directions in Psychological Science, 26(1), 10–15. https://doi.org/10.1177/0963721416666518

26.

Heller

Komar

Lee

W. B.

(2007). The dynamics of personality states, goals, and well-being. Personality and Social Psychology Bulletin, 33(6), 898–910. https://doi.org/10.1177/0146167207301010

27.

Hoffman

(2015). Longitudinal analysis: Modeling within-person fluctuation and change. Routledge.

28.

Horstmann

K. T.

Rauthmann

J. F.

Sherman

R. A.

Ziegler

(2020). Unveiling an exclusive link: Predicting behavior with personality, situation perception, and affect in a preregistered experience sampling study. Journal of Personality and Social Psychology, 120, 1317–1343. https://doi.org/10.1037/pspp0000357

29.

Ilies

Dimotakis

Watson

(2010). Mood, blood pressure, and heart rate at work: An experience-sampling study. Journal of Occupational Health Psychology, 15(2), 120–130. https://doi.org/10.1037/a0018350

30.

Impett

E. A.

Gordon

A. M.

Kogan

Oveis

Gable

S. L.

Keltner

(2010). Moving toward more perfect unions: Daily and long-term consequences of approach and avoidance goals in romantic relationships. Journal of Personality and Social Psychology, 99(6), 948–963. https://doi.org/10.1037/a0020271

31.

Kazak

A. E.

(2018). Editorial: Journal article reporting standards. American Psychologist, 73 (1), 1–2. https://doi.org/10.1037/amp0000263

32.

Leikas

Kuula

Pesonen

A. K.

(2021). Does counter-habitual behavior carry psychological costs?. Journal of Research in Personality, 92, 104077.

33.

Leikas

(2020). Sociable behavior is related to later fatigue: Moment-to-moment patterns of behavior and tiredness. Heliyon, 6(5), Article e04033.

34.

Leikas

Ilmarinen

V. J.

(2017). Happy now, tired later? Extraverted and conscientious behavior are related to immediate mood gains, but to later fatigue. Journal of Personality, 85(5), 603–615. https://doi.org/10.1111/jopy.12264

35.

Leising

Erbs

Fritz

(2010). The letter of recommendation effect in informant ratings of personality. Journal of Personality and Social Psychology, 98(4), 668–682. https://doi.org/10.1037/a0018771

36.

Lönnqvist

J. E.

Paunonen

Tuulio

Henriksson

Verkasalo

(2007). Substance and style in socially desirable responding. Journal of Personality, 75(2), 291–322. https://doi.org/10.1111/j.1467-6494.2006.00440.x

37.

Lüdecke

(2021). _sjstats: Statistical functions for regression models (Version 0.18.1). https://doi.org/10.5281/zenodo.1284472; https://CRAN.R-project.org/package=sjstats

38.

McCrae

R. R.

Costa

P. T.

Jr Martin

T. A.

Oryol

V. E.

Rukavishnikov

A. A.

Senin

I. G.

Hřebíčkovád

Urbánek

(2004). Consensual validation of personality traits across cultures. Journal of Research in Personality, 38(2), 179–201. https://doi.org/10.1016/S0092-6566(03)00056-4

39.

Meyer

R. D.

Dalal

R. S.

Hermida

(2010). A review and synthesis of situational strength in the organizational sciences. Journal of Management, 36(1), 121–140. https://doi.org/10.1177/0149206309349309

40.

Mischel

(1977). On the future ofpersonality measurement. American Psychologist, 32(4), 246–254. https://doi.org/10.1037/0003-066X.32.4.246

41.

Nakagawa

Johnson

P. C.

Schielzeth

(2017). The coefficient of determination R 2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface, 14(134), Article 0213. https://doi.org/10.1098/rsif.2017.0213

42.

Parrigon

Woo

S. E.

Tay

Wang

(2017). CAPTION-ing the situation: A lexically-derived taxonomy of psychological situation characteristics. Journal of Personality and Social Psychology, 112(4), 642–681. https://doi.org/10.1037/pspp0000111

43.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). SAGE.

44.

Rauthmann

J. F.

Gallardo-Pujol

Guillaume

E. M.

Todd

Nave

C. S.

Sherman

R. A.

Ziegler

Jones

A. B.

Funder

D. C.

(2014). The situational eight DIAMONDS: A taxonomy of major dimensions of situation characteristics. Journal of Personality and Social Psychology, 107(4), 677–718. https://doi.org/10.1037/a0037250

45.

Rauthmann

J. F.

Sherman

R. A.

Funder

D. C.

(2015). Principles of situation research: Towards a better understanding of psychological situations. European Journal of Personality, 29(3), 363–381. https://doi.org/10.1002/per.1994

46.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

47.

Reis

H. T.

(2018). Why bottom-up taxonomies are unlikely to satisfy the quest for a definitive taxonomy of situations. Journal of Personality and Social Psychology, 114(3), 489–492. https://doi.org/10.1037/pspp0000158

48.

Roth

Altmann

(2019). A multi-informant study of the influence of targets’ and perceivers’ social desirability on self-other agreement in ratings of the HEXACO personality dimensions. Journal of Research in Personality, 78, 138–147. https://doi.org/10.1016/j.jrp.2018.11.008

49.

Stanne

M. B.

Johnson

D. W.

Johnson

R. T.

(1999). Does competition enhance or inhibit motor performance: A meta-analysis. Psychological Bulletin, 125(1), 133–154. https://doi.org/10.1037/0033-2909.125.1.133

50.

Stone

A. A.

Shiffman

(2002). Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine, 24(3), 236–243. https://doi.org/10.1207/S15324796ABM2403_09

51.

Sun

Harris

Vazire

(2020a). Is well-being associated with the quantity and quality of social interactions? Journal of Personality and Social Psychology, 119(6), 1478–1496. https://doi.org/10.1037/pspp0000272

52.

Sun

Schwartz

H. A.

Son

Kern

M. L.

Vazire

(2020b). The language of well-being: Tracking fluctuations in emotion experience through everyday speech. Journal of Personality and Social Psychology, 118(2), 364–387. https://doi.org/10.1037/pspp0000244

53.

Sun

Vazire

(2019). Do people know what they’re like in the moment? Psychological Science, 30(3), 405–414. https://doi.org/10.1177/0956797618818476

54.

Watson

Clark

L. A.

(1991). Self- versus peer-ratings of specific emotional traits: Evidence of convergent and discriminant validity. Journal of Personality and Social Psychology, 60(6), 927–940. https://doi.org/10.1037/0022-3514.60.6.927

55.

Watson

Hubbard

Wiese

(2000). Self–other agreement in personality and affectivity: The role of acquaintanceship, trait visibility, and assumed similarity. Journal of Personality and Social Psychology, 78(3), 546–558. https://doi.org/10.1037/0022-3514.78.3.546

56.

Woodruffe

(1985). Consensual validation of personality traits: Additional evidence and individual differences. Journal of Personality and Social Psychology, 48(5), 1240–1252. https://doi.org/10.1037/0022-3514.48.5.1240

57.

Wrzus

Luong

Wagner

G. G.

Riediger

(2015). Can’t get it out of my head: Age differences in affective responsiveness vary with preoccupation and elapsed time after daily hassles. Emotion, 15(2), 257–269. https://doi.org/10.1037/emo0000019