Abstract
Neuroticism is an important predictor of well-being that is conceptualized by high levels of mean negative affect and negative affect variability. However, research has shown that negative affect variability only explained limited additional variance in neuroticism when the confound with mean negative affect was accounted for using a modified version of the standard deviation (SD), the relative standard deviation (RSD). Here, we (a) examined the suitability of the RSD as a variability measure, (b) introduced the number of negative affect episodes as an alternative measure of negative affect variability, and (c) investigated the relationship between neuroticism and negative affect variability, accounting for measurement error. Re-analyzing three experience sampling datasets (N = 430 participants), we found several issues with the RSD, which limits its use as a negative affect variability measure, and which were not found for the number of negative affect episodes measure. Moreover, only the SD and the number of negative affect episodes explained substantial variance in neuroticism above mean negative affect. Thus, neuroticism was associated with experiencing negative affect more strongly and more often in daily life, when measurement error was accounted for, which demonstrates the importance to model reliability and to correct accordingly.
Neuroticism is a personality trait that is characterized by high levels of and high variability in negative emotions such as anger, anxiety, sadness, guilt, or worry (Kamen et al., 2010; Thompson, 2008). It is an important predictor of mental and physical health. For example, it is associated with depression, anxiety, and substance use disorders (Kotov et al., 2010) as well as with greater mortality from cardiovascular disease, diabetes, and cancer (Lahey, 2009). Moreover, neuroticism may come with a heavy economic burden resulting from increased mental and public health care use and costs (Cuijpers et al., 2010). Thus, understanding the underlying affective processes in neuroticism is of great personal and societal importance.
Theoretical accounts of neuroticism define it by a stable and by a dynamic component (Kamen et al., 2010), such that highly neurotic individuals experience more negative affect in general, but also experience more unstable or prolonged episodes of negative affect. These two components are reflected in questionnaires of neuroticism that include scales measuring the general negative responses to fear, threat, or loss (e.g., the subscales anxiety and depression in the NEO-PI-3, McCrae et al., 2005) and scales reflecting the variability in these responses (e.g., the irritability subscale in the NEO-PI-3). To capture the dynamic component, research has often used the standard deviation (SD) of mean (M) negative affect as a measure of negative affect variability, which has shown small-to-medium sized positive associations with neuroticism (Houben et al., 2015). Thus, individuals with higher levels in neuroticism reported both stronger and more variable negative affect in their everyday lives.
However, recent research has shown that these associations were very close to zero when the confound with mean negative affect was accounted for using a modified version of the SD (Kalokerinos et al., 2020). This finding challenges the notion that neuroticism can also be characterized by high variability in negative affect in conjunction with the stable component. In the present research, we expanded on this research by examining (a) whether the modified SD measure is well suited to capture variability in negative affect, (b) whether an alternative affect dynamic measure is less confounded with mean negative affect, and (c) whether the relationship between neuroticism and variability in negative affect increases substantially when accounting for measurement error and other biases. 1
The (relative) standard deviation as a measure of affect variability
It has been shown that M and SD are non-linearly dependent due to the restricted bounds of the scales (Mestdagh et al., 2018), which makes it necessary to control for this confound when comparing the independent contributions of mean negative affect and negative affect variability as captured via the SD. As an alternative to including both M and SD as predictors, the relative SD (RSD) has been proposed, which standardizes the SD at the maximum SD of a given response (Mestdagh et al., 2018). Using only the RSD to predict neuroticism, Kalokerinos et al. (2020) re-analyzed 11 experience sampling method (ESM) datasets and found that the relationship between neuroticism and negative affect variability was no longer significant and very close to zero.
However, it has been demonstrated that the RSD also comes with some problems (Wenzel & Kubiak, 2020). First, an empirical test showed that although the mean correlation between mean negative affect and the RSD was not significant and only small in size (r = −.06), there was a large heterogeneity between the individual correlations within the different datasets (I2 = 88.7%). Moreover, 4 of the 11 datasets showed large and significant negative associations between mean negative affect and the RSD (Wenzel & Kubiak, 2020). This indicates that standardizing the SD at the maximum SD did not only fail to consistently remove the confound with the mean but also led to a significant confound in the other direction in approximately one third of the datasets. Second, Wenzel and Kubiak (2020) also showed that although the heterogeneity in the mean correlation between neuroticism and RSD was limited (I2 = 18.30%), it was fully explained by differences in the association between mean negative affect and the RSD. Thus, it is unclear how well suited the RSD is to correct for the SD’s confound with the mean. Consequently, we wanted to examine in our first research question of the present research whether the RSD offers an appropriate way of correcting for the confound between SD and M.
The number of negative affect episodes as an alternative measure of affect variability
In the second research question, we wanted to introduce an alternative measure of capturing the dynamic component of neuroticism, that is, the number of negative affect episodes, which is easier to interpret than the SD and less confounded with the mean. To best illustrate the course of affective experiences in daily life, affective experiences are often understood as time-series data, where the individual values from observation to observation of a given affective state fluctuate around the individual’s mean of that affective state. The SD, then, summarizes these deviations from the mean, with stronger deviations resulting in larger SDs. However, we argue that it is not the mean from which deviations are important but rather from the point at which individuals do not experience negative affect: Individuals experience episodes of negative affect and differ not only in how strongly they experience these episodes (M) but also in how often and how long they experience these episodes.
Figure 1 illustrates two time series from two participants from one of our datasets, in which anger was assessed on a scale ranging from 0 (not angry at all) to 10 (very angry). We argue that any deviation from 0 is emotionally relevant. Consequently, we defined an emotionally relevant episode via consecutive values above zero that ends with the first observation where anger is reported to be 0. The mean, then, reflects the total deviation from 0 or, figuratively spoken, the area of the “mountains.” Mean anger can be best understood as how strongly an individual experiences anger over a period of time. However, we can also derive other measures to describe the shape of the time series. The most basic measure is the number of “mountains,” that is how many negative affect episodes an individual experiences over the course of a period. For example, the participant with the identifier 128 experienced an episode of anger in observations 5 to 7. Thus, the number of negative affect episodes reflects how strongly episodes fluctuate over time and, thus, can be understood as a measure of negative affect variability. Importantly, this measure is not conceptually dependent on mean negative affect: The individuals in Figure 1 show very similar levels of mean negative affect (1.60 vs. 1.53) but differ greatly regarding the number of the episodes: The participant with the identifier 128 reported 9 anger episodes, whereas the participant with the identifier 38 reported only 3 episodes. Given that the number of negative affect episodes depends on the total number of observations, that is, a participant who completes more observations can report more episodes, it is best to standardize it on the total number of observations. This shows that participant 128 experienced a negative affect episode on 22.5% of the time, whereas participant 38 experienced one on only 7.9% of the time. However, the duration of negative affect episodes was more than twice as large on average for participant 38 compared to 128, M = 4.33 versus M = 1.88. Time-series data from two participants, participant 128 (left panel) and participant 38 (right panel) from Dataset 3. Note. Time reflects the consecutive number of observations. A negative affect episode is defined by the deviation from 0, which corresponds to the participant not experiencing any anger. The episode ends when the participants report an anger value of 0 again.
By introducing the number of negative affect episodes as an alternative measure of negative affect variability that should be less confounded with the mean, we wanted to investigate in our second research question how this measure is associated with negative affect variability as captured via the SD and RSD as well as with neuroticism.
Measurement issues in research on affect dynamics
Another issue that might limit the size of the associations between neuroticism and negative affect variability could be that prior research often used person-aggregated manifest variables to study the relationships. However, this approach is associated with several measurement issues. First, it does not allow to account for (differences in) reliability. Reliability directly limits the maximal size of an observed correlation and, thus, using variables that contain greater measurement error reduces the dataset’s ability to uncover the true association between an affect measure and neuroticism. For example, the observed correlation is the product of the “true” correlation multiplied by the square root of the product of the reliability of both variables (Spearman, 1904). Thus, a “true” correlation of .50 is reduced to .35 if both variables’ reliability is .70, which is widely agreed on as an acceptable level of measurement error (Kline, 2000).
Importantly, prior research has reported lower reliabilities of negative affect variability as captured by the SD compared to mean negative affect. For negative affect, it has been shown that between-person reliability is very high, with RKF = .99 on average across 11 ESM datasets (Kalokerinos et al., 2020) or ω = .94 on average across five ESM datasets (Wenzel et al., 2021). However, the reporting and investigation of the psychometric properties of affect dynamic measures such as negative affect variability are inconsistent across studies. Whereas it is quite common to report reliabilities of mean affect, reliabilities for negative affect variability are often not reported. The exception is research with a dedicated focus on estimating SD reliability, which found reliability estimates ranging between .76 and .83 (Eid & Diener, 1999; Mejía et al., 2014). Moreover, a simulation study showed that reliabilities for mean negative affect are generally larger than for SD: Whereas a reliability of .80 for mean negative affect was achieved after 35 observations, 90 observations per individual were needed on average to obtain a reliability of .80 for SD (Estabrook et al., 2012). Given that only three out of the eleven datasets used by Kalokerinos et al. (2020) had a higher average number than 90 observations, correcting for the (un)reliabilities of mean negative affect and SD is important when comparing the contribution of both measures in explaining the variance in neuroticism. This is particularly important given that prior research has shown that the reliability of the affect measurement scale also impacts the reliability of affect dynamic measures (Du & Wang, 2018).
In the present research, we propose a novel approach of assessing the reliability of negative affect variability and affect dynamic measures in general, that also allows for correcting for their reliability. For this approach, we used dynamic structural equation modeling (DSEM) in Mplus (Asparouhov et al., 2018), which combines multilevel structural equation modeling with time-series features using Bayesian estimation (Hamaker et al., 2018). In this approach, negative affective states load onto a latent negative affect factor, whereas the SDs of each negative affective state are used as indicators for the latent negative affect variability factor. Then, both latent factors are used to predict neuroticism.
This approach addresses several measurement issues and accounts for differences in reliabilities. First, it allows to simultaneously predict multiple outcomes in a multilevel time series (Hamaker et al., 2018). In our case, it allows to derive the SD of each affective state from the residual of the person intercept of the respective negative affective state, to use the SDs as indicators for the latent factor, and to predict neuroticism, all in one model. Such models have been called mixed-effects location scale models (e.g., Blozis et al., 2020) and allow the simultaneous analysis of stable (i.e., location, here M) and dynamic (i.e., scale, here SD) parameters in one model. Research utilizing such models has shown that neuroticism was significantly related to negative affect variability (Geukes et al., 2017), although neuroticism predicted both M and SD and, thus, the estimates were not controlled for the confound with the M. To do so, one would need both M and SD to predict neuroticism, although conceptually negative affect and its variability should be predicted by neuroticism.
A second advantage of this approach is that by computing the dynamic measure for each affective state and not for mean negative affect, the reliability can be estimated in terms of internal consistency, which is the common method of assessing the reliability of affect scales. Third, using the mean of manifest variables disregards heterogeneous item-construct relationships: Some negative affective states, such as anger, might be experienced more variably than others, such as worry, which cannot be accounted for when computing the mean across negative affect. Fourth, prior research first computed the SD and then used it as a predictor, which treats the SD as a perfectly reliably assessed observed variable in the prediction of neuroticism. However, since reliability is affected by the number of observations a measure is based on, disregarding this information yields underestimated standard errors and inflated Type-I errors (Liu et al., 2021). DSEM allows computing the SD as a latent variable, where the number of observations is taken into account, thereby leading to less biased estimates of person-level constructs (Lüdtke et al., 2008). Thus, in the second research question, we wanted to examine whether using a DSEM approach with latent variables to account for measurement error and other biases increased the relationship between neuroticism, mean negative affect, and negative affect variability (as captured by the SD, RSD, and the number of negative affect episodes).
Present research
Evidence regarding the role of negative affect variability underlying neuroticism has produced mixed results in the past (Kalokerinos et al., 2020; Wenzel & Kubiak, 2020). To advance this line of research, we (a) compared the SD and RSD in the prediction of neuroticism, (b) introduced and tested an alternative variability measure that should be less confounded with mean negative affect, and (c) examined whether the relationship between neuroticism and negative affect variability (as captured by the SD, RSD, and the number of negative affect episodes) increased substantially when their reliabilities were accounted for. To that end, we re-analyzed three ESM datasets, which samples were originally recruited to test other research questions, and computed (dynamic) structural equation models to examine the reliability-corrected relationships between neuroticism, mean negative affect, and affect dynamic measures. None of the research questions or the respective analyses were pre-registered and, thus, the present research should be deemed explorative in nature.
Method
Overview of the three datasets.
Note. ESM, experience sampling method.
a The scales were divided by 10 and rounded down in order to offset the imprecision of the slider on a smartphone, where participant often accidentally selected values 1 or 2 when they wanted to indicate the absence of negative emotion, and in order to ease calculation of the number of negative affect episodes.
Participants
Dataset 1
To achieve a power of 95% for the research questions of the parent study, 137 undergraduate students were recruited using a range of recruiting options (flyers, mailing lists, social networks, bulletins, and direct approaches). 11 participants dropped out throughout the course of the study and one participant was excluded due to completing less than 33% of the signals, leaving a final sample of N = 125 participants.
Dataset 2
The parent study took a pragmatic approach and did not plan the statistical power a priori. It was pre-registered that the data collection would stop once 90 dyads were enrolled in the study. Due to great interest in the study, this number was slightly exceeded by enrolling 92 dyads. Using bulletins, undergraduate students were recruited together with their romantic partners, with both receiving either partial course credit or 50€ (approx. US$ 55) as compensation. One dyad dropped out of the study after 1 day. Again, we excluded participants with less than 33% completed ambulatory signals (n = 7). The remaining sample size was n = 175 participants. Dataset 2 was collected to examine the relationship between mindfulness and static and dynamic affect measures and to study dynamic emotion regulation within dyads.
Dataset 3
The dataset was taken from an ongoing longitudinal study consisting of 135 healthy participants (February 2021), which was collected to examine the relationship between self-compassion and static and dynamic affect measures. Using flyers and personal letters, participants were recruited from two large longitudinal studies, in which individuals from the general population could participate who were at least 18 years old and spoke German fluently. The final compensation for participation was staggered, consisting of a basic amount and several bonuses for high adherence rates, yielding in a maximum possible amount of 95€ (app. US$ 113). Daily e-mail reminders were sent out to inform participants of their current signal compliance and the expected compensation. All participants with fewer than 33% of the signals (n = 5) were excluded, which resulted in a final sample size of n = 130 participants.
Sensitivity power analysis
Given that we re-analyzed existing datasets, these datasets were powered for other research questions. To estimate the post-hoc power regarding the relationship between affect measures and neuroticism, we used the metapower package in R (Griffin, 2021). As parameters, we entered a study size of N = 430, k = 3 studies, and a significance criterion of p = .005. Assuming a heterogeneity of either 0% or 50% yielded power estimates of 65.3% and 46.7% for a small effect of r = .10, 99.5% and 93.3% for a medium effect of r = .20, and 100% and 99.8% for a large effect of r = .30.
Procedure
Dataset 1
The parent study consisted of an ecological momentary assessment study with weekly laboratory sessions. Participants provided their written informed consent and completed several trait questionnaires in the first session, which are not relevant to the present study (a complete overview regarding the design, procedure, and material can be found in the study protocol; Rowland et al., 2016). Starting the next day and for 40 subsequent days, participants received six randomly distributed signals per day (M interval = 103.4 min, SD = 34.3 min) between 10 am and 8 pm. These signals served as prompts to complete items on their current affect, recent event intensity, and how they dealt with these events. In weekly laboratory sessions, all participants completed several questionnaires; participants in the mindfulness training condition also performed a computer-based guided-breathing meditation.
Dataset 2
Dataset 2’s procedure was similar to that of Dataset 1, but with a shorter, more intense ecological momentary assessment portion, which lasted 7 days. Participants received 12 signals in a 12-hr time frame and could choose between one of three starting points (8, 9, and 10 am). To ensure that couples received the signals at the same time, we generated one set of signal times randomly for each day at the start of the data collection, which was the same for each participant and with the condition that each signal had to be at least 30 minutes apart (M interval = 58.3 min, SD = 20.6 min). All participants received the same set of randomized signals, which can be found at OSF (https://osf.io/957ew/?view_only=b8b429c907144e4592464ea74c2fb3c8).
Dataset 3
The parent study utilized a longitudinal design that included up to four assessment points for each participant which were separated by 6 months. Each assessment point followed the same procedure: Participants gave written consent, received a study smartphone and instructions on how to use them in a first lab session. The day after this session, participants completed a 1-week experience sampling using movisensXS, with 6 semi-random observations per day. Participants could choose a starting point (8, 9, and 10 am) and signals were distributed within a 12-hour window, with the condition that two signals were at least 60 minutes apart from each other (M interval = 111.5 min, SD = 44.4 min). At each signal, participants completed questions regarding their current momentary affective states. After the 1-week experience sample, participants either returned to the lab where they provided feedback in a semi-structured interview and returned the study smartphones or used their phones for the final session.
Measures
Between-person means, standard deviations, between-person reliabilities, and meta-analytically derived mean zero-order associations.
Note. ω = McDonald’s ω, reflecting between-person reliability. Estimates above the diagonal are based on the manifest variables, estimates below the diagonal on the latent variables. Estimates in bold are significant at p < .005.
a Significant heterogeneity at p Q < .05.
Analytic approach
The data were prepared in Stata 16 (College Station, TX, USA: StataCorp LP). Due to the multiple signals, the data were hierarchical in nature, with observations (level-1) nested within participants (level-2). To estimate the association using the standard approach with manifest variables, we first computed the variables in Stata 16. To compute mean negative affect, we first took the mean across the negative affect items for each observation (level-1) and then aggregated the means on the person level (level-2). To measure negative affect variability, we computed the standard deviation of mean negative affect for each participant, which yields a level-2 variable. To count the number of negative affect episodes, we identified the runs or spells in the time-series data where the respective negative affect items were not zero. 2 We then summed up this number for each negative affect, divided it by the total number of observations, and then took the mean across all negative affective states. Thus, a higher value indicated that the participant experienced a higher number of negative affect episodes relative to the total number of observations.
We, then, computed SEMs in Stata 16, using the Satorra–Bentler scaled chi-squared test to account for the non-normality of the variables (Satorra & Bentler, 2001) and reporting the z-standardized estimates. Standardized effect sizes were interpreted with respect to the current guideline by Funder and Ozer (2019), where a standardized coefficient of β = .05, β = .10, β = .20, β = .30, and β = .40 constitutes a very small, small, medium, large, and very large effect, respectively. In all models, mean negative affect was included to control for the confound with the mean.
To estimate the latent association, we used DSEM in Mplus (Asparouhov et al., 2018) and the model types illustrated in Figure 2. In all models, we specified a measurement model for the latent mean negative affect factor on the between-person level (for example, in Dataset 3, using the four negative affect items “angry,” “anxious,” “depressed,” and “sad”) and for the latent neuroticism factor (using, for example, the 7 items from the BFI-44) (see Figure 2a). To examine negative affect variability, we specified a measurement part of negative affect variability with the residual of the person intercept of the respective negative affective state as its indicators (Figure 2b). The residual term is the deviation from the person mean of that negative affective state and, thus, represents the variability of that affective state around the person mean, that is, the variance that cannot be explained by the mean. Consequently, the latent variable of negative affect variability accounts for measurement error as well as adjusts for the fact that different negative affect items might reflect mean negative affect variability differently. Schematic path diagrams of the multilevel dynamic structural equation models. Note. The latent neuroticism factor is predicted (a) by latent negative affect and (b) by negative affect variability using the standard deviation of each negative affective state, or (c) by the number of negative affect episodes. The means and most variances of the exogenous variables are not shown for the sake of clarity and to focus on the parameters that are of key interest.
For the DSEM, we used the Bayesian estimator and two MCMC chains as well as the default priors and the default Gibbs sampler, so that the results were driven by the data. Moreover, we used a thinning of 50 to reduce the autocorrelations between the iterations, and thus, every 50th iteration of the estimation was discarded. We did not use Mplus’ default convergence cut-off but set it to a value of .0025, so that model estimation would stop when the Potential Scale Reduction (PSR) fell below PSR = 1.005 instead of the default value of 1.100 (BCONVERGENCE = 0.0025 is multiplied by 2 in multivariate models in Mplus). This cut-off was based on prior research, which demonstrated that the default cut-off of Mplus is too high, but that the model properties did not further improve with more strict cut-offs below a PSR of 1.005 (Zitzmann & Hecht, 2019). Importantly, inspecting the autocorrelations and the trace plots of the models did not show any signs of misspecification: All autocorrelations were low (r < .20) and quickly moved to zero, all trace plots resembled what is often called a fat, hairy caterpillar, and all models reached at least a PSR value of 1.005 at the end.
After computing the models based on the individual datasets, we meta-analyzed the results across the datasets, using a restricted maximum likelihood meta-analysis with Fisher’s Z-transformed standardized coefficients. Given the exploratory nature of the present research, we set the threshold for defining statistical significance to p < .005, as recently recommended (Benjamin et al., 2018). In addition to the sensitivity power analysis described earlier, we also entered the meta-analytically derived mean effect size of the respective measure and the I 2 heterogeneity estimates of the respective relationship to report the post-hoc power estimates.
Results
SD and RSD as suitable measures of negative affect variability
Prior research has demonstrated that the SD is confounded with the mean (Kalokerinos et al., 2020; Mestdagh et al., 2018). And indeed, Table 2 shows that the correlation in our datasets was very high, with a meta-analytically derived mean of r = .66, SE = .03, p < .001, 99.5% [.61, .72], I 2 = 0%, when using latent factors, with the individual estimates ranging from r = .64 to r = .69. Although we found very high correlations, all estimates were below r = .70, which has been considered as an appropriate indicator when issues of multi-collinearity arise (Dormann et al., 2013). However, the RSD, in turn, came with several issues of its own. First, in our three datasets, the latent variable correlation between mean negative affect and the RSD was negative and large, r = −.32, SE = .13, p = .013, 99.5% [−.61, .04], although it did not reach our conservative significance threshold of p < .005 (Table 2). Moreover, as in prior research (Wenzel & Kubiak, 2020), this association was highly and significantly heterogenous, Q = 12.26, p = .002, I 2 = 86.4%. Thus, the RSD did not remove the confound with the mean in our three datasets but was instead negatively associated with mean negative affect.
Second, we examined the latent variable correlation with the number of negative affect episodes, which can be understood as an alternative measure of negative affect variability. As indicated in Table 2, we found a very large and significant association between the latent factor of the number of negative affect episodes and the latent factor of the SD, r = .67, but an association very close to zero with the latent factor of RSD, r = .03. Thus, standardizing the SD at the maximum SD at a given value did not only reverse the direction of the confound with the mean but also led to very low convergent validity.
Finally, in addition to our analyses in the present research, we re-analyzed the data provided by Kalokerinos et al. (2020). The results show that the manifest variable associations between the RSD and mean negative affect, which we calculated for each of the 11 datasets, correlated very highly with the associations between the RSD and neuroticism, r = .86, SE = .07, z = 13.11, p < .001, 95% CI [.73, .99]. This means that in datasets, where the RSD was highly positively correlated with neuroticism, it was also highly associated with mean negative affect. In turn, when computing the manifest variable association between the SD and neuroticism and controlling for mean negative affect, the individual associations only correlated modestly with the associations between the SD and mean negative affect, r = .22, SE = .22, z = 1.02, p = .310, 95% CI [−.20, .64].
Manifest associations between neuroticism and negative affect variability
Negative affect variability as captured via the SD and RSD
Figure 3 shows the individual and meta-analytical results for the relationship between neuroticism and negative affect variability, using the standard approach with manifest variables. We found a meta-analytically derived mean effect size of β = .42 for the manifest relationship between neuroticism and mean negative affect, indicating a very large effect size (Funder & Ozer, 2019). In turn, the manifest association between neuroticism and negative affect variability as captured by the SD was small-to-medium in size, β = .19. The mean effect size of manifest mean negative affect reduced to β = .33, SE = .08, p < .001, 99.5% [.11, .55], when the manifest SD was included in the model, which indicates that mean negative affect still explained more variance in neuroticism than negative affect variability as captured by the SD. Moreover, adding the SD to the model with mean negative affect only explained an additional variance of Coefficient plots of the relationship between neuroticism and mean negative affect (M), negative affect variability (SD and RSD), and the number of negative affect episodes (NUM). Note. Dots represent the standardized coefficients, whiskers their 99.5% confidence interval. The results of the random-effects meta-analysis are depicted by the diamond and by showing the explained variance (R
2
) and the heterogeneity (I
2
) measures. The vertical line indicates a null relationship. Power = post-hoc statistical power. The power was estimated via the metapower package (Griffin, 2021) in R, using a study size of N = 430, a study number of k = 3, a significance criterion of p = .005, a product-moment correlation r as the effect size as well as the mean effect and I
2
value of the respective relationship.
However, the contribution of the RSD was even more limited: As illustrated in Figure 3, the coefficient of the manifest relationship between neuroticism and the RSD was only very small and did not explain additional variance above and beyond manifest mean negative affect.
Negative affect variability as captured via the number of negative affect episodes
In the second research question, we tested the number of negative affect episodes as an alternative measure of negative affect variability. First, unlike the SD and the RSD, the latent variable correlation between the number of negative affect episodes and mean negative affect was only medium in size, r = .22, and not significant (Table 2). Thus, the number of negative affect episodes was not substantially confounded with mean negative affect, thereby avoiding issues of multi-collinearity when examining its association with neuroticism.
Second, regarding the manifest variable associations, Figure 3 shows that the number of negative affect episodes was a significant predictor of neuroticism, β = .18, that explained R 2 = 3.4% additional variance in neuroticism above and beyond mean negative affect. Thus, individuals high in neuroticism reported more episodes of negative affect. Taken together, our results indicate that the number of negative affect episodes can be considered as a suitable measure of negative affect variability that is not substantially confounded with mean negative affect.
Latent associations between neuroticism and negative affect variability
Reliabilities
The reliabilities of all measures in the three datasets are summarized in Table 2. To derive a general measure of reliability across the three datasets, we meta-analyzed the individual reliability estimates via reliability generalization (Vacha-Haase et al., 2002). As illustrated in Table 2, using the negative affective states as indicators of a latent factor demonstrated good to very good reliabilities of all measures, with the highest estimate for mean negative affect, ω = .94, and the lowest estimate for the number of negative affect episodes, ω = .83. Importantly, although the difference between .94 for mean negative affect and .84 for negative affect variability as captured by the SD might seem very small, squaring these measures shows that the measurement of mean negative affect contained approximately 18 percentage points less unexplained variance than the measurement of negative affect variability as captured by the SD.
Negative affect variability as captured via the SD and RSD
When using the DSEM approach with latent variables to account for measurement error and other biases, we found that the mean effect sizes of mean negative affect only slightly increased to β = .44, which can be interpreted as a very large effect size. In turn, the mean effect size of latent negative affect variability as captured by the SD increased substantially, β = .27, in comparison to the effect sizes found in the SEM using manifest variables, which can be interpreted as a medium-to-large effect size (Funder & Ozer, 2019). When the latent SD was added to the model, the mean effect size of latent mean negative affect reduced from β = .44 to β = .27, SE = .09, p = .003, 99.5% [.09, .43], which was very close to the estimate for latent negative affect variability in the same model, β = .27. Importantly, adding the latent SD to the model explained substantial variance in neuroticism,
We also ran the same models using the RSD by computing the RSD for each negative affective state and included them as indicators for the latent negative affect variability factor (similar to the model depicted in Figure 2c). However, using the DSEM approach to account for measurement error and other biases did not reveal a significant contribution of the RSD in predicting neuroticism (Figure 3), even when controlling for mean negative affect, which was at odds with the results for the SD.
Negative affect variability as captured via the number of negative affect episodes
Finally, we examined the associations between the latent factors of neuroticism and the number of negative affect episodes. Figure 3 shows that the number of negative affect episodes was significantly associated with neuroticism and explained as much additional variance in neuroticism above and beyond mean negative affect, R 2 = 5.4%, as did the SD. Thus, our results indicate that participants with high levels in neuroticism experienced a higher number of negative affect episodes during the study period, which added substantial information in predicting neuroticism.
Discussion
In the present research, we re-analyzed three ESM datasets to (a) continue the ongoing debate about the utility of the SD and the RSD in capturing affective variability and in predicting neuroticism, (b) investigate the number of negative affect episodes as an alternative variability measure, and (c) assess the associations of static and dynamic affect measures with neuroticism free of measurement error. We found that the SD and our alternative measure of the number of negative affect episodes but not the RSD added substantial variance in explaining neuroticism differences between individuals. Moreover, these associations increased considerably when correcting for measurement error and other biases using the DSEM approach with latent variables. In the following sections, we discuss these results in more detail.
The standard deviation should be preferred over the relative standard deviation as a measure of affect variability
Although mean negative affect correlated highly with the SD in our datasets, it is important to note that the correlations in all three datasets were below the recommended cut-off of r = .70 (Dormann et al., 2013). However, given that issues of multi-collinearity can also arise below this cut-off (Dormann et al., 2013), it is important to find ways to detect possible issues of this confound and to come up with appropriate corrections. To that end, prior research has proposed the RSD as an alternative measure that statistically corrects for the confound with the mean. However, our results showed that the RSD is also problematic given that (a) it demonstrated a large negative latent association with mean negative affect, (b) it was not significantly related to the other measures of negative affect variability (SD and the number of negative affect episodes), and (c) the association of the RSD with mean negative affect was highly correlated with its association with neuroticism, indicating that in datasets, where the RSD was highly positively correlated with mean negative affect, it was also highly correlated with neuroticism. Taken together, these results indicate that the RSD of negative affect did not consistently control for the confound with mean negative affect. Moreover, the remaining association was strongly associated with the association of interest (i.e., the association between variability and neuroticism), and the RSD was, unlike the SD, no longer associated with the number of negative affect episodes. Thus, we do recommend using the SD instead of the RSD. However, when using the RSD to control for the non-linear dependency between M and SD, we argue that it is still important to include both the RSD and M in a model, as we did in our prior research using the RSD (Wenzel & Kubiak, 2020).
Finally, as illustrated in Figure 3, the post-hoc statistical power of our analyses was very high, except for the RSD. While the post-hoc power for the neuroticism-RSD relationship was very low, it is important to note that the sensitivity analysis pertained to all analyses, including the RSD. Thus, we had enough power to detect small-to-medium sized relationships but not enough power to detect relationships that were very small and close to zero. However, we do not view the null results as a problem of our sample size in the present research but instead as a relatively robust estimator of the effect size of the neuroticism-RSD relationship, which was marginally small and close to zero.
The number of negative affect episodes might also be a suitable measure of affective variability
Given the issues associated with the RSD, testing alternative measures of negative affect variability that are less confounded with mean negative affect is important to further examine the relationship between neuroticism and negative affect variability free of potential issues of multi-collinearity. In the present research, we proposed the number of negative affect episodes as an alternative measure of capturing negative affect variability. And indeed, our results demonstrated that this measure added as much variance in explaining neuroticism above and beyond mean negative affect as the SD but was less confounded with the mean on average. Thus, these results show that the association of negative affect variability with neuroticism might not be driven by the SD’s confound with the mean. Moreover, given that the SD is relatively difficult to interpret and given that it correlated highly with the number of negative affect episodes, the number of negative affect episodes provides a measure of variability that is easier to interpret: Highly neurotic individuals not only experience stronger negative affect in general compared to less neurotic individuals but also more, and relatively short, negative affect episodes.
However, using the number of episodes also comes with some disadvantages that future research could address. First, the number of episodes depends on the interval between ESM signals, which makes it difficult to compare the scores across datasets. However, the different spacing of the signals in the present analyses (around 2 hours in Datasets 1 and 3, and only 1 hour in Dataset 2) did not substantially impact the results. Second, the measure assumes that two consecutive observations, where, for example, anger occurred, pertain to the same episode, although they could also reflect two separate events that provoked anger. To circumvent this problem, future research should capture eliciting events more elaborately, so that one could tie affective episodes to events and track the impact of emotionally relevant events over time.
Correcting for the reliability of affect dynamic measures
Another implication of our results is that studies aimed at investigating affect dynamic measures, such as negative affect variability, should adopt a standard practice of estimating and reporting the reliability of these measures (Parsons et al., 2019). For negative affect variability, we propose computing the SD for each emotion or the number of negative affect episodes and, then, using these measures as indicators in an SEM. Using this approach, we found that all affect variability measures could be reliably assessed in all three datasets. Thus, using, for example, the SDs as indicators did not only connect well to prior research efforts examining reliability of SDs but also yielded greater reliabilities: Whereas a prior simulation study showed that the SD based on mean affect needed on average 90 observations per individual to obtain a reliability of .80 (Estabrook et al., 2012), our approach demonstrated good reliabilities around ω = .85 based on an average of 58 and 39 observations per participant. Thus, negative affect variability might be reliably assessed based on the number of observations of common experience sampling studies when using a latent factor approach.
The second implication is that our approach also allows to correct for the (un)reliability of these measures and other biases such as heterogeneous item-construct relationships or disregarding the number of within-person observations when using person-aggregated predictors (e.g., Brose et al., 2021). When using manifest variables using a two-step approach as was done in prior research (Kalokerinos et al., 2020), we found a significantly smaller coefficient for negative affect variability compared to mean negative affect, which was similar in size to the RSD-neuroticism relationship when controlled for mean negative affect (Wenzel & Kubiak, 2020). When using latent variables in a one-step approach via DSEM, we found that the significant difference between the predictors of mean negative affect and negative affect variability reduced to nearly zero and became non-significant, with both effects being medium in size (Funder & Ozer, 2019). In addition to an increase in explained variance of app. 5% when negative affect variability was additionally entered into the model, either based on the SD or on the number of negative affect episodes, we can conclude that negative affect variability provides substantial and unique information about neuroticism above and beyond mean negative affect when correcting for the reliabilities of the measures.
The congruence of personality states and traits
Adding about 5% explained variance might sound too low to count negative affect variability as an important feature of neuroticism. However, the added explained variance of
Limitations
First, it is important to note that our findings are based on three nonclinical samples. Dataset 1 consisted mostly of assessments from female undergraduate psychology students, Dataset 2 was more balanced regarding the gender of the participants, but at least one half still consisted of students of psychology while the other half were their partners, and Dataset 3 involved an adult sample. Given that we found similar effects over all datasets, we can rule out that the found effects were specific for students. However, the found associations may have even been larger in individuals who carry a psychological diagnosis or experience some other kind of crisis given that negative affect ratings tend to be rather low in nonclinical samples, which is known to restrict the observed affect variability (Hisler et al., 2020). For example, in a recent study that was conducted during the COVID-19 pandemic, the researchers found that neuroticism was indeed uniquely associated with affect variability (Kroencke et al., 2020) while other studies that neither corrected for measurement errors nor examined individuals during crisis did not (Hisler et al., 2020; Kalokerinos et al., 2020). Based on these findings, the authors concluded that individuals high in neuroticism may have experienced more variable NA during a crisis (Kroencke et al., 2020). Thus, although we think that negative affect variability represents an important aspect of neuroticism, which may be particularly pronounced in stressful situations and can even be observed in a nonclinical sample when correcting for measurement errors, future research may also examine affect variability in relation to neuroticism in clinical samples while correcting for measurement error. Understanding what and how individuals high in neuroticism feel like would give further hints on how to provide person-tailored interventions for them.
Second, our affect assessments were at least 30 to 60 minutes apart from each other. Therefore, we may have missed capturing short-term changes in affect over only a few minutes or even seconds, which may also be indicative for neuroticism. Future research may assess emotions more frequently and directly after an emotionally relevant event has occurred in order to represent fine grained emotional experiences and their variability more accurately in relation to neuroticism.
Third, based on our data, we cannot make any conclusions on whether experiencing affect variability leads individuals to report high neuroticism characteristics or whether high neuroticism levels result in more variable affective experiences. In a recent study, the Borderline personality trait characteristics prospectively predicted affect variability 1 year later, but not the other way around (Houben & Kuppens, 2020). However, in another study, negative affect reactivity predicted neuroticism and not vice versa (Wrzus et al., 2021). Thus, future research may also examine whether a more negative and unstable self-concept as reported via neuroticism scales may manifest in more variable affective experiences over time.
Fourth, we used different emotion items to assess negative affect as well as two different questionnaires to assess neuroticism in our datasets, which could also impact the relationship between neuroticism and negative affect variability. However, although three datasets are relatively few to arrive at robust conclusions regarding the presence of homogeneity, our results did not indicate any heterogeneity in the individual associations between neuroticism and negative affect variability. However, future research could also turn an eye to the neuroticism questionnaires and examine how well these instruments are constructed to capture meaningful variation in negative affect.
Conclusion
Previous research challenged the notion that negative affect variability is an important aspect of neuroticism above and beyond mean negative affect (Kalokerinos et al., 2020). However, this research is based on manifest variables, which differ with regard to their reliability. By using a novel modeling approach for affect variability, we found that negative affect variability explained neuroticism substantially above and beyond mean negative affect. This highlights the importance to model and correct for measurement errors and other biases in a DSEM framework when examining affect dynamics. Using this approach, our results indicate that negative affect variability represents a unique aspect of neuroticism.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by internal grants of the Leibniz Institute for Resilience Research and the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) Grant CRC 1193 C04 (to O.T.).
