Abstract
Although ecological momentary assessment (EMA) provides an opportunity for the examination of intervention mechanisms in real time, there are few validated tools to assess key treatment mechanisms in EMA studies. Our focus in this study is a potentially critical treatment mechanism, improvement in self-efficacy for managing negative emotions. We examined the psychometric properties of the Patient-Reported Outcomes Measurement Information System (PROMIS) self-efficacy for managing negative emotions scale measured via EMA. Participants (n = 145 college students) in a clinical trial of Dialectical Behavior Therapy skills completed four daily EMAs for 6 weeks (13,153 total responses). Results indicated (a) excellent internal consistency and good factor structure, (b) sufficient variability at both the between- and the within-persons levels, and (c) strong construct and predictive validity. This study supports the validity of an EMA measure of self-efficacy for managing negative emotions that can be used in real time, both in intervention studies and in observational research.
Introduction
The idea that it is important to understand the mechanisms of how a psychotherapeutic intervention confers change (Kazdin, 2007) is not a new one (Frank, 1974; Rosenzweig, 1936). Modern frameworks to test the mechanisms by which psychotherapy leads to improved outcomes have existed for the past 20 years (Kazdin, 2009; Kraemer et al., 2002), but the critical need for psychometrically robust measures of putative causal mechanisms has been bolstered in recent years by the U.S. National Institute of Mental Health’s (NIMH) experimental therapeutics approach (Insel & Gogtay, 2014). Despite the existence of methods to test mechanisms of treatment effectiveness, and the funding support to do so, there is a paucity of research that successfully identifies mechanisms (Cuijpers et al., 2019).
This is especially true in studies that employ mobile interventions, where change can occur over smaller temporal intervals (hours, days) than traditional measures (e.g., weekly, monthly) are able to capture. Recent advances in mobile interventions, where therapeutic content (i.e., ecological momentary intervention) and/or assessments of the effectiveness of the content (i.e., ecological momentary assessment [EMA]), are delivered in real time on a smartphone or other internet-enabled device and allows researchers to examine mechanisms of treatment over briefer durations (Domhardt et al., 2021; Graham et al., 2019). Specifically, EMA can measure self-efficacy for managing negative emotions in real-world, naturalistic contexts (e.g., the ability to manage negative emotions can be measured while participants are actually experiencing negative emotions in their daily lives).
One constraint for the use of EMA to test treatment mechanisms, however, is a lack of validated measures to do so. Accordingly, the goal of this study is to establish the validity of an EMA measure of one potential mechanism of treatment effects: self-efficacy for managing negative emotions. We focused on adapting an existing measure of self-efficacy for managing negative emotions from the Patient-Reported Outcomes Measurement Information System (PROMIS) measures (Gruber-Baldini et al., 2017) for use in repeated, real-time assessments like EMA.
Self-efficacy for managing negative emotions is a form of self-efficacy (Bandura, 1977) that focuses on one’s “confidence to manage/control symptoms of anxiety, depression, helplessness” (Gruber-Baldini et al., 2017, p. 1917). Changes in self-efficacy for managing negative emotions or related broader constructs (e.g., coping self-efficacy; Schwarzer & Renner, 2000) occur naturalistically over the course of the lifespan (Carstensen et al., 2011; Scheibe & Carstensen, 2010) and also precede improved outcomes in many psychotherapeutic interventions, including mindfulness (Luberto et al., 2014), distress tolerance within Dialectical Behavior Therapy (Lynch et al., 2006), several components of the Unified Protocol for Emotional Disorders (Wilamowska et al., 2010), Hope Boxes for suicide risk (Denneson et al., 2019), and exposure therapy for agoraphobia (Williams et al., 1989). It makes logical sense that improvements in self-efficacy for managing negative emotions would precede improvements in outcomes of skills-based treatments. Research suggests that the belief that one can use a skill to achieve a desired goal (e.g., manage negative emotion, in this case) is key to the successful use of that skill (Maddux, 1995; Maddux & Kleiman, 2016), and skill use is associated with better outcomes in Dialectical Behavior Therapy (DBT) (Neacsiu et al., 2010). Self-efficacy for managing negative emotions is an ideal candidate for adaptation into an EMA measure self-efficacy for managing negative emotions would be particularly relevant in the context of high levels of negative emotions experienced at the moment—but also not irrelevant even in the presence of lower levels of negative emotions. Thus, we examined the items from PROMIS four-item short form of self-efficacy for managing negative emotions scale as a measure of perceptions of self-efficacy for managing negative emotions in the present moment. This will allow researchers to assess this construct precisely at the moment that it is most relevant—that is, while participants are experiencing negative emotions.
The Present Study
The present study had three aims focused on evaluating an EMA measure of self-efficacy for managing negative emotions. The first aim was to establish the within-person and between-person factor structure of a four-item scale of self-efficacy for managing negative emotions. These analyses are important to determine whether the items represent a unitary construct, both within (i.e., observation-to-observation differences from an individual’s mean score) and between (i.e., factor structure using each participant’s average score) individuals.
The second aim was to characterize the variability in self-efficacy for managing negative emotions. Here, we addressed whether, and the extent to which, variability in self-efficacy for managing emotions is due to person-to-person versus moment-to-moment variability. This allows us to identify whether people can be distinguished from others based on their average self-efficacy for managing emotions (i.e., between-person variability) and how much people vary around their own averages (i.e., within-person variability).
The third aim was to assess the validity of responses to the four-item scale of self-efficacy for managing negative emotions. We explored predictive validity by examining whether self-efficacy for negative emotion predicted two relevant outcomes also assessed via EMA: ability to tolerate emotions and reflections of how well one handled emotion during the day. We hypothesized that greater momentary self-efficacy for managing emotions would be temporally associated with more tolerable emotions at the next assessment and ratings of how well emotions were handled during the day.
Method
Information from this article, our prior article from this study (Rizvi et al., 2022) which does not overlap scientifically with this one, and our ClinicalTrials.gov record (NCT04558411), provide context on how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. Briefly, our sample size (n = 145) was determined based on the original randomized controlled trial and our desire to detect approximately medium effects in the intervention (d = .55 or larger) with EMA compliance as low as 75%.
Participants
Participants in this study were 145 college students recruited as part of a larger randomized clinical trial testing the effect of a smartphone-based animated intervention teaching Dialectical Behavior Therapy (DBT) skills during the COVID-19 pandemic. Participants were randomized to receive the intervention and EMA (n = 92) or EMA only (n = 53; see Rizvi et al., 2022 for more details). We include participants from both arms and control for intervention group where relevant. The only inclusion criteria were: (a) owning a compatible iOS or Android smartphone, (b) being 18 years or older, (c) being a matriculated undergraduate student at the university (a large university in the Northeastern United States) where the study was occurring during the Fall 2020 semester when the data were collected, and (d) currently residing in the United States (classes were remote that semester). In addition, in the present article, we included only participants who provided at least one EMA response on all variables used in the analyses presented here; thus, we excluded eight participants from the original study (n = 7 from intervention group and n = 1 from control group). Otherwise, we included all participants who were in the full study.
Procedure
Recruitment, Consent, and Baseline
Participants were recruited from university listservs and social media channels for a study on “how to effectively manage the many challenges as an undergraduate in the COVID-19 era.” Those who were interested completed a brief eligibility screener. If eligible, potential participants were emailed a consent form and a brief set of baseline measures and information on how to install the EMA app (MetricWire) on their phone to begin the smartphone-based ecological momentary assessment monitoring period.
EMA Period
Beginning the day after the baseline assessment, participants began to receive four brief surveys per day on their smartphones delivered at random times within prespecified windows. These surveys assessed a variety of constructs in the present moment (e.g., affect and self-efficacy for managing negative emotion). The last survey of the day was a slightly longer assessment that included questions asking participants to reflect on an entire day. After 2 weeks, two-thirds of the participants were randomized to receive a video teaching DBT skills each day for 14 days (in addition to the EMA they were already completing). The other group continued to only complete EMA. Otherwise, both groups had identical EMA procedures for the full 6-week EMA period. Participants could earn up to US$0.25 per survey or a total of US$42 for the EMA portion (participants could earn an additional US$18 through weekly surveys included in the study but not used in this manuscript).
Measures
Baseline
Participants completed a demographics screener and a variety of baseline assessments not relevant to the present study.
Momentary Self-Efficacy for Managing Emotions
Each EMA survey included a momentary assessment of self-efficacy for managing negative emotions, using the items from the PROMIS Self-Efficacy for Managing Emotions Scale (Gruber-Baldini et al., 2017). The measure asked participants to rate, at that moment using a 1 (I am not at all confident) to 5 (I am very confident) scale, the following four items: (a) I can bounce back from disappointment, (b) I can avoid feeling discouraged, (c) I can find ways to manage stress, (d) I can handle negative feelings. In the initial test of this scale (Gruber-Baldini et al., 2017), a 27-item full measure as well as four- and eight-item short forms were found to be valid. Given the importance of minimizing participant burden in EMA studies, we selected the shorter four-item version as the basis for our EMA adaptation. Our items were identical to those in the initial measure. The only change was the explicit mention of assessment in the present moment.
Momentary Intolerance of Emotions
Intolerance of emotions was also assessed at each EMA survey. Specifically, we asked participants to rate their ability to tolerate emotions in the present moment by providing the stem “The feelings I’m having right now are”: with a 1 (not at all unbearable) to 5 (extremely unbearable) scale.
Momentary Negative Emotion
At each EMA survey, we asked participants how negative they were feeling (“Generally, how negative you feel right now”) on a 0 (not at all) to 6 (very much) scale.
Nightly Rating of Handling Negative Emotions During the Day
Each night, participants were asked to reflect on how well they handled emotions (“How well did you manage your painful feelings and impulses today?”) on a 0 (not well at all) to 5 (very well) scale.
Analytic Strategy
Aim 1: Factor Structure
We conducted a multilevel confirmatory factor analysis (CFA) in R using the lavaan package (Rosseel, 2012) that simultaneously assessed the between- and within-person factor structure of the measure. We assessed model fit using the comparative fit index (CFI, larger is better, especially > .95) and the root mean square error of approximation (RMSEA, smaller is better, especially <.08; Schreiber et al., 2006). We also estimated McDonald’s omega (ω) using the semTools package (Jorgensen et al., 2021). Omega is a preferable metric of internal consistency to Cronbach’s alpha for many reasons, most notably that it does not assume uncorrelated errors (Hayes & Coutts, 2020). It is generally interpreted in a similar way to Cronbach’s alpha (higher is better, with >.8 being an arbitrary cutoff for strong internal consistency). Finally, we calculated the interitem correlation among the self-efficacy for managing emotions items using the psych R package (Revelle, 2017).
Across all three metrics (CFA, omega, correlations), we estimated both between- and within-person effects. Within-person effects reflect observation-to-observation changes within a person, capturing the deviation between any given response and an individual’s average response. Between-person effects represent the CFA, omega, or correlations using each person’s average score, capturing the deviations from an individual’s average score from the sample’s average score. This strategy allowed us to decompose the between- and within-person variance in the factor structure, reliability, and correlations helping us better understand the nature of what this measure captures.
Aim 2: Characterizing Variability
To characterize variability, we calculated two statistics. First, we calculated intraclass correlation (ICC) using the ICC R package (Wolak et al., 2012). ICC captures the amount of variability due to between-person vs. within-person (i.e., day-to-day) changes. Higher values indicate more variability from person to person than within person. Second, we assessed the root mean square of successive differences (RMSSD; von Neumann et al., 1941) for each person using the psych package. This captures how much a measure varies over time, where higher scores equal more “jagged” variability. RMSSD is on the same scale as the variable it is measuring (e.g., an RMSSD of 0.5 on a 1–5 scale means on average, a response at time T differed from the response at time T+1 by ± 0.5).
Aim 3: Validity
To assess predictive validity, we conducted three multilevel structural equation models, building on the CFA (i.e., measurement model) established in Aim 1. The first model was concerned with the temporal association with intolerance of emotions. We used all available data for which we had two consecutive time points measured within the same day (since we were not concerned with effects overnight or over periods longer than a day). In this model, the outcome was intolerance of emotions at time T+1. The predictors were study arm (i.e., experimental vs. control), intolerance of emotions at time T, momentary negative emotions, and self-efficacy for managing emotions. The second model was concerned with changes in negative emotions over time. In this model, the outcome was negative effect at time T+1. The predictors were study arm (i.e., experimental vs. control), at time T, momentary negative emotions at time T, and self-efficacy for managing emotions at time T. The third model was concerned with participants’ ratings of how well they controlled emotions at the end of the day. This model used the nightly rating of how well emotions were handled as the dependent variable. The predictors were the study arm, daily negative emotions, and the daily average of assessments of self-efficacy for managing emotions. We used the daily average of self-efficacy for managing negative emotions instead of the momentary variables here because our outcome variable was on a daily level. The addition of a daily-level variable would have added a third level (observations within days within people), and thus we aggregated the momentary ratings to the daily level. In this model, the measurement portion consisted of the daily average of each indicator of the latent self-efficacy for managing negative emotions variable.
We controlled for negative emotions at time T in each model because we were interested in how well self-efficacy for managing negative emotions predicted intolerance of emotions and how well one handled their emotions above and beyond what they were currently feeling. We used latent variable centering in each model so that we could decompose the effects that occurred at the between- vs. within-person level.
Separating Observation- and Person-Level Contributions
Across our aims, we conducted analyses that allowed us to separate the contributions due to observation-to-observation variability from the contributions due to person-level variability. Observation-to-observation variability quantifies how a construct varies in everyday life (a strength of EMA) and what factors are temporally related to it. Repeated observations can also be useful in aggregate to provide a nuanced understanding of person-level processes. For example, we could compare those with more versus less stable reports of self-efficacy to manage negative emotions.
Choice of Estimator
In latent variable (i.e., CFA and structural equation modeling [SEM]) and path modeling, there are a variety of methods of estimation methods, called estimators, that offer different ways to estimate the free parameters (e.g., covariances and regression weights) in the model. We initially used the default estimator in lavaan, called the maximum likelihood estimator. However, we switched to the maximum likelihood robust (MLR) estimator for two reasons. Given that we used two different estimators, we report the model fit statistics from both estimators.
The first reason is that MLR produces standard errors that are robust to nonindependent indicators (Yuan & Bentler, 1998). As we note later, the indicators in our CFA were highly correlated at the person level. Second, some estimators, like the MLR estimator used here, involve the full information maximum likelihood (FIML) approach. This estimating approach allows use of all available data on a case-by-case basis, avoiding listwise deletion when there is missing data in an individual observation. Simulation studies have shown that FIML methods like MLR perform similar to multiple implantation methods. Moreover, FIML methods produce estimates similar to the estimates that would appear if complete data were available (Lee & Shi, 2021). Given that we had a small amount of missing data at the individual case level (2.01% of all responses had at least one missing value) that was found to be missing completely at random (Little’s MCAR test: χ² = 12.61, df = 16, p = .701), this method was appropriate.
Results
Demographics
The mean age of the participants was 20.76 years (SD = 2.66 years, range = 18.07–37.67 years). Regarding gender: 79.72% of the sample identified as cisgender female, 15.38% identified as cisgender male, 1.40% identified as transgender male, and 2.10% identified as non-binary or other gender. Regarding race, 50.35% of the sample identified as White, 48.25% identified as Asian, 6.29% identified as Black or African American, and the remainder identified as more than or another race. Regarding ethnicity: 14.69% of the sample was Hispanic/Latinx. Race and ethnicity breakdowns are representative of the undergraduate student body at this university.
Survey Compliance Rates
The 145 participants completed a total of 13,153 observations (M = 90.71 observations/person, SD = 51.97; range: 2 to 176). This yielded an effective compliance rate of 53.9%. There were a total of 2,918 nightly surveys (M = 20.12, SD = 13.13, range: 1 to 42) that assessed all study constructs plus the daily report of how well one handled emotions during the day. Consistent with the larger study (Rizvi et al., 2022), we found higher compliance in the first four weeks of the study than in the last two weeks. The average compliance rate was 57.8% (SD = 33.5%) in the first four weeks compared with 36.5% (SD = 32.1%) in the last two weeks (t = 8.46, df = 144, p < .001). Given that the primary analysis in this paper, the CFA, involved aggregating momentary-level data to the day-level, the rate at which participants gave at least one response on any given day is relevant. Participants completed a survey on 31.03 out of 42 days on average (SD = 13.44 days). At least one survey was completed on 73.9% of all days.
Descriptive Statistics
Table 1 shows the intercorrelation among the individual variables that comprised the self-efficacy for managing negative emotions composite. Although all correlations were significant at p < .001, the within-person correlations were smaller (rwithin range from .59–.71) than the between-person correlations (rbetween range from .86–.94). The descriptive statistics for the other variables used in our models is as follows: Intolerance of emotion: M = 1.62 (SD = 0.88), ICC = 0.39, (95% confidence interval [CI] = [0.34,0.46]), Negative emotion: M = 1.23 (SD = 1.39), ICC = 0.45, (95%CI = [0.39,0.52]), Ratings of how well one managed emotions during the day: M = 3.17 (SD = 1.44), ICC = 0.37, (95%CI = [0.31,0.43]).
Intercorrelations Among Individual Items, Reliability if Item Removed, and Sample-Wide Descriptives.
Note. All correlations significant at p < .001. CI = confidence interval.
Aim 1: Factor Structure
Figure 1 shows the results of the CFA. The factor structure had a poor fit for the non-robust estimator (RMSEA = .089 [95% CI = .082, .096], CFI = .984) but had a far better fit using the robust MLR estimator (RMSEA = .047 [95% CI = .043, 051], CFI = .976). The loadings were quite strong ranging from .764 to .834 in the within-person model and ranging from .926 to .967 in the between-person model. The overall reliability (i.e., omega; ω) was excellent for both the within-person (ω = .874) and the between-person (ω = .974) models. Table 1 also shows the ω for the scale if each item was removed. The removal of any one item did not meaningfully change the within-person (ω ranged from .827–.849) or between-person (ω ranged from .961 to .971) reliability.

Multilevel CFA of Momentary Assessment of Self-Efficacy for Managing Negative Emotions.
Aim 2: Characterizing Variability
Figure 2 shows variability over the study period for each participant (one line per participant). The ICC was .701 (95% CI = .652, .750), indicating about 70% of the variability occurred from person-to-person. The individual item ICCs ranged from .634 to .659 (see the rightmost column in Table 1 for 95% CIs). The average RMSSD across participants on the self-efficacy for managing emotions average score was 0.58 (SD = 0.30), suggesting that overall self-efficacy for managing negative emotion mean scores, which could have ranged from 1 to 5, differed from assessment to assessment by 0.58 on average.

Participant-Level Variability in Self-Efficacy to Manage Negative Emotion Across the Study Period.
Aim 3: Validity
Figure 3 shows the model testing the temporal association between momentary self-efficacy for managing emotions and intolerance of emotions. This model had a poor fit for the non-robust estimator, RMSEA = .118 (95%CI= .115, .121), CFI = .877, but had better fit when using the MLR robust estimator (RMSEA = .077, 95%CI= [.075, .079], CFI = .824). Self-efficacy for managing negative emotions was negatively associated with intolerance of emotions at the within-person (i.e., momentary) level but not at the between-person level.

Models Predicting Intolerance of Emotions.
Figure 4 shows the model testing the temporal association between momentary self-efficacy for managing emotions and negative emotions. This model had a good fit for both the non-robust estimator (RMSEA = .043, 95%CI= [.039, .047], CFI = .987) and the MLR robust estimator (RMSEA = .029, 95%CI= [.027, .032], CFI = .981). Self-efficacy for managing negative emotions was negatively associated with intolerance of emotions at the within- and between-person level.

Models Predicting Negative Emotions.
Figure 5 shows the model of the association between momentary self-efficacy for managing emotions (aggregated to the daily level) and nightly ratings of how well emotions were handled during the day. This model had a good fit for both the non-robust estimator (RMSEA = .049, 95%CI = [.042, .056], CFI = .984) and the robust MLR estimator (RMSEA = .035, 95%CI = [.030, .041], CFI = .980). Self-efficacy for managing negative emotions was positively associated with ratings of how well emotions were handled during the day at both the day-to-day and personal levels.

Models Predicting How Well Emotions Were Handled Over the Past Day.
Discussion
Recent advances in real-time assessment now allow researchers to examine the engagement of putative intervention mechanisms as they occur, such as improved self-efficacy for managing emotions. This provides critically important information about mechanisms associated with improved outcomes that were not available in prior intervention studies. One issue that has slowed progress in this domain of experimental therapeutics is the lack of validated measures of self-efficacy for managing emotions that can be assessed in real time. The goal of the present study was to begin to address this issue by providing initial evaluation of a momentary assessment of self-efficacy for managing negative emotions. This assessment is brief and could be used in a multitude of intervention designs, thus addressing a major gap in intervention outcome research and in research on the natural history of emotions and emotion regulation (Carstensen et al., 2011; Scheibe & Carstensen, 2010).
Our first aim was to examine the between- and within-person factor structure of the measure. We found good factor structure in both the between- and the within-person models. No single item contributed most strongly to the factor structure and removing any single item did not appear to substantially deteriorate the factor structure. This finding suggests that when participant assessment burden is a concern, researchers could likely use three (or possibly fewer) items in this scale. The second aim was to characterize the variability in the measure. We found that although there was more variability from person to person than from assessment to assessment, there was still substantial variability that can only be captured using repeated assessments. Our third aim was to test the validity of this measure. We found that self-efficacy for managing negative emotions was prospectively associated with intolerance of emotions at the next time point and overall daily ratings of how well emotions were handled during the day. Both analyses showing predictive validity showed the effect above and beyond negative emotion itself, suggesting that self-efficacy for managing negative emotion is a unique construct above the experience of emotion. Beyond providing evidence of predictive validity, these analyses were also useful because they support the importance of repeated assessments of self-efficacy to manage negative emotions. Specifically, the participant average and momentary ratings accounted for nontrivial variance in both models.
A final important note from the validity findings is that self-efficacy to manage negative emotions was positively associated with the momentary severity of negative emotions. This finding could suggest that stronger negative emotions would be harder to manage than less intense negative emotions, potentially leading participants to feel less efficacious about their ability to manage it. This idea is in line with other EMA studies assessing general self-efficacy (Veilleux et al., 2021) and self-efficacy related to smoking cessation (Hoeppner et al., 2014).
Several unanswered questions should be addressed in future research. First, this sample, although part of a clinical trial, was an undergraduate sample in a relatively “light touch” intervention who were not recruited based on the presence or severity of psychopathology. Thus, the factor structure might differ among participants with severe psychopathology in more intensive treatment. Future work is needed to evaluate the measure in clinical samples and those characterized by elevated distress. Second, we did not include the trait-level PROMIS assessment of self-efficacy for managing negative emotions, so we cannot evaluate the associations between the trait and momentary measures. Related to this, we also only included a single-item measure of negative affect, tolerability of emotions, and ratings of how well one handled emotions. Future work should examine predictive validity with more nuanced, multi-item measures. Third, the optimal frequency for assessing this construct is unknown. Given the overall ICC of .697, it seems that there is useful variability to be found when assessing this construct at a frequent interval, but it may not need to be assessed as frequently as it was in our study (i.e., up to four times a day). It may be useful for future studies to attempt to determine how often this construct should be assessed, which may vary based on a variety of factors, including how often the treatment is expected to engage this mechanism.
In conclusion, there are at least two ways in which the information from this study relates to broader literature. First, this study provides support for a measure of a construct likely relevant to a broad range of intervention research and joins only a small handful of other EMA assessments of potential mechanisms of treatment effects (e.g., Veilleux et al., 2018). Self-efficacy for managing negative emotions, especially when assessed in the presence of negative emotions (which this measure is well-suited to do), is particularly relevant to many forms of psychotherapy because it addresses the reality that many psychotherapies aim to improve the ability to bear, manage, and tolerate negative emotion rather than eliminating the experience of emotions altogether. For example, Dialectical Behavior Therapy does not focus on reducing negative emotions but rather focuses on functional engagement in life even when negative emotions are present (Lynch et al., 2006). Second, this assessment of self-efficacy for managing negative emotions may be useful beyond treatment studies and could be included in observational studies of emotion processes and in studies of risk factors for c suicidal thinking (Liu et al., 2020) and eating disorders (MacNeil et al., 2012). EMA is quite commonly used in these fields (see Engel et al., 2016; Kleiman & Nock, 2018 for reviews), making this validated assessment particularly useful for studies of eating disorders and suicide risk as well as potentially many other fields.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Rutgers Center for COVID-19 Response and Pandemic Preparedness.
