Abstract
Experience sampling methodologies are likely to play an important role in advancing our understanding of momentary influences on aggression, including short-term antecedent psychological states and situations. In this study, we evaluate whether a newly developed experiencing sampling measure of aggression, the Aggression Experience Sampler (Aggression-ES), provides a valid and reliable measure of aggression in experience sampling contexts. Participants were a convenience sample of 23 young adults recruited from the local University community. Data were collected using an experience sampling smartphone application over 8 days. They were analyzed using multilevel structural equation modeling. Our results support the within- and between-person reliability and the criterion validity of the Aggression-ES. The Aggression-ES represents a good choice of measure for use in experience sampling studies of aggression. Further work in other samples will help to provide further validity evidence for the measure.
Experience sampling refers to collecting information about thoughts, emotions, behavior, events, and context in the flow of participants’ daily lives (Bolger & Laurenceau, 2013). Compared with traditional questionnaire measures, experience sampling can minimize retrospective recall bias, improve reliability by collecting multiple datapoints per item, and in collecting data in real-life context, is argued to provide more ecologically valid data (e.g., Telford, McCarthy-Jones, Corcoran, & Rowse, 2012). Arguably, experience sampling is the only method currently available that can feasibly collect information on a large scale relating to individual behavior, emotions, cognitions, and experiences at the time and in the context in which they occur.
A popular method of experience sampling is to use brief questionnaire measures administered via participants’ own smartphones, with participants prompted to complete the measures around 3 to 7 times a day (e.g., Hofmann & Patel, 2015). Within psychological and behavioral research, the importance of this method of experience sampling is growing. From a practical perspective, recent increases in smartphone ownership (e.g., Torous, Friedman, & Keshavan, 2014) make it a cost-effective and minimally intrusive means of probing the daily lives of participants. It is estimated that around 90% of people carry their smartphones with them at all times (Rainie & Zickuhr, 2015) with smartphone use embedded within the daily routines of young adults (e.g., Bakker, Kazantzis, Rickwood, & Rickard, 2016). Numerous previous experience sampling studies tackling a range of topics have provided evidence for the feasibility and utility of the method. This includes evidence of high response rates, good participant compliance with protocols, and data that are reliable and accurate (e.g., Hofmann & Patel, 2015; Moskowitz & Young, 2006; Myin-Germeys et al., 2009).
Currently, a wide choice of experience sampling applications are available with varying levels of customization possible, meaning that experience sampling can be adapted to a broad range of research applications (e.g., Thai & Page-Gould, 2018). The questionnaire measures in experience sampling can also be combined with a range of other data sources collected simultaneously, such as biological data, movement, or location data. For example, within the aggression and criminology field, a small number of studies have used experience sampling to map out areas that are high crime risk (e.g., Solymosi, Bowers, & Fujiyama, 2015). Indeed, given its advantages, to date, experience sampling has been utilized to investigate a diversity of research questions related to, for example, aggression, mental health, media use, relationships, and substance use (e.g., DeWall, Lambert, Pond, Kashdan, & Fincham, 2012; Hofmann & Patel, 2015; Shiffman, 2009). Experience sampling is also being used to deliver and evaluate the efficacy of interventions to improve mental and physical health (e.g., Fishbach & Hofmann, 2015; Kramer et al., 2014). It is easy to envisage similar uses of experience sampling in delivering and collecting outcome data in the context of interventions for interpersonal violence.
From a theoretical perspective, smartphone-based experience sampling yields data that can speak to the interaction between situations and traits and thus holds significant potential for testing and advancing theory in aggression and crime research. Contemporary theories of crime and aggression such as the I-cubed theory, situational action theory, the general aggression model, and the general theory of crime all acknowledge the importance of the combined influence of psychological states and situations in creating risk for aggressive and criminal acts (Anderson & Bushman, 2002; Finkel et al., 2012; Gottfredson & Hirschi, 1990; Wikström & Treiber, 2009). In empirical studies, experience sampling studies have shown significant initial promise, revealing, for example, how events in the social environment (e.g., provocations) and momentary emotional internal states (e.g., gratitude or curiosity) affect aggressive motivations and behavior (e.g., DeWall et al., 2012; Kashdan et al., 2013). It also uniquely allows researchers to gain insights into risk factors such as emotional lability (the variability in emotional states over a series of days), or synchrony (the extent to which two things tend to co-occur; for example, provocation and anger) that cannot be calculated from normal survey data (e.g., Mejía, Hooker, Ram, Pham, & Metoyer, 2014). It can thus provide an operationalization of a whole new class of risk factors for crime and aggression.
Experience sampling data collected over longer time spans also have the potential to test emerging theories about how day-to-day events, thoughts, emotions, and behavior coalesce into long-term trait changes. The TESSERA framework (which stands for “Triggering situations, Expectancies, States/State Expressions, and Reactions”), for example, articulates the basic components of daily experience and behavior that result in patterns of stability and change in personality over developmental time (e.g., Wrzus & Roberts, 2017). Interactions across developmental and momentary timescales are also core to several explicitly “multi-timeframe” theories of development with relevance for the development of aggression including dynamic systems theory (e.g., Granic & Patterson, 2006) and trait-state perspectives (e.g., Edmondson et al., 2013).
However, for the potential benefits of experience sampling to be realized in crime, aggression, and violence research, it is critically important that the measures used in experience sampling are developed and validated with the same rigor as is applied to traditional trait measures. This is even more important given the necessity for brevity of measures in experience sampling studies to minimize participant burden and ensure acceptable levels of retention. This need for brevity creates a greater impetus to ensure that the items selected for use in experience sampling are optimized in terms of reliability and validity.
High reliability is especially important when experience sampling measures are being used to operationalize traits involving variance or covariance over time (e.g., emotional lability; Mejía et al., 2014). There have, however, been very few psychometric studies of the measures utilized in experience sampling studies (see, for example, Carlson et al., 2016; Edmondson et al., 2013, for exceptions). Many studies use new measures that have undergone practically no prior validation or have adapted preexisting trait measures. However, because trait measures were not designed to capture to momentary experiences, there is no guarantee that—even with adaptation—they function as intended in experience sampling studies that seek to capture “real-time” processes. In this study we, therefore, present a set of measures developed specifically for the purpose of aggression experience sampling research and validated in a pilot sample.
Method
Ethics
Ethical approval for the study was obtained from the Institute of Criminology, University of Cambridge ethics review board. All participants provided full informed consent to participate in the study.
Measures
To develop our experience sampling measure, we began by defining our construct (“aggression”), and then generated a large number of items conforming to the definition of this construct, drawing on the theoretical and empirical literature pertaining to aggression and its antecedents. Aggression is commonly defined as a behavior directed toward another, that is carried out with an immediate intent to cause harm and where the victim is motivated to avoid the harm (Anderson & Bushman, 2002). Participant interviews with n = 10 volunteers identified any issues with items, for example, lack of clarity or variance in item interpretations. No major issues with item wording or clarity were identified in this pretest phase. Items which were perceived to be redundant with one another by interviewees were candidates for elimination. A small number of items considered by the project team members with expertise in aggression to be the items of highest content validity were administered in the below-described experience sampling survey. Selection was based on independently reviewing the items, followed by a discussion to reach a consensus on which items should be selected for the survey. The data from the survey were used to evaluate reliability and validity of the items.
Experience sampling measures were developed for our focal construct of aggression (12 items), and also for provocations (eight items) to provide a measure of criterion validity. These items are provided in Appendices A and B. Based on past theory and evidence, we predicted that provocation will be strongly related to aggression (e.g., Anderson & Bushman, 2002). Aggression items were selected to cover both reactive and proactive aggression, as well as both physical and social/indirect forms of aggression, reflecting contemporary theory. We focused on more common manifestations of aggression, because serious aggression is a rare event unlikely to occur in experience sampling timeframes. We also aimed to identify items that had the greatest universal applicability, as our goal was to develop a core of items that could be used validly across demographic and cultural groups.
The aggression items that were selected for the experience sampling survey are provided in Table 1. All were measured on a 5-point scale from very slightly or not at all to extremely. In addition, the negative affect items of the Positive Affect Negative Affect Schedule–Expanded Form (PANAS-X; Watson & Clark, 1999) scale administered to provide a measure of hostile affective state. Again, hostile affective state would be expected to be associated with aggression based on past evidence of an association between aggression and hostility, especially reactive aggression (e.g., Ramirez & Andreu, 2006). The reliability and validity of the PANAS-X and the original Positive Affect Negative Affect Schedule (PANAS; which has same format but a smaller number of items) have been supported across a large number of studies and has been widely used in a number of past experience sampling studies (e.g., Crawford & Henry, 2004; DeWall et al., 2012). In addition, context at the time of responding to the experience sampling questionnaires was measured using items adapted with permission from Juslin and Västfjäll (2008). Alcohol use was measured using an item adapted with permission from Dimeff, Baer, Kivlahan, and Marlatt (1999). These measures are not analyzed here but make the overall questionnaire administered more realistic with respect to the kinds of experience sampling surveys that may be conducted in crime and aggression research.
Descriptive Statistics for Aggression Measures.
Participants were also administered an intake questionnaire at the beginning of the survey to collect data for sample characterization purposes. This measure included demographic, mental health, and victimization questions. An end of survey questionnaire collected information on their mental health and crime victimization during the survey, and their subjective experiences of burden resulting from participation in the survey.
Participants
Twenty-three participants were recruited from the local University community and via social media to complete the experience sampling survey. Our sample thus represents a convenience sample. Participants were eligible to take part if they had access to a smartphone which ran iOS or android operating systems. It is estimated that smartphone penetration is 85% among adults (aged 16-75 years) in the United Kingdom (Deloitte, 2017), with iOS and android together accounting for 99.6% of the market share of smartphones (Thai & Page-Gould, 2018).
Participants were offered prize draw entry as an incentive to take part. Participants had a median age of 22 years, ranging from 20 to 29 years, and were all resident in the United Kingdom at the time of the survey. Due to a technical fault, gender was not recorded by the experience sampling application. In terms of ethnicity, 65% identified as White, 9% as mixed/multiple ethnicities, 17% as Asian/British Asian, 4% as Black/African/Caribbean/Black British, and 4% as “Other ethnic group.” In terms of self-rated health, 35% rated their health as “excellent,” 43% as “very good,” 13% as “good,” and 9% as “fair.” No participants rated their health as “poor.” In terms of mental health, 52% reported that in general they feel worried, tense, or anxiety, while 17% reported that in general they feel sad, blue, or depressed. Twenty-six percent of the sample reported that they had ever been a victim of a crime.
Procedure
After providing some basic demographic, health, and well-being information in an intake questionnaire, participants completed the experience sampling survey. The experience sampling survey lasted 8 days and involved answering survey questions when prompted 5 times a day. Prompts were scheduled to be random between 9:00 a.m. and 9:00 p.m. each day. This time window was selected because previous research has suggested that collecting data outside of normal “office hours” can yield additional insights (Kramer et al., 2014). However, this also had to be balanced against concerns about participant burden and the intrusiveness of measures. Therefore, early morning and night-time periods were excluded from the survey. The survey was implemented using a smartphone application provided by LifeData LLC, which participants downloaded and used on their own smartphones. The first page in the application reminded participants of study information and collected informed consent. As the smartphone application is available on both iOS and android, this meant that the vast majority of smartphone users would be theoretically able to take part. At the end of the survey, participants were invited to complete another brief survey on their smartphones. This survey also asked about general health and well-being, as well as crime victimization and their experiences of participating in the study; however, only 12 participants completed this.
Statistical Procedure
Scale reliability was assessed by computing multilevel omega following the method outlined by Geldhof, Preacher, and Zyphur (2014). Omega is similar to Cronbach’s alpha but does not assume tau equivalence and is, therefore, usually more appropriate than alpha. Here, a multilevel confirmatory factor analysis (CFA) is fit to disentangle between-person and within-person factor structure. Loadings from the CFA are then used to compute alpha and omega. Further validity evidence was provided by computing the within-person level associations between aggression, provocations, and hostile affective state. We focused on these criterion associations rather than other possible criterion associations (e.g., substance use, context, company, other affective states) because hostility and provocation could be assumed to show sufficient within- and between-person variation over the time frame of the study and to be among the most proximal causes of aggression (e.g., Anderson & Bushman, 2002). All models were estimated in Mplus 7.0 using robust maximum likelihood estimation (MLR; Muthén & Muthén, 2012). In using robust maximum likelihood estimation, we assume that item distributions approximate a continuous distribution. This is typically justifiable if items have at least five response categories (Rhemtulla, Brosseau-Liard, & Savalei, 2012). We did this in preference to using an ordered categorical estimator (weighted least squares means and variances; WLSMV in Mplus) because robust maximum likelihood estimation can better handle missing data. Although maximum likelihood estimation provides unbiased estimates provided data are missing at random (MAR), weighted least squares means and variances estimation uses pairwise deletion, which only produces unbiased parameter estimates if data are missing completely at random (MCAR), an unrealistic assumption for most applications.
Results
Descriptive Statistics
Data collected in the prompts were more than 80% complete (17%-18% missingness for the aggression items) with 68% of responses provided within 20 min of the prompt (minimum = 1 min, maximum = 508 min). Item Ns, means, and standard deviations across prompts and participants for the newly developed measures are provided in Table 1 along with the intraclass correlation coefficients. The means and standard deviations of the aggression items suggested that most of the aggressive behaviors included in the study were relatively uncommon, especially physical acts of aggression. Furthermore, intraclass correlation coefficients suggest that the majority of variation occurred within rather than between people; however, at the item level, within-person variance includes both reliable and error variance.
Reliability
A multilevel CFA with aggression specified as unidimensional at both the between-person and within-person level initially fit poorly. Fit was improved by specifying separate physical and nonphysical aggression factors at the within-person level. However, these two factors were highly correlated at the between-person level (r = .93); thus, a two-factor model was specified at the within-person level and a one-factor model was specified at the between-person level. In addition, it was necessary to include a residual covariance between Items 7 and 8 to model their excess covariance at both the within-person and between-person level. This model fit well according to root mean square error approximation (RMSEA) = .058, but poorly according to Tucker–Lewis Index (TLI) = .80, and comparative fit index (CFI) = .84, probably reflecting the lack of model parsimony due to a need to model two levels (between- and within-person) and the fact that we assumed that items were continuous when in fact they included only five response options. Given that all standardized factor loadings were all >.50 (25% overlapping variance and “fair” to “good” according to Comrey & Lee, 1992, classification), that RMSEA was <.08, and that no large modification indices remained, we judged this model as acceptable for the purposes of estimating reliability.
Omega at the within-person level was .79 and .76 for the verbal/social aggression factor and physical aggression factor, respectively. Omega at the between-person level was .89. The within-person correlation matrix between hostility, provocation, and aggression is provided in Table 2. These correlations indicate that participants were more likely to express verbal/social aggression when in a more hostile affective state and when provoked. However, they were no more likely to express physical aggression in these circumstances. The between-person correlation matrix for hostility, provocation, and aggression is provided in Table 3. These correlations indicate that individuals who tend to be provoked more often also tend to behave aggressively more often. However, they suggest that those individuals who experience hostile states more often/intensely have no greater tendency toward aggression than those who experience these affective states less often/intensely.
Within-Person Correlation Matrix.
p < .05.
Between-Person Correlation Matrix.
p < .05.
End of Study Survey
Among the 12 participants who completed the end of survey study, 58% reported that their mental health was not good for at least 1 day of the survey and nobody reported being a victim of a crime. When asked whether, during the total period of the study, the prompts bothered them, 25% answered “not at all,” 33% answered “rarely,” 25% answered “sometimes,” 8% answered “quite a bit,” and 8% answered “very much.” When asked whether the prompts disrupted their daily routine, 25% answered “not at all,” 17% answered “rarely,” 33% answered “sometimes,” 17% answered “quite a bit,” and 8% answered “very much.”
Discussion
In this study, we sought to develop and pilot a new measure of aggression specifically for use in experience sampling research. The process involved construct definition, item generation, participant interview, item review for content validity, and then piloting in a convenience sample. The result was a 12-item measure of aggression, which showed high within- and between-person reliability as well as criterion validity. These properties are essential for a measure of aggression for use in experience sampling studies that seek to understand the momentary influences on aggression, and related questions such as the effects of individual difference traits on the relations between provocation and aggressing, or the effects of day-to-day experiences in the long-term development of aggression. They are also consistent with the idea that aggression shows both systematic trait and state variance.
For general population samples, a nine-item variant excluding acts of physical aggression could be administered as our results suggested that physical aggression was uncommon. This nine-item version measures verbal/social aggression and showed high reliability at the within-person level, supporting its utility for studying the “momentary” influences on aggression. When the goal is to understand more serious acts of aggression in, for example, high-risk samples, the three physical aggression items may be preferred. Together, these three items yielded a scale with high within-person reliability. In terms of criterion validity, only the verbal/social aggression items were correlated at the within-person level with provocation and hostile affective states. This likely reflects the fact that milder forms of aggression are almost always more likely than severe forms to occur in practice, owing to the substantial costs and risks associated with responding to internal psychological states and external provocations in a physically aggressive manner.
Our results build on previous work that has shown the feasibility and value of experience sampling studies in aggression and crime research (e.g., DeWall et al., 2012; Kashdan et al., 2013; Lim, Ilies, Koopman, Christoforou, & Arvey, 2016). DeWall et al. (2012), for example, found in a study of 168 undergraduates reporting their daily social interactions that affective states characterized by gratitude were associated with lower aggression. Using a similar design, Kashdan et al. (2013) found that in a sample of 110 undergraduates, curiosity was associated with lower aggression. However, previous studies have not had a measure of aggression specifically developed and validated for use in experience sampling available to them.
Illustrating the broad scope of experience sampling in crime and aggression research and taking a more situational perspective, Solymosi et al. (2015) used experience sampling to map areas of high fear of crime. Other potential applications include the delivery and evaluation of interventions to reduce interpersonal violence, linkage with physiological data to capture immediate biological antecedents and effects of aggression, and long-term repeat measures of experience sampling to examine how day-to-day profiles of aggression change over developmental time. A large number of longitudinal studies have tracked the development of aggression from childhood to adulthood; however, none have examined how “day-to-day” thoughts, emotions, behaviors, and experiences related to aggression change over these time spans. Incorporating experience sampling into longitudinal studies would thus have enormous value for understanding how levels and manifestations of aggression change over different stages of development.
It is, however, important to acknowledge the potential limitations of experience sampling in crime and aggression research. First, measurement reactivity is a potential challenge. This refers to the phenomenon whereby taking part in an experience sampling study changes a participant’s behavior by making the constructs under study more salient (e.g., Telford et al., 2012). Second, experience sampling may be experienced as more burdensome or intrusive than traditional questionnaire measures, making recruitment and retention more difficult. In the current study, 20 of 23 participants completed all experience sampling measures suggesting that they were not overly burdensome. However, only 12 completed the final survey at the end of the data collection period. This suggests that the survey—which consisted of 12 items in addition to the day’s experience sampling measures—may have been perceived to be too burdensome to complete on a smartphone. Another possibility is that the participants misperceived the study as complete on receipt of the final notification and did not read on to find the final survey. Unfortunately, as we did not have any information on participants who did not complete the final survey (nor on the three participants who dropped out during the experience sampling phase), it is not possible to ascertain why such a large proportion of the sample dropped out at this final stage. Third, the use of mobile technologies in many of the situations that aggression and crime researchers are interested in may not be feasible. Mobile phone use is, for example, generally restricted in prisons and may not be safe in areas of high crime and violence. Finally, although experience sampling improves upon questionnaire measures in that it minimizes problems of retrospective recall and providing data out of ecological context, there remains no guarantee that participants accurately report on their emotions, behaviors, and experiences. Participants could still be prone to response biases and/or forgetting over the short recall periods of experience sampling.
In terms of the limitations of the current study, a main limitation is that no gold standard measure against which to test the criterion validity of the developed measures exists. In fact, the lack of validated experience sampling aggression measures was a major impetus for the current study. The sample size was also small; therefore, the study was likely underpowered to detect small effects that may have nonetheless represented substantively interesting phenomena. It was also skewed toward university students in a Western, industrialized, and high-income democratic nation; thus, replication in larger more varied samples will also be required. In particular, our results require replication in samples with a greater diversity in terms of geography, race, religion, ethnicity, language, age, ability, sex, gender identity, and sexual orientation to evaluate whether the tool developed is valid across a wider range of individuals and situations. This is particularly important given that levels and types of aggression victimization and perpetration are known to vary across groups. Similarly, due to a technical failure, the software did not record gender. We were, therefore, unable to establish the extent to which males and females were adequately represented in the sample. We also did not know how many people were ineligible to take part due to a lack of access to an iOS or android smartphone; however, given that the majority of young adults in the United Kingdom have access to a smartphone and that the vast majority of smartphones run iOS or android, we believe this would only account for a very small proportion of prospective participants (Deloitte, 2017; Thai & Page-Gould, 2018). Finally, although we tested the face validity, reliability, and criterion validity of the scale, other forms of validity will need to be tested including, for example, equivalence of functioning across genders or discriminant validity. It may also be valuable to explore specific adaptations or augmentations of the measure for particular groups. For example, adding items related to dating violence may be particularly relevant for participants in late adolescence and early adulthood. The set of criterion measures could also be expanded to include other theoretically relevant predictors of aggression in the moment. Similarly, expanding the time window of the experience sampling survey beyond 9:00 p.m. has the potential to reveal associations with aggression in the context of nightlife, including further illuminating the role of substance use in aggression.
Conclusion
The current study presents the aggression experience sampler (Aggression-ES) scale for use in aggression experience sampling studies. The purpose of the Aggression-ES is to help meet the growing demand for valid and reliable scales that can be used in experience sampling studies focused on understanding the long- and short-term antecedents and consequences of aggression. The scale showed high within-person and between-person reliability and evidence of criterion validity when assessed against hostile affective state and provocation. These results provide support for the use of the Aggression-ES in experience sampling studies of aggression. The current study presents the first validation of the scale; future studies will be important to build on this initial evidence on its psychometric properties. With the growing popularity of experience sampling, more studies are needed to develop and validate experience sampling measures of other constructs. Presently, experience sampling psychometrics is underresearched.
Footnotes
Appendix
Provocation Items Used in Experience Sampling Survey.
| In the last 30 min . . . |
| Someone offended me? |
| Someone interfered with my goals? |
| Thought ahead to an unpleasant event? |
| Thought about a time when someone had annoyed me? |
| Someone made fun of me? |
| Felt manipulated? |
| Someone tried to start an argument with me? |
| Someone took their bad mood out on me? |
Authors’ Note
Tridip Jyoti Borah and Aja Louise Murray contributed equally.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
