Abstract
The Buss-Perry Aggression Questionnaire (AQ) is a self-report measure of aggressiveness commonly employed in nonforensic and forensic settings and is included in violent offender pre- and posttreatment assessment batteries. The aim of the current study was to assess the fit of the four-factor model of the AQ with violent offenders (N = 271), a population for which the factor structure of the English version of the AQ has not previously been examined. Confirmatory factor analyses did not yield support for the four-factor model of the original 29-item AQ. Acceptable fit was obtained with the 12-item short form, but careful examination of the relationships between the latent factors revealed that the four subscales of the AQ may not represent distinct aspects of aggressiveness. Our findings call into question whether the AQ optimally measures trait aggressiveness among violent offenders.
Contemporary theories of aggression such as the general aggression model (GAM; Anderson & Bushman, 2002) conceptualize trait aggressiveness as a multidimensional construct consisting of distinct but related interactive behavioral, cognitive, and affective subtraits that together make up an individual’s preparedness to aggress (Anderson & Bushman, 2002). Empirical research indicates that an individual’s overall trait aggressiveness influences whether and how frequently he or she behaves violently: Previous studies have found that self-report measures of aggressiveness differentiate groups of violent and nonviolent offenders (Barnett, Fagan, & Booker, 1991; Brennan, Moore, & Shepherd, 2010; Helfritz et al., 2006; McNiel, Eisner, & Binder, 2003; Smith, Waterman, & Ward, 2006; Troisi & Argenio, 2006). Among violent offenders, greater aggressiveness is also associated with indicators of persistence of violence, such as number of convictions for violent offences (Fiqia, Lang, Plutchik, & Holden, 1987; Serin, Gobeil, & Preston, 2009) and greater estimated risk of reoffending (Polaschek, Bell, Calvert, & Takarangi, 2010; N = 30, r = .39 to.41, p < .05).
Hostility and anger—cognitive and affective subtraits of aggressiveness—have also been empirically linked with violent offending. For example, results of a meta-analysis indicated that spousal assaulters were significantly more hostile than nonassaultive men (d = 0.58, 95% CI = [0.45, 0.63], k = 14; Norlander & Eckhardt, 2005). Greater hostility has been linked with increased persistence of both institutional and community violence (James & Seager, 2006; Palmer & Thakordas, 2005; Serin & Kuriychuk, 1994), and can be predictive of violent reoffending (Hanson & Morton-Bourgon, 2004; Hanson & Wallace-Capretta, 2004; McNiel et al., 2003; Menzies & Webster, 1995; van der Put et al., 2012). Turning to the affective component of aggressiveness, anger was also found to significantly differentiate between spousal assaulters and nonassaultive men in Norlander and Eckhardt’s (2005) meta-analysis (d = 0.47, 95% CI = [0.34, 0.45], k = 25). Increased anger has been linked with more institutional and community violence among forensic psychiatric patients (Doyle & Dolan, 2006; Skeem et al., 2006). It is worth noting, however, that results of several reoffending studies suggest that anger may not be predictive of violent reoffending for all offenders (see Loza & Loza-Fanous, 1999; Mills & Kroner, 2003), but may instead be limited to specific subtypes of violent offenders such as spousal assaulters (Grann & Wedin, 2002).
Taken together, the results of the studies reviewed above are consistent with the contention that trait aggressiveness is related to violent offending and predictive of violent recidivism. Thus, measuring this construct accurately and comprehensively is important to the scientific study of violence as well as to the continued development of effective assessment and treatment strategies for violent offenders (Norlander & Eckhardt, 2005).
The Aggression Questionnaire (AQ)
The AQ (Buss & Perry, 1992) is the focus of the current study. The AQ is a commonly used self-report measure of aggressiveness in forensic and nonforensic settings (Diamond, Wang, & Buffington-Vollum, 2005). For example, it has been used in pre- and posttreatment assessment batteries in the Canadian Violence Prevention Program (VPP) and the National Sex Offender Treatment Program (NaSOP; Correctional Service of Canada [CSC], 2009; Kingston, Yates, & Olver, 2013). The four subscales of the AQ are believed to measure related but distinct behavioral (Physical and Verbal Aggression), affective (Anger), and cognitive (Hostility) aspects of aggressiveness. The authors of the scale initially derived these four subscales using exploratory factor analyses (EFA; Buss & Perry, 1992). Confirmatory factor analyses (CFAs) were then conducted to verify the viability of a four-factor model, which successfully replicated. To empirically test the assumption that the four factors map onto the same overarching construct (trait aggressiveness), the authors also tested a hierarchical model in which a second-order factor of trait aggressiveness was added to the original four-factor structure. The hierarchical model also met the authors’ criteria for adequate model fit, supporting the contention that the AQ measures four distinct but related factors (Buss & Perry, 1992). Following Buss and Perry’s (1992) analyses, a sizable body of research has examined the latent structure of the AQ in nonforensic samples. Most researchers have concluded that, with minor modifications, such as removing the two negatively keyed items, the four-factor model provided adequate fit for their data (Archer, Kilpatrick, & Bramwell, 1995; Harris, 1995). However, some researchers have argued that the methodology employed in many factor analyses of the AQ may not have been rigorous enough to warrant confidence in the adequacy of the four-factor model (Bryant & Smith, 2001). Bryant and Smith’s (2001) multigroup, multimethod study addressed some of these potential shortcomings. They tested a number of variants of the four-factor and hierarchical models proposed by Buss and Perry (1992); none of these models provided adequate fit. Consequently, the researchers sought to develop a better measurement model by removing items (a) that did not load sufficiently on their designated factors (<.40), (b) that loaded heavily on more than one factor (≥.40), or (c) which content did not appear to reflect the direct endorsement of aggression. This improved the conceptual clarity and distinctness of AQ subscales and increased the proportion of variance accounted for by the latent factors. The resulting short form of the AQ (AQ-SF) consisted of 12 items. The four-factor model and its hierarchical extension, which included a higher-order factor of aggressiveness, provided adequate fit with the AQ-SF.
The reduced length and improved measurement properties of the AQ-SF increase its potential utility in correctional settings (Diamond & Magaletta, 2006). To date, four studies have conducted CFAs with one or more variants of the English-language AQ in offender samples (i.e., original AQ only, k = 1 [Williams, Boyd, Cascardi, & Poythress, 1996]; AQ and AQ-SF, k = 2 [Diamond et al., 2005; Pettersen, Nunes, & Cortoni, 2016]; AQ-SF only, k = 1 [Diamond & Magaletta, 2006]). Results of these studies are inconsistent. Williams et al. (1996) examined the fit of the simple four-factor model in a sample of (primarily) nonviolent men (62%) and women (38%) pre-trial detainees (N = 200). Three of the four cited fit indices did not meet criteria for adequate fit. A subsequent EFA suggested that a two-factor structure consisting of physical aggression/anger and verbal aggression/hostility best accounted for the data. This model has failed to replicate in subsequent studies (e.g., Diamond et al., 2005), and has since been criticised for accounting for too little variance to represent an acceptable measurement model (i.e., < 50%; Bryant & Smith, 2001).
Diamond and colleagues (2005) tested seven competing models of the AQ and AQ-SF in a sample of mentally disordered male offenders (N = 786). All models of the original 29-item AQ, as well as variants with 27 and 26 items, failed to provide adequate fit. Good fit, however, was obtained for the four-factor structure with the AQ-SF (root mean square error of approximation [RMSEA] = 0.058, comparative fit index [CFI] = .095, goodness-of-fit index [GFI] = 0.96, and the Tucker–Lewis index [TLI] = 0.94). Diamond and Magaletta (2006) subsequently successfully replicated the four-factor model of the AQ-SF with 916 federally incarcerated adult men (RMSEA = 0.051, CFI = .096, and TLI = 0.95). We tested the fit of the four-factor model of the original 29-item AQ, a 26-item variant, as well as the 12-item AQ-SF in a sample of incarcerated sexual offenders (N = 293; Pettersen et al., 2016). Good fit for the four-factor model was obtained only with the AQ-SF (RMSEA = 0.07, CFI = .97, TLI = 0.96, and the weighted root mean square residual [WRMR] = 0.70).
In line with existing multifactor theories of aggression, such as Anderson and Bushman’s (2002) GAM, which conceptualizes behavioral, cognitive, and affective indices of aggression as distinct but interrelated constructs, the factor analytic studies examined above support the viability of the four-factor model of the original 29-item AQ in nonforensic populations. In forensic samples, however, acceptable fit for the four-factor model has only been obtained for the 12-item AQ-SF. Nevertheless, when results of previous factor analyses are considered together, the bulk of the existing evidence supports the theoretical distinction between the four constructs. Results of other studies of the AQ also point to adequate concurrent and criterion validity for the four subscales. The Physical Aggression subscale has shown moderate to strong correlations with other measures of physical aggression and violent behavior in offender and student samples (Diamond & Magaletta, 2006; Tremblay & Ewart, 2005). For example, in Diamond and Magaletta’s (2006) study, the Physical Aggression subscale correlated moderately and significantly with the overall Aggressiveness scale of the Personality Assessment Inventory ([PAI; Morey, 2003]; r = .69; Diamond & Magaletta, 2006). The convergent and discriminant validity of the Physical Aggression subscale of the AQ was supported: Physical Aggression on the AQ correlated most strongly with the Physical Aggressiveness subscale of the PAI (r = .71, p < .01), relative to the remaining Aggressiveness scales (r = .55 to .57, p < .01). Similarly, the Verbal Aggression subscale has shown moderate correlations with other measures of overall aggressiveness and verbal aggression in both forensic and nonforensic samples. Diamond and Magaletta’s study supported the convergent validity of this subscale as well: The Verbal Aggression subscale correlated most strongly with the Verbal Aggressiveness subscale of the PAI (r = .44, p < .01). The Anger subscale has shown small to moderate correlations with other measures of anger such as self-reported angry feelings in response to Provoking Situation Vignettes (r = .24, p < .05; Tremblay & Ewart, 2005) and scores on the Irritability scale of the PAI (r = .46, p < .01; Diamond & Magaletta, 2006). Finally, the Hostility subscale of the AQ has shown modest but significant correlations with overall Aggressiveness on the PAI (r = .38, p < .01), as well as with measures of distinct but related aspects of aggressiveness, such as verbal and physical aggression (r = .30 to .36, p < .01; Diamond & Magaletta, 2006), and moderate associations with other measures of hostility (e.g., the Resentment scale of the PAI; r = 49, p < .01; Diamond & Magaletta, 2006).
Taken together, results of previous factor analyses as well as other scale validation studies support the criterion validity of the subscales of the AQ. Nevertheless, studies differ greatly in terms of how many items can be retained without jeopardizing the distinctness of constructs assessed. It is possible that the different findings in studies of offenders and nonoffenders are due to differences in methodology and standards for model acceptance between studies. Another possibility is that the AQ does not measure the same construct equivalently across groups. Measurement invariance can be assessed by examining the “behavior” of measures in multiple distinct groups at once (i.e., a multigroup study), or, more commonly, it is inferred from findings of multiple studies in different populations over time. Measurement noninvariance could have important implications in terms of the substantive interpretation of offenders’ AQ scores, the utility of the AQ in identifying offenders’ treatment needs, and for offender management strategies. If we do not know what we are measuring, it is difficult to determine whether (and how) a factor such as aggressiveness might affect offenders’ risk of reoffending and whether this risk can be reduced in treatment.
The primary goal of the current study was to examine the viability of the four-factor structure of the AQ in a single sample of violent offenders, a population for which the factor structure of the English-language AQ has not previously been assessed. We examined the fit of the four-factor model with the original 29-item AQ and the 12-item AQ-SF. When the data permitted, we also tested the viability of the hierarchical model that included a second-order factor of aggressiveness. We hypothesized that the four-factor structure of the original 29-item AQ would provide suboptimal fit and that removal of the two negatively keyed items would result in appreciable improvement in model fit. The four-factor structure and its hierarchical extension were expected to provide good fit for the AQ-SF.
Method
Participants
The sample consisted of adult men who participated in VPP between 2000 and 2004. The VPP is an intensive cognitive-behavioral treatment (CBT) program for incarcerated violent offenders who are at high risk of reoffending, who have two or more separate convictions for violent offences, and whose needs are not better met in family violence or sexual offender treatment programs (CSC, 2009). The AQ (Buss & Perry, 1992) was part of an assessment battery administered prior to, and upon completion of, treatment (Cortoni & Nunes, 2005). Of the total sample of 877 offenders in the database, a subsample of 271 had pretreatment item-level data on the AQ. Of these, none had missing items. All analyses were conducted using data from these 271 offenders.
Sample characteristics
At the time of pretreatment assessment, participants’ age ranged from 19 to 60 years (M = 33.34, SD = 8.31). The majority of the offenders were White (66.79%), 12.18% were Metis, 12.8% were Aboriginal, and 5.53% were Black. Other races accounted for the remaining 3.32% of the sample. One hundred twenty-five offenders (46.13%) were in a common-law relationship or married, 5.17% were separated or divorced, and 48.34% were single. Marital status was unknown for one offender. On average, the offenders had committed 3.99 violent offences (SD = 3.57, Mdn = 3). The majority of offenders in the sample (95%) were at high risk to reoffend, as indicated by an actuarial (general) risk assessment instrument (i.e., their scores fell in the “poor” and “very poor” risk categories on the SIR-R1; Nafekh & Motiuk, 2002).
Measures
The AQ
The AQ (Buss & Perry, 1992) consists of a total of 29 items divided into four subscales; Physical Aggression (nine items), Verbal Aggression (five items), Anger (seven items), and Hostility (eight items). Participants responded on a 5-point Likert-type scale from 1 (extremely uncharacteristic of me) to 5 (extremely characteristic of me). Total scores can range from 29 to 145 with higher total scores indicating greater aggressiveness. Overall, previous research indicates that the AQ has acceptable reliability and validity. AQ total and subscales scores correlate with peers’ ratings on corresponding areas (Buss & Perry, 1992; O’Connor, Archer, & Wu, 2001). Violent offenders have been found to score significantly higher than nonviolent offenders (Diamond & Magaletta, 2006), and higher scores on the Physical Aggression subscale is significantly associated with greater estimated risk of violent reoffending (Selenius, Hellström, & Belfrage, 2011). Previous studies report acceptable internal consistency for the AQ (e.g., Bryant & Smith, 2001). In the current study, internal consistency was good for both total and subscale scores of the AQ (Table 1).
Mean Scores and (Standard Deviations), Ranges, and Internal Consistency of the Original 29-Item AQ and the 12-Item AQ-SF.
Note. AQ-SF = Aggression Questionnaire–Short Form.
Statistical Analyses
A total of 271 offenders had complete pretreatment data on the AQ and were included in analyses. Pretreatment scores were used rather than posttreatment scores because it allowed for direct comparisons of findings with those of previous studies. Due to the categorical nature of the items, we examined polychoric correlation matrices (Holgado-Tello, Chacón-Moscoso, Barbero-García, & Vila-Abad, 2010). Briefly, polychoric correlations are estimates of the Pearson’s Product–Moment correlations that would have resulted if the underlying continuous distribution crudely represented by the Likert-type scale had been available (Jöreskog & Sörbom, 1996). Although many of the items were not normally distributed, we did not transform or otherwise normalize the data because polychoric correlations are robust against violations of normality (Flora & Curran, 2004). CFAs were conducted in Mplus 6 (Muthén & Muthén, 2010). This software program was selected because it offers an estimation method—the mean and variance adjusted robust weighted least squares (WLSMV) estimator—that minimizes the likelihood of deriving inaccurate parameter estimates and fit values with nonnormal ordinal data (Brown, 2006).
Evaluation of overall model fit
Overall fit of the model is evaluated using fit indices. Following guidelines for thorough fit assessment in CFA, each model was evaluated using a combination of multiple fit indices (Jackson, Gillaspy, & Purc-Stephenson, 2009; Schreiber, Stage, King, Nora, & Barlow, 2006). We examined the χ2 GFI, the RMSEA (Steiger & Lind, 1980), the CFI, the TLI, and the WRMR. Small, nonsignificant χ2 values indicate perfect model fit. However, the χ2 index is very sensitive to model misspecification and is often significant even with good fitting models. Because perfectly fitted models are rarely obtained with real data, methodologists recommend interpreting the χ2 index cautiously and in conjunction with other indices, and this is how it was used in the current study (Browne & Cudeck, 1993; Kline, 2011). Fit index values considered indicative of adequate model fit were ≤ 0.08 for the RMSEA, ≥ 0.95 for the CFI, ≥ .95 for the TLI, and ≤ 0.95 for the WRMR (Breivik & Olsson, 2001; Brown, 2006; Jackson et al., 2009; Muthén & Muthén, 2010).
Localized fit
Localized model fit was assessed by careful examination of the polychoric and residual correlation matrices, as well as factor loadings and their associated standard errors. Following CFA guidelines, modification indices (MIs) were considered complementary sources of information about local fit. No respecifications were made based on MIs alone (MacCallum, Roznowski, & Necowitz, 1992). MI values of ≥ 3.84 (p < .05) are indicative of potential for significant model improvement (Whittaker, 2012).
Criteria for model acceptance
To maximize the consistency and accuracy of model fit evaluations, we established clear a priori criteria for acceptable model fit: (a) Three or more fit indices should meet the cutoff criteria specified above, (b) no factor loadings should fall below the conventional cutoff of .40 (Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009), (c) cross-loadings should not exceed .40 (Kline, 2011), (d) few residual correlations should fall below (−.10) or exceed (+.10), and (e) latent factor correlations should be no larger than .84 for the factors to be considered distinct (Brown, 2006).
Power
To determine whether there was adequate power to accurately estimate model fit, power was assessed using the Monte Carlo techniques available in Mplus 6 for CFA with categorical items. Population parameter values derived from the initial model were inputted. In accordance with Muthén and Muthén’s (2002) recommendations, 10,000 iterations and three random sampling points (i.e., seed values) were specified. Acceptable statistical power to accurately estimate single parameters was considered present if (a) no parameter bias exceeded 10%, (b) bias in factor loadings and latent factor correlations did not exceed 5%, and (c) coverage values fell between 0.91 and 0.98 (Muthén & Muthén, 2002). Turning to global estimates, Mplus 6 also allows for power analyses of the χ2, RMSEA, and WRMR fit indices. The following were considered indicative of adequate power to accurately estimate the χ2 GFI: (a) The percentage of replications in which the χ2 statistic exceeded its critical value was low, (b) in replications in which the critical value was exceeded, the expected and χ2 were close in terms of absolute value, (c) the average χ2 obtained was similar in magnitude to the model df, and (d) the variance of χ2 across replications was not exceedingly large compared with 2df. Criteria (a) and (b) were also applied to the RMSEA and the WRMR.
Results
Descriptive statistics and internal consistency for the original 29-item AQ and the 12-item AQ-SF are reported in Table 1. Coefficient alpha can become distorted when used as an index of reliability with ordinal data. An alternative reliability index, ordinal alpha, was therefore computed for the AQ based on output from the CFA models (Table 1; see Zumbo, Gadermann, & Zeisser, 2007, for the formula for ordinal alpha).
Four-Factor Model of the Original 29-Item AQ
All items were specified to load only on one factor. No error terms were permitted to covary. The model was overidentified, and no errors were reported. The overall fit of the four-factor structure was inadequate. Fit index values are reported in Table 2. Only the RMSEA met the cutoff for acceptable model fit. Localized fit was also poor: The negatively keyed Items 7 and 18 from the Physical Aggression and Anger subscales and Item 10 from the Verbal Aggression subscale had standardized factor loadings that fell below the a priori cutoff of .40 (see Table 3 for factor loadings and item content). Moreover, the model tended to overestimate the relationships between items specified to load on separate factors. Specifically, 15% of residual correlations fell outside of the acceptable range of (−.10) to (+.10) as defined in the “Statistical Analyses” section. Finally, estimated latent factor correlations were very large, suggesting that the hypothesized four latent factors were not distinct when all 29 items were included. Estimated latent factor and subscale correlations are reported in Table 4. Testing of hierarchical models is only purposeful if the original (simple) factor model first demonstrates adequate fit (Brown, 2006). Thus, the hierarchical extension of the four-factor model was not tested with the original 29-item AQ.
Fit Indices for Four-Factor and Hierarchical Models of the Original 29-Item AQ, the 26-Item AQ, and the 12-Item AQ-SF.
Note. Fit index cutoffs employed in the current study are provided in parentheses. Values indicating adequate fit are boldfaced. AQ-SF = Aggression Questionnaire–Short Form; RMSEA = root mean square error of approximation; CFI = comparative fit index; TLI = Tucker–Lewis index; WRMR = weighted root mean square residual.
Hierarchical model of the AQ-SF.
p < .01.
Factor Loadings and (Standard Errors) of the Four-Factor Model of the Original 29-Item AQ.
Note. Items removed from the respecified 26-item four-factor model are boldfaced. Items retained in the four-factor model of the 12-item AQ-SF are italicized. AQ-SF = Aggression Questionnaire–Short Form. Std. = standardized; λ = factor loading.
Estimated Latent Factor and Subscale Correlations (r) for the Original 29-Item AQ, the 26-Item AQ, and the 12-Item AQ-SF.
Note. Pearson’s (r) AQ total and subscale score correlations are in parentheses. Aggressiveness = second-order latent factor included in the hierarchical model. Estimated correlations ≥ .85 suggesting nonindependence of latent factors are boldfaced. AQ-SF = Aggression Questionnaire–Short Form.
p < .01.
Next, we examined the four-factor model excluding the three low-loading items. No other model respecifications were made. Both overall and localized fit remained inadequate. Only the RMSEA met the cutoff for adequate fit. Although unacceptably low factor loadings were eliminated from the respecified model, 13% of residual correlations still fell outside of acceptable range. As shown in Tables 2 and 4, criteria for testing a hierarchical model were not met for the 26-item AQ.
Four-Factor Model of the 12-Item AQ-SF
Next, we examined the fit of the four-factor model of the AQ-SF (Figure 1). The overall factor structure fit the data adequately. With the exception of the χ2, all fit indices met the a priori cutoffs for good model fit (Table 2). No factor loadings fell below the cutoff of .40. Residual correlations indicated that under and overestimations of item correlations were rare. The estimated latent factor correlations were sufficiently attenuated from previous models to allow for the hierarchical model to be tested (i.e., < .85; Brown, 2006).

The four-factor model of the 12-item AQ-SF.
Power
Monte Carlo analyses indicated that the model had adequate power to accurately estimate both single parameters and global fit. The criteria employed to assess power are detailed in the “Statistical Analyses” section above. All a priori criteria for adequate estimation of single parameters were met. In terms of power to assess global fit, the χ2 and WRMR were well approximated across replications, while the RMSEA was not (e.g., critical RMSEA was exceeded in more than 50% of replications). In summary, the power to correctly estimate single parameters was good in this model and the power to accurately estimate global fit was adequate.
Aggressiveness: Testing a hierarchical model with the AQ-SF
We explicitly tested the assumption that the four first-order factors measure distinct but related subtraits or aspects of a single overarching latent aggressiveness construct. The original model of the AQ-SF was therefore compared with a model that also included a second-order aggressiveness factor. Global fit indices indicated improved fit for the hierarchical model over the original four-factor model (Table 2). However, careful examination of the estimated latent factor correlations (Table 4) revealed that two first-order factors, verbal aggression and anger, were so highly correlated with the aggressiveness factor that they could not be considered distinct from it.
Discussion
The main purpose of the current study was to examine whether the four-factor model of the AQ (Buss & Perry, 1992) would replicate with violent offenders. As hypothesized, the model failed to provide adequate fit with the original 29-item AQ. Contrary to our expectations, exclusion of low-loading items, including two negatively keyed items, did not result in appreciable improvement in fit. Good fit for the four-factor model, however, was obtained for the 12-item AQ-SF. To empirically test the assumption that the four factors of the AQ-SF represent a single overarching aggressiveness construct, we examined the fit of a hierarchical extension of the original four-factor model that included a second-order aggressiveness factor. Our hypothesis was not clearly supported: The proposed second-order aggressiveness factor and the first-order verbal aggression and anger factors shared too much variance for these to be considered distinct.
We examined the reliability of the original 29-item AQ as well as the 12-item AQ-SF. Internal consistency estimates generally exceeded the conventional cutoff of .70 (Schmitt, 2011) and were comparable with those reported in previous studies (e.g., Dahlen, Martin, Ragan, & Kuhlman, 2004; Williams et al., 1996). With the exception of scores on the Hostility subscale, the current sample of violent offenders had appreciably higher mean total and subscale scores than those reported for student samples in previous studies (e.g., Buss & Perry, 1992).
There are multiple possible explanations for the unexpectedly large latent factor correlations obtained with the 29- and 26-item four-factor models of the AQ. It is possible that the severe multicollinearity between factors reflects the unsuitability of restrictive CFA models to multidimensional constructs such as aggressiveness (e.g., Asparouhov & Muthén, 2009; Marsh et al., 2011). Specifically, given that affective and cognitive components of aggressiveness may be mutually reinforcing and are thought to facilitate aggressive behavior, items representing these different subtraits of aggressiveness should be expected to correlate to some extent. In the current context then, the fact that we set item cross-loadings to zero could conceivably lead to the problem of inflated latent factor correlations. However, numerous successful applications of CFA to the AQ in previous research cast doubt on the current findings being attributable to problems with CFA as an analytic technique (e.g., Bryant & Smith, 2001; Buss & Perry, 1992; Diamond & Magaletta, 2006; Diamond et al., 2005; Harris, 1995).
The four-factor structure appears to be a better measurement model when the number of AQ items is reduced to 12. With the AQ-SF, the factor correlations indicated that, although closely related, the latent factors could be considered distinct constructs (Figure 1). However, as noted above, we also tested the assumption that the four latent factors map onto a common overarching aggressiveness construct. If the four subscales of the AQ-SF measure different aspects of global trait aggressiveness, to the relative exclusion of other constructs, then we would expect to see moderate positive correlations between first-order factors and the second-order factor of aggressiveness. However, the anger and verbal aggression factors correlated too strongly with aggressiveness to be considered distinct from it.
The high degree of overlap between latent factors is not a finding unique to the current investigation. For example, Bryant and Smith (2001) found that the factors of the AQ were not clearly distinct when all 29-items were included. Other researchers have also questioned the multidimensionality of the AQ (e.g., Bernstein & Gesn, 1997), and multicollinearity between latent factors of the original 29-item AQ was found in a recent CFA with a sample of sexual offenders (Pettersen et al., 2016). Pettersen and colleagues also found that the anger and verbal aggression factors were the most problematic in this regard.
More generally, studies using students, nonviolent offenders, and violent forensic inpatients have commonly found large correlations between AQ total scores and various measures of anger (e.g., r = .79, n = 224 [Dahlen et al., 2004]; r = .60 to .76, n = 138 to 206 [Hornsveld, Muris, Kraaimaat, & Meesters, 2009]; r = .86, n = 95 [Lindquist, Dåderman, & Hellström, 2005]), reactive aggression as compared with instrumental aggression (r = .47 vs. r = .19, n = 127; Haden, Scarpa, & Stanford, 2008), impulsivity (r = .70, n = 160; Archer et al., 1995), and negative emotionality (r = .52, n = 234; Sharpe & Desai, 2001). Hornsveld and colleagues (2009) reported large intercorrelations between AQ (and AQ-SF) scores and measures of state and trait anger among forensic patients and students, whereas scores on measures of psychopathy and antisocial lifestyle appeared unrelated to aggressiveness. One interpretation of these findings and of the results of the current study is that the AQ measures impulsive anger and its reactive behavioral manifestations (Haden et al., 2008), rather than overall aggressiveness.
The current study highlights the importance of thorough examinations of the performance of psychological measures in the various populations in which they may be employed (Brown, 2006). The method of assessment of psychological traits and the subsequent interpretation may have important practical implications. For example, if the AQ is a measure of reactive aggressiveness, it is possible that some other aspects of aggressiveness are not being attended to. Thus, aggressiveness in a subset of violent offenders whose offences may not be exclusively or even primarily affectively driven may not be appropriately measured with the AQ. This seems especially relevant to the current study: Our sample consisted of generally and persistently violent offenders. Many of these men may consider violence a rational and effective means to obtain some tangible goal, and not simply as a way to express negative emotion.
The current study has some important limitations. First, the sample was too small to lend itself to cross-validation procedures that could help increase confidence in the results. Nevertheless, our sample size is considered adequate for the CFAs we conducted. Second, the present study is also subject to limitations associated with CFA as a tool for the examination of unobservable psychological constructs. CFA is by its nature subjective and researcher bias and misinterpretation of various indicators of model fit are ever-present threats to sound model evaluation (Kline, 2011). It should be noted, however, that we made every effort to reduce the risk of biased model evaluation by establishing clear a priori criteria for adequate model fit. Third, our sample was exceptional in that it consisted of persistent serious violent offenders. It appears possible that different results would have been obtained with a sample more representative of the violent offender population as a whole. Nonetheless, it is important to examine the AQ factor structure with persistent serious violent offenders, and our findings are in most regards comparable with previous factor analyses of the AQ in forensic samples; the four-factor model fit only the 12-item AQ-SF. Importantly, the multicollinearity between factors of the AQ-SF was not discovered until the viability of the hierarchical model was explicitly tested. To the best of our knowledge, the hierarchical model has not been tested in previous CFAs of the AQ using forensic samples. It may be then, that our findings are the function of differences in methodological and analytic approach between the current study and past research. Future studies should continue to examine the viability of the theoretical assumptions underlying the structure and contents of the AQ and explicitly test hierarchical models.
Future Directions
Future studies should further examine the factor structure of the AQ with violent offenders. Given the heterogeneity of the violent offender population, future studies may benefit from examining the latent structure of the AQ separately among groups of men whose violent offences are believed to be differentially motivated (e.g., spousal assaulters vs. generally violent offenders). Future studies with offenders should also further examine the AQ-SF. Given that budgetary concerns, understaffing, and time restrictions often complicate the measurement of psychological characteristics in correctional settings, measures that are quick to complete, inexpensive, and predictive of specific behaviors can be especially useful (Diamond & Magaletta, 2006). In this regard, the AQ-SF may be more suitable than the original 29-item AQ.
Further examination of the AQ as a measure of aggressiveness may also be warranted. As mentioned above, some previous studies have found that the AQ has relatively modest discriminant validity (e.g., Archer et al., 1995), and severe multicollinearity between factors was uncovered in the current study. In addition, the content of the majority of items on the AQ appears to indicate a degree of emotional dysregulation (e.g., Item 1 “Once in a while I cannot control the urge to hit another person”), whereas relatively few seem to reflect carefully calibrated aggressive acts in pursuit of specific goals (i.e., instrumental aggression). Anger, impulsivity, and aggressiveness are likely closely related constructs and may all be relevant to research on violent offending and treatment of violent offenders. Nevertheless, determining the precise nature of the constructs assessed with self-report questionnaires such as the AQ is arguably of importance to our understanding of the psychology of violence. Advancing understanding of the latent dimensions of the AQ can help inform how the AQ items should be organized in subsequent studies, the relevance and role of these constructs in violent behavior, as well as if, and how, those constructs could be systematically assessed and used to further improve the accuracy of existing risk assessments and effectiveness of treatment aimed at reducing future violence.
Footnotes
Acknowledgements
We thank Roberto Di Fazio, Brian Grant, Colette Cousineau, Mark Latendresse, and others at the Correctional Service of Canada for facilitating access to and assisting with the preparation of the dataset.
Authors’ Note
The views expressed are those of the authors and do not necessarily represent the views of the Correctional Service of Canada.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
