Abstract
The Buss-Perry Aggression Questionnaire (AQ) is often used with sexual offenders, but its factor structure has never been examined in this population. The primary aim of this study was to assess the fit of the proposed four-factor model of the AQ reported in previous studies on a sample of incarcerated sexual offenders (N = 293). Results of a series of confirmatory factor analyses did not clearly support the four-factor structure of the full or short version of the AQ. Specifically, very large latent factor correlations suggested that the AQ may not measure a four-dimensional construct in the current sample. Only the physical aggression subscale was independently associated with estimated risk of sexual recidivism. Our findings suggest that the AQ is relevant to risk of sexual recidivism but call into question the appropriateness of the established subscales and their interpretation for sexual offenders.
Keywords
Trait aggressiveness—an individual’s preparedness to aggress (Anderson & Bushman, 2002)—is theoretically and empirically related to sexual offending (Hall & Hirschman, 1991; Hanson & Morton-Bourgon, 2005; Whitaker et al., 2008). Consequently, measures of aggressiveness are often used in the context of sexual offender assessment and treatment (e.g., the Canadian National Sexual Offender Program [NaSOP]; Correctional Service of Canada [CSC], 2009; Kingston, Yates, & Olver, 2014). Aggressiveness can be broadly defined as a multidimensional construct consisting of behavioral indicators such as physical and verbal aggression, as well as cognitive and affective subtraits like hostility and anger (Anderson & Bushman, 2002; Ramírez & Andreu, 2006).
Hostile cognitions have been theoretically linked with sexual aggression. For example, in the confluence model, hostile masculinity—an approach to women characterized by coercive dominance that is fueled by a perception of women as deceptive and malicious—represents a distinct pathway to sexual aggression by men against women (Malamuth, Check, & Briere, 1986; Malamuth, Sockloskie, Koss, & Tanaka, 1991). Similarly, “grievance thinking”—consistent and excessive rumination on perceived injustices—is thought to contribute to sexual offending by supporting sexual aggression and violence as viable means to personal gratification and vindication (Thornton, 2002, pp. 143-144; Wakeling & Barnett, 2011, pp. 274-275). Most multifactor theories of sexual aggression also include anger as a disinhibiting factor (e.g., Hall & Hirschman, 1991; Howells, Day, & Wright, 2004; Knight & Prentky, 1990; Polaschek, Hudson, Ward, & Siegert, 2001; Polaschek & Ward, 2002).
Generally, results of empirical research support the theoretical implications of aggressiveness and its subtraits to sexual offending. Elevated trait anger/hostility is one of the few problematic psychosocial features common to pedophilic offenders, exhibitionists, offenders with multiple paraphilias, and sexual offenders against adults (Lee, Pattison, Jackson, & Ward, 2001), and hostility has been found to significantly differentiate groups of sexual offenders and nonsexual offenders (d = 0.30, 95% confidence interval [CI] = [0.05, 0.50]; Whitaker et al., 2008). In terms of the relative magnitude of effects, Cohen’s d values of 0.20, 0.50, and 0.80 are considered small, medium, and large effects, respectively (Cohen, 1992). In a recent meta-analysis, greater hostility significantly predicted sexual recidivism (d = 0.17, 95% CI = [0.04, 0.31], k = 9), violent recidivism (d = 0.21, 95% CI = [0.08, 0.34], k = 6), and general recidivism (d = 0.31, 95% CI = [0.19, 0.43], k = 5) among sexual offenders (Hanson & Morton-Bourgon, 2004). In addition, Firestone, Nunes, Moulden, Broom, and Bradford (2005) found that hostility significantly predicted sexual and violent recidivism even after accounting for risk, as assessed by an actuarial instrument. Focusing more specifically on anger, the empirical literature also supports the theoretical link between anger and sexual aggression, with most associations between anger and sexual offending falling in the small to moderate range (e.g., Hanson & Harris, 2000; Hornsveld & De Kruyk, 2005; Hudson & Ward, 1997; Kalichman, 1991; Rada, Laws, Kellner, Stivastava, & Peake, 1983; but see Loza & Loza-Fanous, 1999; Smallbone & Milne, 2000).
Taken together, the results of the studies reviewed above are consistent with the contention that aggressiveness is related to sexual offending and predictive of sexual recidivism. Thus, accurate and comprehensive measurement of this construct is important for research on sexual offending and for adequate assessment in applied settings.
The Aggression Questionnaire (AQ)
The AQ (Buss & Perry, 1992) is the focus of the current study. It is a commonly used self-report measure of aggressiveness both in forensic and nonforensic settings (CSC, 2009; Diamond, Wang, & Buffington-Vollum, 2005; Kingston et al., 2014). The four subscales of the AQ are believed to measure distinct but related behavioral (Physical and Verbal Aggression subscales), affective (Anger subscale), and cognitive (Hostility subscale) subtraits of aggressiveness and were empirically derived using exploratory factor analyses (EFAs; Buss & Perry, 1992). The authors subsequently conducted confirmatory factor analyses (CFAs) with an independent student sample (n = 448) to compare the fit of three competing models: (a) a one-factor model in which all items were forced to load on a single factor of aggressiveness, (b) a four-factor model including only the four factors identified in the initial EFA, and (c) a hierarchical model in which a higher order factor (“super-factor”) of trait aggressiveness was added to the simple four-factor structure. The hierarchical model reflects and explicitly tests the assumption that the four factors have a common origin (i.e., they all represent different aspects of trait aggressiveness). Adequate fit was obtained for the simple four-factor model and its hierarchical extension. There was no statistically meaningful difference in goodness of fit between these two models. The hierarchical structure, arguably a more parsimonious model, was therefore preferred on theoretical grounds (Buss & Perry, 1992).
Following Buss and Perry’s (1992) analyses, a sizable body of research has examined the latent structure of the AQ in nonforensic samples (e.g., Archer, Kilpatrick, & Bramwell, 1995; Harris, 1995). Most researchers have concluded that, with moderate modifications, such as removing the two negatively keyed items, the simple four-factor model provided adequate fit for their data. However, other researchers have argued that failure to provide adequate methodological details in published studies as well as utilization of suboptimal analytic strategies such as overreliance on a small number of fit indices substantially limits confidence in the adequacy of the four-factor model. Bryant and Smith’s (2001) multigroup, multimethod study addressed some of the potential shortcomings of previous CFAs of the AQ. They tested a number of variants of the simple four-factor and hierarchical models proposed by Buss and Perry; none of these models provided adequate fit. Consequently, the researchers sought to improve the measurement model by increasing the proportion of variance accounted for by the latent factors and improving the conceptual clarity and distinctness of AQ subscales. To this end, items that (a) loaded below .40 on their own factor, (b) cross-loaded heavily (≥ .40) on another factor, or (c) did not appear to reflect the direct endorsement of aggression were excluded. The resulting short form of the AQ (AQ-SF) consisted of 12 items. Adequate fit was obtained for the hierarchical model with the AQ-SF as indicated by absolute indices of global fit (root mean square error of approximation [RMSEA] = 0.062) as well as relative fit indices (the comparative fit index [CFI] = 0.096, the global fit index [GFI] = 0.94, and the Tucker–Lewis index [TLI] = 0.94). Moderate attenuations of the magnitude of the latent factor correlations indicated that the AQ-SF more clearly reflected a multidimensional construct than did the original AQ. Importantly, reducing the measure to 12 items did not change the relationships between AQ scores and other measures or outcomes of interest (Bryant & Smith, 2001).
The reduced length and improved measurement properties of the AQ-SF increase its potential utility in correctional settings (Diamond & Magaletta, 2006). To date, three studies have conducted CFA with one or more variants of the English-language AQ in offender samples (i.e., original AQ only, k = 1, Williams, Boyd, Cascardi, & Poythress, 1996; AQ and AQ-SF, k = 1, Diamond et al., 2005; AQ-SF only, k = 1, Diamond & Magaletta, 2006). Results of these studies appear somewhat inconsistent. Williams et al. (1996) examined the fit of the simple four-factor model in a sample of adult pretrial detainees (N = 200) of both genders (38% women) who had been charged with violent (e.g., assault; 22%) or nonviolent (e.g., prostitution; 78%) offense(s). Three of the four cited fit indices did not meet criteria for adequate fit. A subsequent EFA suggested that a two-factor structure consisting of physical aggression/anger and verbal aggression/hostility best accounted for the data (36.4% and 4% of the total variance, respectively). This model has failed to replicate in subsequent studies (e.g., Diamond et al., 2005) and has been criticized for accounting for too little variance to represent an acceptable measurement model (i.e., < 50%; Bryant & Smith, 2001).
Diamond and colleagues (2005) tested seven competing models of the AQ and AQ-SF in a sample of mentally disordered male offenders (N = 786). Seventy percent suffered major affective or psychotic disorders, 56% of whom had comorbid substance abuse disorder(s). Forty percent met criteria for a personality disorder. The sample was split randomly to allow for cross-validation of initially supported models (Sample 1, n = 383; Sample 2, n = 403). All models of the original 29-item AQ, as well as variants with 27 and 26 items, failed to provide adequate fit. The simple four-factor model of the AQ-SF was the only model for which good fit was obtained (RMSEA = 0.058, CFI = 0.095, GFI = 0.96, and TLI = 0.94). Cross-validation procedures provided further support for this model. Diamond and Magaletta (2006) subsequently successfully replicated the simple four-factor model of the AQ-SF with 916 federally incarcerated adult men (RMSEA = 0.051, CFI = 0.096, and TLI = 0.95).
Taken together, the factor analytic results examined above lend some support for the adequacy of the simple four-factor model of the original AQ in nonforensic samples. In offender samples, acceptable fit for the simple four-factor model was largely obtained using the AQ-SF. Failure to provide key methodological details in some past factor analyses of the AQ greatly reduces confidence in the fit and stability of many previously obtained models, including the simple four-factor structure (Bryant & Smith, 2001; Jackson, Gillaspy, & Purc-Stephenson, 2009; Schreiber, Stage, King, Nora, & Barlow, 2006). The possibility remains that inconsistent findings across studies are due to differences in methodology and analytic strategy, or different sample characteristics. Contradictory and less than optimal factor analytic methods may explain, in part, the lack of consensus regarding the factor structure of the AQ (e.g., standards for acceptable model fit have changed substantially over time; Jackson et al., 2009). Alternatively, the discrepant findings may indicate that the measurement properties of the AQ are not equivalent in forensic and nonforensic populations. Measurement noninvariance could have important implications in terms of the substantive interpretation of offender AQ scores, as well as for scoring procedures.
The Current Study
The primary goal of the current study was to examine the viability of the most commonly reported factor structure of the AQ in a sample of sexual offenders; a population in which the latent structure of the AQ has not previously been assessed. We examined the fit of the simple four-factor model and its hierarchical counterpart (Buss & Perry, 1992) with the original AQ. We also examined the fit of these models with the AQ-SF (Bryant & Smith, 2001), because previous research indicates that it may constitute an improved measurement model compared with the complete measure (Bryant & Smith, 2001; Diamond & Magaletta, 2006; Diamond et al., 2005). We hypothesized that the simple four-factor structure of the AQ would provide suboptimal fit in the current sample. We expected appreciable improvement in fit to occur upon removal of the two negatively keyed items. The simple four-factor structure and the hierarchical model were expected to provide good fit for the AQ-SF. We also assessed the relationships between AQ and AQ-SF total and subscale scores and estimated risk of sexual recidivism. We expected that higher AQ total scores would be significantly associated with greater estimated risk of sexual recidivism and that this association would remain significant with the AQ-SF.
Method
Participants
The sample consisted of adult men enrolled in the NaSOP between 1994 and 2005. NaSOP is a cognitive-behavioral treatment program that is delivered by the CSC both in correctional institutions and in the community. It is aimed at reducing the risk of recidivism among sexual offending men by enhancing self-management skills and by addressing specific areas of psychosocial functioning, including cognitive distortions, sexual deviance, social skills, anger and emotional management, and victim empathy (CSC, 2009).
The AQ was part of an assessment battery administered prior to, and upon completion of, treatment (Nunes & Cortoni, 2008). Of the total sample of 523 adult male sexual offenders in the database, a subsample of 305 had pretreatment item-level data on the AQ. Of these, 12 (<5%) had missing data on one or more AQ items. Missing data analyses indicated that the data were missing completely at random (MCAR). Thus, the 12 missing cases were simply excluded and the remaining analyses were conducted with a sample of 293 offenders. All but three offenders included in the current study were incarcerated.
Participants’ age ranged from 19 to 77, and the average age was 44.31 years (SD = 13.03). The majority of the offenders were White (81.2%, n = 237), 8.2% (n = 24) were Aboriginal, 4.5% (n = 13) were Black, and 1.7% (n = 5) were Inuit. Other races accounted for approximately equal proportions of the remaining offenders for a total of 4.1% (n = 12). Two offenders were of unknown race (0.7%). In terms of past and current sexual offense convictions, 15.7% (n = 46) of the total sample had offended exclusively against children younger than 12 years of age, 47.4% (n = 139) had no victims younger than 12, while the remaining 36.9% (n = 108) had both types of victims. Static-99 (Hanson & Thornton, 2000; described in the “Measures” section) scores were available for 214 offenders and indicated that 14.3% (n = 42) were at low risk, 22.5% (n = 66) were at moderate–low risk, 21.8% (n = 64) were at moderate–high risk, and 14.3% (n = 42) were at high risk to reoffend sexually.
Measures
The AQ
The AQ (Buss & Perry, 1992) consists of a total of 29 items divided into four subscales: Physical Aggression (nine items), Verbal Aggression (five items), Anger (seven items), and Hostility (eight items). Participants responded on a 5-point Likert-type scale from 1 (extremely uncharacteristic of me) to 5 (extremely characteristic of me). Total scores can range from 29 to 145, with higher total scores indicating greater aggressiveness. The AQ has been found to have good internal consistency (α = .80-.93) and test–retest reliability (r = .72-.80) in nonforensic and forensic samples (Archer, 2004; Bryant & Smith, 2001; Buss & Perry, 1992; Harris, 1997; Kelley & Lambert, 2012; Mills & Kroner, 2003; Williams et al., 1996). Previous research also supports the overall construct validity of the AQ, as indicated by significant positive correlations with peer-nominated aggressiveness among students (r = .21-.45, n = 98, p < .05; Buss & Perry, 1992), violent institutional infractions among offenders (Physical Aggression subscale, d = 0.77, p < .05, N = 1,271; Diamond & Magaletta, 2006), and greater/higher treatment intensity in samples of sexual offenders (r = .17-.21, n = 171; Olver, Kingston, Nicholaichuk, & Wong, 2013).
Static-99
The Static-99 (Hanson & Thornton, 2000) is an actuarial measure of risk of sexual recidivism consisting of 10 static items. Total scores range from 0 to 12, with higher scores indicating greater risk of sexual recidivism. Scores are sorted into four risk categories: low (0, 1), moderate–low (2, 3), moderate–high (4, 5), and high (6+). Recent meta-analytic results indicate that the Static-99 has good predictive validity for sexual recidivism (d = 0.74; Hanson & Morton-Bourgon, 2009). In the current data set, the Static-99 was completed as part of an intake assessment (Nunes & Cortoni, 2008).
Overview of Procedures and Analyses
Data Screening
A total of 293 offenders had complete pretreatment (item-level) data on the AQ and were included in the current analyses. Pretreatment scores were used rather than posttreatment scores because they were available in greater numbers and allowed for direct comparisons with findings of previous studies of untreated forensic samples (e.g., Williams et al., 1996). Skewness was observed for the majority of the items. Twelve items showed above modest skew as indicated by a value ≥ 5.0 (Flora & Curran, 2004). A small number of items were moderately kurtotic. Results of recent simulation studies indicate that using polychoric correlations reduces model bias relative to using Pearson product–moment correlations when the data are ordinal (e.g., Holgado-Tello, Chacón-Moscoso, Barbero-García, & Vila-Abad, 2010). Thus, we analyzed polychoric correlation matrices, and the data were not transformed or otherwise normalized prior to analyses. Of note, original scores on the two negatively keyed items (Items 7 and 18; see Table 3) were reversed prior to analyses.
Factor analyses were conducted in Mplus 6 (L. K. Muthén & Muthén, 1998-2010). This software program was selected because it offers estimation methods that minimize the likelihood of deriving inaccurate parameter estimates and fit values with nonnormal ordinal data (Brown, 2006). The robust weighted least squares estimator (WLSMV) was used to facilitate accurate modeling with ordinal data (Brown, 2006; B. O. Muthén, 2011).
Model Fit Indices
Following convention, both absolute and relative fit indices were included (Jackson et al., 2009; Schreiber et al., 2006). Indices of absolute fit included the χ2 goodness of fit index and the RMSEA (Steiger & Lind, 1980). Significant χ2 values indicate that there is a statistically meaningful difference between the hypothesized model and the structure of the sample data. Because perfectly fitted models are rarely obtained with real data, methodologists recommend interpreting the χ2 index cautiously and in conjunction with indices less sensitive to model misspecification (Browne & Cudeck, 1993; Kline, 2011; Sun, 2005), and that is how it was used in the current study. When the WLSMV procedure is used, the χ2 value does not follow the appropriate distribution, thereby limiting its utility in the context of model comparisons (Brown, 2006); thus, the nested χ2 (χ2diff) was also included to facilitate direct between-model comparisons. Recommended cutoffs for the RMSEA generally range from ≤ 0.05 to ≤ 0.08 (Browne & Cudeck, 1993; Hu & Bentler, 1999; Schermelleh-Engel, Moosbrugger, & Müller, 2003; T. A. Schmitt, 2011; Sun, 2005). The RMSEA is sensitive to model complexity with small to moderate sample sizes (Breivik & Olsson, 2001). Given that the current analyses included models that vary appreciably in terms of model complexity, we selected a relatively liberal cutoff value of ≤0.08 for the RMSEA. Three relative fit indices were also examined: the CFI (Bentler, 1990), the TLI (Tucker & Lewis, 1973), and the weighted root mean square residual (WRMR; L. K. Muthén & Muthén, 1998-2010). We selected cutoffs of ≥0.95 for the CFI and TLI, reflecting recent recommendations to employ stricter cutoffs with relative fit indices (Jackson et al., 2009; Schermelleh-Engel et al., 2003). A cutoff of ≤0.95 was selected for the WRMR. Localized fit was assessed by careful inspection of the polychoric and residual correlation matrices, as well as factor loadings (λ) and their associated standard errors.
Criteria for model acceptance
To maximize consistency and accuracy of model fit evaluations, we established clear a priori criteria for acceptable model fit: (a) three or more fit indices should meet the cutoff criteria specified above, (b) no factor loadings should fall below the conventional cutoff of .40, (c) item cross-loadings should not exceed .40, and (d) few residual correlations should fall below −.10 or exceed +.10 (Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009; Kline, 2011).
Power
To determine whether models tested had adequate statistical power to reliably produce accurate estimates of fit indices and single parameter estimates, we used the Monte Carlo techniques available in Mplus 6 for CFA with categorical indicators. Power analyses were reserved for models that provided acceptable fit for the data. In accordance with L. K. Muthén and Muthén’s (2002) recommendations, 10, 000 iterations and three seed values—random points at which the sampling begins—were specified. Population parameter values derived from the initial model were inputted.
Starting with power to estimate single parameters, acceptable statistical power was considered present if (a) no parameter bias exceeded 10%, (b) bias in factor loadings and latent factor correlations did not exceed 5%, and (c) coverage values fell between 0.91 and 0.98 (L. K. Muthén & Muthén, 2002).
Turning to global estimates, the following were considered indicative of adequate power to accurately estimate the χ2: (a) the percentage of replications in which the χ2 statistic exceeded its critical value was low; (b) in replications in which the critical value was exceeded, the expected and obtained χ2 were close in terms of absolute value; (c) average χ2 obtained was close to model df; and (d) the variance of the model χ2 across replications was not exceedingly large compared with 2 df. Criteria a and b were also applied to the RMSEA and the WRMR. Behavioral assessments for the CFI and TLI indices are not currently available in Mplus 6.
Results
The Simple Four-Factor Model of the AQ
All items were allowed to load freely on their own factor but not on other factors. No correlated error terms were specified. The model was solvable (i.e., overidentified) and no errors were reported. The complete polychoric correlation matrix is available in Table 1. The RMSEA index indicated adequate model fit, whereas the remaining indices suggested inadequate fit (Table 2). Standardized factor loadings (λ) fell below the cutoff of 0.40 for three items. Two of these items were negatively keyed (i.e., Item 7 from the Physical Aggression subscale: “I can think of no good reason for ever hitting another person”; and Item 18 from the Anger subscale: “I am an even-tempered person”). Item 10 from the Verbal Aggression subscale also fell below the cutoff: “I tell my friends openly when I disagree with them.” Table 3 displays item contents, standardized and unstandardized factor loadings, and their associated standard errors.
Polychoric Correlation Matrix
Note. Cross-loadings > .40 are in italics. Correlations of items specified to load on the verbal aggression (F2) and anger (F3) factors are bolded. Items of the verbal aggression (F2) and anger (F3) factors that cross-loaded >.40 are in italics and bolded. AQ = Aggression Questionnaire (Buss & Perry, 1992).
Fit Indices of the Simple Four-Factor Models of the AQ With 29, 26, and 12 (AQ-SF) Items, and the Hierarchical Model of the AQ-SF
Note. Fit index cutoffs employed in the current study are provided in parentheses. AQ = Aggression Questionnaire; AQ-SF = 12-item short form of the AQ (Bryant & Smith, 2001); RMSEA = root mean square error of approximation; CI = confidence interval; CFI = comparative fit index; TLI = Tucker–Lewis index; WRMR = weighted root mean square residual.
Hierarchical model of the AQ-SF.
p < .01.
Standardized and Unstandardized Factor Loadings (Standard Errors) of the Four-Factor Model
Note. Items removed from the respecified 26-item model are bolded. Items of the AQ-SF (12 items) are italicized. AQ = Aggression Questionnaire (Buss & Perry, 1992). Std. = standardized; λ = factor loading (lambda). AQ-SF = Aggression Questionnaire–Short Form.
Sixteen percent of residual correlations fell outside the acceptable range of −.10 to +.10. The rates of over- and underestimation of item correlations were approximately equal. A number of items loaded heavily on more than one factor (λ > .40; see Table 1). Not surprisingly given the magnitude of the cross-loadings of many items, two of the estimated latent factor correlations were very large (i.e., Φ ≥ .85; Brown, 2006). Very large estimated factor correlations indicate that the hypothesized latent factors may not be distinct (see Table 4).
Estimated Latent Factor Correlations for the Simple Four-Factor Model of the AQ With 29, 26, and 12 Items (AQ-SF), and the Hierarchical Model of the AQ-SF
Note. Aggressiveness = second-order latent factor of global aggressiveness. Estimated latent factor correlations suggesting nondistinct factors (Φ ≥ .85) are bolded. AQ = original 29-item Aggression Questionnaire. AQ-SF = Aggression Questionnaire–Short Form; AQ-26 = Aggression Questionnaire with 26 items (i.e., Items 7, 10, and 18 removed).
Respecification: The Simple Four-Factor Model With 26 Items
In light of the above findings, we respecified the original model. The three low-loading items were removed and the fit of the four-factor model was reexamined with the remaining 26 items. As indicated in Table 2, removal of the three low-loading items resulted in slightly more favorable χ2diff, CFI, TLI, and WRMR values. Nevertheless, only the RMSEA met criteria for adequate model fit. No factor loadings fell below 0.40. Twelve percent of residual correlations fell outside the acceptable range. Close inspection of the polychoric correlation matrix revealed that the poor fit of the respecified 26-item four-factor model was due to a number of items that cross-loaded heavily; items of the Verbal Aggression and Anger subscales were particularly problematic in this regard (Table 1).
The Simple Four-Factor Model of the AQ-SF
Next we tested the simple four-factor model with the 12-item AQ-SF. The model χ2 was significant, and χ2diff and RMSEA values did not differ appreciably from those obtained for the 29-item and 26-item four-factor models described above (refer to Table 2). The CFI, TLI, and WRMR indicated good fit. A single residual correlation fell outside the acceptable range (i.e., −.15, Item 2 with Item 4). The estimated latent factor correlations indicated that all factors were strongly related (Table 4). This simple four-factor model of the AQ-SF is shown in Figure 1.

The Four-Factor Model of the AQ-SF
Power
No estimated bias in factor loadings exceeded 5% (M = 1.07). All coverage values fell between 0.91 and 0.98. The 95% CI included the “true” population value 100% of the time. Average bias in estimated latent factor correlations was < 2%, and only one bias exceeded 5%.The proportion of replications in which the model χ2 exceeded the critical value was modest (6%), and in replications in which it was exceeded, the values of observed and expected χ2 were similar in terms of absolute magnitude (106.858 vs. 106.395). The size of the average χ2 value across replications was close to that of model df (M χ2 = 87.50, df = 84). However, the variance of χ2 was exceedingly large compared with 2 df. Taken together, power diagnostics indicated that single model parameters and the χ2 index were moderately well approximated, whereas the RMSEA and the WRMR indices were both well approximated according to the criteria above.
The Hierarchical Model of the AQ-SF: Global Aggressiveness
As noted above, the assumption that the four latent factors corresponding to the four subscales of the AQ (and AQ-SF) measure independent but related subtraits of overall trait aggressiveness was also of interest in the current study. Although the estimated latent factor correlation between the anger and verbal aggression suggested that these factors may not be distinct (Φ = .90), the hierarchical model was nevertheless tested for the purpose of further investigating the relationships between factors. Model fit did not degrade noticeably as a result of the inclusion of the second-order latent factor of global aggressiveness (Table 2). Interestingly, the first-order anger factor shared close to 100% of its variance with the higher order factor. The verbal aggression factor also correlated very strongly with global aggressiveness (Table 4).
AQ Scores, Offender Subtypes, and Risk of Recidivism
We also examined average scores, standard deviations, and internal consistency estimates of the AQ and the AQ-SF, as well as offender subtypes and bivariate correlations between AQ and AQ-SF total and subscale scores and scores on a measure of risk of sexual recidivism (i.e., the Static-99 [n = 214]). Given that the simple four-factor model of the reduced 26-item AQ provided inadequate fit in the current sample, we limited our investigations to the original AQ (29 items) and the AQ-SF. The AQ-SF contains none of the items that were problematic in terms of low factor loadings in the CFA of the original AQ. Means, standard deviations, and internal consistency estimates, and bivariate correlations are available in Table 5. Coefficient alpha can become distorted when used as an index of reliability with ordinal data. Therefore, ordinal alpha was computed for the AQ based on output from the CFA models (Table 5; see Zumbo, Gadermann, & Zeisser, 2007, for the formula employed to derive ordinal alpha).
Means, (Standard Deviations), Internal Consistency, and Bivariate (Pearson’s r) Correlations for Total Sample
Note. Correlations were significant at p < .01 unless otherwise specified. AQ = Aggression Questionnaire (Buss & Perry, 1992); AQ-SF = Aggression Questionnaire–Short Form. Internal consistency values and bivariate correlations for the AQ-SF are in [brackets].
Not significant.
p > .05.
Offender Subtypes
Given that sexual offenders are heterogeneous, it appears plausible that the factor structure of the AQ differs between subgroups of offenders. Unfortunately, the size of the current sample did not allow for separate CFA of the AQ with different sexual offender subtypes. We conducted a preliminary investigation of potential differences in AQ and AQ-SF total scores between offenders with young victims (i.e., <12 years of age exclusively), older victims (i.e., ≥ 12 years of age exclusively), and offenders with both young and older victims. In terms of aggressiveness, sexual offenders with young victims scored significantly higher on the Hostility subscale than sexual offenders with older victims (d = 0.39, 95% CI = [0.06, 0.73]). Sexual offenders with older victims scored significantly higher on Physical Aggression than offenders with both types of victims (d = 0.27, 95% CI = [0.02, 0.52]). No other group comparisons yielded significant effect sizes.
Risk of Recidivism
Bivariate (Pearson product–moment) correlations indicated that higher AQ total and Physical Aggression and Anger subscale scores were significantly associated with greater estimated risk of sexual recidivism. Verbal Aggression and Hostility subscale scores were not significantly related to risk of recidivism. Bivariate correlations were not always smaller for the AQ-SF (Table 5).
Two multiple regression analyses were conducted to determine the degree to which subscales of the AQ and AQ-SF were independently associated with estimated risk of sexual (i.e., Static-99; n = 214) recidivism. On the AQ, only the Physical Aggression subscale was significantly and independently associated with estimated risk of sexual recidivism (B = 2.26, SE = 0.54, 95% CI = [1.20, 3.32]; β = .44, p < .05). This was also the case with the AQ-SF (B = 2.73, SE = 0.43, 95% CI = [1.88, 3.58]; β = .40, p < .05).
Discussion
The main goal of the study was to examine whether the simple four-factor model of the AQ (Buss & Perry, 1992), and its hierarchical extension, would replicate with sexual offenders. As hypothesized, the simple four-factor structure did not provide adequate fit for the AQ. Contrary to expectations, adequate fit was also not obtained when the number of items was reduced to 26, with the three low-loading items excluded. Although the simple four-factor structure and the hierarchical model provided good fit for the 12-item AQ-SF (Bryant & Smith, 2001), very large latent factor correlations indicated that the scale may not be measuring four distinct factors in the current sample.
Given that the factor analyses did not yield an unambiguously supported model, we examined the psychometric characteristics of the originally proposed AQ and AQ-SF factors. With regard to reliability, internal consistencies of the AQ and the AQ-SF generally exceeded the conventional threshold value of .70 (N. Schmitt, 1996) and were comparable with those reported in previous studies with nonoffenders and with offenders (e.g., Dahlen, Martin, Ragan, & Kuhlman, 2004; Williams et al., 1996). With regard to construct validity, both the AQ and AQ-SF total scores had small to moderate correlations with estimated risk of sexual recidivism. Multiple regression analyses indicated that only the Physical Aggression subscale was independently associated with estimated risk of sexual recidivism. Average aggressiveness scores obtained with the current sample did not differ appreciably from mean scores reported for student samples in previous studies (e.g., Buss & Perry, 1992). Higher hostility scores were found for offenders with younger victims than those with older victims, and higher physical aggression scores were found for offenders with older victims than those who had both younger and older victims.
Latent Dimensions of the AQ
There are multiple possible explanations for the unexpectedly large estimated latent correlations in the current study. It is possible that multicollinearity between factors reflects common complications associated with the application of restrictive CFA models to multidimensional constructs. Specifically, latent factor correlations may be grossly inflated due to relatively small cross-loadings among items that have been erroneously set to zero (e.g., Asparouhov & Muthén, 2009; Marsh et al., 2011; McCrae, Zonderman, Costa, Bond, & Paunonen, 1996). Given that affective and cognitive components of aggressiveness may be mutually reinforcing and are thought to facilitate aggressive behavior, items representing these different subtraits of aggressiveness should correlate to some extent. In the current context then, the fact that we set item cross-loadings to zero could conceivably lead to the problem of inflated latent factor correlations. However, the fact that the Verbal Aggression items tended to load more heavily on the anger factor than on their own factor cannot easily be attributed to the methodological restrictions associated with CFA. Numerous successful applications of CFA to the AQ in previous research cast further doubt on the current findings being attributable to problems with CFA as an analytic technique (e.g., Bryant & Smith, 2001; Buss & Perry, 1992; Diamond & Magaletta, 2006; Diamond et al., 2005; Harris, 1995).
Another possible explanation for the multicollinearity between latent factors is the presence of differential item distributions (Bernstein & Gesn, 1997). In the present case, however, only a minority of items from the Verbal Aggression and Anger subscales were similarly distributed. Moreover, the anger and verbal aggression factors remained very strongly related even when the AQ-SF was tested (i.e., Φ = .90), despite the absence of differentially distributed anger items in this model. In the current study then, problematic multicollinearity between latent factors is unlikely to be attributable to item distributions.
When multicollinearity between the latent factors is considered in isolation, findings of the current study could also be consistent with the idea that sexual offenders, particularly those with adult victims, may have more trait anger than nonsexual offenders and nonoffenders (Gannon, Collie, Ward, & Thakker, 2008). However, the mean scores on the Anger subscale for the sample as a whole, and for sexual offenders with older victims specifically, are not appreciably different from averages obtained with student samples (e.g., Dahlen et al., 2004) and are somewhat lower than the average obtained with nonsexual offenders (Williams et al., 1996). Indeed, some researchers have suggested that it may be the ability to regulate anger rather than the intensity or frequency of angry emotions that differentiates sexual offenders from other men (Lee et al., 2001).
A more obvious interpretation of the high correlations between anger and verbal aggression is that they reflect a common factor. This would be consistent with the contention that the items of the Verbal Aggression subscale assess the verbal expression of anger (Mills & Kroner, 2003) and with previous findings of large intercorrelations between the Verbal Aggression subscale and measures of anger, such as the Anger subscale of the AQ (e.g., Hornsveld, Muris, Kraaimaat, & Meesters, 2009; Kelley & Lambert, 2012; Von Der Phalen, Sarkola, Seppä, & Eriksson, 2002) and the Anger Expression (Out) subscale of the State-Trait Anger Expression Inventory 2 (STAXI-2; r = .61; Dahlen et al., 2004).
More generally, it is important to note that the high degree of overlap between latent factors is not a finding unique to the current investigation. Studies using students and nonsexual offenders have commonly found large correlations between AQ total scores and various measures of anger (e.g., r = .79, n = 224, Dahlen et al., 2004; r = .45-.76, n = 138-206, Hornsveld et al., 2009; r = .86, n = 95, Lindquist, Dåderman, & Hellström, 2005), reactive aggression as compared with instrumental aggression (r = .47 vs. r = .19, n = 127; Haden, Scarpa, & Stanford, 2008), impulsivity (r = .70, n = 160; Archer et al., 1995), as well as negative emotionality (r = .52, n = 234; Sharpe & Desai, 2001). Hornsveld and colleagues (2009) reported large intercorrelations between AQ (and AQ-SF) scores and measures of state and trait anger among forensic patients and students, whereas scores on measures of psychopathy and antisocial lifestyle were unrelated to aggressiveness as measured by the AQ and the AQ-SF. In light of the findings of previous studies, it appears plausible that the current results reflect the nature of the AQ as a measure of impulsive anger and its reactive behavioral manifestations (Haden et al., 2008), rather than global aggressiveness.
The current study highlights the importance of thorough examinations of the performance of psychological measures in the various populations in which they may be used (Brown, 2006). The method of assessment of psychological traits and the subsequent interpretation may have important practical implications. For example, if the AQ is a measure of reactive aggressiveness, it is possible that some aspects of aggressiveness are not being attended to. Thus, aggressiveness in a subset of sexual offenders whose offenses may not be primarily affectively driven (e.g., sexual offenders with strong psychopathic traits) may not be appropriately measured. Moreover, intervention and treatment strategies that are appropriate for one group may be ineffective, or even counterproductive, with another group.
Given that sexual offenders with younger victims scored significantly higher on the Hostility subscale than sexual offenders with older victims, some items on the AQ (e.g., Item 27: “I am suspicious of overly friendly strangers”) may reflect fearfulness, which one might expect from many child sexual offenders in a prison. Some Hostility items also appear to reflect grievance thinking (e.g., Item 25: “I wonder why sometimes I feel so bitter about things”). Hostile cognitions, particularly dangerous world beliefs and grievance thinking, may contribute to directing sexual attention to persons perceived as less threatening, such as children (Marziano, Beech, Ward, & Pattison, 2006). For some offenders, children may also provide the recognition and respect of which these men feel they have been unjustly deprived (Ward, 2000). In addition, increased hostility among offenders with younger victims may accurately reflect their social status. Some etiological theories suggest that relative competitive disadvantage or low social status may be related to offending for some men who sexually abuse children (Seto, 2008, pp. 164-182). However, low social status among sexual offenders against children may also be a consequence of the legal and social sanctioning processes that follow from offending (e.g., Levenson, D’Amora, & Hern, 2007).
The current study has some important limitations. First, the sample was too small to lend itself to cross-validation procedures that could help increase confidence in the results. In addition, given that the restrictive sample size of the present study prevented examination of the factor structure separately among subtypes of sexual offenders, it is unknown whether the current findings would replicate in more homogeneous subgroups (e.g., sexual offenders against adults). However, this is the first study that we know of to examine the factor structure of the AQ with sexual offenders, and our sample size was adequate for the CFA we conducted. Second, the present study is also subject to limitations associated with CFA as a tool for the examination of unobservable psychological constructs. CFA is by its nature subjective, and researcher bias and misinterpretation of various indicators of model fit are ever-present threats to sound model evaluation (Kline, 2011). It should be noted, however, that we made every effort to reduce the risk of biased model evaluation by establishing clear a priori criteria for adequate model fit.
Future Directions
Future studies should further examine the factor structure of the AQ among sexual offenders and the construct(s) reflected by the subscales/factors. Given the heterogeneity of the sexual offender population, future studies may benefit from using multiple samples that are large enough to permit separate investigations of the factor structure among more homogeneous subgroups of sexual offenders, and for cross-validation more generally (Kline, 2011). Future studies in offender populations should also further examine the AQ-SF. Although the AQ-SF did not provide an overall improved measurement model in the current study, it retained meaningful associations with estimated risk of sexual recidivism. Given that measurement of psychological characteristics is often complicated in correctional settings, partly due to budgetary concerns, understaffing, and time restrictions, measures that are quick to complete, inexpensive, and predictive of specific behaviors can be especially useful (Diamond & Magaletta, 2006). In this regard, the AQ-SF may be more suitable than the AQ.
Further examination of the AQ as a measure of aggressiveness may also be warranted. As mentioned above, some previous studies have found that the AQ has relatively modest discriminant validity (e.g., Archer et al., 1995), and severe multicollinearity between factors was uncovered in the current study. In addition, the content of the majority of items of the AQ appears to indicate a degree of emotional dysregulation (e.g., Item 1: “Once in a while I cannot control the urge to hit another person”), whereas relatively few items seem to reflect carefully calibrated aggressive acts in pursuit of specific goals (i.e., instrumental aggression). Anger, impulsivity, and aggressiveness are likely closely related constructs and may all be relevant to sexual offending research and treatment. Nevertheless, determining whether the AQ measures one or the other is arguably of importance to our understanding of the psychology of sexual and non-sexual violence. Without such clarity, intervention and treatment efforts may be less effective than they otherwise could be.
Footnotes
The views expressed are those of the authors and do not necessarily represent the views of the Correctional Service of Canada.
