Abstract
Although actuarial instruments are ubiquitously used in the field of sex offender recidivism risk assessment, there is limited empirical information about the underlying constructs from which they are derived. The following study utilized a nonparametric item response theory procedure, a Mokken analysis, and nonlinear factor analysis (Normal Ogive Harmonic Analysis Robust Method) to explore the underlying constructs of the STATIC-99 scores obtained for male sexual offenders (N = 451) referred for an evaluation to determine if they met criteria for civil commitment under a state’s Sexually Violent Persons Law. The results from the analyses indicated that the STATIC-99 comprises the two previously identified constructs associated with sexual deviancy and antisocial behaviors as well as a third, additional construct, associated with the items pertaining to age and past marital-type relationships. These findings support Hanson and Thornton’s assertion that sexual offender recidivism risk is multifactorial and not the result of a single underlying trait. Implications for future research are discussed.
Introduction
Considered to be the most widely used actuarial risk assessment instrument for adult male sex offenders, the STATIC-99 is derived from static, or unchangeable, factors empirically associated with sexual recidivism (Harris, Phenix, Hanson, & Thornton, 2003). It has been cross-validated by dozens of studies that have consistently found moderate predicative accuracy across numerous samples of male sexual offenders (Hanson & Thornton, 1999). Developed by combining two instruments, the Rapid Risk Assessment for Sex Offence Recidivism (Hanson, 1997) and the Structured Anchored Clinical Judgment (Grubin, 1998), the STATIC-99 comprises related but not identical constructs that contribute unique variance to regression equations when the total scores were used to predict sexual recidivism (Hanson & Thornton, 1999). Consequently, as described by its authors, the STATIC-99 is based on an empirically derived multifactorial conceptualization of sexual recidivism risk represented by a single, unified score.
Despite the practical utility of the STATIC-99 for assessing sexual recidivism, some research has suggested that the latent psychological constructs it assesses are not fully understood (Babchishin, Hanson, & Helmus, 2011). For example, Babchishin et al. (2011) found that although the items of the RRASOR (Rapid Risk Assessment for Sex Offence Recidivism) are included in the STATIC-99, the two scales had an opposite relationship with violent recidivism indicating that they appear to be sampling different domains. This finding is unexpected given that the STATIC-99 is conceptualized as containing items measuring the two primary domains of the RRASOR and SACJ (Structured Anchored Clinical Judgment): sexual deviance and nonsexual criminal history or antisocial behavior (Hanson & Thornton, 1999). As concluded by Babchishin and colleagues (2011),
Identifying the constructs being measured requires both theory and empirical evidence; without such evidence, reliability between assessors concerning the latent constructs would be expected to be low. (p. 17)
In fact, the notion that observable phenomena are influenced by underlying and unobserved causes or traits have been widely used to conceptualize psychological characteristics (Glymour, Scheines, Spirites, & Kelly, 1987). As suggested by some researchers, a majority of psychological tests are actually designed to measure or evaluate latent constructs and they have practical importance to the extent that they are related to the outcomes under consideration (Babchishin et al., 2011). As concisely stated by Bollen (2002),
. . . latent variables provide a degree of abstraction that permits us to describe relations among a class of events or variables that share something in common, rather than making highly concrete statements restricted to the relation between more specific, seemingly idiosyncratic variables. In other words, latent variables permit us to generalize relationships. (p. 606)
Even though the use of latent variables is pervasive throughout many areas of science, there is not a single definition that would encompass the numerous statistical and data analysis models used to evaluate them (Bollen, 2002). Instead, the various common definitions of latent variables refer to variables not present in the data set, either implicitly or explicitly (e.g., unmeasured variables, factors, unobserved variables, variables, constructs, true scores, etc.) and are closely tied to specific statistical models (Bollen, 2002). Despite these conceptual ambiguities, however, latent constructs may be generally distinguished by nonformal or formal definitions. With regard to nonformal definitions, there are three main types: (a) hypothetical constructs or variables (Harman, 1960), (b) unobservable or unmeasured variables (Jöreskog & Sörbom, 1979), and (c) a data set reduction devices (Harman, 1960). These definitions, taken either individually or combined, appear best suited to exploratory analyses where the relationship between observed and unobserved variables is not specified in advance.
In addition to the above-described definitions, there are also several “formal” definitions including those based on local independence, expected values, nondeterministic functions of observed values, and sample realization (Bollen, 2002). Although an in-depth discussion of these definitions is outside the scope of this article, it is important to note that they apply to the common statistical models used to analyze latent variables (e.g., multiple regression, logistic regression, factors analysis, latent curve models, item response theory, latent class analysis, and structural equation models). Consequently, there is no right or wrong approach to defining latent variables, and the research question is more about finding the definition that is the most useful insofar as it corresponds to a common understanding of what variables should be considered latent.
This article analyzed the latent trait structure of the STATIC-99 by utilizing two of the most common statistical models: factor analysis and item response theory to empirically evaluate if its latent factors or traits were consistent with their theoretical constructs asserted by previous research. For example, in a two-part study, Roberts, Doren, and Thornton (2002) examined 103 male sexual offenders who completed their prison terms and were in the process of being evaluated for possible civil commitment under a Sexually Violent Persons Act. Using a principal component analysis of five measures of actuarial risk (the STATIC-99 was removed due to item redundancy with the RRASOR), they found a two-factor solution with the components Antisocial/Violence and Pedophilic Deviance/Repetitiveness. A major limitation of their analysis, however, was that they used only total scale scores or combinations of individual risk indicators. As they noted, other factors may have emerged if the individual items of the actuarial assessments were analyzed.
In the second part of their study, Roberts et al. (2002) evaluated 393 sexual offenders in England and Wales released from prison at the conclusion of their sentence. Combining the individual risk indicators of the STATIC-99 and Risk Matrix 2000, they found that in addition to general criminality and sexual deviance, another factor, “detachment,” contributed to the prediction of sexual reconviction. According to their analysis, this third factor loaded on the items age, stranger victim, single, and index nonsexual violence. These results were replicated in subsequent analyses that identified similar factorial structures relating to the prediction of sexual recidivism as well as other significant factors such as criminal thinking (Barbaree, Langton, & Peacock, 2006; Walters, Deming, & Elliott, 2009).
Although general models of static factors associated with sexual recidivism have been previously identified, none have examined the structure of specific STATIC-99 items for sexual offenders identified as high risk. As a major goal of this study was to explore patterns in the data, not to explicitly test stated hypotheses, an exploratory factor analysis was utilized because it imposes no substantative constraints on the data (Albright & Hun Myoung, 2009). That is, unlike other types of factor analysis (i.e., confirmatory factor analysis), there are no restrictions on the pattern of relationships between observed and latent variables. Moreover, by using a nonparametric item response procedure, this study also evaluated whether the relationships between items are indicative of a unified latent trait or if they represent an empirically derived multifactorial conceptualization of sexual recidivism risk characterized by a single, unified score. Arguably, the theoretical constructs on which recidivism risk assessments are derived is becoming increasingly important because the theoretical perspectives used to interpret them (e.g., Bayesian versus Frequentist) significantly impacts the prediction of risk (Elwood, 2013). Specifically, the field of sexual offender risk assessment must eventually resolve a central question: Is sexual recidivism the result of a psychological trait(s), a behavioral outcome resulting from many factors, or some combination of both?
Method
Subjects
This study utilized archival data for 451 convicted male sexual offenders who were referred for an evaluation by licensed psychologists, between 1999 and 2011. The primary inclusion criteria for this sample was that the record contained complete STATIC-99 and RRASOR data, that is, total scores and individual items scores for the subjects. Although all 451 subjects were classified as high risk for sexual recidivism by a Department of Corrections review board at the conclusion of their prison sentence, only 146 were subsequently found, by psychological evaluation, to meet the criteria for civil commitment under a Sexually Violent Persons Act. This act requires that to meet the commitment standards, a person must have a mental disorder, acquired or congenital, that affects his emotional or volitional capacity making it more likely than not (greater than 50% probability) that he will commit sexually violent acts. Consequently, of the total sample of subjects, almost one third was determined to possess a characteristic(s) indicative of sexual recidivism not identified in the other subjects.
Analysis
The first step of the analysis was to evaluate demographic and other characteristics relevant to recidivism risk for male sexual offenders including age (Thornton, 2006), actuarial scores (Babchishin et al., 2011), presence of deviancy (Hanson & Bussière, 1998), and treatment completion (Hanson et al., 2002). An additional variable relating to sexual misconduct while incarcerated was also evaluated. Although it is questionable if sexual misconduct during periods of institutionalization relates to recidivism risk, this behavior is often considered during the evaluation process.
The subjects were then separated into two groups, those who were determined to meet criteria for civil commitment, or possess a sexual recidivism characteristic, and those who did not, although still identified as a high risk. An area under the curve was calculated for each variable to determine if there were any statistically significant differences. For the total scores of the STATIC-99 and RRASOR, the area under the curve (AUC) was calculated from the receiver operating characteristic curve (ROC) that is possible to approximate by connecting the data points (sensitivity, 1-specificity) and then calculating the estimated AUC using the trapezoidal rule (Lasko, Bhagwat, Zou, & Ohno-Machado, 2005). The calculated AUC has been shown to be equivalent to the Mann−Whitney U statistic normalized by the number of possible pairings. The use of this approach is advantageous because it imposes no structural assumptions on the data and allows for the calculations of confidence intervals. In addition, a Fisher’s exact test was conducted for the variables: paraphilia, past treatment, recent treatment, and conduct reports. The Fisher exact test is a significance test for a 2 × 2 table that evaluates all distribution probabilities and produces a probability for a given set of frequencies. The null hypothesis is that the row and column variables are unrelated. Finally, a t test was used to compare the mean age between the subjects referred for commitment with this who were not based on the null hypothesis that the difference is equal to 0 (MedCalc, 1993-2012).
The next part of the analysis was based on a nonparametric item response theory model (IRT) that has been shown to be an appropriate approach to understand the underlying structure (i.e., latent trait) of psychological characteristics (Aggen, Neale, & Kendler, 2005; Bolt, Hare, Vitale, & Newman, 2004; Maij-de Meij, Kelderman, & van der Flier, 2008; Meijer & Baneke, 2004; H. A. Miller, Turner, & Henderson, 2009). A primary benefit of identifying a latent trait is that in conjunction with manifest traits, it offers an integrated explanation for the overt or observed behaviors and symptoms (Acton & Zodda, 2005).
The first step of an IRT analysis is to determine if the items fulfilled a monotonic homogeneity model (MHM), which means that the greater level of recidivism within an individual would be reflected in more assorted forms of behavior or more items on the scale. The assumptions of a MHM are unidimensionality, local independence, and monotonicity. The second step involves determining if the items meet the assumptions of a double monotonicity model (DMM), which includes those of a MHM and also assumes that the item response functions do not intersect. If conditions of monotonic homogeneity and double monotonicity are met, then the sum score yields an ordinal measurement of the underlying trait (Nitschke, Osterheider, & Mokros, 2009). Finally, if the set of items meet the requirements of a Mokken scale, the next analysis involves evaluating whether the scale is of the Guttman type which indicates that the scale is cumulative with the highest level of the scale being comprised of all of the information of the previous levels and the total score or the “sum score” is an adequate statistic for the trait level.
This analysis involved utilizing a Nonparametric Item Response procedure, as described by Sijtsma and Molenaar (2002), which evaluates the responses to items on a test or questionnaire assumed to contain information about the latent (hidden) trait present in the phenomenon being studied. The modern models of item response theory (IRT) are probabilistic whereas the probability of an affirmative response is high (but not 1) if the person has the trait and the item is easy and low (but not 0) if the trait is absent and the item is difficult. Thus, the success probability for an item depends on the person’s ability or trait level and is referred to as an item response function (IRF) that is assumed to increase if the person has more of the latent trait. For dichotomous items, the IRF is defined as
The random variable Xi denotes the item score that equals 0 for an incorrect or negative responses and 1 for a correct or positive responses (Sijtsma & Molenaar, 2002). The nonparametric item response theory (NIRT) models assume order restriction governs Pi (θ) and θ or that they are influenced such that for any pair of values θa and θb, θa < θb: Pi (θa) ≤ Pi (θb). This equation implies, then, that the IRF is a nondecreasing function of θ or the latent trait and thus, the higher the trait, the greater the probability of an affirmative response (Sijtsma & Molenaar, 2002).
The NIRT models must meet the following four assumptions:
Assumes all items of the test measures the same latent trait.
An individual’s response on an item is not influenced by his/her responses to other items.
The conditional probability is monotonically nondecreasing (monotonicity of IRFs) in the latent trait.
All the IRFs are nonintersecting.
One specific procedure under the MHM is an automated item selection procedure (AISP) that selects items from larger sets into clusters that sufficiently measure the latent trait. Moreover, the AISP selects clusters of items that satisfy the definition of a scale, which is “ . . . a set of dichotomously scored items for which, for a suitably chosen positive constant c . . . ” (Mokken, 1971, p. 184). That is, a scale must have positively correlating items or items for which all Hijs are positive, all Hijs are at least c, and the total H is at least c. The constant c is represented by Hi ≥ c > 0 for All items i with c = .3 as the recommended practical minimum lower bound. An algorithm for item selection was proposed by Sijtsma and Molenaar (2002) that included
selecting the item pair with the highest significantly positive Hij;
selecting the item from the remaining item pool that correlates positively with both items, had an H that is significantly greater than 0, is higher than c, and maximizes the total H for all three items; and
repeating the procedure until none of the remaining items fulfill the above-stated requirements.
As suggested by Sijtsma and Molenaar (2002), there are two important considerations when applying an automatic selection procedure. First, they advised that researchers should “predict the most likely dimensional structure for their item set” (p. 69). This may assist with understanding the unexpected or “odd” item selection as well as assist with interpreting them. Second, the AISP “ . . . is not a formal test of the MHM” (p. 69). That is, as the AISP selects and rejects inter-item correlations, it may result in accepting an item in conflict with the MHM assumptions and reject one that is in agreement with them. Despite not being a formal test of the MHM, however, its resultant item cluster(s) are considered an “excellent” starting point for further item analysis (Sijtsma & Molenaar, 2002).
An automated item selection procedure was conducted using the MSP5 or Mokken Scale Analysis for Polytomous Items for Windows, Version 5 (Molenaar & Sijtsma, 2000) program with the 10 STATIC-99 items. Specifically, the item selection method of “search normal” was used, which is an exploratory procedure utilized when little is known about an item set or multidimensionality is suspected (Molenaar & Sijtsma, 2000). From the selected item set, one or more scales are formed using a “step-wise bottom up search procedures” (Molenaar & Sijtsma, 2000, p. 39). Items are added one-by-one to an initial set of items that in this case was determined by the MSP program specifying a formal search criteria of c = .3 and α = .05. The constant c is the recommended practical minimum lower bound and alpha is the reliability coefficient based on a method proposed by Sijtsma and Molenaar (1987). Using a DM model, they showed that in a number of realistic cases their method was almost unbiased, whereas Cronbach’s alpha coefficient almost always underestimates the reliability of the total score (Sijtsma & Molenaar, 2002).
The last step was based on a nonlinear factor analysis using the Normal Ogive Harmonic Analysis Robust Method [NOHARM] that “ . . . has proved reasonably robust against violations of the normal distribution assumption” (McDonald, 1982, p. 387). Specifically, the analysis was conducted using NOHARM that is a statistical program for fitting “unidimensional and multidimensional normal ogive item response models” (T. R. Miller, 1991). Moreover, it can be used for dichotomous data. As provided by T. R. Miller (1991),
The generalized multidimensional normal ogive model is given as P(yij = 1 \ Θ
j
) =ci + (1 − ci) Φ [di + ai Θj], where P(x
ij
= 1 \ ai, di, Θ
j
) is the probability in an m-dimensional space of a correct response to an item i by person j, ai is an m-dimensional vector of item discrimination parameters, di is a scalar parameter related to item difficulty, Θ
j
is an m-dimensional vector of latent abilities, ci is a pseudo-guessing parameter, and Φ is the normal distribution function. (p. 2)
Results
In regard to the total sample, 146 (32.4%) of the 451 subjects were determined to meet criteria for civil commitment. The mean age was 41.26 years (SD = 12.45) and 260 (58%) were diagnosed with a paraphilia. The average STATIC-99 total score was 5.5 (SD = 1.7) with an average total RRASOR score of 3.2 (SD = 1.1). Of the total sample, 135 (30%) of the subjects had completed treatment prior to most recent incarceration and 40 (8.9%) completed treatment during their most recent incarceration. There were also 120 (27%) subjects who received sexual misconduct reports of any type. These variables were further analyzed by separating the subjects into two groups, those who were determined to meet commitment criteria and those who were not, and the results are illustrated in Table 1.
Demographic Characteristics and RRASOR and STATIC-99 Scores (N = 451).
Note. at-test.
Fisher’s exact test.
Area Under the Curve. CI = confidence interval.
As seen in Table 1, there were significant differences between the groups with the subjects determined to meet commitment criteria being older, having recent sex offender treatment, having a diagnosis of a paraphilia, and receiving more sexual misconduct reports. In addition, those determined to meet commitment criteria had significantly higher RRASOR and STATIC-99 total scores.
The MSP5 analysis indicated that the items did not fulfill the assumptions for monotone homogeneity or double monotonicity, and consequently, the sum score does not yield a measurement of an underlying trait. Moreover, the items did not conform to a cumulative scale resulting in a total score that is an adequate statistic for the trait level (Guttman-type scale). At a setting of c = .3 and α = .05, the automated item selection process, however, resulted in three clusters that satisfied the definition of a scale. As illustrated in Table 2, the groupings or clusters may be described as generally corresponding to the underlying factors of the STATIC-99 proposed by previous research: sexual deviancy (Stranger Victims, Prior Sex Offenses, Any Unrelated Victims) and antisocial or criminal behavior (Index Non-Sexual Violence, Prior Non-Sexual Violence, Prior Sentencing Dates). In addition, a third factor emerged represented by the items pertaining to age and relationship history.
Results of MSP Analysis of STATIC-99 (N = 451).
The results of the NOHARM analysis and the factor correlations are summarized in Tables 3 and 4, respectively. The factor loadings determine how closely each variable is aligned with the underlying factor and are consequently used for interpreting the factor. The results of the NOHARM analysis are generally consistent with the MSP analysis with Factor 1 having its highest loadings with Items 1, 6, and 7 (MSP scale 3: Stranger Victims; Prior Sex Offenses; Any Unrelated Victims); Factor 2 with Items 2, 4, and 5 (MSP scale 2: Index Non-Sexual Violence; Prior Non-Sexual Violence; Prior Sentencing Dates); and Factor 3 with its highest loadings with Items 9 and 10 (MSP scale1: Young; Single). Items 3 and 8 (Any Convictions for Non-Contact Sex Offenses, Any Male Victims) did not have very high loadings on any of the factors. Moreover, whereas Item 4 (Index Non-Sexual Violence) had its highest loading on Factor 2, it also had almost as large a loading on Factor 1, suggesting that this item may have elements that reflect both factors. 1
Promax (Oblique) Rotated Factor Loadings: Results of NOHARM Analysis-Three Factor Output (N = 451).
The STATIC-99 item “Prior Sex Offenses” was dichotomized so that a value of 1 indicated that the subject had a sex offense prior to the index offense and a value of 0 if he did not.
Factor Correlations.
Discussion
By utilizing nonparametric factor analysis and item response theory, this study empirically evaluated the underlying or latent traits of the STATIC-99 using the specific items for sexual offenders identified as high risk. That is, although general factorial models of static factors associated with sexual recidivism have been identified, none have examined the dimensional structure of specific STATIC-99 items beyond sexual deviancy and antisociality. The results of the present analysis, using a Mokken procedure and the Normal Ogive Harmonic Analysis Robust Method, yielded two main findings both of which have been previously identified in the research. First, the STATIC-99 items did not conform to a scale that supports previous assertions that its structure is multifactorial and does not represent a single psychological construct or latent trait. That is, these findings are consistent with a conceptualization of sexual recidivism as a behavioral outcome rather than an underlying psychological construct or trait similar to personality or intelligence.
Second, these findings also found a third factor associated with age and relationship status that has been referred to as the “Detachment” dimension (Roberts et al., 2002). As a distinct factor, this item cluster or dimension is hypothesized to reflect immaturity, lack of intimate attachments, and lack of social integration. Although several studies have found age and relationship status to be risk factors related to sexual offending recidivism (Hanson & Bussière, 1998; Mann, Hanson, & Thornton, 2010; Thornton, 2006), research has further indicated that they may actually be representative of other, underlying offender characteristics. For example, Thornton (2006) suggested that a decline in recidivism for sexual offenders over 60 years of age may be related to loss of physical health, reporting differences (families not wanting to report their grandparent) and sexual misbehavior being treated as a health problem. In addition, Mann et al. (2010) noted that the lack of emotionally intimate relationships with adults may be reflective of dysfunctional intimacy with differing underlying pathologies.
Although consistent with previous research, this study has three main limitations. First, the subjects for this research were considered high risk to sexually reoffend and consequently, the results presented here may have limited generalizability to other sexual offenders, such as those on community supervision. Second, the Mokken and NOHARM analysis should be considered exploratory and a starting point for further more sophisticated statistical procedures. Finally, given that the items within the identified clusters overlapped conceptually (e.g., age and relationship status), there is a question about whether the factors represented similar or distinct psychological constructs. In fact, a post hoc analysis using the Spearman Rank Correlation method indicated that the items of Young and Single were correlated (ρ = −.488, p < .01).
Despite these limitations, however, several conclusions may be proposed based on these results. First, the underlying latent structure of the STATIC-99 for a high-risk group of offenders includes the previously identified constructs of sexual deviancy and antisociality. Second, there appears to be a third construct, “detachment,” based on age and marital relationship history that has not yet been fully empirically explored. And third, these results support Hanson and Thornton’s assertion that sexual recidivism is a multifactorial phenomenon (Hanson & Thornton, 1999). Arguably, the acceptance of sexual recidivism as a behavioral outcome influenced by numerous factors provides important information about which analytical perspectives would be the most efficacious in the prediction of risk for future sexual assaults.
With regard to the identification of a third dimension associated with sexual recidivism, several questions remain: Is the third factor that was identified here reflective of underlying pathologies arising from an interaction between age and dysfunctional intimacy (Mann et al., 2010)? Do the items represent the degree to which offenders are integrated into social structure and/or the extent of their social support systems (Bonta, Rugge, Scott, Bourgon, & Yessine, 2008)? Do these items reflect an additional dimension of social competence that varies with age (Marshall, Barbaree, & Fernandez, 1995)? How one approaches the third dimension in treatment (if deciding to address it) will depend on how this third dimension is understood and operationalized.
As the field of sexual offender risk assessment continues to evolve, it is becoming increasingly evident that sexual recidivism is a much more complex construct than the original conceptualizations of static and dynamic factors. An alternative view, based on a behavioral outcome model, is that sexual recidivism is a set of behaviors occurring within a socio-cultural context that results from a complicated interaction between environments, psychological characteristics, and social phenomena. Moreover, these interactions manifest differently within individuals over time. Hopefully, this present study offers some empirical support for such a model and provides direction for future research as more data and better analytical methods become available.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
