Abstract
This study examined the relationship between psychometric test scores, psychometric test profiles, and sexual and/or violent reconviction. A sample of 3,402 convicted sexual offenders who attended a probation service–run sexual offender treatment programme in the community completed a battery of psychometric tests pre- and posttreatment. Using Cox regression, posttreatment scores on measures of self-esteem, an ability to relate to fictional characters, and recognition of risk factors were, individually, predictive of recidivism. When psychometric tests were grouped into dynamic risk domains, only the pretreatment scores of the domain labelled socioaffective functioning (SAF) predicted recidivism and added predictive power to a static risk assessment. The number of risk domains that were dysfunctional pretreatment also predicted recidivism outcome; however, this did not add predictive power to a static risk assessment tool. Possible explanations for the superiority of pre- over posttreatment scores in predicting reconviction are discussed, and directions for further research considered.
Given the severity of the consequences of sexual and violent crime, determining which offenders are most likely to reoffend in this way is a task of significant importance to those working within the Criminal Justice System and the public who are at risk of victimisation. It is not surprising, therefore, that the assessment of risk of reoffending is an area that has gained increasing prominence in academic, political, and public arenas.
This type of forensic risk assessment started as unstructured, subjective, clinical judgement; typically, this would take the form of a practitioner reviewing information on an offender and then using his or her judgement to determine the risk that individual posed. However, when tested empirically, such approaches tended to demonstrate poor reliability and validity, with high rates of error among even the most experienced risk assessors (e.g., Grove, Zald, Lebow, Snitz, & Nelson, 2000). As a result, so-called second-generation risk assessment tools were developed (Bonta, 1996). These actuarial tools code and combine information on historic, generally unchanging, factors (such as number of previous convictions or age at first offence), which are selected on the basis of their ability to predict reconviction in large samples of offenders (Beech, Fisher, & Thornton, 2003). These static factors cannot be deliberately changed through intervention, so they are of little clinical value and are not selected on the basis of any supposed theoretical link to offending (Beech et al., 2003). However, Beech and Ward (2004) suggested that static factors are actually historic markers of enduring, changeable, psychological problems.
The Risk Matrix 2000 (RM2000; Thornton et al., 2003) is a static risk measure that has been developed to assess risk of sexual reconviction in convicted sexual offenders and is widely used by those in the Criminal Justice System in the United Kingdom, whereas the Static-99 (Hanson & Thornton, 2003) is popular in North America and Canada. Such measures have been extensively tested and consistently outperform unstructured clinical judgement in predicting sexual reoffending (Hanson, Morton, & Harris, 2003). Both the RM2000 and the Static-99 have demonstrated moderate to good accuracy in assigning offenders to groups that differ in rates of sexual reconviction (see Hanson & Morton-Bourgon, 2009, for a recent meta-analytic review of risk assessment tools for sexual offenders).
While second-generation tools are useful to those determining which offenders resources should be directed to, discriminating as they do between those at higher and lower risk of reoffending, they cannot be used to determine what someone needs to change to lower his or her risk (Beech et al., 2003). Indeed, second-generation tools do not enable a reappraisal of risk following intervention aimed at reducing the risk someone poses. Third-generation risk assessment schemes were developed in an attempt to fill this gap; these tools focus on identification of psychologically meaningful, changeable risk factors (dynamic factors) that have demonstrated a reliable relationship with reoffending (Hanson & Harris, 2001). Internationally, there are a number of third-generation risk assessment schemes in use that ask assessors to consider a range of dynamic risk factors that have demonstrated a causal link to reoffending (e.g., Hanson & Harris, 2001; Thornton, 2002). For sexual offending, there are a number of such factors that risk assessors are routinely guided to consider, as a result of the strength of evidence linking them to sexual reoffending (see Mann, Hanson, & Thornton, 2010, for a recent review). These factors can be grouped into four domains: (a) offence-related sexual interests, such as a sexual interest in sexualised violence, a preoccupation with sex, a sexual preference for prepubescent children, and certain paraphilias that are more easily acted out through offending; (b) offence-supportive attitudes (OSA), such as beliefs that children are not harmed by or can enjoy sex with an adult, that men should dominate women sexually, that men are titled to sex, and that women are deceitful and lead men on sexually; (c) SAF, which includes an emotional congruence with children; feeling inadequate, lonely, and unable to do anything about the situation; having a suspicious, hostile, vengeful style of thinking; and lacking emotional intimacy with adults; and (d) self-management (SM), which includes being impulsive, having poor emotional control, and having poor cognitive problem-solving ability (Beech, 1998; Thornton, 2002).
Measurement of dynamic risk factors can be problematic, often relying on clinical judgement and self-report. As a result, third-generation schemes, such as the Structured Assessment of Risk (Thornton, 2002), its close derivative the Structured Assessment of Risk and Need (SARN; Webster et al., 2006), or the STABLE (Hanson, Harris, Scott, & Helmus, 2007), tend to require that a range of sources of evidence are considered to make a judgement on the presence, absence, and/or strength of each dynamic factor for an individual. One such source of evidence comes from psychometric assessments that measure psychological characteristics. However, such assessments have been criticised for relying on accurate self-report, which can be affected by the offender’s motivation to be open about his or her problems and by an offender’s level of insight into himself or herself (Holden, Kroner, Fekken, & Popham, 1992).
Despite these concerns, researchers have found links between self-report psychometric tests and subsequent sexual offending. Using samples of convicted sexual offenders about to enter treatment in the community or custody, Beech (1998) developed a method for identifying the severity of dynamic risk posed by individuals based on their pretreatment psychometric test scores. In all, 140 offenders who had sexually abused children completed psychometric tests measuring various dynamic risk factors associated with sexual offending. Based on their psychometric assessment scores, using agglomerative cluster analysis, two clusters of offenders emerged. The mean scores for Cluster A were significantly higher than the mean scores for Cluster B on nine of the measures used. Cluster A was labelled “high deviancy,” as their scores deviated highly from the nonoffender norms for these measures, whereas Cluster B was labelled “low deviancy.” Subsequent analysis found that high- and low-deviancy groups differed significantly in relation to the number and type of their previous victims, and risk of reconviction (as calculated from the offenders’ offence histories).
Beech, Friendship, Erikson, and Hanson (2002) found that, in a sample of 140 convicted child abusers (from both prison and probation settings) with a follow-up period of up to 6 years, psychometric deviancy level added incrementally to the predictive validity of a static risk assessment. Beech and Ford (2006) found similar results with a group of 51 sexual offenders attending a treatment clinic; high-deviancy men were more likely to be reconvicted at the 2-year and 5-year follow-up periods (13% and 44% reoffending, respectively) than the low-deviancy men (4% reoffending at 2 years and 10% at 5 years). Using a sample of child molesters attending treatment in New Zealand, Allan, Grace, Rutherford, and Hudson (2007) reported that psychometric measures of OSA and sexual interests predicted reconviction for a sexual reoffence, despite previous concerns about the face validity of these tests. Taken together, these studies suggest that psychometric assessment can be used to measure dynamic risk of sexual offending and that this could add to the predictive power of static risk assessments.
The Current Study
In the probation service, the psychometric test battery developed by Beech (1997) is currently used to help identify the level of dynamic risk, also termed criminogenic need, of sexual offenders attending treatment. Those with a high level of criminogenic need (high deviancy) are directed to a higher dosage of treatment than those with low levels of criminogenic need (low deviancy). In addition, scores on individual measures are used to help identify an individual’s specific risk factors (e.g., beliefs that sex with children is not harmful), to aid treatment planning. The psychometric profiles of treatment completers are also used in the risk assessment of sexual offenders. Each of the functions that the psychometric tests currently serve rests on the assumption that the measures are indeed reliably capturing factors that are related to risk of sexual reoffending.
Although existing studies provide encouraging results, these have involved only small samples of offenders, all of which have been abusers of children. The current study aims to examine the relationship the psychometric tests have with future sexual and/or violent reconviction for a large sample of sexual offenders with a range of offence types. The relationship between deviancy level, overall level of functionality posttreatment (based on an amalgamation of psychometric tests scores), and reconviction will also be examined for a large group of child abusers. In both structure and content, this article mirrors that of Wakeling, Beech, and Freemantle (2010), to allow direct comparison of psychometric tests used in sexual offender treatment in community and custodial settings in England and Wales.
Research Questions
The questions this study aims to answer are as follows:
Research Question 1: Which psychometric tests from each dynamic risk domain are the best predictors of recidivism, pre- and posttreatment?
Research Question 2: When psychometric tests are grouped to form dynamic risk-domain scores, in which domains do recidivists differ from nonrecidivists, pre- and posttreatment?
Research Question 3: Which of the dynamic risk domains, pre- and posttreatment, are the best predictors of recidivism, and do they add incremental validity to static risk?
Research Question 4: Is there an association between the number of dysfunctional dynamic risk domains (overall functionality), pre- or posttreatment, and recidivism; does this predict recidivism; and does it add incremental validity to static risk?
Research Question 5: Does deviancy level discriminate between recidivists and nonrecidivists in a sample of child abusers, and does this add predictive power to static risk?
Method
Participants
The sample was taken from a population of convicted sexual offenders who had completed treatment in the community in the form of one of the probation service accredited sexual offender treatment programmes since 2002. The sample consisted of 3,402 offenders who completed the psychometric assessments both pre- and posttreatment, and whose psychometric test information was recorded centrally. Noncompleters were not represented in the sample, as they would not have completed the posttreatment psychometric tests. A review of community treatment provision in 2008-2009 indicated that there was an average noncompletion rate of 8.9% (n = 175) across probation-run sex offender group–work programmes (Rehabilitation Services Group [RSG], 2009). This report also indicated that the main reason people were unable to complete the programme was that their community sentence expired before they had finished the course. This suggests that (a) the number of noncompleters not included in this sample is likely to have been small and that (b) the majority of the noncompleters (those whose reason for noncompletion was order expiry) might not have differed from the completers in important ways that could affect either psychometric test scores or reconviction rates. However, it is likely that a small minority of noncompleters were reoffenders or had a high level of dynamic need/more extreme psychological problems, and this was the reason that they failed to complete the programme.
In addition, there would have been a number of offenders who were not included because their assessments were not supplied to the central database. The view of practitioners who provide this information is that such missing data tends to be the result of operational issues, such as a lack of resources for data entry, rather than factors that could lead to systematic bias between those included in the sample and those whose data are missing. This means that while the sample does include psychometric test information from offenders who completed treatment in every probation area that delivers the programmes, and is therefore a national sample, it is not representative of everyone who either started or completed treatment in the community. Central records indicate that around 6,714 offenders have completed one of the sex offender group work programmes in the community since 2002; therefore, the current sample represents around half (50.7%) of those who completed treatment in this time.
Table 1 summarises information on the demographic and offence-related characteristics of the sample. The ages of those in the sample ranged from 18 to 82 years, and they had between one and eight victims (M = 1.96, SD = 0.95). The average length of time spent on a treatment programme in the community was 14.3 months (SD = 7.3), and 15.2% had already engaged in sexual offender treatment in custody. Follow-up started once the participants had completed their posttreatment psychometric test battery. The average length of follow-up was 3 years.
Sample Characteristics
Note: RM = Risk Matrix.
Programme Descriptions
Four accredited sexual offender group work programmes are currently active in the community. Three of the programmes share the same treatment targets but operate in different probation regions in England and Wales. These are the Community Sex Offender Group-Work (CSOG) programme, the Thames Valley Sex Offender Group-Work (TVSOG) programme, and the Northumbria Sex Offender Group-Work (NSOG) programme.
The CSOG programme is delivered in 13 of the 42 probation areas and consists of three parts. The induction module represents 50 hr of treatment and is suitable for high- and low-deviancy offenders. Participants attend either as a condition of their community sentence or as a condition of their licence if they have been released from prison without having completed treatment. High-risk, high-deviancy offenders then go on to the long-term therapy programme, which is a further 190 hr of treatment. This covers victim empathy, SM, and offence-related fantasy. Low-risk, low-deviancy offenders who have completed the induction module, high-deviancy offenders who have completed this and the long-term therapy programme, and the offenders who have completed treatment in custody also attend the final part of the programme that focuses on relapse prevention. This section of the programme lasts around 50 hr.
The TVSOG programme is offered in 16 probation areas and is shorter than the CSOG programme. In total, the four components of the TVSOG programme last around 160 hr. High-risk, high-deviancy men are directed to attend the full programme, whereas low-risk, low-deviancy men will complete three of the four blocks; the foundation block, the victim empathy block, and the relapse prevention block. Offenders who have completed treatment in custody go directly to the final block of the programme. The foundation block lasts for 60 hr and is delivered over a 2-week period. It covers topics such as offence-related fantasy and OSA. This is followed by a 16-hr victim empathy block and a 40-hr life skills block. Finally, there is a block that focuses on risk management and relapse prevention.
The NSOG programme is delivered in 12 probation areas and takes a total of 180 hr to complete. It has two main components, the first of which is the Core programme. This is attended by high-risk, high-deviancy men who have community sentences or who are on licence and have not completed treatment in prison. The Core aims to address issues such as victim empathy, cognitive distortions, and problem solving. The second component focuses on relapse prevention and consists of around 36 hr of treatment. Low-risk, low-deviancy offenders and offenders who have completed treatment in custody go directly to this part of the programme.
The relapse prevention components of all three programmes were replaced with updated versions whose approach is in line with the Good Lives Model (Ward, 2002). The new versions of this module started to roll out in 2006 and had less of a focus on avoidance goals, and more of a focus on building motivation and skills for a better, offence-free life, than the previous versions.
The Internet Sex Offender Treatment Programme (I-SOTP) is delivered in 28 probation areas and is targeted at low- or medium-risk, low-deviance offenders whose sexual convictions relate to Internet-based indecent images of children. The I-SOTP is delivered in either a one-to-one format (consisting of a total of 30-45 hr of treatment) or a group format (consisting of a total of 70 treatment hr). I-SOTP consists of six modules covering (a) change motivation; (b) understanding offending; (c) victim empathy; (d) self- and emotional management and intimacy skills; (e) compulsivity, community engagement, and collecting behaviour; and (f) relapse prevention.
Materials
Static risk:RM2000
The RM2000 (Thornton et al., 2003) is a static risk-assessment tool for use with adult males who have been convicted of a sexual offence. At least one of the sexual offences must have been committed when the offender was more than 16. The RM2000/s predicts sexual recidivism and is made up of seven items divided into two scoring steps. Step one comprises three items: age of the offender on release, number of sentencing occasions for a sexual offence, and number of sentencing occasions for any criminal offence. The scores assigned to each of these items are summed and translated into one of the four preliminary risk categories: low, medium, high or very high. The second scoring step considers four risk-raising items (aggravating factors): whether the offender has any male victims of sexual offending, whether any of the offenders’ victims were strangers, whether the offender has ever had a stable live-in relationship for over 2 years (termed the “single” item), and whether the offender has ever committed a noncontact sexual offence. These items are scored on a dichotomous scale as either present or not. If two or three of these items are present, the initial risk category is raised one level (e.g., from low risk to medium). If all four of these aggravating factors are present, the initial risk category is raised by two risk levels (e.g., from low to high). According to current guidance on using RM2000 with internet, only offenders (Thornton, 2007), the noncontact, and stranger items should not be scored. The present study adheres to these guidelines.
A number of studies have indicated that the RM2000/s has good predictive validity with U.K. samples (Barnett, Wakeling, & Howard, 2010; Craig, Beech, & Browne, 2006; Grubin, 2008; Thornton et al., 2003).
Reconviction Data
Reconviction data were sourced from the Police National Computer (PNC) for all offenders for whom pre- and posttreatment psychometric test data were available. This data included the year of the first reconviction and the type of offence (sexual or violent) the reconviction referred to.
As can be seen in Table 1, 5.4% of the sample had been reconvicted of a sexual offence (n = 185), 1.6% had been reconvicted of a violent offence (n = 54), and 6.9% had been reconvicted of a sexual or a violent offence (n = 235). Because of the low base rates of sexual reconviction in this sample, this study uses sexual and/or violent reconviction as its outcome of interest. Using this outcome measure also accords with research that suggests that sexual offenders will often reoffend with a nonsexual offence (Hanson & Bussière, 1998).
Psychometric Measures
The Interpersonal Reactivity Index (IRI)
The IRI is a 28-item measure of the cognitive and emotional components of empathy (Davis, 1980). Respondents rate items on a 5-point Likert-type scale ranging from 0 (does not describe me well) to 4 (describes me very well). It has four subscales each consisting of seven items. The Fantasy subscale measures ability to relate to fictional characters, the Empathic Concern subscale measures compassion for others, the Perspective Taking subscale measures ability to take on another’s perspective, cognitively, and the Personal Distress subscale measures a tendency to experience anxiety and negative feelings in response to others’ distress. Each scale is scored separately and scores range from 0 to 28. The four subscales are reported to have satisfactory internal consistency (Fantasy: α = .77, Empathic Concern: α = .72, Perspective Taking: α = .72, and Personal Distress: α = .74), and the test–retest reliability coefficients for each subscale were reported as .77, .79, .81, and .74, respectively (Rallings & Webster, 2001).
The Relapse Prevention Questionnaire
This questionnaire (Beckett, Fisher, Mann, & Thornton, 1997) consists of 18-items that elicit respondents’ recognition of lapse cues, possession of coping skills and strategies, and acceptance of future risk and likelihood of relapse. Responses are coded on a 3-point scale: 0 = no recognition or skills, 1 = has some idea/skills, and 2 = shows good recognition or skills. Higher scores reflect greater relapse-cue recognition and management skills.
The Self-Esteem Scale
This scale (Thornton, Beech, & Marshall, 2004; Webster, Mann, Thornton, & Wakeling, 2007) is an eight-item measure of general self-esteem. Higher scores reflect greater self-esteem. Items are rated either yes or no, and the highest score attainable is 16. Webster et al. (2007) report excellent psychometric test properties for this scale: The internal consistency was α = .84 and the test–retest reliability of the scale was .90.
The University of California (UCLA) Loneliness Scale
This scale (Russell, Peplan, & Cutrona, 1980) is a measure of emotional loneliness. It was originally a 20-item questionnaire; however, 1 item was removed following a factor analysis of the original items. This 19-item questionnaire indicates the extent to which respondents believe they have meaningful relationships, have people close to them, or are lonely. Item responses are on a 4-point Likert-type scale. Greater scores indicate greater loneliness and fewer close and meaningful relationships. Previous studies indicate that the internal consistency of the scale was α = .95 and the test–retest reliability of the scale was .79 (Rallings & Webster, 2001).
The Children and Sex Questionnaire
This questionnaire (Beckett, 1987) is an 87-item questionnaire that measures respondents’ attitudes, feelings, and thoughts about children and sex. Higher scores reflect stronger attitudes supporting the sexual abuse of children. Respondents rate each item on a 5-point Likert-type scale. Only 30 of the 87 items are scored. These 30 items are clustered into two subscales: Cognitive Distortions and Emotional Congruence. Items are summed to produce a total scale score. High scores reflect stronger beliefs that support the abuse of children and higher congruence and stronger identification with children. Beech, Fisher, and Beckett (1998) report good psychometric test properties for this scale, test–retest reliability was .77 (Cognitive Distortions) and .63 (Emotional Congruence).
The victim empathy distortions scale
This scale (Beckett & Fisher, 1994) consists of 30 questions about how the offender’s (child) victim might have felt about the offence in both the short- and the long-term. In addition, there are questions pertaining to the lead-up to the offence as well as questions that aim to assess the offender’s perceptions about whether the victim was culpable. Responses are given on a 4-point scale. Beech et al. (1998) reported that the test–retest reliability of this scale was very good at .95.
The Underassertiveness/Overassertiveness Scale
Social Response Inventory (SRI; Keltner, Marshall, & Marshall, 1981) consists of 22 items that measure self-reported levels of underassertiveness and overassertiveness in hypothetical situations. Respondents are given certain scenarios and indicate which, from a range of five possible reactions, best describes what they would do. Beech et al. (1998) report good psychometric properties for the Underassertiveness scale; its test–retest reliability was .80.
The Nowicki–Strickland Locus of Control Scale
This scale (Nowicki, 1976) measures the extent to which an individual feels that events are contingent on his or her own behaviour or that events are beyond his or her control. It consists of 40 items with a dichotomous yes/no response format. Beech et al. (1998) report good psychometric properties for this scale; test–retest reliability was .83.
The Barratt Impulsivity Scale–II (BIS-II)
This scale (Barratt, 1994) is a 30-item measure with three subscales. The Motor Impulsivity subscale measures the extent to which individuals act without prior thinking, the Cognitive Impulsivity subscale assesses whether the individual makes quick cognitive decisions, and the Nonplanning subscale measures an individual’s lack of concern for his or her future. Respondents are asked how much the statements apply to them using a 4-point scale ranging from rarely/never to almost always/always. Patton, Stanford, and Barratt (1995) report that the internal consistency of the scale ranges from .79 to .83.
The Paulhus Deception Scales (PDS)
This scale (Paulhus, 1988) has 40 items and consists of two subscales, Self-Deceptive Enhancement (SDE) and Impression Management (IM). The SDE subscale measures an unconscious favourability bias closely related to narcissism. High scores on this subscale reflect a trait-like tendency toward overly self-favourable presentation rather than situational demands. The IM subscale is designed to measure responding that is guided by a desire to create a favourable impression on others and, thus, intends to measure the extent to which the respondent is faking or lying. High scores indicate that the respondent may be exaggerating and purposely trying to impress the assumed audience. Paulhus (1988) reports good psychometric test properties for the measure; the internal consistency of the SDE subscale ranged from .70 to .75 and IM and PDS total coefficients ranged from .81 to .86, and the scale has demonstrated good convergent, discriminant, and construct validity.
Procedure
Psychometric tests were administered pre- and posttreatment in groups, in probation offices, by programmes staff. Raw data were entered on-site by programme teams, and then sent to and collated in the RSG where it was added to a central psychometric test database, the Sex Offender Psychometric Scoring System (Mandeville-Norden, Beech, & Middleton, 2006). RM2000/s was scored by trained staff in the field and then sent to RSG along with the psychometric test data. Reconviction data were sourced for those who had both pre- and posttreatment psychometric test data. Time at risk was calculated as the time between completion of the posttreatment psychometric tests and either the date of reoffence or end of the follow-up period (the date at which the reconviction data was sourced). Recall to prison for a nonsexual, nonviolent offence or for a breach of probation order was taken into account when calculating time at risk (i.e., for those who were recalled, time at risk was calculated as time from completing psychometric tests to time to reincarceration) but was not counted as a new offence.
Prior to analysis, the psychometric tests were grouped into three of the four known dynamic risk domains for sexual offending; OSA, SAF, and SM. There was no direct psychometric measure of offence-related sexual interest available for this sample. Please refer to Table 2 below for a list of which psychometric tests were assigned to each domain. Psychometric tests were grouped into domains on the basis of the theoretical relationship between the dynamic risk factors they attempt to measure and the risk domains. The Children and Sex Questionnaire and Victim Empathy Distortions Scale were included in the OSA domain as they measure cognitions relating to beliefs that children can enjoy, are not harmed by, or can actively seek sex with adults. The Emotional Congruence With Children scale was also included in this domain, following the precedent set by Beech (1998). The Short Self-Esteem scale, the Locus of Control Questionnaire, the UCLA Loneliness scale and the Under- and Overassertiveness scales are all related to the concept of inadequacy (Beech, 1997; Fisher, Beech, & Browne, 1999), which is related to SAF. Also related to SAF are the concepts of Personal Distress and Perspective Taking and Empathic Concern, both of which could feasibly impact on someone’s ability to relate to others. The SM domain was made up of the scales measuring impulsivity, Fantasy, and relapse-prevention planning, all of which relate to the ability to control or cope with/manage problems.
Psychometric Tests by Dynamic Risk Domain
Although the Paulhus scales are not measuring a dynamic risk factor, the current position in the probation service is that those who score highly on this measure should be treated as if they were high deviancy, regardless of their scores on other measures. This is based on the rationale that such an individual’s scores on the other psychometric tests would be unreliable as the individual would be prone to providing socially desirable responses. As the Paulhus scales can be conceptualised as a measure of how defended an individual is, it is also argued that those with higher Paulhus scores require more treatment than those who are less defended, to work through these defences (Beech et al., 1998). Deviancy level was calculated for all child molesters in the sample according to Beech’s (1998) criteria.
Analyses
Cox regression analyses were used to examine whether pre- or posttreatment psychometric test scores were predictive of recidivism. For all regression analyses, the number of variables that can be entered into the models is based on a formula cited in Harrell, Lee, and Mark (1996) which states that m/10 number of predictors can be used, where m is the number of people in the less-frequent outcome category (here the reconviction group). When using sexual and/or violent recidivism as the outcome variable, 23 variables can be entered into the model (235 offenders were reconvicted for a sexual or violent offence; 235/10 = 23.5). This number varies for each analysis depending on the number of offenders in the reconviction group for whom the psychometric test data are available. Time at risk or to reoffence (calculated as the time between completing the posttreatment psychometric tests and first proven reoffence or end of follow-up period for nonrecidivists) and recidivism outcome (whether or not there was a proven reoffence in the follow-up period) were the outcome variables.
To determine which of the psychometric tests from each domain were most predictive of recidivism outcome, regression analyses were performed; one for each of the three dynamic risk domains represented in the battery of measures used. For example, the three pretreatment psychometric tests measuring those risk factors from the OSA domain were entered as predictor variables into the first model, whose outcome variable was time to sexual and violent reconviction. The same regression analyses were then performed on the posttreatment psychometric test data to establish whether posttreatment psychometric test scores have predictive value.
Next, dynamic risk-domain scores were calculated for all of the offenders, which were produced to represent a measure of level of problems in each risk domain for each offender. This was achieved by standardising each of the psychometric measures using the mean and standard deviation of the entire sample. Once standardised, scores greater than 0 were considered dysfunctional, as such a score is worse than half of the entire sample. For those psychometric tests on which high scores denote greater functionality than low scores, the standardised scores were initially multiplied by −1, so that greater than 0 was dysfunctional across all psychometric tests. Domain scores were then created by summing the standardised scores of all of the psychometric tests relating to each of the three dynamic risk domains (see Table 2). Psychometric tests were assigned to different risk domains on the basis of the theoretical relationship between the constructs the psychometric tests intend to measure and the nature of the risk domain. Missing psychometric scores were given the average standardised score for that psychometric. If 10% of scores within a domain were missing, a total domain score was not computed. If less than 10% of scores with a domain were missing, these missing scores were imputed with the average standardised scores. The domain scores were then standardised themselves, if the score was over 0 the domain was counted as dysfunctional.
Finally, a Psychological Deviance Index (PDI) was produced, based on the number of dysfunctional domains for each individual, which can range from 0 to 3. High scores indicate greater dysfunction. The PDI can be used as a measure of psychological deviance allowing testing of this concept with rapists as well as child molesters. t tests, chi-square receiver operating characteristics (ROC) analyses, and Cox regression models were then performed on these dynamic risk-domain scores and overall PDI scores.
Results
RM2000/s and Recidivism
For those in the sample for whom the RM2000 data were available, 40.1% were classified as low risk, 40.7% as medium risk, 16.4% as high risk, and 2.8% as very high risk. The mean RM2000 level was significantly higher for those men who sexually reoffended (M = 1.07, SD = 0.96) than for those who did not (M = 0.80, SD = 0.79; t[180.51] = 3.47, p < .001). The mean RM2000 level was also significantly higher for those men who sexually or violently reoffended (M = 1.12, SD = 0.96) than for those who did not (M = 0.80, SD = 0.79; t[236.09] = 4.84, p < .001). A Pearson’s correlation indicated that the higher risk categories were significantly associated with sexual (.07), violent (.08), and sexual and violent (.10) reconviction (p < .01 for all correlations). ROC analysis indicated that the RM2000/s had an area under the curve (AUC) statistic of .60. The area under the ROC curve provides a measure of predictive accuracy and can range from .5, indicating that prediction is no better than chance, to 1, indicating perfect prediction.
Socially Desirable Responding
To determine whether or not the sample may have responded to the measures in a socially desirable way, correlational analyses were performed examining the relationship between the IM subscale of the PDS and each of the measures, pre- and posttreatment. The correlations suggest that there was no issue with social desirability of responses as for all measures except emotional loneliness (r = −.10) scores were positively correlated with IM (i.e., more problematic scores on each measure were associated with higher levels of IM).
Research Question 1 (RQ1): Which psychometric tests from each dynamic risk domain are the best predictors of recidivism, pre- and posttreatment?
A total of six Cox regression analyses were performed: three to examine the ability of the pretreatment psychometric tests to predict sexual and violent recidivism by dynamic risk domain and three to examine the ability of the posttreatment psychometric tests to predict sexual and violent recidivism by dynamic risk domain.
Pretreatment scores
The model for the OSA psychometric tests, using the Victim Empathy Distortions, Emotional Congruence With Children, and Cognitive Distortions pretreatment scores (conducted on the child molesters in the sample only), was not significant (−2 log likelihood [LL] = 2823.19, p = .78). None of the individual measures were significant predictors of recidivism outcome (see Table 3). Similarly, the model using the pretreatment SAF psychometric tests was not significant (–2LL = 3364.94, p = .09), and again, none of the individual measures were significant predictors of recidivism outcome (see Table 3).
Summary of Simultaneous Regression Analyses for Pretreatment Offence-Supportive Attitudes, Socioaffective Functioning, and Self-Management Domain Variables Predicting Sexual and/or Violent Reconviction
Note: Tx = treatment.
However, the model using the pretreatment SM domain psychometric tests was significant (–2LL = 3392.17, p < .05). The Fantasy scale was the only significant predictor; higher scores on this measure were associated with recidivism (see Table 3). ROC analyses indicated that the model containing pretreatment Fantasy score and RM2000/s category was more accurate at predicting relative risk (AUC = .63, 95% confidence interval [CI] = [.58, .67]) than RM2000/s alone (AUC = .60, 95% CI = [.55, .64]).
Posttreatment scores
The model for the posttreatment OSA psychometric tests (using the child molester sample only) was not significant (–2LL = 3179.13, p = .94), and none of the individual measures were significant predictors of recidivism outcome (see Table 4). Nor was the model using the posttreatment SAF psychometric tests (–2LL = 3548.87, p = .20). However, the posttreatment Self-Esteem score was a significant predictor; lower Self-Esteem scores were related to recidivism (see Table 4). ROC analyses indicated that the model containing posttreatment Self-Esteem score and RM2000/s was slightly more accurate at predicting relative risk (AUC = .62, 95% CI = [.57, .66]) than RM2000/s alone (AUC = .60, 95% CI = [.55, .64]).
Summary of Simultaneous Regression Analyses for Posttreatment Offence-Supportive Attitudes, Socioaffective Functioning, and Self-Management Domain Variables Predicting Sexual and/or Violent Reconviction
Note: Tx = treatment.
Similar to the pretreatment model for the SM domain, the model using the posttreatment psychometric tests for this domain was significant in predicting time to sexual and/or violent reconviction (–2LL = 3054.83, p < .01). The posttreatment Fantasy and Relapse-Prevention Recognition scales scores were both significant predictors; offenders with higher scores on these measures were more likely to be reconvicted for a sexual and/or violent reoffence (see Table 4). This means that better scores on Relapse-Prevention Recognition (recognising risky thinking, emotions, and situations) were related to reconviction for a sexual or violent reoffence. ROC analyses indicated that the model containing posttreatment Fantasy score and RM2000/s category was more accurate at predicting relative risk (AUC = .63, 95% CI = [.59, .67]) than RM2000/s alone (AUC = .60, 95% CI = [.55, .64]), as was the model containing Relapse-Prevention Recognition score and RM2000/s (AUC = .62, 95% CI = [.58, .67]). The model containing both the posttreatment Fantasy score and the Relapse-Prevention Recognition score with RM2000/s was even better at predicting relative risk (AUC = .65, 95% CI = [.61, .69]).
Research Question 2: When psychometric tests are grouped to form dynamic risk-domain scores, in which domains do recidivists differ from nonrecidivists, pre- and posttreatment?
t-test analyses using the standardised dynamic risk-domain scores found that recidivists had significantly higher (more problematic) scores pretreatment in the SAF domain (t[3,390] = 4.62, p < .001) than the nonrecidivists. Recidivists and nonrecidivists also differed in their pretreatment scores on the SM domain (t[3,400]) = 2.22, p < .05); again recidivists had worse scores on this domain than the nonrecidivists. Posttreatment recidivists had significantly higher (more problematic) scores on the SAF domain only (t[3400] = 2.62, p < .01).
Research Question 3: Which of the dynamic risk-domain scores, pre- and posttreatment, are the best predictors of recidivism, and do they add incremental validity to a measure of static risk?
Seven Cox regression analyses were performed to examine the ability of the standardised dynamic risk-factor domain scores to predict time to sexual and/or violent reconviction (see Table 5).
Summary of Simultaneous Regression Analyses for Standardised Pretreatment and Post-treatment Domain Scores Predicting Sexual and/or Violent Reconviction
Note: OSA = offence supportive attitudes; SAF = socioaffective functioning; SM = self-management; Tx = treatment.
Pretreatment domain scores
Using only the child molester sample for which the psychometric tests in the OSA domain were relevant, the predictive ability of the OSA pretreatment total score was examined. The model was not significant (–2LL = 3178.37, p = .28).
Using the whole sample, a model incorporating the SAF pretreatment scores was examined. This model was significant (–2LL = 3546.88 p < .001). ROC analyses indicated that the model containing the pretreatment SAF domain standardised score and RM2000/s was more accurate at predicting relative risk (AUC = .64, 95% CI = [.60, .68]) than RM2000/s alone (AUC = .60, 95% CI = [.55, .64]).A third model using the pretreatment SM domain standardised score was not significant (–2LL = 3558.40 p = .22).
A hierarchical Cox regression was then used to examine whether the standardised pretreatment SAF domain score (the only domain score that was predictive of reconviction) provided increased predictive validity for sexual and violent recidivism when controlling for static risk level (see Table 6). The RM2000 risk categories (categorical variable) were entered into the model at the first step, and the domain score was entered at the second step in a forward stepwise model.
Summary of Regression Analysis Using RM2000 Variables and Pretreatment Socioaffective Functioning (SAF) Score (N = 3,059)
Note: RM = Risk Matrix; SAF = socioaffective functioning; Tx = treatment.
The final model was significant (–2LL = 3168.04, p < .001), and the standardised pretreatment SAF domain score added significantly to the predictive power of RM2000 (χ2 = 7.03, p < .01). In addition, ROC analyses indicated that the model containing the pretreatment SAF domain standardised score and RM2000/s category was more accurate at predicting relative risk (AUC = .64, 95% CI = [.60, .68]) than RM2000/s alone (AUC = .60, 95% CI = [.55, .64]).
Posttreatment domain scores
None of the models using posttreatment domain standardised scores were significant in predicting time at risk and sexual or violent reconviction (OSA domain, using child molesters only, –2LL = 3179.23, p = .58; SAF domain, –2LL = 3558.99, p = .33; SM domain, –2LL = 3558.09, p = .72).
Research Question 4: Is there an association between the number of dysfunctional dynamic risk factor domains (PDI), pre- or posttreatment, and recidivism; does this predict recidivism and does this add incremental validity to a measure of static risk?
Chi-square analyses were computed to examine the association between the PDI groups and recidivism outcome. The proportion of offenders who were reconvicted of another sexual or violent offence increased with each PDI group, both pre- and posttreatment. Those who exhibited no problems in any of the domains had a sexual and/or violent reconviction rate of 4.9%, increasing to 6% for those with problems in one domain, 7.2% for those with problems in two, and 11% for those with problems in all three domains measured. The associations by PDI groups and recidivism outcome were significant pre-treatment; χ2(3, 1) = 15.89, p < .01; but not posttreatment; χ2(3, 1) = 5.55, p = .14.
The PDI pretreatment variable was entered into a Cox Regression analysis using time to reconviction for a sexual or violent reoffence as the outcome (see Table 7). The model was significant (–2LL = 3552.52, p < .05). The model using the posttreatment PDI as a predictor for the same outcome was not significant (–2LL = 3357.24, p = .32).
Summary of Three Regression Analyses, the First Using the Pretreatment PDI, the Second Posttreatment PDI Predicting Sexual and/or Violent Reconviction (N = 3,386), and the Third Using RM2000 Variables and Pretreatment PDI Score (N = 3,059).
Note: RM = Risk Matrix; Tx = treatment; PDI = psychological deviance index.
Subsequently, a further Cox regression was performed to examine the ability of the pretreatment PDI score to predict outcome along with RM2000/s risk categories. At Step 1, the RM2000 variables were entered and then at Step 2, the pretreatment PDI score was entered. The final model was significant (–2LL = 3175.51, p < .001) and the RM2000 low, high, and very high risk categories were individually predictive of outcome, but the pretreatment PDI did not add to the predictive power of the RM2000 (χ2 = 1.56, p = .21). However, ROC analyses indicated that the model containing the pretreatment PDI and RM2000/s category was slightly more accurate at predicting relative risk (AUC = .62, 95% CI = [.58, .66]) than RM2000/s alone (AUC = .60, 95% CI = [.55, .64]).
Research Question 5: Does deviancy level discriminate between recidivists and nonrecidivists, and does it add predictive power to a static risk assessment?
Chi-square analyses were used to determine whether sexual and or violent recidivism status was associated with deviancy level. Only child molesters were used in these analyses, as this equation was developed for use with this group only. Deviancy level was significantly associated with recidivism status, χ2(1, n = 2367) = 3.72, p < .05; high deviancy was associated with recidivism, low deviancy with nonrecidivism. Correlational analyses indicated that deviancy status was also related to RM2000/s risk category (r = .31, p < .01).
The potential incremental validity of the deviancy level was examined using Cox regression analysis (see Table 8). The model was not significant (–2LL = 2174.08, p = .36); deviancy level did not add any incremental predictive validity to the RM2000. ROC analyses also indicated that the model containing deviancy and RM2000/s was no more accurate at predicting relative risk (AUC = .59, 95% CI = [.54, .64]) than RM2000/s alone (AUC = .60, 95% CI = [.55, .64]).
Summary of Regression Analysis Using RM2000 Variables and Deviancy Level (N = 2,513)
Note: RM = Risk Matrix.
Each of the analyses described here were also run using reconviction for a sexual offence only as the outcome variable. None of the analyses produced substantively different results to those reported here.
Discussion
This study aimed to explore the potential relationship between the psychometric test scores of convicted sexual offenders attending treatment in the community and subsequent sexual or violent reconviction. Previous studies have indicated that groups that differ in their psychometric profile can differ in their reconviction rates; those whose scores deviate more highly from the norm having higher reconviction rates than those whose scores are closer to the nonoffender norms for those measures (Beech, 1997, 1998; Beech & Ford, 2006). This study provided some tentative support for these earlier studies. In addition, this sample included sexual offenders with a range of offences suggesting that psychometric tests could potentially discriminate recidivists from nonrecidivists and be predictive of recidivism, for rapists as well as child abusers. This is in line with the findings of Wakeling et al. (2010), who reported similar results for psychometric tests used with sexual offenders undertaking treatment in custody.
First, each psychometric tests’ relationship with recidivism was examined. The only measure to predict recidivism pretreatment was the Fantasy subscale of the IRI (Davis, 1980). This measures the ability to relate to fictional characters and was included in the IRI as it was thought to be linked to empathy. The fact that higher scores on this measure, both pre- and indeed posttreatment, are linked to recidivism is curious if you consider that this should be related to better empathy. However, the link between empathy and offending is far from clear; Hanson and Morton-Bourgon’s (2004, 2005) meta-analysis concluded that “the clinical presentation variables (e.g., . . . low victim empathy . . . ) had little or no relationship with sexual or nonsexual recidivism” (Hanson & Morton-Bourgon, 2004, p. 17). In fact, it has been argued that higher scores on the Fantasy subscale are actually indicative of a tendency to engage in emotion-focussed coping; that is, a higher score means an offender has a greater tendency to escape into a fantasy world where things are imagined to be better, to avoid having to deal with the often painful realities of their lives (Elliot, Beech, Mandeville-Norden, & Hayes, 2009).
Another somewhat unexpected finding was that posttreatment, better recognition of risky situations, thoughts, feelings, and actions, as measured by the Recognition subscale of the Relapse Prevention Questionnaire, was predictive of sexual and/or violent reconviction. It may be that people who score more highly on this measure are more alert to potential offending situations than those who do not and that this leads to greater temptation to offend. However, without further evidence, it is not possible to conclude that this is a reasonable explanation for this finding. Indeed, there is a more likely alternative explanation.
The posttreatment self-esteem score was the only individual measure among the SAF domain measures that significantly predicted sexual and/or violent reconviction. This supports the findings of a previous study, which also found that lower scores on this self-esteem measure were associated with higher sexual reconviction rates in a sample of sexual offenders in the United Kingdom (Thornton et al., 2004). However, it is not clear why this relationship was only observed at the posttreatment and not the pretreatment stage. It may be that the treatment programmes the sample attended were particularly good at encouraging the men to identify and acknowledge issues with their self-esteem and that subsequently their posttreatment scores were a truer reflection of their actual levels of self-esteem than their pretreatment scores. However, this is just one possible explanation and remains untested.
It is striking that the measures that predict reconviction seem to have different demand characteristics to the other measures examined. For example, it is reasonable to suppose that it is easier to admit to a problem with self-esteem, which could induce a caring, sympathetic response from others, than to admit holding more explicitly offence-related attitudes, such as those measured in the Children and Sex Questionnaire (Beckett, 1987). It could be argued, therefore, that those more likely to reoffend are scoring in a more socially desirable way on those measures on which, if they demonstrated a problematic score, they would fear would make them look bad to others. Extending this argument would lead to the proposition that the only measures on which the recidivists would respond more truthfully would be those which they deem to be more socially acceptable, or less likely to lead to condemnation. The Fantasy subscale is not explicitly offence related, and indeed one could argue that to those completing the questionnaires it could seem to be positive to relate to fictional characters, as it demonstrates an ability to empathise with others on some level. The only measure that predicted recidivism that mentions offending in its items is the Recognition subscale of Relapse Prevention Questionnaire on which better scores were predictive of reconviction. This supports this argument, particularly, as treatment programmes emphasise the importance of recognising risk and reward such insight. However, this does not explain why better scores on the Coping subscale of the Relapse Prevention Questionnaire were not predictive of reconviction, as one would expect whether this argument was correct. In addition, the relationship between scores on a measure of IM and each of these psychometric tests suggests that “faking good” is not an issue for these measures. However, it may be that the IM scale used is not actually measuring a tendency to respond in a socially desirable way on these measures in this population. Certainly, this is the conclusion that Mathie and Wakeling (2011) came to when examining the validity of this measure with a large sample of convicted sexual offenders in custody.
It could also be that the treatment programmes are particularly good at helping people recognise risk situations but do less well at developing insight into how to cope with those situations, or it may be that the Coping subscale of the Relapse Prevention Questionnaire is simply not as reliable. Alternatively, it may be that there is another explanation for these findings. Further research into this area is required before we can be confident of explaining this result.
Following examination of individual measures, psychometrics were grouped into dynamic risk domains to see whether overall scores on these domains had a relationship with reconviction. Although recidivists scored more highly than the nonrecidivists, posttreatment, on the SM domain, it was the SAF domain that demonstrated the most reliable relationship with reconviction. This mirrors the results of Wakeling et al. (2010) lending strength to the notion that psychometric measures of this domain have the best predictive validity. Both pre- and posttreatment recidivists scored more highly than nonrecidivists on this domain, and SAF was the only domain that demonstrated a relationship with subsequent sexual and/or violent recidivism. It was only the pretreatment domain score that was predictive of recidivism, but this score was still able to add, incrementally, predictive power to a static assessment of risk (the RM2000; Thornton et al., 2003). This is a significant finding, as it suggests that a psychometric measure of dynamic risk can improve prediction above that provided by static tools alone.
Rates of recidivism increased with the number of domains in which problems were demonstrated, but although the number of domains in which offenders demonstrated dysfunction prior to treatment did predict sexual and/or violent reconviction, this did not add any predictive power to a static risk assessment. For those who had abused children, Beech’s (1998) deviancy level discriminated recidivists from nonrecidivists, supporting previous findings. However, in this sample, deviancy did not add any predictive validity to a static risk assessment.
One of the key findings of this study is that posttreatment scores, both for individual measures and when grouped into domains, were less discriminative and predictive of reconviction than pretreatment scores. There is no clear reason for this, but one explanation is that pretreatment scores are a purer measure of dysfunction than posttreatment scores. That is, posttreatment scores are likely to be affected by a number of factors that could influence how someone responds to the questionnaires.
First, there are likely to be respondents who want to look as though they have improved, either to decrease the chances they will be recommended further treatment or because this is what is expected of them by treatment teams, family members, and others. Second, there will be people who have a false impression of how much they have changed; the treatment environment is intended to be very supportive and to provide a safe place for people to test out new ways of thinking and behaving. It may be that people come away from such an environment believing that they are better at managing their emotions/problem thinking and so on than they really are, because they have only tested these in an artificially safe environment that does not represent the world in which they will be required to implement these new ways of being. Third, there will be people who have genuinely changed, and fourth, those who have stayed the same (either they were fine to begin with or they are still as dysfunctional as they were before treatment). The notion that the posttreatment scores should be different to the pretreatment scores, that people should improve, introduces demand characteristics that we would not expect to affect pretreatment scores. That is not to say that pretreatment scores are not affected by any demand characteristics, but it seems that they are less affected than posttreatment scores. However, whatever the explanation, what is clear is that some measures of dynamic risk are related to risk of reconviction. This has practical application, as it means that treatment providers can look to certain psychometric measures to provide an indication of where someone’s treatment needs lie and can use this to inform individuals’ treatment so it focuses on those areas that we know are related to risk of reoffending. This is an important part of any treatment programme, and having knowledge of which measures can help to identify criminogenic needs, and which have demonstrated a relationship with recidivism, can increase confidence that treatment providers are adhering to the principles of effective treatment (see Andrews & Bonta, 2003). However, the poor performance of these measures posttreatment suggests that treatment providers should rely less on these scores as a way of assessing risk after treatment. Indeed, for programme evaluators, these findings suggest less emphasis should be placed on posttreatment scores on psychometric measures of dynamic risk factors as a way of establishing the efficacy of treatment programmes.
There are, of course, cautions to take heed of when considering the results of this study. Importantly, the sample, although national, did not include everyone who had been through treatment and certainly did not include those who had started but did not complete treatment. Given that some studies have indicated raised levels of risk in noncompleters (e.g., Hanson & Bussière, 1998), it would be useful to examine the psychometric profiles of this group, as they may be qualitatively different to those included in this study. In addition, it may be that those whose psychometric tests were not sent to the central database differed in some important way from those in this study, although this seems unlikely as it was whole treatment groups, rather than individual offenders, who tended to be missing from the data set. Nevertheless, for these two reasons, the results of this study may not be generalisable to everyone undertaking probation sexual-offender treatment in the community. Second, the low base rates of sexual and violent/reconviction, although something to be celebrated, do mean that our results are based on comparisons with a small number of reoffenders, decreasing power to detect effects. In addition, the short follow-up is an important limitation as it often takes some time for sexual and violent offences to result in conviction, if reported. This means that some of those considered nonrecidivists in this study may well have been reported for reoffending sexually or violently in that time but that this had not yet resulted in a conviction. Future studies using longer follow-ups would address this issue. Finally, this study only examined the relationship between psychometric tests and reconviction, not reoffending, rates of which are likely to be much higher than reconviction.
In summary, the findings of this study tentatively suggest that some psychometric scores, particularly those gathered pretreatment and relating to SAF, can be used to predict subsequent sexual and/or violent reconviction in sexual offenders with a range of offences, who are attending programmes in the community. The findings indicate that psychometric measures of dynamic risk might be able to add to the predictive power of static risk assessment, an important finding for both risk assessors and treatment providers alike. However, there is still much that we do not know, particularly about how we can measure change in risk over time. Although there is no space to report on such analyses here, we are in the process of conducting further studies examining what change in psychometric scores over treatment can tell us about subsequent risk of reoffending, something much needed for those involved in the management of sex offenders in the community.
Footnotes
Acknowledgements
We would like to thank Professor Tony Beech for his insight and guidance on this article.
The author declared no potential conflicts of interests with respect to the authorship and/or publication of this article.
The author received no financial support for the research and/or authorship of this article.
