Abstract
Most sexual recidivism risk assessment tools focus primarily on risk factors and deficits without consideration for strengths or protective factors which might mitigate reoffense risk. The current study is the first in a research program designed to develop and validate the Structured Assessment of PROtective Factors for violence risk—Sexual Offence version (SAPROF-SO), a measure of protective factors against sexual reoffending. The study aimed to test interrater reliability and construct validity of the SAPROF-SO with a high-risk (n = 40) and routine (n = 40) sample. Interrater reliability between three independent raters was generally good to excellent for the SAPROF-SO domain and Total scores across both samples and compared favorably with validated measures of dynamic risk. Moreover, the SAPROF-SO demonstrated construct validity and was moderately independent of existing measures of risk. Findings open the door for a more balanced, strengths-based, and accurate approach to recidivism risk assessment.
Assessment technology available to professionals working with men known to have committed sex offenses has yet to catch up with strengths-based approaches to treatment and desistance research (Laws & Ward, 2011; Marshall et al., 2017). Commonly used instruments including the Static-99R (Helmus, Thornton, et al., 2012) and Stable-2007 (Fernandez et al., 2014) are composed primarily of risk factors and understood as measures of risk. Their focus on risk factors and deficits creates a tension between the developing ethos of treatment and the practice of assessment, making it harder for one to inform the other. The development of measures of protective factors represents a promising way forward. The current article describes the development and initial research with a new instrument, the Structured Assessment of PROtective Factors for violence risk—Sexual Offence version (SAPROF-SO), designed to support the systematic and structured assessment of protective factors relevant to sexual recidivism risk.
For our purposes, and in line with other researchers’ definitions (e.g., de Vogel et al., 2012), protective factors can be defined as those factors that are theoretically or empirically associated with reduced rates of sexual or violent recidivism in individuals with at least one apprehension for a sexual offense as an adult. They must signal the presence of a strength, not merely the absence of a risk factor or deficit (de Vries Robbé, Mann et al., 2015). Some protective factors may reflect the opposing, positive pole of a risk factor, and both the positive and negative can coexist (e.g., prosocial and antisocial peers, prosocial and deviant sexual interests; de Vries Robbé, Mann et al., 2015). Other proposed protective factors reflect variables independent of known risk factors (e.g., medication, life goals; de Vogel et al., 2012). Notwithstanding whether protective factors represent anything new, as highlighted by Helmus (2018), “. . .the possibility remains that they are measuring existing factors better” (p. 5). In this initial article, we describe the SAPROF-SO development process and report initial research findings seeking to answer the question, are protective factors, as measured by the SAPROF-SO, valid constructs? We have deliberately set a low bar for an affirmative answer to this question. Specifically, we address, (a) whether the SAPROF-SO assesses protective factors in a reliable way, (b) whether the SAPROF-SO’s protective factors show convergent and divergent validity, and (c) whether the SAPROF-SO is measuring something beyond established risk assessment instruments. If the SAPROF-SO fails to produce reliable and independent variance, it could be argued that it lacks novelty, even though its positive reframe of variance captured by existing instruments might still be clinically useful. Questions of predictive utility will be investigated in future studies and are not the focus of the present study.
Relevance to Risk Assessment
Recidivism risk assessments inform a variety of important decisions in forensic and correctional settings including the form and intensity of treatment and supervision, as well as judicial decisions such as readiness for discharge or parole. Accordingly, improving the predictive accuracy of risk assessment practices represents an ongoing research priority. Although contemporary tools perform better than their predecessors, they are not able to identify groups with truly high sexual recidivism rates. For example, Fernandez et al. (2014) reported that after 5 years, the majority of individuals assessed in the high or very high priority categories for sexual reoffending based on combined Static-99R/Stable-2007 scores were not charged or reconvicted for a new sexual offense. Using the Violence Risk Scale—Sexual Offense version (VRS-SO), Olver et al. (2007) found that after 10 years, 70% of men assessed in the highest risk category reoffended with a sexual offense, and in an independent validation study of the VRS-SO, Beggs and Grace (2010) found that 56.2% sexually reoffended after 12 years. The presence of protective factors might reasonably differentiate individuals assessed with a greater than average risk to reoffend who do not reoffend, compared with those who do.
Assessment of Protective Factors
To the authors’ knowledge, prior to the development of the SAPROF-SO, no actuarial tools were designed specifically to assess protective factors against sexual reoffending in adults. The Desistence for Adolescents who Sexually Harm (DASH-13; Worling, 2013) was designed to assess protective factors against sexual reoffending in adolescents, yet few studies have examined its construct and predictive validity (for an exception, see Zeng et al., 2015). Moreover, few tools are available that measure protective factors against general reoffending. Some risk assessment tools include a protective factor subscale incorporating a limited number of protective factors into a risk factor dominant tool (e.g., Miller, 2006; Serin, 2007, 2017). For example, the Dynamic Risk Assessment for Offender Reentry (DRAOR; Serin, 2007, 2017) includes a Protective Factors subscale comprising six strengths reflecting prosocial perceptions (responsive to advice, prosocial identity, high expectations, costs/benefits) and prosocial connectedness (social support, social control). The SAPROF (de Vogel et al., 2012) and its Youth Version (SAPROF-YV; de Vries Robbé, Geers et al., 2015) are the only comprehensive stand-alone measures of protective factors. The SAPROF is a structured professional judgment (SPJ) tool assessing protective factors against violence (general and sexual) in adults. Initial SAPROF studies found predictive validity for sexual and violent recidivism in a forensic psychiatric sample after controlling for risk assessment scores (de Vries Robbé et al., 2011; de Vries Robbé, de Vogel et al., 2015). For men who had sexually offended (N = 83), area under the curve (AUC) values for total SAPROF scores, but not the final protection judgment, were significant for follow-up periods of 3 years (AUC = .76, p < .05) and 15 years (AUC = .71 p < .01). In other words, the sum of individual item scores was a better predictor than the assessor’s judgment of overall protection, which is somewhat consistent with the observation that actuarial tools perform slightly better than SPJ tools in the assessment of sexual recidivism risk (Hanson & Morton-Bourgon, 2009). Subsequent studies examining the predictive validity of the SAPROF for sexual recidivism have produced less promising results (Turner et al., 2016; Yoon et al., 2016); however, divergent samples (men with a history of offending against adults vs. children) and methodological differences (coding from intake assessments vs. discharge plans) might explain the disparate findings. As a predominantly dynamic tool that considers the extent to which protective factors are present in a defined context, it is likely that the SAPROF is a better predictor of desistance when rated on release/discharge rather than at institutional intake. Not only can internal protective factors (e.g., effective coping skills) change over time including as a result of treatment, release environments differ in the extent to which they afford opportunities for the expression of different protective factors (e.g., structured prosocial leisure activities) which intake assessments cannot account for.
It is well established that structured approaches to recidivism risk assessment including the use of both actuarial and SPJ tools outperform unstructured clinical judgment (e.g., Andrews et al., 2006; Hanson & Morton-Bourgon, 2009). Actuarial tools are more accurate predictors of recidivism than SPJ tools in the assessment of sexual recidivism risk (Hanson & Morton-Bourgon, 2009), perhaps because actuarial measures are less ambiguous, decreasing the opportunity for inherent bias as compared with SPJ tools (Murrie et al., 2009). Moreover, at least in North America and the United Kingdom, actuarial tools are more commonly used than SPJ tools in assessing sexual recidivism risk (Kelley et al., 2018), and the field currently lacks an actuarial measure of protective factors that might bring balance to sexual recidivism risk assessments. Accordingly, a research program was initiated to develop such a tool. The research program aims to (a) develop a theoretically and empirically informed measure of protective factors that can be reliably rated by independent raters, (b) examine the tool’s construct validity through analyzing relationships between protective factors and risk factors, (c) examine the tool’s factor structure to validate hypothesized domains and inform theory underlying protective factors and (d) test the tool’s predictive validity for recidivism, including concurrent and incremental validity alongside commonly used risk assessment tools. The current article reports findings from empirical research addressing the first two aims.
Developing the SAPROF-SO
As its name suggests, the SAPROF-SO was derived initially from the SAPROF. The SAPROF-SO was developed to measure a broader range of protective factors compared with the SAPROF, including those specific to sexual offending, using an actuarial approach. Although the initial intention was to develop a supplementary tool containing item refinements and additions based on empirical reviews (e.g., de Vries Robbé, Mann et al., 2015; Thornton et al., 2017), the extent of identified changes warranted development of a stand-alone tool. The 24 SAPROF-SO items listed in Table 1 can be compared with the 17 items from the SAPROF available at https://irp-cdn.multiscreensite.com/f430bf1b/DESKTOP/pdf/coding+sheet+saprof+2013.pdf. Information about what each SAPROF-SO item is measuring and its supporting literature has been made available as Online Supplemental Material (see Supplemental Table S1). An expanded response scale was developed to capture greater variance in the level of protection present as well as to better detect incremental changes in the level of protection present. Explicit scoring instructions were developed to increase the interrater reliability of mechanical scoring rules (a less important consideration for SPJ vs. actuarial tools). Attempts were made to integrate an understanding of protective mechanisms into scoring instructions, such that high scores corresponded not simply to the presence of a proposed protective factor, but also to the presence of underlying mechanisms through which the protective factor was thought to operate. Hypothesized mechanisms can fit broadly into two categories: control and prosocial reward (Thornton et al., 2017). Control refers to processes that restrain urges to engage in antisocial behavior, or more broadly, mitigate the operation of risk factors. Control may stem from various sources, including internal control (e.g., effective coping), informal social control/social policing (e.g., being accountable to prosocial peers), or formal social control (e.g., imprisonment). Prosocial reward refers to experiencing a prosocial lifestyle as rewarding, thus reinforcing prosocial living. Prosocial reward as a protective mechanism aligns closely with underlying assumptions of the Good Lives Model (see Laws & Ward, 2011; Ward & Stewart, 2003), specifically that offending is less likely when valued outcomes (or primary human goods) such as intimacy, belonging, and a sense of mastery are attained in meaningful, prosocial ways.
SAPROF-SO Items, Domains, and Theoretical Alignment
Note. For an overview of desistance theories, see Laws and Ward (2011). SAPROF-SO = Structured Assessment of PROtective Factors for violence risk—Sexual Offence version (Willis et al., 2017–2019); GLM = Good Lives Model (Ward & Stewart, 2003).
Item new to the SAPROF-SO. b Sexual offense–specific protective factor.
The SAPROF-SO was developed with support from one of the authors of the SAPROF. Given that sexual recidivism is predicted both by general risk factors for reoffending (e.g., antisocial peers) and those specific to sexual reoffending (but not general reoffending, for example, sexual preoccupation; Mann et al., 2010), items were informed by research specific to sexual offending as well as general desistance research. Many of the original SAPROF items were revised to fit the expanded response scale and align with relevant theoretical and empirical research, and new items were added. Specifically, the “Intelligence” item was replaced with “Intact cognitive functioning,” as individuals who demonstrate intelligence as measured with traditional intellectual scales may demonstrate a range of cognitive difficulties in other ways (e.g., executive functioning problems due to head injury). Explicit instructions were added to the “Empathy” item to exclude consideration of victim empathy and general self-control was differentiated from control of sexual impulses. The “Attitudes towards authority” and “Life goals” items were renamed “Attitudes towards rules and regulations” and “Goal-directed living,” respectively, to better capture the intent of both items. Finally, the “Professional care” item was replaced with “Sexual offense–specific treatment” to specifically address the availability of appropriate (i.e., adhering to the principles of Risk, Needs, and Responsivity; Andrews & Bonta, 2010) treatment for sexual offending (vs. general mental health care). New items in the SAPROF-SO included items hypothesized to protect against sexual recidivism specifically (sexual self-regulation, prosocial sexual interests, prosocial sexual identity) as well as items hypothesized to protect against recidivism in general, including sexual recidivism (adaptive schemas, emotional connection to adults, housing stability, and therapeutic alliance). As illustrated in Table 1, the SAPROF-SO items were grouped into five domains aligning with desistance theories (e.g., Giordano et al., 2002; Göbbels et al., 2012; Laub & Sampson, 2003; see Laws & Ward, 2011, for an overview) and the strengths-based Good Lives Model of rehabilitation (Ward & Stewart, 2003): Internal Capacity, Prosocial Identity, Prosocial Connection, Stability, and Professionally Provided Support.
The Current Study
The current study is the first in a research program designed to validate an actuarial measure of protective factors exclusively for predicting desistance from sexual offending in adult biologically born males. This study aimed to test the interrater reliability and construct validity of the SAPROF-SO with two diverse samples. The first sample was drawn from a high-risk and highly restrictive (civil commitment) context in the United States, and the second sample approximated a routine sample, drawn from men in a less restrictive (parole/probation) context in New Zealand.
Hypotheses relevant to convergent and divergent validity were as follows:
SAPROF-SO Total scores will be largely independent of Static-99R scores with observed correlations predicted to be less than r = .3 and not statistically significant.
SAPROF-SO Total scores will correlate negatively with measures of dynamic risk and positively with reduction in dynamic risk. Observed correlations will be moderate to high and statistically significant for those measures that more closely reflect ongoing current functioning (VRS-SO Change score and DRAOR risk scores) and small (r < .3) and not statistically significant for those measures that capture long-term functioning (VRS-SO Pretreatment Dynamic scores).
The observed correlation between the SAPROF-SO Total score and the DRAOR Protective Factors scale will be positive, moderate to high, and statistically significant.
Method
Participants
High-Risk Sample (United States)
Participants were 40 adult males committed as Sexually Violent Persons (SVPs) under the Wisconsin State Civil Commitment Law. Individuals are considered by the court to meet the SVP criteria if they have been convicted of one or more qualifying sexually violent offenses, and are considered dangerous due to having a mental disorder which makes it more likely than not they will commit an additional act of sexual violence sometime in the future. This legal commitment includes the secure treatment facility (Sand Ridge Secure Treatment Center) where the first three phases of treatment occur. It also includes the supervised release program, which is considered the final phase of treatment.
The majority of participants in the sample were residing within the secure perimeter of Sand Ridge (82.5%, n = 33), 15% (n = 6) were on supervised release in the community, and one participant (2.5%) was in jail (after committing a nonsexual offense while at the Sand Ridge Secure Treatment Center). Participants ranged in age from 28 to 79 years (M = 51.25, SD = 11.58), and they were predominantly White (70%, n = 28) with 25.0% (n = 10) participants who identified as Black and 5.0% (n = 2) as Native American. Participants’ Static-99R scores ranged from 2 to 9, with a mean of 5.52 (SD = 1.81), indicating that on average, participants were in the above average risk category for sexual reoffending. Considering sexual offense histories, the majority of participants had charges or convictions for contact offending against children (0–12 years; 70.0%, n = 28), female victims (82.5%, n = 33), and nonfamilial acquaintances (77.5%, n = 31). Approximately half of the sample had charges or convictions for contact offending against teenagers (13–17 years; 50%, n = 20), adults (≥18 years; 45%, n = 18), strangers (45%, n = 18), and male victims (47.5%, n = 19). Approximately one third of the sample had charges or convictions for contact offending against family members (35%, n = 14). Noncontact exposure offenses (20%, n = 8), noncontact Internet-related offenses (2.5%, n = 1), and sexual abuse image offenses (2.5%, n = 1) were less common.
Participants were all assessed between 2015 and 2018 to determine whether they met or continued to meet SVP criteria under Wisconsin State Civil Commitment Law. Cases at Sand Ridge are randomly assigned to individual evaluators, and the current study included all cases randomly assigned to the second author for whom (a) sufficient information was considered available to rate the SAPROF-SO from the assessment report and (b) VRS-SO scores were available. The sample approximated the static risk and demographic profile of all SVP trials in Wisconsin between 2012 and 2016 (N = 132), where the mean Static 99R score was 5.40 (SD = 1.7), the mean age was 51.0 (SD = 10.7), and persons on trial were predominantly White (61.1%; Elwood, 2019).
Routine Sample (New Zealand)
Participants were 40 adult males released from prison following an index sex offense and under the supervision of the community probation service. Participants voluntarily consented to participate in a longitudinal study investigating protective factors against sexual reoffending between late 2018 and mid-2019. Participants were compensated for their time with their choice of a grocery or petrol voucher. Participants’ ages aligned closely with the high-risk sample, ranging from 23 to 73 years (M = 50.50, SD = 13.34), most were of New Zealand European decent (70%, n = 28), and more than one quarter were of Māori decent (28%, n = 12). Participants’ Static-99R scores ranged from −3 to 7, with a mean of 2.38 (SD = 2.37), indicating that overall participants were in the average risk category for sexual reoffending. Considering offense histories, the majority of participants had charges or convictions for contact offending against children (0–12 years; 62.5%, n = 25) and female victims (77.5%, n = 31). Approximately half of the sample had charges or convictions for contact offending against family members (50%, n = 20), nonfamilial acquaintances (55%, n = 22), and teenagers (13–17 years; 50%, n = 20). Noncontact exposure offenses were recorded for 37.5% (n = 15) of participants, and less than one third (27.5%, n = 11) had charges or convictions for contact offenses against male victims. Fifteen percent of the sample (n = 6) had charges or convictions related to the possession of child abuse images. Relative to the high-risk sample, fewer participants were apprehended for contact sexual offenses against adults (≥18 years; 15%, n = 6) or strangers (7.5%, n = 3). Noncontact Internet-related offenses (2.5%, n = 1) were uncommon.
Demographic characteristics and offense history information for the population from which participants were drawn were unavailable; however, participants’ static risk profile approximated routine samples (Helmus, Hanson et al., 2012) and their demographic characteristics were comparable to other New Zealand samples of men imprisoned for sexual offenses (e.g., Beggs & Grace, 2010; Willis & Grace, 2008), suggesting the sample was representative in terms of static risk and demographics for men imprisoned for sexual offenses in New Zealand.
Measures
SAPROF-SO
The SAPROF-SO (Willis et al., 2017–2019) includes 24 items as listed in Table 1. In general, ratings are made based on information from the previous 6 months to make predictions about the upcoming 6 to 12 months; however, longer time periods are canvassed for items considered more stable (e.g., adaptive schemas and prosocial sexual interests). Each item is rated on a 0- to 4-point scale: A score of 4 indicates that the item is clearly present, a score of 2 indicates that the item is somewhat present, and a score of 0 indicates that the item is very rarely present, or absent. Raters first decide on the extent to which a 0, 2, or 4 best fits this individual, on the basis of all information available (e.g., behavioral observations, file notes, interview data). Middle scores (1 or 3) are assigned when an individual’s functioning is in the middle of the two anchor points. Similarly, middle scores are assigned if an individual has demonstrated incremental change from one anchor point, but not to the extent defined by the next anchor point. In the event that the rater cannot decide between an anchor point and middle score (e.g., 2 and 3), the anchor point is assigned (i.e., 2). Ratings can be made for both current and expected future contexts. The current study focused on current context ratings (i.e., using information from the recent past to make predictions about the level of protection present in the upcoming 6–12 months in the individual’s current context).
Static-99R
The Static-99R (Helmus, Thornton et al., 2012) contains 10 items based on commonly available demographic (age, relationship history) and criminal history information (e.g., prior sexual offenses, any unrelated victims, total number of prior sentencing occasions for anything). Static-99R scores range from −3 to 12 and correspond to the following risk levels: I—very low risk (scores of −3 and −2), II—below average risk (scores of −1 and 0), III—average risk (scores of 1, 2, and 3), IVa—above average risk (scores of 4 and 5), and IVb—well above average risk (scores of 6 and higher; Hanson et al., 2017). Static-99R risk levels parallel the standardized risk levels developed for general correctional populations by the Justice Centre of the Council of State Governments (Hanson et al., 2017). Static-99R is the most frequently used sexual recidivism risk assessment tool (Kelley et al., 2018). It can be scored with high interrater reliability (for a review, see Phenix & Epperson, 2016) and has moderate ability to discriminate recidivists from nonrecidivists (Helmus, Hanson et al., 2012).
VRS-SO
The VRS-SO (Wong et al., 2003–2017) consists of 17 dynamic factors that assess sexual recidivism and treatment change. It generally performed well on validation and cross-validation studies (e.g., Beggs & Grace, 2010; Olver et al., 2014; Sowden & Olver, 2017). The VRS-SO has demonstrated good interrater reliability with intraclass correlation coefficients (ICCs) ranging from .73 to .93 for Pretreatment Dynamic Risk and .68 to .83 for Change scores (Beggs & Grace, 2010, 2011; Eher et al., 2015; Olver et al., 2007; Sowden & Olver, 2017). The normative samples have recently been updated to include four samples (N = 913) with a fixed 10-year follow-up. Treatment change on the VRS-SO has been found to be related to a reduction in sexual recidivism. Recently, Olver et al. (2015) applied logistic regression modeling to estimate 5-year rates of sexual recidivism as a function of the combined effect of the Static-99R score, VRS-SO Pretreatment Dynamic score, and VRS-SO Change score. Each of the three scores made a unique contribution in the prediction of risk (i.e., incremental validity; Olver et al., 2018).
In a recent survey, the VRS-SO was the second most utilized dynamic risk assessment tool among civil commitment evaluators (Kelley et al., 2018). At Sand Ridge, 61.5% of the evaluators choose to use the VRS-SO. The Pretreatment Dynamic Risk score is based on a thorough review of the individual’s history in the community, at the time of previous sex offenses, and functioning within incarcerated settings. This includes any treatment they may have participated in prior to being admitted to Sand Ridge. The VRS-SO Change score is based on the individual’s functioning and treatment participation since entering Sand Ridge, with the last 2 years being weighed most heavily. Thus, change is largely a measure of observable behavioral change for the past 2 years as compared with how they presented at the point of admission. Given that the Sand Ridge Evaluation Unit is responsible for completing annual updates to the court on individuals’ treatment progress and continued need for commitment, VRS-SO Change scores may change on a yearly basis, but VRS-SO Pretreatment Dynamic Risk scores remain the same.
DRAOR
The DRAOR (Serin, 2007, 2017) was developed to assess general recidivism risk and inform ongoing risk management for adults under community supervision (e.g., parole, probation). Designed for repeated use at each supervision contact, the DRAOR is an SPJ instrument comprising 19 items grouped into three subscales: Stable Dynamic Risk Factors, Acute Dynamic Risk Factors, and Protective Factors. Each item is rated on a 3-point scale; a score of 0 indicates the item is absent, 1 indicates the item is somewhat present or evidence is inconsistent, and 2 indicates the item is strongly present. For research purposes, total scores are calculated by summing the risk subscales and subtracting the protective subscale; however, in practice, as an SPJ tool, total scores are not calculated. The original implementation study in New Zealand found AUCs ranging from .67 to .72 when prospectively predicting general reoffending in a large sample (N = 3,498) of parolees (Hanby, 2013). Subsequent studies have reported similar findings (e.g., Averill, 2016; Yesberg & Polaschek, 2015). Averill (2016) reported that the mean DRAOR total score for 851 men convicted for sexual offenses and released into probation supervision in New Zealand was 4.37 (SD = 6.63).
The DRAOR was implemented in New Zealand in April 2010 for use by probation officers to assess and manage risk for all individuals released from prison into probation supervision. The DRAOR is scored at multiple time points based on all information available including contact with clients, their support people, and treatment providers. Repeat scoring is at the supervising probation officer’s discretion and with consideration for initial risk levels and the length of time an individual has been on parole (Yesberg & Polaschek, 2015).
Procedure
An initial SAPROF-SO manual for beta testing was produced in July 2017 which included 25 items. Feedback was requested from researchers and clinicians with expertise in sexual recidivism risk assessment. Initial feedback resulted in dropping one item (“emotional connection to adults one is attracted to”) and adding clarifications and examples for anchor point descriptions across items. The authors then rated four paper cases (developed for previous training delivered by the authors) of men with sex offense convictions for the purpose of initial calibration. Cases were rated independently before scores were discussed, which resulted in a further set of manual refinements (Willis et al., 2017–2019).
High-Risk Sample (United States)
The SAPROF-SO was coded based on an SVP report that the second author had previously completed. SVP assessment reports were made accessible (via a secure online link) in initial batches of three to all authors. Each report typically included the following sections: relevant psychosocial history, sexual and nonsexual criminal history, treatment progress, clinical interview, diagnosis, and sexual recidivism risk assessment. Risk assessment information included the Static-99R total score and a narrative description of VRS-SO results. Most reports included VRS-SO Dynamic and Change scores (62.5%). Individual Static-99R and VRS-SO item scores and domain scores were not available in the forensic reports at the time of SAPROF-SO coding; however, they were later obtained from the second author for the purposes of statistical analysis.
The SVP reports did not contain SAPROF-SO results. However, most of the reports contained information relevant to protective factors that was important for scoring the SAPROF-SO in the current study. Protective factor information was usually found in the treatment progress and clinical interview sections, given that clinical staff members at Sand Ridge have received training on the original SAPROF and have incorporated protective factors into their treatment model. In addition to addressing criminogenic needs, treatment focuses on increasing protective factors (e.g., prosocial connections, employability) in anticipation of eventual discharge.
The three authors coded each batch of reports independently, focusing on information relevant to each SAPROF-SO item. Two of the three coders were blind to the VRS-SO assessment, except data that may have been included in the report. The narrative description of VRS-SO results provided some information relevant to selected SAPROF-SO items (e.g., VRS-SO Sexual Deviance and SAPROF-SO prosocial sexual interests). Scores were reviewed during coding meetings, with each coder’s individual scores recorded and discrepancies discussed and resolved by consensus scoring. Initial interrater reliability was computed after 10 cases were triple coded, and refinements were made to coding instructions for items that produced poor interrater agreement relative to other items. Most refinements were minor and included renaming “Life goals” to “Goal-directed living” to reflect the focus of this item on the extent to which an individual was actively pursuing prosocial goals, as opposed to the simple presence of prosocial goals. Remaining cases were coded in batches of three to four, followed by coding meetings to review scores and identify consensus scores, until all 40 cases were coded.
To examine potential biases affecting correlations between the SAPROF-SO and VRS-SO inherent in the second author completing the latter assessment, we removed the second author’s SAPROF-SO scores, calculated the average between the other two ratings, and examined correlations between key variables (Static-99R, VRS-SO Pretreatment Dynamic Risk, VRS-SO Change score) with the resulting SAPROF-SO Total score. All correlations were within .02 of those obtained when the consensus scores were used and, unsurprisingly, there were no changes to the significance levels obtained.
Routine Sample (New Zealand)
Participants were recruited as part of an ongoing longitudinal study of protective factors against sexual reoffending. Information about the study was distributed to adult males released from prison following an index sexual offense via probation officers and in monthly community-based maintenance groups for men who have completed the Kia Marama or Te Piriti prison-based treatment programs for sexual offending against children. Men interested in further information were asked to make direct contact with the researchers or complete a consent-to-contact form allowing the researchers to make direct contact with them. The routine sample for the current study comprised the first 40 participants who consented to participate in the longitudinal study. As it was unknown how many men received information about the study, it was not possible to calculate a response rate.
All participants were interviewed via telephone by a trained graduate student or the first author, with interviews structured around the SAPROF-SO items. Interviews were audio recorded and typically lasted 1 to 1.5 h. Following each interview and with participants’ consent, collateral information was extracted from the Department of Corrections electronic records including the most recent psychological report (psychological treatment report and/or report to the parole board prior to the participant’s release), probation officer case notes up to 6 months preceding the interview, and information required to score the Static-99R. Psychological reports typically included a summary of the participant’s index offending and criminal history, background information including family of origin and schooling, offense precipitants and a psychological formulation of the individual’s sexual offending, treatment progress (where applicable), and a sexual recidivism risk assessment. Risk assessments typically utilized the New Zealand–developed Automated Sexual Recidivism Scale (ASRS; Skelton et al., 2006) and either the Stable-2007 or VRS-SO. Probation officer case notes were often brief, reporting on the client’s general presentation, compliance with sentence conditions, and any requests in relation to sentence conditions (e.g., proposed employment or accommodation requiring the probation officer’s approval). Probation officer case notes additionally included DRAOR scores. Narrative descriptions supporting DRAOR scores occasionally provided information relevant to SAPROF-SO items (e.g., social network); however, actual scores were not relevant to coding the SAPROF-SO.
The SAPROF-SO was coded on the basis of all information by three independent raters (the first author and two graduate students trained in the SAPROF-SO by the authors). Coding commenced toward the end of coding the high-risk sample and followed a similar procedure: cases were coded in batches of three to four, discrepancies were addressed, and consensus scores were assigned during regular coding meetings until all 40 cases were triple coded. DRAOR scores most closely aligning with interview dates were then extracted from probation officer case notes for each participant, and Static-99R item scores were entered into a database which automatically calculated total scores. This research was reviewed and approved by the Sand Ridge Institutional Review Board and the University of Auckland Human Participants Ethics Committee.
Results
Descriptive Statistics
SAPROF-SO
Sufficient information was typically available to rate all items for both samples, with missing data more frequent in the high-risk sample (when ratings were based solely on file information). Specifically, 14 cases (35%) in the high-risk sample were missing ratings for between one and five items (M = 2.21, SD = 1.31), and seven cases (17.5%) in the routine sample were missing ratings for one or two items (M = 1.14, SD = 0.38). The number of valid ratings together with means and standard deviations for items, domains, and Total scores for both samples is available as Online Supplemental Material (see Supplemental Table S2). Goal-directed living and the therapeutic alliance items were omitted most frequently in the high-risk sample (n = 9 and n = 6 cases, respectively), and the therapeutic alliance item was omitted most frequently in the routine sample (n = 5 cases). Omitted items were replaced with the sample mean in the calculation of domain and Total scores and scores of not applicable (N/A) or not relevant (N/R) were replaced with scores of 0 (protective factor not present). The routine sample demonstrated higher scores and greater variance across the Internal Capacity and Prosocial Identity domains than the high-risk sample. Prosocial Connection scores were similar in both samples, and low relative to the maximum possible score. Both samples evidenced high scores on the Stability domain. Professionally Provided Support was greater for the high-risk versus routine sample, with limited within-sample variance.
VRS-SO (High-Risk Sample Only)
The Pretreatment Dynamic Risk score corresponded to the well above average risk category (M = 38.69, SD = 4.44), and participants demonstrated an average level of change compared with VRS-SO norms (Change score M = 4.03, SD = 2.92; 57.9 percentile). The sample demonstrated high pretreatment scores within each of the three VRS-SO factors: Sexual Deviance (M = 11.75, SD = 2.13; 89.0 percentile), Criminality (M = 12.28, SD = 3.13; 87.9 percentile), and Treatment Responsivity (M = 10.25, SD = 1.55; 97.2 percentile). At least an average amount of change was demonstrated across each factor based on mean change scores: Sexual Deviance (M = 1.03, SD = 1.04; 54.1 percentile), Criminality (M = 1.36, SD = 1.16; 72.8 percentile), and Treatment Responsivity (M = 1.16, SD = 0.86; 46.9 percentile).
DRAOR (Routine Sample Only)
Stable Dynamic Risk scores ranged from 0 to 10 (possible range: 0–12, M = 5.25, SD = 2.99), Acute Dynamic Risk scores ranged from 1 to 9 (possible range: 0–14, M = 4.22, SD = 2.13), and Protective Factor scores ranged from 3 to 12 (possible range: 0–12, M = 7.35, SD = 2.68). The mean total score and variance in total scores (M = 2.13, SD = 6.75) approximated that reported by Averill (2016) in a large sample of men convicted for sex offenses in New Zealand (N = 851; M = 4.37, SD = 6.63). For the purpose of investigating convergent validity in the current study, the stable and acute subscales were combined to produce an overall dynamic risk score (M = 9.48, SD = 4.76).
Interrater Reliability
ICCs (two-way mixed, absolute agreement) were calculated for single and average raters on numerical data. The single-rater ICC for the Total score was .90, 95% confidence interval (CI) = [.83, .94], in the high-risk sample and .94, 95% CI = [.90, .97], in the routine sample. More detailed results, including for domains and items, are available as Online Supplemental Material (Supplemental Tables S3 and S4). These also include the percentage of ratings for which raters were within 1-point agreement when at least two raters assigned a numeric score.
Across both samples, interrater reliability for a single rater was excellent for the Internal Capacity domain and the Total score and good for the Professionally Provided Support domain according to Koo and Li’s (2016) conventions for interpreting ICCs. Interrater reliability for the Prosocial Identity and Prosocial Connection domains was good in the high-risk sample and excellent in the routine sample; differences were likely explained by some missing data in the high-risk sample, given ratings were reliant on information included in reports. Interrater reliability for the Stability domain was good for the routine sample and moderate for the high-risk sample (the latter finding explained by a lack of variance in one of the two items that make up the Stability domain). The percentage of ratings for which raters were within 1 point of agreement was consistently high; only social network and emotional connection to adults in the high-risk sample failed to reach 70% agreement, and the average percent agreement was 84.3 for the high-risk sample and 86.5 for the routine sample.
Reliable identification of the “N/A” and “N/R” response options versus numeric scores for the sexual offense–specific treatment (routine sample only; no “N/R” scores recorded for the high-risk sample), medication, and therapeutic alliance items were tested using Cohen’s kappa. In the high-risk sample, mean kappas across the three pairs of raters were .70 and .75 for medication and therapeutic alliance, respectively, indicating very good reliability (Regier et al., 2013). In the routine sample, mean kappas across the three pairs of raters were .79 for sexual offense–specific treatment, 1.0 for medication, and .93 for therapeutic alliance indicating very good to excellent interrater reliability.
Construct Validity
These analyses sought to examine the relationships between the SAPROF-SO Total and domain scores with the Static-99R, the VRS-SO, and the DRAOR in a way that would speak to convergent and divergent validity.
High-Risk Sample
Divergent validity was examined by analyzing correlations of the SAPROF-SO domain and Total scores with the Static-99R and VRS-SO Pretreatment Dynamic Risk score. As shown in Table 2, the Static-99R correlated r = −.10, 95% CI = [−.40, .22] with the SAPROF-SO Total score, and the VRS-SO Pretreatment Dynamic Risk score correlated r = .07, 95% CI = [−.25, .37], supporting divergent validity. The SAPROF-SO domains generally showed small, nonsignificant correlations with the Static-99R and VRS-SO Pretreatment Dynamic Risk score with one exception: There was a moderate and significant inverse correlation between the Static-99R and SAPROF-SO Stability domain (r = −.39, p = .014).
Divergent Validity of the SAPROF-SO in the High-Risk Sample
Note. High-risk sample, n = 40. Static-99R (Helmus, Thornton et al., 2012); SAPROF-SO = Structured Assessment of PROtective Factors for violence risk—Sexual Offence version (Willis et al., 2017–2019); VRS-SO = Violence Risk Scale—Sexual Offense version (Wong et al., 2003–2017); Tx = Treatment.
p < .05.
There was a strong positive correlation between the SAPROF-SO Total score and the VRS-SO Change score (r = .72, p < .001), 95% CI = [.53, .84], supporting convergent validity of the SAPROF-SO. As Table 3 shows, correlations between the SAPROF-SO and change scores for each of the VRS-SO factors were also large and statistically significant. With the exception of the Stability domain, significant positive correlations were observed between all SAPROF-SO domains and the VRS-SO Change score.
Correlations Between SAPROF-SO Scores and VRS-SO Change Scores in the High-Risk Sample
Note. High-risk sample, n = 40. SAPROF-SO = Structured Assessment of PROtective Factors for violence risk—Sexual Offence version (Willis et al., 2017–2019); VRS-SO = Violence Risk Scale—Sexual Offense version (Wong et al., 2003–2017).
p < .05. **p < .01. ***p < .001.
Change was also examined as the Pretreatment minus Posttreatment Dynamic Risk score with the Pretreatment score partialed out, allowing for the fact that those with more initial problems have more opportunity to change as well as controlling for regression to the mean (e.g., Olver et al., 2014). This method largely reproduced the same results when simply using the change score.
Routine Sample
Relevant to divergent validity, as Table 4 shows, the correlation between the SAPROF-SO Total score and the Static-99R was small (r = .09), 95% CI = [−.23, .39]. Relevant to convergent validity, the correlation of the SAPROF-SO Total score with the DRAOR overall risk score was strong and statistically significant (r = −.60, p < .001), 95% CI = [−.77, −.35], as was its correlation with the DRAOR Protective Factors scale (r = .43, p = .005), 95% CI = [.14, .66]. Correlations for the SAPROF-SO domains were also generally significant and of substantial magnitude with the notable exception of the correlation with the Professionally Provided Support domain.
Construct Validity of the SAPROF-SO in the Routine Sample
Note. Routine sample, n = 40. Static-99R (Helmus, Thornton et al., 2012); SAPROF-SO = Structured Assessment of PROtective Factors for violence risk—Sexual Offence version (Willis et al., 2017–2019); DRAOR = Dynamic Risk Assessment for Offender Reentry (Serin, 2007, 2017).
p < .05. **p < .01. ***p < .001.
Discussion
The current study sought to examine the interrater reliability and construct validity of the newly developed SAPROF-SO. Interrater reliability was generally good for the domain and Total SAPROF-SO scores across both samples, the main exception being Stability in the high-risk sample, a result which likely reflects the small number of items in that domain and the fact that one item had no variance. Interrater reliability for the scale as a whole (.90 and .94 in the two samples) compared favorably with that observed for measures of dynamic risk (e.g., VRS-SO Pretreatment Dynamic: .78; VRS-SO Change: .76; Olver et al., 2019). Overall, raters agreed on the degree of presence of protective factors. The first criterion for protective factors being valid constructs is therefore met.
The pattern of correlations between the SAPROF-SO and risk scales was supportive of construct validity. Divergent validity was supported by the small and not statistically significant correlations between the SAPROF-SO Total score and the Static-99R in both samples, and the small and not statistically significant correlation between the SAPROF-SO Total score and VRS-SO Pretreatment Dynamic Risk score. Supportive of convergent validity was that substantial correlations were observed between measures of risk that focus on current ongoing functioning (VRS-SO Change and the DRAOR scores). Furthermore, change in each of the three VRS-SO factors was robustly associated with the SAPROF-SO Total score, which correlated to a moderate degree with the DRAOR Protective Factors score (which does not include any sexual offense–specific items). The last finding was of particular interest. The VRS-SO Sexual Deviance and Treatment Responsivity factors are defined in a way that is specific to sexual offending (e.g., expressions of deviant sexual interests; cognitive distortions relative to sexual offending). That change in these factors correlated well (.61, .71) with the SAPROF-SO indicates that the SAPROF-SO is incorporating protective factors that are specifically relevant to sexual offending. Overall, each hypothesis in relation to construct validity was supported. The second criterion for protective factors being valid constructs is met.
Finally, it should be noted that although large correlations were found in the construct validity analyses, they still leave a large body of reliable variance in the SAPROF-SO that is not accounted for by current dynamic risk measures. The largest correlations were around .70, accounting for about half of the variance, and about 90% of the SAPROF-SO variance is reliable. In short, the SAPROF-SO correlates with measures of risk in a way consistent with its construct validity, but it clearly also captures substantial variance that is independent of the risk measures. The third criterion for protective factors being valid constructs is met.
Implications
All three of our rather modest criteria for protective factors, as measured by the SAPROF-SO, being valid constructs were met. This opens the door to using the SAPROF-SO in clinical practice and in future research. Concerning practice implications, the SAPROF-SO provides a useful clinical and risk management tool that aligns with strengths-based approaches to rehabilitation including the Good Lives Model (Ward & Stewart, 2003). The SAPROF-SO can help bring balance to one-sided, risk-focused assessment practices through identifying a client’s existing strengths, and help inform strengths-based treatment through identifying those protective factors in need of strengthening. Moreover, through orienting clinicians to probable mechanisms underlying protective factors, the SAPROF-SO aims to promote individualized case formulations and treatment planning that extend beyond simple lists of dynamic risk factors.
While it is premature to use the SAPROF-SO as an actuarial risk assessment tool, each of the items is empirically and/or theoretically supported. Therefore, until predictive validity is demonstrated and normative data are available, we advocate for using the SAPROF-SO to aid structured clinical judgment when considering protective factors that may mitigate sexual recidivism risk in the current or expected future setting. Consideration of protective factors present in different settings tends to be especially helpful for those cases in which risk management is particularly contingent upon the environment’s response to the individual (e.g., traumatic brain injury, major mental illness).
In terms of research implications, clearly there is scope for further investigations relevant to construct validity, including examining the natural pattern of change and evolution of protective factors over time in the community, the predictive value of protective factors and the degree to which they can improve prediction beyond that achieved by traditional risk measures, and the degree to which incorporating protective factors into treatment can increase treatment engagement. A particular feature of the SAPROF-SO has been the incorporation of domains coordinated with theoretical ideas regarding sources of desistance. Although we regard this as a virtue, factor analysis will allow the development of more empirically derived subscales that will complement theoretically rational domains. We anticipate that items with more specifically sexual offense–related content may separate from items that are relevant to antisocial behavior in general.
Limitations
Several limitations of the current study must be acknowledged. The routine sample was self-selected. Accordingly, men volunteering to participate may have been further into desistance trajectories than others and seeking to “give back” to society through way of research participation. Conversely, it could be that men well into desistance trajectories were not interested in participating in research to do with sex offending; moreover, such men may not have been motivated by the incentive compared with men struggling to reintegrate. Regardless, such a limitation is not major for the purposes of exploring interrater reliability and construct validity. At the same time, future studies with representative samples are needed and encouraged.
In both samples, raters coding the SAPROF-SO were not always blind to scores on risk measures, potentially inflating correlations with risk measures. Correlations may also have been inflated by the fact that reports justifying risk scores were part of the data used in assigning SAPROF-SO scores in the high-risk sample. However, correlations between the SAPROF-SO Total score and risk measures were essentially the same, regardless of whether the SAPROF-SO Total was based solely on the two raters who had not scored the VRS-SO or based on consensus scores informed by all raters. Similarly, the narratives justifying DRAOR scores were considered when scoring the SAPROF-SO in the routine sample. Certainly, correlations would have been lower if raters had identified different, sometimes contradictory, case facts when rating different instruments. However, reductions in correlations owing to different perceptions of the underlying facts would not be of theoretical relevance to the underlying relationship between the constructs being assessed.
Although the pattern of correlations observed supported the construct validity of the SAPROF-SO, the sample sizes meant that CIs were wide. Nevertheless, in the high-risk sample, the CI for the key correlation predicted to be large (VRS-SO Change with SAPROF-SO Total) did not overlap with CIs for correlations predicted to be low. In contrast, in the routine sample, the CI for the SAPROF-SO correlation with the DRAOR Protective Factors scale overlapped with that for the Static-99R even though their observed magnitude differed.
The domains are theoretically based, rather than derived through factor analysis. This represents an obstacle to calculating internal consistency reliability coefficients because these assume that scales are unidimensional, and this assumption is unlikely to have been met. Nevertheless, at the suggestion of a reviewer, we calculated McDonald’s ω and the greatest lower bound, and the results are given in the Online Supplemental Material (Supplemental Table S5). The construct validity of SAPROF-SO domains was not systematically examined, which requires identifying other variables/scales specifically relevant to each domain. This is a task for future research, alongside exploration of predictive validity.
The present article speaks primarily to the properties of the SAPROF-SO Total score. Sample sizes were insufficient to support the number of statistical tests required to examine the properties of the domains. Descriptively, we can note that the pattern of results for the domain scores largely mirrors the pattern of results for the Total score, with the exception of the Stability domain in the high-risk sample. This may reflect that the Stability items have a different meaning in a secure institution than they do in the community. Alternatively, the anomalous pattern of correlations may be spurious, reflecting the larger number of significance tests carried out for domain scores. This is a matter to be investigated in future research.
Conclusion
We began this article by proposing the question, “Are protective factors valid constructs?” Findings uniformly suggest that protective factors can be reliably identified and that the SAPROF-SO measures them in a way that has some construct validity, moderately independent of existing measures of risk. Accordingly, our limited criteria for establishing construct validity were met. Findings open the door to an approach to measurement that can be more balanced, more accurate, and more supportive of the shifting paradigm in forensic/correctional rehabilitation from relapse prevention and deficit-based approaches to strengths-based, holistic approaches, as exemplified by the Good Lives Model (see Laws & Ward, 2011; Ward & Stewart, 2003; Willis et al., 2013).
Supplemental Material
Supplement_material – Supplemental material for Are Protective Factors Valid Constructs? Interrater Reliability And Construct Validity Of Proposed Protective Factors Against Sexual Reoffending
Supplemental material, Supplement_material for Are Protective Factors Valid Constructs? Interrater Reliability And Construct Validity Of Proposed Protective Factors Against Sexual Reoffending by Gwenda M. Willis, Sharon M. Kelley and David Thornton in Criminal Justice and Behavior
Footnotes
Authors’ Note:
The authors acknowledge Melissa Adam and Shane Brown who coded the routine sample with the first author for interrater reliability as part of their doctoral and master’s theses, respectively. The opinions are those of the authors and not necessarily those of Sand Ridge Secure Treatment Center or the New Zealand Department of Corrections. This research was supported by a Rutherford Discovery Fellowship awarded to Gwenda M. Willis.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
