Abstract
This study examined Static-99R normative data and cross-cultural validity in a sample of 811 Aboriginal and 3257 non-Aboriginal Australian men (N = 4068) serving custodial orders for sexual offences in New South Wales (NSW), Australia. Aboriginal men scored significantly higher on the Static-99R than non-Aboriginal men (M = 4.39 vs. 2.61) and were more likely to be represented in higher categories of risk. The Static-99R showed good discrimination performance for the total sample (AUC = .76; 95% CI = [.73–.80]) and acceptable calibration to expected reoffending rates for routine samples, with slight tendencies towards overestimation. Discrimination accuracy was lower for Aboriginal men (AUC = .68; 95% CI = [.60–77]) than non-Aboriginal men (AUC = .78; 95% CI = [.74–83]) although was significantly better than chance for both groups. Additional analyses indicated that cross-cultural differences in discrimination were partly associated with variance in sample composition between groups. This is the first Australian study to find evidence for significant predictive validity of the Static-99R with Aboriginal men, and while further research is needed, the results provide initial support for cross-cultural applications of the measure in local criminal justice settings.
Keywords
Introduction
Actuarial risk assessment is a prevalent and potentially valuable tool for decision-making within criminal justice systems. Within corrections settings, risk assessment is typically used to rank individuals according to their likelihood of reoffending, so as to determine their eligibility and priority for behaviour change interventions. In line with Risk Need Responsivity (RNR: Bonta & Andrews, 2017) principles, allocation of limited resources to individuals with the highest risk of recidivism is an important component of effective case management to address the often high social and other costs of reoffending. While risk assessments are often not designed to address specific legal decisions (Helmus et al., 2022), risk can also be a consideration in legal matters such as sentencing options and release to parole. In Australia and elsewhere, this can extend to assessment as part of preventative detention determinations, extended supervision or continuing detention orders in New South Wales (NSW) under the Crimes (High Risk Offenders) Act 2006, with significant implications for community safety and the rights of the individual being assessed.
The most commonly used actuarial measure for assessing risk of sexual recidivism is the Static-99 and its variations (Hanson & Thornton, 2000; Helmus et al., 2012b). The Static-99 estimates risk of sexual reoffending on the basis of 10 items measuring static or historical factors, including age, sexual offending characteristics, and other criminal history. Following revisions to score weightings for age, the Static-99R has been established as the preferred assessment of sexual recidivism risk (Helmus et al., 2012b) and is the focus of the current study. Since its development the Static-99R has been used in criminal justice settings in dozens of countries across the world, and translated into multiple languages (Helmus et al., 2022).
In keeping with the widespread use of the Static-99R, a key consideration is whether evidence for validity and associated normative data are generalisable, or applicable across locations and samples. In the absence of local validation, generalisability of an assessment to different people and settings requires a number of assumptions about consistency in how risk factors are measured, their association with outcomes, and measurement of the outcome itself (Helmus et al., 2011). Actuarial assessments of risk are modelled and normed on specific development samples, and it is not guaranteed that factors such as the relationships between explanatory variables and outcomes, or between a given score and rates of recidivism, will extend to other samples. In this regard, the extent to which an assessment is generalisable has a bearing on both discrimination performance, or accuracy in differentiating reoffenders from non-reoffenders, and calibration performance, or correspondence between expected and observed rates of reoffending.
Empirical considerations involving the generalisability of risk assessments have been associated with increasing concerns about cross-cultural validity and the potential for racial/ethnic bias. The Static-99R has been largely validated using samples from North America and the United Kingdom (Helmus et al., 2022), and therefore predominantly represent the majority White racial/ethnic groups of these countries. As a result, there is often limited data to support use of the Static tools with culturally diverse groups within or between countries. The importance of cross-cultural validation of risk assessments has been highlighted by indications that different groups, such as racial/ethnic minorities and Indigenous peoples, experience culture-specific risk factors or have different experiences of common risk factors compared to other groups (for a review see Gutierrez, 2018), including as a result of histories of discrimination and marginalisation. These factors have critical implications for the likelihood of error in risk assessment performance, with lead on effects for the accuracy of criminal justice decision-making and therapeutic case formulation to effectively address risk factors. In this respect, risk assessment error could contribute to observed disparities in justice outcomes for minority cultural groups (Shepherd & Lewis-Fernandez, 2016). A related consideration is that the relative dearth of representative data has the potential to impact upon the utility of risk assessment processes in criminal justice systems, with multiple assessments being subject to legal challenges regarding their use with people from different cultural backgrounds in Canada, the United States, and other countries in recent years (see Allan et al., 2018; Day et al., 2018; Shepherd & Lewis-Fernandez, 2016; Skeem & Lowenkamp, 2016).
The existing research also suggests a continued need for examination of cross-cultural validity of risk assessments, particularly in regard to Indigenous peoples. For example, meta-analytic reviews have found that the Level of Service Inventory (LSI: e.g., Andrews & Bonta, 1995) assessments tend to have comparable predictive validity across various population subgroups, including different racial/ethnic groups (Olver et al., 2014), providing support for the commonality of Central Eight risk factors (Bonta & Andrews, 2017). However, a meta-analysis of studies focusing on Indigenous peoples from Canada, the United States, and Australia found that while the LSI tools had significant discrimination for reoffending among Indigenous samples, predictive validity was poorer on five of the eight assessed risk factors relative to non-Indigenous samples (Wilson & Gutierrez, 2014). A similar pattern of results was found in relation to assessment of sexual recidivism risk, with a recent meta-analysis by Ahmed and colleagues (2023) indicating that the Static-99R showed predictive validity for Indigenous samples which was significantly better than chance although consistently lower than non-Indigenous racial/ethnic groups.
The Australian Context
Cross-cultural validity has been identified as a key factor in use of risk assessments in Australian criminal justice contexts, considering the unique and diverse cultural heritages of Aboriginal and Torres Strait Islander peoples. The First Nations peoples of Australia are a large number of groups across mainland Australia and surrounding islands, with distinct traditional lands, languages, and practices, collectively representing the longest continuous living culture in the world (for the purposes of this report, we hereafter use the term ‘Aboriginal’ to refer to all First Nations Australians including Aboriginal and Torres Strait Islander peoples). Similar to the impacts of colonisation on Indigenous peoples in other countries, Aboriginal people disproportionately experience a range of social, health, and other disadvantages, including overrepresentation among those who come into contact with the criminal justice system. Aboriginal people represent 2.6% of the Australian adult population and 31.5% of those in prisons across the country (Australian Bureau of Statistics, 2021); they have been identified as the most incarcerated people in the world (Anthony, 2017). Consistent with this, Aboriginal people are also disproportionately convicted of sexual offences and subject to related criminal justice interventions (e.g., Howard, 2016). It follows that widespread application of the Static tools in Australia has coincided with their routine use with Aboriginal people.
The applicability of the Static tools, which were developed without representative Australian data (Hanson & Thornton, 2000; Helmus et al., 2012b), to local criminal justice contexts and samples has continued to be a subject of debate. Use of existing risk assessments with Aboriginal people convicted of sex offences has been criticised in Australian criminal courts, including in response to the limited culturally specific evidence for validity (for a review, see Allan et al., 2018). Concerns have also been raised about the extent to which assessments such as the Static-99R can account for unique situational and other factors that contribute to offending among Aboriginal people, including those related to histories of colonialism and intergenerational trauma (Day et al., 2018; Shepherd & Lewis-Fernandez, 2016; Spiranovic, 2012). While a recent study of Australian forensic practitioners indicated that most viewed risk factors for sexual recidivism to be common for Aboriginal and non-Aboriginal people, they identified challenges to cultural competence and lack of data on cross-cultural validity as substantial barriers to best practice risk assessment (Allan et al., 2020).
To date, a small number of studies have sought to test the validity of Static measures in Australian samples. Consistent with population densities within Australian states and territories as well as the country as a whole, previous studies have tended to be impacted by methodological limitations associated with small sample sizes and low statistical power. An initial study by Allan et al. (2006) examined Static-99 assessment results for a sample of 144 men from Western Australia who had been convicted of violent and non-violent sexual offences. Only non-Aboriginal men were included in the sample, due to the lack of representative Static-99 data for Aboriginal men. They found that the Static-99 had good discrimination accuracy for non-Aboriginal men, reporting a benchmark area under the curve (AUC) statistic of .78 for sexual reoffending over an unspecified follow-up period.
A study by Spiranovic (2012) also examined Static-99 and Static-99R assessment data for Aboriginal and non-Aboriginal men convicted of sex offences in Western Australia, as part of aims to develop norms for these populations. The study involved retrospective coding of assessment scores for 952 men who had received sentences between 1951 and 2009, including 162 Aboriginal men and 660 non-Aboriginal men who had valid data for calculations of sexual reoffending at a 5-year follow-up period. They found the Static-99 and Static-99R had significant discrimination accuracy for sexual reoffending among non-Aboriginal men (AUC = .70 and .71 respectively for sexual recidivism); however, discrimination was not significantly better than chance for Aboriginal men (AUC = .57 and .52 respectively). As a result, normative data for Aboriginal men were not included in the study. The author concluded that the assessments should not be used with Aboriginal people convicted of sex offences (see also Allan et al., 2006).
Subsequent research by Smallbone and Rallings (2013) examined Static-99 and Static-99R data for a sample of 399 Aboriginal and non-Aboriginal men convicted of sexual offences in Queensland. The sample included a total of 67 Aboriginal men, and analyses for this group were further limited by an average post-release follow-up period of less than three years (mean = 29 months). They found that the Static-99 had adequate predictive validity for both Aboriginal and non-Aboriginal men, reporting AUC values for sexual recidivism of .76 and .82 respectively. Static-99 R AUC values were lower at .61 and .79 respectively, and confidence intervals were large for the Aboriginal men, with the result being that discrimination performance was not significantly better than chance for this group.
More recently, a study by Reeves et al., (2018; see also Raymond et al., 2021, for another study using a subset of older men from the same sample) examined the predictive validity of multiple tools including the Static-99, Static-99R, Static-2002 and Static-2002R among 621 men who had been convicted of sexual offences and were under forensic mental health care in Victoria. They reported that only three individuals in the sample identified as being of Aboriginal or Torres Strait Islander background; subsequently their analyses did not consider cross-cultural validity or subgroup analyses. Their results indicated moderate discrimination performance that did not differ significantly across assessments, returning AUC values of .68–.69 for sexual recidivism at 5 years.
Aims
Cross-cultural validation to inform the use of risk assessments across different groups and countries has been identified as a priority for research on the Static-99R (Helmus et al., 2022). The aim of the current study is to expand on the limited existing literature by examining the predictive validity of the Static-99R for men who are convicted of sexual offences in NSW, Australia, using a large and statistically robust sample of assessments conducted under field settings. This study sets out to provide local data on Static-99R norms and conduct tests of predictive validity in reference to both discrimination and calibration performance. To address the dearth of validation evidence for Aboriginal people convicted of sex offences, we present normative data and validation results separately for Aboriginal and non-Aboriginal men, as well as for the sample overall.
An additional aim of this study is to examine relative cross-cultural validity by conducting inferential analyses to test whether relationships between Static-99R scores and sexual reoffending differ as a function of Aboriginal status. Following recent examples (e.g., Skeem & Lowenkamp, 2016), we anticipated that predictive bias would be expressed in significant differences in the associations between test scores and the criterion for Aboriginal and non-Aboriginal men. We also repeated primary comparisons of predictive validity after matching Aboriginal and non-Aboriginal men based on their scores on Static-99R items. This procedure was intended to account for statistical effects of compositional differences between the samples of Aboriginal and non-Aboriginal men, and isolate the effects of culture by comparing Static-99R performance across groups that have equivalent profiles and distributions of risk.
Methods
In this section, we report how we determined our sample size, all data exclusions and manipulations, and all measures in the study. The authors take responsibility for the integrity of the data, the accuracy of analyses, and have made efforts to avoid inflating statistically significant results. Conduct of this study was approved by the Corrective Services Ethics Committee (CSEC) of Corrective Services New South Wales.
Sample
The sample for this study comprised n = 4068 adult men who had received a custodial sentence in NSW between 2000 and 2021. Men were included in the study if they had one or more valid Static-99 or Static-99R assessments completed during their index custodial episode. Individuals were also required to have exited custody to allow for calculation of their survival period and reoffending outcomes.
Across the total sample, the mean age at time of assessment was 43.12 years (SD = 14.44; range = 18–93). Men in the sample received an average total sentence of more than 5 years (M = 2017.01 days; SD = 1891.87) and spent an average of almost 3 years (M = 1079.71 days; SD = 1134.94) in custody during the index episode. The majority (84.2%) were released to parole at the conclusion of their custodial episode; the remainder (15.6%) were released at the expiry of their sentence.
The sample included 811 Aboriginal and 3257 non-Aboriginal men. Aboriginal men were significantly younger than non-Aboriginal men at the time of assessment (M = 37.5 vs. M = 44.5 years; t (4066) = 12.70; p < .001; d = .49). Aboriginal men also had longer sentences (M = 2078.89 vs. M = 2001.23 days; t (4066) = .77; p = .021; d = .04) and spent more time in custody (M = 1124.56 vs. M = 1068.55 days; t (4066) = 1.25; p < .001; d = .05) compared to non-Aboriginal men. Aboriginal men and non-Aboriginal men had similar rates of release to parole at the conclusion of their index custodial episode (85.5% vs. 83.9%; χ2 (1, 4068) = 1.20; p = .27).
Measures
Demographics
Individual demographic and criminal history variables were obtained from the central Corrective Services NSW operational database known as the Offender Integrated Management System (OIMS). A key variable obtained from OIMS was Aboriginal status. Individuals are recorded as Aboriginal if they self-identify as being of Aboriginal and/or Torres Strait Islander cultural background during any historical or current episode with Corrective Services NSW, with the remainder being classed as non-Aboriginal. Other variables obtained from OIMS included date of birth, gender, custodial episode start and end dates, sentence start and expiry dates, and type of discharge from custody.
Static-99R
The Static-99R (Hanson & Thornton, 2000; Helmus et al., 2012b) is a 10-item risk assessment tool that was designed to assess risk of sexual reoffending among adults with a history of sexual offending. All items are assessed based on static or historical factors and are aggregated into a single estimate of sexual recidivism risk, giving a score ranging between −3 and 12.
Some men in the sample had scores resulting from administration of the Static-99 (n = 483) or the Static-99R (n = 3585), but not both. This is in line with the sampling timeframe and local policy governing transition from the Static-99 to the Static-99R after the latter became available. The Static-99R is identical to the Static-99 with the exception of revised age weights (Helmus et al., 2012b). To accommodate this, Static-99R scores were computed from Static-99 scores using administrative data on the individual’s date of birth and custodial episode release date to calculate the updated age category item.
Recidivism
Sexual reoffending outcomes were derived from NSW criminal court finalisation data. Individuals were recorded as sexually reoffending if they had one or more reconvictions in the state of NSW relating to offences under the Australian and New Zealand Standard Offence Classification (ANZSOC) Division 3: sexual assault and related offences (which includes a range of sexual offences, including violent and non-violent offences), following release from the custody episode in which the offender received assessment. New convictions were required to be attached to an offence that occurred after the custody release date to account for instances of reconviction for historical offending.
Instances of reconviction were collated from the time of the offender’s release from custody until the data censoring date of 30 September 2021. Depending on analysis, recidivism variables were coded to indicate any reconviction for sexual reoffending at a fixed 5-year follow-up period, or any reconviction for sexual reoffending over the total post-release survival period. In both cases the post-release follow-up period was calculated from days’ free time in the community and was adjusted for any periods of reimprisonment for a non-sexual offence or order violation. Average (mean) survival period was 1859.65 days (SD = 1368.05) or slightly over 5 years, and survival period did not differ significantly between Aboriginal and non-Aboriginal men on average (M = 1827.84; SD = 1366.04 vs. M = 1867.57; SD = 1368.64; t (4066) = .74; p = .23; d = .03).
Procedure
Data collation
Static-99 and Static-99R assessments were conducted by trained Corrective Services NSW psychologists as part of routine case management procedures. While precise data on context and purpose was not available, assessments were typically undertaken by psychologists as part of determining eligibility for custody-based sex offender programs (see Howard, 2016), with others being conducted to support other decision-making within the criminal justice system, such as records of pre-sentence assessments or State Parole Authority (SPA) submissions. Corrective Services NSW policy indicates that all custody-based individuals with an index sex offence require assessment for sex offender programs; however, a large proportion of men convicted of sex offences do not receive a valid assessment (Bell & Howard, 2020). Omissions include individuals with short custodial sentences who are unlikely to be eligible for programs, although other administrative factors may be influential (Howard & Wei, 2022). Individuals who are deemed ineligible for Static-99R assessment according to evaluators’ guide recommendations (Helmus et al., 2021) are also excluded.
A number of men had received more than one assessment with the Static-99 or Static-99R over the study timeframe. To minimise dependence between observations, we only included the first custodial episode where the offender received valid assessments. In the event that men received multiple assessments over the same custodial episode, representing multiple estimates of risk for the same release period, we counted the assessment that was most recent to the time of release. This is because individuals can be re-assessed as a result of additional information becoming available, or where post-approval reviews suggest scoring discrepancies, in which case the latest assessment is deemed to override earlier assessments.
Normative data
Normative data relating to distribution of Static-99R scores, observed and estimated rates of reoffending, and relative risk estimates, were calculated separately for Aboriginal and non-Aboriginal subsamples as well as for the total sample. In accordance with other literature reporting Static-99R norms, reoffending outcomes were assessed against a criterion of sexual recidivism at a fixed follow-up period of 5 years. Calculations of reoffending were conducted only with men in the sample who had a minimum of 5 years’ follow-up time following release from custody, whereas distribution statistics are given for all men in the sample.
Estimated rates of reoffending were calculated by a series of binary logistic regression models conducted for Aboriginal men, non-Aboriginal men, and the total sample. In each model, Static-99 R total score was entered as the sole predictor and reconviction for sexual reoffending was entered as the dependent variable. Static-99 R scores were centered around the median for the sample of interest, in order to give indicative data about the distributions of relative risk for Aboriginal and non-Aboriginal men assessed in the current study. Median scores were 4 for Aboriginal men, and 3 for non-Aboriginal men and the total sample.
The resulting regression equation gives the constant B0, which after centring represents the log odds of reoffending for men in the sample with a median score, and can therefore be transformed to indicate the rate of reoffending associated with the average Static-99R score in the sample. The equation also gives the coefficient B1, which represents the average change in rate of reoffending (expressed as a log odds ratio) with each unit change in Static-99R score. Exponentiating this coefficient gives the odds ratio (OR), which expresses the average change in odds of reoffending with each unit change in Static-99R score, and can be interpreted so that values of greater than one indicate increasing odds of reoffending and values lesser than one indicate decreased odds of reoffending.
Estimated rates for each Static-99R score were then calculated by entering the centred score into the regression equation and transforming the predicted values, expressed as logits, into probabilities. Confidence intervals for estimated rates of reoffending were calculated separately using the procedures described by Hanson et al. (2016). Risk ratios for each score were estimated by dividing the predicted reoffending rate for the given score by the predicted reoffending rate for the median of the sample.
Discrimination
Our primary measure of relative predictive validity was the Receiver Operating Characteristic (ROC) area under the curve (AUC) statistic. The AUC statistic is a widely used measure of discrimination accuracy which, in the context of this study, assesses the probability that a randomly selected reoffender will have a higher score on the Static-99R than a randomly selected non-reoffender. A common benchmark for interpretation of AUC indicates values of higher than .556 represent a small effect size; higher than .639 represent a moderate effect size, and higher than .714 represent a large effect size (Rice & Harris, 2005) in terms of differences in the distributions of scores between reoffenders and non-reoffenders. In all cases, AUC statistics were calculated against the criterion of sexual reconviction within 5 years.
Calibration
To assess absolute predictive validity for the Static-99R, or the correspondence between predicted and observed recidivism rates, we used the E/O index (see Hanson, 2017; Helmus et al., 2012b). The E/O index is the ratio of the expected number of individuals with a given outcome to the observed number of individuals with that outcome. The E/O index was calculated by comparing observed rates in the current sample to estimated rates provided by Static-99R normative data as indicated in the Evaluator Workbook (Helmus et al., 2021). For this purpose, updated 2021 norms for estimated 5-year sexual recidivism rates in Routine/Complete samples were used (Lee & Hanson, 2021). Rates were converted into expected number of reoffenders by multiplying the rate by the sample size in the current sample for a given Static-99R score.
The E/O index can be interpreted so that a value of 1 indicates perfect correspondence between the number of observed and expected reoffenders, whereas values below 1 indicate underprediction of reoffending relative to observed outcomes, and values above 1 indicate overprediction of reoffending relative to observed outcomes. As a ratio, the E/O index cannot be calculated for any values where the expected or observed number of reoffenders was zero. Relatedly, given that power is a common challenge when calculating the E/O index for single scores of assessments such as the Static-99R (Hanson, 2017), which was evidenced in a number of observed zero values for individual Static-99R scores during subgroup analyses in this study, we collapsed scores into Static-99R risk categories (Helmus et al., 2021) for the purpose of primary tests of calibration. Separate sets of E/O index calculations were made for Aboriginal men, non-Aboriginal men, and the total sample.
Tests of cross-cultural validity
Regression modelling was used to examine whether the relationship between Static-99R scores and sexual reoffending differed between Aboriginal and non-Aboriginal men. This included a binary logistic regression model where Aboriginal status, Static-99R score, and the interaction of these terms were entered as predictors, and sexual reoffending within 5 years was entered as the dependent variable. As a secondary analysis aimed at maximising sample retention, we also conducted a cox proportional hazard regression model to estimate the association between the same predictor variables and time to sexual reoffending for all men in the sample. In both models, cross-cultural bias may be indicated by a significant Aboriginal status by Static-99R score interaction term (Skeem & Lowenkamp, 2016).
To further account for potential sources of variance across cultural groups, we repeated the above regression analyses after matching individuals in the Aboriginal and non-Aboriginal groups. We used the Propensity Score Matching (PSM; e.g., Rosenbaum & Rubin, 1983) technique for this purpose and matched individuals according to their scores on each of the 10 items of the Static-99R. PSM was conducted using one-to-one matching without replacement, and using the ‘greedy’ or nearest neighbour matching technique. This allowed for retention of almost all men in the smaller Aboriginal group, as well as an equivalent sample size for the non-Aboriginal men. This procedure was intended to assess the performance of the Static-99R across cultural groups under conditions where both groups had equivalent sample sizes, assessed risk profiles and distributions of scores.
Results
Reoffending Outcomes
A total of 6.7% of men in the sample (n = 274/4068) were reconvicted for sexual reoffending over the total follow-up period, averaging at slightly over 5 years (M = 1859.65 days; SD = 1368.05 days). Among those who had sufficient follow-up time, 4.3% (n = 132/3098) were observed to reoffend within 2 years, and 7.7% (n = 151/1959) reoffended within 5 years. Aboriginal men were more likely to be reconvicted for sexual reoffending than non-Aboriginal men over a follow-up period of 2 years (5.6% vs. 3.9%); a follow-up period of 5 years (10.2% vs. 7.1%); and over the total recidivism follow-up period (8.8% vs. 6.2%).
Distribution of Static-99R Scores
Distributions of Static-99R Scores for Aboriginal Men, Non-Aboriginal Men, and the Total Sample.
Note. Static-99R categories are I = Very Low Risk; II = Below Average; III = Average Risk; IVa = Above Average; IVb = Well Above Average.
Correspondingly, there was significant variation across Static-99R categories, so that Aboriginal men were less likely than non-Aboriginal men to be in the Very Low Risk (1.1% vs 7.8%), the Below Average Risk (4.2% vs 16.8%) and the Average Risk (29.8% vs 37.2%) categories, and were more likely than non-Aboriginal men to be in the Above Average Risk (31.2% vs 22.8%) and the Well Above Average Risk (33.7% vs 15.4%) categories (χ2 = 260.07; p < .001).
Observed and estimated Rates of Reoffending
Observed and Estimated Sexual Reoffending at 5 years for Aboriginal Men.
Note. CI = confidence interval.
Observed and Estimated Sexual Reoffending at 5 years for Non-Aboriginal Men.
Note. CI = confidence interval.
For Aboriginal men, Static-99R score was found to be a significant predictor of reoffending within 5 years (Wald χ2 (1, 404) = 15.61; p < .001), corresponding to a regression coefficient of B = .303 (SD = .077) and an odds ratio of 1.355 (95% CI = [1.17–1.58]). This indicates that each score increase on the Static-99R was associated with a 35.5% increase in odds of sexual reoffending within 5 years. The constant for this regression model was B = −2.53, which is equivalent to the log odds of reoffending for Aboriginal men who received a Static-99R score of 4, corresponding to a reoffending probability of 7.4%. The Hosmer and Lemeshow test statistic was not statistically significant (χ2 (6, 404) = 5.00; p = .54), indicating that there were non-significant differences between observed and estimated rates of sexual recidivism for men with a given Static-99R score on average (see also Figure 1). Line plot showing observed and estimated (with 95% confidence intervals) rates of sexual reoffending associated with each Static-99R score for Aboriginal men. Note. Values are not plotted at scores where cell size is less than 10. The Routine/Complete Samples line shows estimated 5-year recidivism rates as given in Static-99R normative data.
For non-Aboriginal men, Static-99R score was also a significant predictor of sexual reoffending within 5 years (B = .407; SE = .042; χ2 (1, 1554) = 93.97; p < .001). The odds ratio for Static-99R score was 1.50, indicating that each unit increase in Static-99R score was associated with a 50% increase in odds of reoffending relative to the sample median. A constant of −2.87 was estimated for the model, which corresponds with an average probability of recidivism of 5.4%. The Hosmer and Lemeshow test was highly non-significant (χ2 (7, 1554) = 1.30; p = .99), indicating goodness of fit between observed and expected recidivism rates among individuals with a given Static-99R score on average (see also Figure 2). Line plot showing observed and estimated (with 95% confidence intervals) rates of sexual reoffending associated with each Static-99R score for non-Aboriginal men. Note. Values are not plotted at scores where cell size is less than 10. The Routine/Complete Samples line shows estimated 5-year recidivism rates as given in Static-99R normative data.
For the total sample, Static-99R score was significant (B = .38; SE = .036; χ2 (1, 1958) = 110.20) and associated with an odds ratio of 1.45 (95% CI = [1.36 – 1.56]). A constant of −2.87 was estimated for the model, which was centred on a total sample median of 3, corresponding with a 5.3% probability of sexual recidivism. Again, Hosmer and Lemeshow test statistics indicated goodness of fit between observed and estimated rates (χ2 (7, 1958) = .94; p = .99). Additional statistics on observed and estimated reoffending rates among the total sample can be found in the Appendix.
Discrimination
For Aboriginal men, the AUC value for sexual recidivism at 5 years was .682 (95% CI = [.597–767]). For non-Aboriginal men, the equivalent AUC value was .782 (95% CI = [.739–825]). Reflecting the majority representation of non-Aboriginal men in the sample, the total sample AUC value for sexual recidivism within 5 years was similar at .763 (95% CI = [.726–.800]). Each of the AUC values were statistically significant (ps < .001) and had 95% confidence intervals that did not include .5, indicating discrimination performance that was significantly better than chance. The values correspond to a medium-large effect size in discriminating sexual recidivists from non-recidivists from their distributions on Static-99R scores for Aboriginal men, and large effect sizes for non-Aboriginal men and the total sample (Rice & Harris, 2005).
Calibration
E/O Index Calculations for Static-99R Risk Categories and Overall Rates.
Note. CI = confidence interval; E = expected number of reoffenders; O = observed number of reoffenders.
For Aboriginal men, calculation of the E/O index for the lower two risk categories was not possible due to zero observed instances of sexual reoffending. For the remaining categories, Static-99R norms tended to underestimate reoffending for the Average Risk category and overestimate reoffending for the Above Average and Well Above Average categories. Static-99R norms also tended towards overestimation of overall reoffending rates for the group. Confidence intervals indicated that differences between expected and observed reoffending for all E/O calculations were not significant.
E/O index outcomes for non-Aboriginal men were similar to those of the total sample, with tendencies towards overestimation of reoffending for all Static-99R categories and the sample overall. All confidence intervals included the value of 1, indicating that differences between expected and observed rates were not statistically significant.
Tests of Cross-Cultural Validity
Total sample
A binary logistic regression model was conducted to estimate sexual recidivism at 5 years, where Static-99R score and Aboriginal status, as well as a Static-99R by Aboriginal status interaction term were entered as predictor variables. The final estimation model was significant (χ2 (3, 1958) = 123.24; p < .001). Examination of the individual coefficients indicated that Static-99R score was a significant predictor of sexual recidivism within 5 years, after adjusting for other factors in the model (χ2 (1, 1958) = 93.97; p < .001; OR = 1.50; 95% CI = [1.38–1.63]). Both the Aboriginal status main effect (χ2 (1, 1958) = .442; p = .51; OR = 1.41; 95% CI = [.51–3.95]) and the Static-99R by Aboriginal status interaction term (χ2 (1, 1958) = 1.39; p = .24; OR = .90; 95% CI = [.76 – 1.07]) were non-significant predictors of sexual recidivism. This indicates that associations between Static-99R score and odds of sexual recidivism within 5 years did not differ significantly between Aboriginal and non-Aboriginal men.
In order to maximise retention of Aboriginal men in the sample, we replicated the above model using Cox proportional hazard regression, where the outcome was any recorded reconviction for sexual offending while adjusting for individual variation in survival period. The regression model was also significant (χ2 (3, 4067) = 224.35; p < .001). In a similar pattern to the logistic regression model, the main effect of Static-99 R was a significant predictor of survival (χ2 (1, 4067) = 174.85; p < .001; HR = 1.43; 95% CI = [1.35–1.50]) whereas Aboriginal status was not (χ2 (1, 4067) = .22; p < .64; HR = 1.12; 95% CI = [.58–2.45]). Importantly, the Static-99R x Aboriginal status interaction term was also non-significant (χ2 (1, 4067) = 1.07; p = .30; HR = .94; 95% CI = [.84–1.06]).
After matching
As mentioned, PSM was used to match individuals in the Aboriginal and non-Aboriginal groups on their scores on each of the 10 Static-99 R items. Applying the nearest neighbour matching technique, this resulted in retention of 809 Aboriginal men and 809 non-Aboriginal men. Data diagnostics indicated that the PSM procedure was successful in matching individuals, with standardised mean differences for all variables reduced to p < .05 after matching. To give an indication of the outputs, distributions of Static-99R total scores were almost identical for Aboriginal men (mean = 4.39; SD = 2.43; median = 4; range -3 – 11) and non-Aboriginal men (mean = 4.39; SD = 2.45; median = 4; range -3 – 11) after matching.
We first replicated primary AUC analyses of discrimination performance for the Aboriginal and non-Aboriginal groups after matching. Discrimination for sexual reoffending at 5 years remained similar for Aboriginal men, as expected (AUC = .680; 95% CI = [.597–.763]). By comparison, discrimination performance for the non-Aboriginal group declined after matching (AUC = .709; 95% CI = [.622–.797]).
Binary logistic regression and Cox proportional hazard regression models testing interactions between Aboriginal status and Static-99R on sexual reoffending were also replicated after matching, with similar results. For the binary logistic model estimating sexual reoffending at 5 years, Static-99 R total score was a significant predictor of outcome (χ2 (1, 770) = 27.08; p < .001; OR = 1.54; 95% CI = [1.31–1.82]) whereas Aboriginal status (χ2 (1, 770) = 1.21; p = .27; OR = 2.20; 95% CI = [.54–8.83]) and the Static-99R by Aboriginal status interaction term (χ2 (1, 770) = −.13; p = .25; OR = .87; 95% CI = [.70–1.01]) were not.
Similarly, for the Cox proportional hazard model estimating sexual reoffending after adjusting for variation in survival period, the Static-99R term was significant (χ2 (1, 1618) = 48.10; p < .001; HR = 1.51; 95% CI = [1.34–1.69]) and the Aboriginal status (χ2 (1, 1618) = 2.39; p = .12; HR = 2.24; 95% CI = [.81–6.21]) and interaction (χ2 (1, 1618) = 2.17; p = .14; HR = .89; 95% CI = [.76–1.04]) terms were not significant predictors of reoffending. In sum, the results of these models indicated that associations between Static-99R scores and sexual recidivism did not differ significantly as a function of Aboriginal status after matching.
Discussion
The extent to which actuarial risk assessments have predictive validity across samples and locations is critical to their utility in cross-jurisdictional criminal justice settings. This is particularly relevant to countries such as Australia, where evidence for the validity of assessments with culturally unique and diverse Aboriginal and Torres Strait Islander peoples is underdeveloped. The aim of the current study was to provide normative data and examine predictive validity of the Static-99R for men convicted of sexual offences in New South Wales, Australia, with a focus on expanding on the limited available data for Aboriginal people and testing evidence for cross-cultural validity. To our knowledge this is the largest study of the Static-99R in Australia, in terms of both Aboriginal and non-Aboriginal samples, completed to date.
Our results indicated that the Static-99R has good predictive validity for the sample overall. As a benchmark index of discrimination performance, the AUC value for sexual reoffending at 5 years was .763 for the total sample. This outcome is above average relative to international samples, with a recent meta-analysis of 56 studies returning AUC values of .68–.69 (Helmus et al., 2022). Calibration testing using the E/O index also suggested that absolute recidivism estimates represented in Static-99R norms for Routine/Complete samples had reasonable correspondence with observed sexual reoffending rates. There was a tendency towards overestimating reoffending, which was significant for overall rates although non-significant when tested for each of the Static-99R risk categories. Variance in absolute risk estimates across samples has been observed elsewhere (e.g., Helmus et al., 2012a) and associated with overestimation of 73% on average (Helmus et al., 2022). An implication for practice is that cross-sample variance in absolute risk may ultimately be best addressed by structuring assessment communications and decision-making around relative risk (Hanson et al., 2016).
When considered separately, there was also evidence to support predictive validity of the Static-99R for Aboriginal men. The observed AUC value of .682 for sexual reoffending at 5 years was higher than that recorded for other international Indigenous samples (Ahmed et al., 2023; Babschishin et al., 2012; Lee et al., 2020) and comparable to average discrimination accuracy for the Static-99R overall (Helmus et al., 2022). Importantly, confidence intervals for both AUC values and logistic regression odds ratios indicated discrimination performance that was significantly better than chance. Previous studies of the Static-99R have returned AUC values for Australian Aboriginal people that were statistically non-significant (Smallbone & Rallings, 2013; Spiranovic, 2012), which may be partly attributable to lower variance and inflation of error associated with small sample sizes. Interestingly, one higher AUC value for Aboriginal men in Australia has been reported for the original Static-99 (AUC = .76; Smallbone & Rallings, 2013), albeit in a small sample with short follow-up times. This may suggest that age adjustments for item 1 of the Static-99R are important to discrimination performance for Aboriginal men; however, the authors did not find significant differences in the association between age and sexual recidivism as a function of Aboriginal status.
Calibration results also indicated acceptable accuracy for Static-99R absolute risk estimates among Aboriginal men. While E/O index calculations were restricted by no observed instances of reoffending for lower risk categories, differences between observed and expected rates for other categories were not statistically significant. Unlike non-Aboriginal men, the direction of miscalibration for Aboriginal men varied, with Static-99R norms tending towards underprediction of reoffending rates for those in the Average Risk category and overprediction for those in the Above Average and Well Above Average categories. However, given that within-category E/O index variance was non-significant in all cases for both Aboriginal and non-Aboriginal men, it is not possible to make conclusions about the direction of effects beyond that attributable to error alone.
While this study found evidence to support predictive validity of the Static-99R with Aboriginal men, tests of relative performance for Aboriginal and non-Aboriginal men yielded a more complex pattern of results. For the total sample, AUC values were markedly lower for Aboriginal men compared to non-Aboriginal men, corresponding to a difference of 8.1% discrimination accuracy between groups. Interpretation of the magnitude of this disparity is moderated by findings that associations between Static-99R scores and sexual reoffending were not statistically significantly different for Aboriginal and non-Aboriginal men. Observable but non-significant differences in Static-99R performance for Indigenous and non-Indigenous samples have been found in other studies (e.g., Babchishin et al., 2012; Lee et al., 2020); however, it is possible that this reflects the limited statistical power of individual studies to detect such differences relative to meta-analytic designs (Ahmed et al., 2023).
In their study of LSI assessment performance for Indigenous and non-Indigenous people, Wilson & Gutierrez (2014) proposed a number of explanations for predictive validity differences across groups. One potential account is that overpolicing of Indigenous communities may inflate recidivism detection and diminish the strength of prospective relationships between criminal-history related static variables and reoffending outcomes. A related explanation is that risk factors may be conceptualised and assessed in a way that is not culturally relevant or fails to take into account the unique experiences of Indigenous people, such as different cultural definitions of familial relationships. Alternatively, risk factors may have similar meanings and associations with recidivism across groups; however, the greater accumulation of risk factors among Indigenous people may affect their predictive validity as individual factors or in aggregate. A fourth explanation is that existing risk assessments omit additional culture-specific variables, including those that reflect cultural experiences of marginalisation and discrimination.
A fifth, and related, potential explanation is that the higher average risk observed among Indigenous cohorts could contribute to restriction-of-range effects in observed or unobserved risk factors (e.g., Ahmed et al., 2023; Hanson et al., 2016). In line with this account, we found that differences in AUC results were attenuated after matching Aboriginal and non-Aboriginal men on their Static-99R item scores. This was associated with a decline in discrimination performance for non-Aboriginal men. The results suggest that when considering complete samples available to this study prior to matching, cross-cultural differences in discrimination accuracy may have been partly associated with statistical artefacts arising from differences in sample composition, such as distributions of global recidivism risk or underlying risk factors.
Although previous reviews have found AUC values for the Static-99R tend to be stable across samples (Hanson et al., 2016; Helmus et al., 2012a; Lee & Hanson, 2021), a recent meta-analysis (Helmus et al., 2022) reported significant variability in discrimination performance. Moderator analyses indicated that a primary contributor to this variability was sample type, particularly whether the sample was preselected as having high risk/need. Such groups are expected to have reduced variance in Static-99R scores, which has been identified as adversely impacting discrimination performance as assessed by AUC values (Howard, 2017; see also Helmus et al., 2012a). Another potential contributing factor is that for assessments of specific reoffending outcomes with low base rates, discrimination specificity may be particularly affected as the risk profile of the sample increases. Consistent with this, a recent study of risk assessments for domestic violence (DV) recidivism found poorer discrimination outcomes marked by weak specificity among higher risk groups (Howard & Zhang, 2020), and suggested that offence-specific estimates may become increasingly confounded by general recidivism risk when the average risk and criminal versatility of the sample increases.
While incidental to our primary tests of cross-cultural validity, we found that Aboriginal men and non-Aboriginal men differed significantly in their distribution of Static-99R scores. Aboriginal men had higher average scores, and were more likely to be represented in above average risk categories of the Static-99R, compared to non-Aboriginal men. Other studies have similarly found that Aboriginal people convicted of sex offences in Australia tend to be assessed at as being at higher risk of sexual recidivism (Smallbone & Rallings, 2013; Spiranovic, 2012). An implication is that the Static-99R may contribute to differential impacts of assessment for Aboriginal and non-Aboriginal men. That is, by having higher risk scores or accumulations of risk factors on average, Aboriginal peoples are more likely to subject to outcomes or decisions that are predicated on risk severity, such as allocation to behaviour change programs or consideration for extended supervision orders. Differential impacts are not evidence of test bias (e.g., Skeem & Lowenkamp, 2016; Warne et al., 2014) and are more pertinent to how an assessment is used rather than its psychometric properties. Nonetheless, there are important policy implications for effective application of the risk principle while ensuring that this does not serve to further entrench cross-cultural disparity and disadvantage within the criminal justice system.
Limitations
A primary limitation of the current study relates to the representativeness of the sample. Static-99R results were drawn from archival data on routine assessments conducted by trained Corrective Services NSW clinicians. While the purpose of routine assessments has remained relatively stable over the sampling timeframe, many people who are convicted of sexual offences do not receive an assessment (Bell & Howard, 2020). It is possible that people who are less likely to receive assessment, such as those with insufficient time on their custodial order to complete programs, have unique characteristics that impact upon generalisation of the results.
Importantly, we also acknowledge that Aboriginal normative data and validation results relate to men serving custodial orders in NSW, and may not be representative of the remarkable cultural diversity of First Nations peoples across Australia. Conversely, Australia is home to people of a range of cultures and ethnicities, and while extensive subgroup analysis was not feasible, it may not be appropriate to consider the non-Aboriginal sample in this study was a culturally homogeneous group. Relevant data was not available to the current study; however an indicative finding is that around 20% of people referred to behaviour change programs in NSW were identified as having culturally and linguistically diverse background other than being Aboriginal and/or Torres Strait Islander (Howard & Lobo, 2020).
A related limitation is that although the current study improves upon sampling challenges observed in other Australian studies of the Static-99R, power was nonetheless suboptimal for Aboriginal men, particularly in reference to the small number of Aboriginal men (n = 41) who sexually reoffended. Sample size limitations may be expected to increase error for a number of analyses and group comparisons involving Aboriginal men, and the potential for type II error cannot be discounted. While an identified advantage of AUC statistics as an indicator of discrimination performance is that they are relatively robust to sampling effects on power (e.g., Rice & Harris, 2005), they may be indirectly affected if smaller samples coincide with restricted variance in scores (Howard, 2017). Statistical power is a particular issue in the context of measuring sexual reoffending, which tends to have low base rates in most samples. It would be beneficial for future research to leverage national databases and other linked data assets (e.g., Spiranovic et al., 2020) to maximise sampling for validations of the Static-99R and other risk assessments.
Another limitation relates to our definition of sexual recidivism. As is true of most studies, our measurement of reconviction for subsequent sexual offences is limited to official court closures and likely an underestimate of true recidivism. Circumstances surrounding the likelihood that offending behaviours result in reconviction, such as community over- or under-policing and decisions to proceed with prosecution, are clearly relevant to issues of cross-cultural bias. Our reconviction data were also sourced from the state of NSW only, and it was not possible to systematically account for inter-jurisdictional reoffending following an individual’s movements across states and territories. Further, survival follow-up times were relatively short for many men in the sample, necessitating sub-sample analyses in many cases and precluding examination of longer-term reoffending outcomes.
Conclusion
This study contributes to the limited available evidence on Static-99R norms and performance for Australian men convicted of sexual offences. We found that the measure had good discrimination accuracy for the total sample, giving positive indications for the validity of the Static-99R in Australian settings and supporting its continued use for the purposes of assessing and communicating relative risk. Calibration to routine Static-99R norms was also largely within acceptable limits, although established tendencies towards overestimation were replicated here. An aim of the current study was to provide relatively novel indicative data about the distributions of Static-99R scores and recidivism outcomes for Australian Aboriginal and non-Aboriginal men; however, considering the pattern of results, in addition to the limited samples available to this study, there is a case for practitioners to continue to apply Static-99R normative data (Helmus et al., 2022) in local criminal justice settings.
A significant feature of the study was the availability of adequate data to examine cross-cultural validity for Aboriginal Australians, who are routinely assessed using Static tools (Allan et al., 2018) although have historically been understudied and undersampled. Our results are the first to indicate predictive validity for the Static-99R that is significantly better than chance for Australian Aboriginal men. Nonetheless, discrimination performance was lower than for non-Aboriginal men, following an international pattern in risk assessment that clearly warrants further scrutiny (Shepherd & Lewis-Fernandez, 2016). We agree with the practical implications recently raised in response to similar findings by Ahmed and colleagues (2023), in that continued use of the Static-99R with Aboriginal men is supported where there are limited available alternatives, particularly considering the real or apparent risk of cultural bias through assessments using unstructured clinical judgement. At the same time, it is important that results are interpreted with caution and a culturally informed understanding of relevant factors that may moderate assessment results. There is also a need for further research to examine cultural differences in risk factors and risk assessment outcomes, and develop or refine tools to better account for those differences. The current study contributes to that objective by finding that sample composition factors are relevant to differences in Static-99R performance between Aboriginal and non-Aboriginal men, suggesting that accounting for potential statistical artefacts may help to support an understanding of underlying causal mechanisms.
We conclude with the recognition that this study is not intended to reach broader conclusions about Aboriginal men’s experiences of bias in their interactions with criminal justice systems in Australia or elsewhere. The results suggest that statistically, relationships between assessed Static-99R and sexual reoffending outcomes were broadly comparable to other men with a history of sexual convictions; a population who themselves have atypical social experiences of discrimination and marginalisation (e.g. Harper et al., 2017). It is also important to consider the impacts of histories of systemic racism and colonialism on accumulation of criminogenic needs, including to but not limited to static variables such as criminal history, which were reflected in the elevated risk profiles among Aboriginal men relative to non-Aboriginal men. In this regard, it was beyond the scope of this study to test differential experiences and expressions of specific risk factors across cultures. While accurate actuarial estimates of recidivism risk are an important basis for effective case management, it is critical that these are accompanied by a culturally informed understanding of the unique needs, protective factors, and situational and community influences that have a bearing on individual outcomes.
Footnotes
Author’s Note
This study was completed while all authors were employed by Corrective Services New South Wales.
Acknowledgments
We would like to acknowledge Simon Corben for his extensive support with data management for this study; Sam Ardasinski for his expert review and feedback on the draft manuscript; and Jermaine Haymond and other members of the Corrective Services New South Wales Aboriginal Strategy and Policy Unit for their consultancy on cultural considerations for the project.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Appendix
Note. CI = confidence interval.
Static-99R score
Observed
Estimated
n reoffend
n total
% Reoffend
%
CI lower, %
CI upper, %
Relative risk
−3
0
50
0.0
0.6
0.3
1.1
.11
−2
1
67
1.5
0.9
0.5
1.5
.16
−1
1
124
0.8
1.2
0.8
2.0
.23
0
4
202
2.0
1.8
1.2
2.7
.34
1
3
193
1.6
2.6
1.9
3.6
.49
2
8
221
3.6
3.7
2.9
4.9
.70
3
15
262
5.7
5.3
4.3
6.6
1.00
4
22
278
7.9
7.6
6.4
9.0
1.42
5
24
209
11.5
10.7
9.2
12.3
2.00
6
24
140
17.1
14.8
12.8
17.1
2.77
7
21
118
17.8
20.2
17.1
23.7
3.78
8
12
55
21.8
26.9
22.2
32.2
5.03
9
11
28
39.3
34.8
28.2
42.2
6.52
10
3
7
42.9
43.8
34.9
53.1
8.19
11
2
4
50.0
53.1
42.3
63.7
9.93
Note. CI = confidence interval.
Static-99R score
Aboriginal
Non-Aboriginal
Total sample
E/O
95% CI
E/O
95% CI
E/O
95% CI
−3
—
—
—
—
—
−2
—
—
.74
[.10–5.23]
.74
[.10–5.23]
−1
—
—
1.82
[.26–12.95]
1.98
[.28–14.09]
0
—
—
1.06
[.40–2.81]
1.11
[.42–2.96]
1
—
—
1.8
[.58–5.59]
2.06
[.66–6.38]
2
.57
[.18–1.76]
1.69
[.70–4.07]
1.27
[.64–2.54]
3
.89
[.34–2.38]
1.22
[.68–2.21]
1.14
[.68–1.88]
4
1.54
[.58–4.11]
1.08
[.68–1.71]
1.16
[.77–1.77]
5
1.08
[.51–2.26]
1.13
[.70–1.82]
1.11
[.75–1.66]
6
1.5
[.67–3.33]
.87
[.55–1.38]
1.03
[.69–1.53]
7
2.09
[.94–4.66]
1.03
[.62–1.70]
1.33
[.87–2.04]
8
1.63
[.61–4.34]
1.32
[.66–2.63]
1.42
[.81–2.50]
9
.72
[.33–1.61]
1.34
[.56–3.23]
1.01
[.56–1.82]
10
1.46
[.21–10.37]
.97
[.24–3.89]
1.14
[.37–3.52]
11
—
—
—
—
—
—
