Abstract
The Ontario Domestic Assault Risk Assessment (ODARA) is an actuarial risk assessment tool for intimate partner violence (IPV) recidivism. Despite its international use, there is no published validation of the ODARA’s predictive accuracy in a U.S. sample. We studied 356 men in New York police records of IPV against a female partner to examine the ODARA’s predictive accuracy for IPV recidivism (base rate 35%), non-IPV violent recidivism (against a nonpartner; 16%), any violent recidivism (49%), and nonviolent recidivism (50%), in a fixed 2-year follow-up. Using 11 scorable ODARA items, area under the curve values were significant and ranged from .590 to .630, indicating small to medium effects. Expected/Observed indices revealed poor calibration with 2-year IPV recidivism rates in ODARA construction and cross-validation samples. Findings support the generalization of the ODARA’s predictive accuracy in different populations and outcomes, but a need for new norm development for higher risk populations.
Cases of assault or other violence against an intimate partner (intimate partner violence, or IPV) have grown to represent a substantial proportion of the criminal justice system caseload in the United States and Canada (Hilton & Ennis, 2020). In Canada, IPV accounts for more than one in four self-reported victims of violent offending (e.g., Burczycka & Conroy, 2017) and over half of all completed adult court cases of violent offenses (Beaupré, 2015). IPV represents the largest single category of calls to police in both Canada and the United States (e.g., Burczycka & Conroy, 2017; Klein, 2009). Criminal justice responses to IPV have largely taken a rehabilitative focus including specialized domestic violence courts and court-mandated IPV intervention programs as a condition of diversion or probation (e.g., Barner & Carney, 2011; Tutty & Babins-Wagner, 2019). Despite this increased attention to IPV in the criminal justice and rehabilitation systems, IPV recidivism remains a concern. For example, rearrest rates in the months after arrest for an IPV offense or treatment completion have been reported in the range of 20% to 30% (e.g., Buzawa et al., 2012; Hilton & Eke, 2017). Evaluations of arrest, prosecution, and treatment for IPV show inconsistent results, and there is evidence that their effects on IPV recidivism may differ depending on the characteristics of the persons being prosecuted (e.g., Maxwell & Garner, 2012) or given treatment (e.g., Cantos et al., 2019), including assessed risk of violent recidivism (e.g., Cox & Rivolta, 2019; Lila et al., 2019).
Specialized risk assessment tools have been developed to help police identify individuals most at risk of committing another IPV offense (e.g., Hilton et al., 2004; Jung & Buro, 2017; Kropp et al., 2010; McEwan et al., 2018; Turner et al., 2019) assessing characteristics such as substance use, previous IPV offenses, other criminal history, use of threats or weapons, and other factors associated with recidivism. Policing services have widely adopted these tools as a triage device to manage the front-line policing IPV caseload (e.g., Ennis et al., 2015; Walton, 2019). The use of empirically validated risk assessment tools can help prioritize handling of the most risky cases, to better manage risk and protect victims. This article describes a validation study of one such tool.
The Validity of IPV Risk Assessment Tools
Violence risk assessment tools are designed to identify an individual’s risk of recidivism. Monahan and Skeem (2014) describe violence risk assessment approaches in terms of how much they structure the evaluation of risk. Arranged in increasing structure, they range from clinical judgment, to a standard list that identifies risk factors, to tools that identify and guide the measurement of risk factors, to those that also provide a way to combine risk factors, and finally to those that produce a final risk estimate. Structured methods for assessing the risk of IPV victimization first appeared in the late 1980s (Campbell, 1986), followed by tools for assessing recidivism risk among individuals who perpetrate IPV (e.g., Kropp et al., 1999). By the late 2010s, researchers identified 39 different tools that had been tested for their ability to predict IPV or its recidivism (van der Put et al., 2019), including up to 15 tools or variants of tools designed and tested specifically for assessing risk of IPV reassault or intimate partner homicide (Graham et al., 2021). The most structured tools use actuarial methods, whereby the assessor scores risk factors typically derived from variables that predicted IPV outcomes in follow-up studies of individuals with a history of IPV and then reports risk based on an actuarial data table (e.g., Hilton, 2021).
Such tools are typically validated in follow-up studies evaluating the risk assessment tools before applying them to new populations. Discrimination is an aspect of predictive validity that shows how well a tool predicts recidivism in a given sample. The ability to accurately discriminate between recidivists and nonrecidivists is typically examined in follow-up studies evaluating the ability of the risk assessment result to discriminate between individuals who do and do not recidivate following assessment. The size of the predictive effect is measured using the receiver operating characteristic (ROC) area under the curve (AUC) statistic (e.g., Helmus & Babchishin, 2017). AUC values range from 0 to 1, and values upward of .556, .639, and .714 correspond to Cohen’s d of .2 (small effect), .5 (medium effect), and .8 (large effect), respectively (Rice & Harris, 2005).
Researchers have conducted several meta-analyses of the ability of IPV risk assessment tools to distinguish recidivists from nonrecidivists. In one meta-analytic study, predictive effects ranged from AUCs of .51 up to .86 (Graham et al., 2021). Tools specifically designed for assessing the risk of IPV did not predict recidivism significantly better than other, more general violence risk assessment instruments overall (van der Put et al., 2019); however, among IPV risk assessment tools, some tool characteristics were associated with improved performance. For example, actuarial tools have been found to perform better than nonactuarial tools (van der Put et al., 2019). In particular, the Ontario Domestic Assault Risk Assessment (ODARA; Hilton et al., 2004, 2010) is an actuarial tool that has performed well when compared with other tools validated in follow-up studies (e.g., Messing & Thaller, 2013; van der Put et al., 2019). The ODARA is a 13-item actuarial tool for assessing the risk of IPV recidivism (Hilton, 2021; Hilton et al., 2004). The ODARA score is related to an actuarial table that indicates the risk associated with each score, in terms of probability (likely of IPV recidivism in a given time frame) and percentile rank (how an individual compares with others in terms of IPV risk). Several studies have tested the ODARA’s predictive accuracy, with medium to large effect sizes for discrimination, including AUCs ranging from .64 to .77 (Graham et al., 2021). A growing literature demonstrates the utility of the ODARA in IPV risk assessment and management. For instance, men’s ODARA scores are correlated with the number of criminogenic treatment needs they have (Hilton & Radatz, 2021) and can inform decisions about their priority for IPV treatment (Radatz & Hilton, 2019).
Another aspect of predictive validity that is gaining attention in violence risk assessment research is calibration. Calibration is a measure of reliability between predicted and observed rates of recidivism, or correspondence in recidivism rates between samples (e.g., Hanson, 2017). It is important to test calibration to reveal how well the recidivism rates observed in a validation study correspond to (or differ from) published rates that are associated with each score on the risk assessment tool. Calibration is most easily measured using the Expected/Observed (E/O) index that involves dividing the expected by the observed number of recidivists (as explained in detail by Hanson, 2017; Helmus & Babchishin, 2017). An E/O index less than 1 indicates the extent to which the risk assessment tool underestimates recidivism rates, and values above 1 indicate the extent to which it overestimates recidivism rates. Few studies have tested the calibration of IPV risk assessment tools. Among 185 men who came to the attention of police in Switzerland for assaulting their intimate partner, IPV recidivism rates in a 5-year follow-up calibrated poorly with the published norms, especially for higher ODARA scores (Gerth et al., 2017). Although the ODARA was designed to identify risk of IPV recidivism among men with female partners, Hilton et al. (2014) tested its calibration properties in a 9-year follow-up of 30 women incarcerated in Canada with a police record for violence against an intimate partner; the ODARA predicted IPV recidivism with an AUC of .724; however, a statistically significant E/O index revealed a poor fit with the published norms for men, indicating that recalibration would be required before using the ODARA to estimate recidivism probabilities for women (Hilton et al., 2014). Further tests of the ODARA’s calibration properties are needed to build on this emerging research.
Importance of Replicating Risk Assessment in Different Populations
When new risk assessment tools are created, cross-validation in samples drawn from similar sources and population can show whether results were reproducible. Most IPV risk assessment validation studies have been conducted in the United States, Canada, and Europe (Graham et al., 2021; van der Put et al., 2019). These include tests of the Danger Assessment (e.g., Campbell et al., 2017), an interview-based tool to help women affected by IPV assess their risk of lethal assault, and the Domestic Violence Screening Instrument–Revised (DVSI-R; for example, Stansfield & Williams, 2014), a tool designed to be scored from criminal justice data by non-law enforcement personnel following an individual’s arrest for IPV. However, there are limited U.S. validation studies of the ODARA (Hilton, 2021), despite legislation endorsing the ODARA and similar police-scored tools by law enforcement services in a number of states (e.g., Walton, 2019).
It is important to demonstrate replicability by validating tools in quite different samples. Generalizability to different populations, such as those in legal jurisdictions outside of the originating regions, should be demonstrated before the tool could be used with confidence in these new contexts. Different populations may have different individual characteristics, different criminal justice policies and processes, and different community supports for justice-involved individuals, among other variables, that may affect the scorability, accuracy, or applicability of a risk assessment tool. Potential differences across individuals, systems, and resources mean that it cannot be assumed that a tool developed and validated for one population will necessarily be valid in another population. Replication of risk assessment research in new samples provides evidence of not only generalizability of the predictors of criminal recidivism but also applicability of the tool itself.
A particular concern motivating this study is that offense data within the United States are often kept on separate systems within different jurisdictions, thereby rendering it more arduous and time-intensive to compile a complete record of an individual’s criminal history. This challenge may limit the ODARA’s scorability and potentially its discriminative accuracy and calibration with published actuarial norms. We located only one study of the ODARA’s predictive validity in a U.S. sample. Ulmer (2015) tested the ODARA as scored by police officers for 281 men who assaulted a female partner. In a 2-year follow-up, 50% of men were rearrested for any offense; however, the relationship of the victim of any subsequent offense could not be determined. The ODARA significantly predicted any subsequent person-related offense (AUC = .57), but not general criminal recidivism (AUC = .54).
Assessing Risk of Non-IPV Recidivism in Samples Identified for IPV
Other researchers have shown that men apprehended for IPV also exhibit a notable base rate of other violent and nonviolent criminal offenses in studies of official records or self-reports (e.g., Hilton & Eke, 2016; Piquero et al., 2014; Richards & Gillespie, 2019). Furthermore, men who perpetrate IPV and also commit general, nonfamilial violence may be at increased risk of perpetrating repeated and severe IPV (e.g., Richards et al., 2013; Verbruggen et al., 2020). This evidence for the nonspecialization of criminal careers suggests that assessment of these individuals should not focus only on IPV outcomes. To support effective and efficient decision-making, there is value in investigating whether risk assessment tools are versatile in their ability to predict IPV and non-IPV recidivism. Furthermore, when researchers determine recidivism from criminal records, a full description of the offense is not always available, and it is not always possible to know whether the victim of the new offense is an intimate partner. This challenge raises the importance of validating IPV risk assessment tools for broader criminal outcomes and IPV recidivism. Thus, it is valuable to evaluate non-IPV and IPV recidivism in validation studies of IPV risk assessment tools
In addition to Ulmer’s work, the ODARA has been found to predict non-IPV outcomes in several published studies. Hilton and her colleagues (2010) reported small to medium effect sizes for violent recidivism and any criminal recidivism among 150 men in an 8-year follow-up after release from a Canadian correctional treatment institution. Rettenberger and Eher (2013) reported similar findings among 66 men convicted of sex offenses against their intimate partners released from prison in Austria. Seewald and her colleagues (2017) reported a large predictive effect for violent recidivism among 30 men following a court-mandated assessment for assaulting or threatening intimate partners. Hilton and Eke (2016) conducted a 7.5-year follow-up of 93 men coming to the attention of police for IPV in Canada and reported medium to large effects for a number of criminal outcomes, including violent, sexual, and nonviolent offenses; threatening; stalking; and conditional release failures. In other studies of men identified as IPV perpetrators by police, the ODARA predicted new police reports of general recidivism with a small effect size in a 5-year follow-up of 186 men in Germany (Sentürk et al., 2016), and Jung and Buro (2017) reported medium to large predictive effects for violent and general recidivism, respectively, in a sample of 246 men followed for an average of 3 years in Canada. Thus, there is evidence that the ODARA predicts other criminal outcomes at least as well as the IPV recidivism that it was designed to assess. In this study, we sought to expand this work by examining the ODARA’s predictive validity for IPV and non-IPV recidivism in a U.S. sample of men who assaulted their female partner.
Present Study
Our review of the current research on IPV risk assessment has identified a gap in the literature in validations of police-focused tools in U.S. samples. We have also identified a need for IPV risk assessment validation studies to include both IPV and non-IPV outcomes, and to use statistics most suited to the study design (e.g., AUC for fixed follow-ups). In this study, we examined the distribution of ODARA scores and the ODARA’s predictive validity (discrimination and calibration) in a U.S. sample. We sought to answer two main research questions:
Method
This study was reviewed and approved by the Institutional Review Board of Niagara University (IRB number: 2018-045). Authorization to conduct the study was provided by each of the participating criminal justice agencies. Data for this study were collected from three criminal justice agencies in Niagara County of New York State, including the county sheriff’s office, the county’s largest city police department (Niagara Falls), and the county’s probation department.
Sample
Upon review of all 528 domestic incident reports (DIRs) completed in 2015 by the city police department, we deleted three incidents that were missing more than three ODARA items. The remaining 525 incidents pertained to 448 different individuals identified as perpetrating IPV. Because the ODARA was designed to evaluate men’s risk of reassaulting a female partner, we excluded 87 women who assaulted their male or female partner and five men who assaulted a male partner. Our final sample comprised 356 men who committed at least one physical assault against a female partner in accordance with the ODARA guidelines for an index assault (Hilton, 2021); that is, any act of violence involving physical contact or a credible threat of death made with a weapon in the presence of the victim.
The 356 men had a mean age of 34.13 years (SD = 11.51), and the women they assaulted had a mean age of 31.58 years (SD = 10.35). The majority of men were identified in the police records as Black (n = 207, 58%), followed by White (n = 109, 31%), Indigenous (n = 6, 2%), and Asian (n = 2, 1%), with the remaining classified as “other” or “unknown” (n = 131, 9%). The women were classified as White (n = 177, 50%), Black (n = 123, 35%), Indigenous (n = 11, 3%), and Asian (n = 1, <1%), with the remaining classified as “other” or “unknown” (n = 11, 12%). Half were living together at the time of the index assault (n = 179, 50%), and the man was present when police attended the scene in one third of cases (n = 118, 33%). Some index assaults included strangulation (n = 67, 19%), weapon use (n = 59, 17%; including gun in n = 3, 1%), and sexual assault (n = 4, 1%). All but one index assault resulted in criminal charges.
The ODARA includes 13 items that were the strongest, unique predictors of IPV recidivism in an average 5-year follow-up of police and criminal records in a sample of men with a police record for assaulting their female marital or cohabiting partner (Hilton, 2021; Hilton et al., 2004). The items pertain to events up to and including the index assault used for scoring the ODARA and include criminal history of the man being assessed (preindex domestic assault, preindex nondomestic assault, preindex custodial sentence, failure on preindex conditional release), his other antisocial behavior (substance abuse, preindex violence against others), his relationship with the partner he assaulted (more than one child altogether, her biological child from a previous partner, assault when pregnant), the index assault (threat, confinement), and the partner’s situation (concern about future assaults, barriers to support). Each item is scored “1” if present, “0” if absent, and “missing” if documentation suggests the item might be present but the available information is unclear or ambiguous. The ODARA can be scored with up to five missing items. In this study, two items, substance abuse and barriers to support, could not be scored in our entire data set. Therefore, we included only cases having fewer than three additional missing ODARA items that could not be scored from information within the documentation available to us (see the appendix for details of each item and the documentation we used to score it).
Procedure
A graduate research assistant, supervised by the first author, extracted all data from police and probation records in 2018–2019 and completed the coding. A coding form with detailed variable coding instructions was created. The ODARA was scored retrospectively from DIRs logged by the city police pertaining to the index assault, which occurred between January and December 2015. For each individual, the first IPV offense in 2015 was considered the index assault and subsequent offenses were counted as recidivism. Recidivism information was scored from incidents recorded in police, sheriff’s, and probation records within exactly 2 years of the index assault and was coded unmasked to the index assault and ODARA score. Time at risk was not adjusted for time spent in custody because this information was not available to us. Due to security restrictions and time-limited access to the data sources, especially systems with sensitive victim information, only one coder extracted and coded all the data, but coding was routinely discussed with the first author, and any coding questions were flagged for review and resolved by consensus with the first author and the second author as needed. The coder and first author completed the online ODARA training program at https://odara.waypointcentre.ca; the coder achieved an interrater reliability intraclass correlation coefficient, single measures, absolute agreement (ICCA1; McGraw & Wong, 1996) of .81 and the first author’s intraclass correlation coefficient (ICC) was .94. The first author had access only to the incident report for the index assault and retrospectively coded the three ODARA items codable from this information alone in 49 cases (14% of the current sample), and reliability with the coder was as follows: confinement, ICC = .64; threat, ICC = .73; victim concern, ICC = .65; and a three-item sum, ICC = .75.
ODARA Variables
Eleven variables were created to determine a total 11-item ODARA score, and each item that was coded as present (i.e., 1 = yes) was added to create the score. Item scoring aligned with the scoring criteria in Hilton (2021). Each ODARA variable and its source information is provided in the appendix. Preindex Domestic Assault signified whether or not a man had a previous domestic incident of assault within his police record against a current or former partner. Preindex Nondomestic Assault denoted whether or not a man had a record of previously assaulting an individual who was not a partner or a partner’s child. Preindex Custodial Sentence of 30 Days or More indicated whether or not a man had a prior custodial sentence of at least 30 days. Failure on Preindex Conditional Release signified whether or not a man had breached a condition of any conditional release that was ordered prior to the index offense. Threat to harm or kill denoted he had threatened to harm or kill his partner or someone else during the index offense. Confinement of the Partner at the Index Assault indicated a man had physically prevented or attempted to prevent his partner from leaving the scene of the incident. Victim Concern represented whether or not a partner felt concerned that he would assault her or her children in the future. More than One Child Altogether signified whether or not the individual and his partner had more than one child altogether (i.e., biological or adopted child of either partner). Victim’s Biological Child from a Previous Partner denoted whether or not she had a biological child with someone other than the man being assessed at the time of the index assault. Preindex Violence Against Others represented whether a man had committed a prior violent incident against a nondomestic victim. Assault on Victim when Pregnant indicated whether the man had ever physically assaulted the victim while she was pregnant.
Two ODARA variables, substance abuse and barriers to victim support, were not attainable through the criminal justice agencies. The total ODARA raw score indicated the total number of ODARA variables scored 1 (i.e., present). The possible total score ranged from 0 to 11. We did not prorate for missing information.
Dependent Variables
This study contained four recidivism outcomes, all coded as dichotomous variables (0 = no, 1 = yes). IPV recidivism indicated that a report of an incident that involved a physical assault or threat with a weapon against an intimate partner was filed against the individual within 2 years of his index assault. We calculated the time to IPV recidivism from the date of the index assault to the date of the first postindex offense involving IPV, regardless of any other type of recidivism in the intervening time. Non–intimate partner violence (non-IPV) violent recidivism denoted that the individual had an incident reported to the police wherein he had committed a violent act against a person who was not his intimate partner, within 2 years of his index assault. Any violent recidivism combined the first two measures and represented that an individual had an IPV and/or a non-IPV violent incident reported to police within the 2 years following his index assault. Nonviolent recidivism indicated a police report of a nonviolent criminal act (e.g., trespassing, drug possession) also within a 2-year follow-up period after his index offense. All four recidivism measures represented what was available within the local police department jurisdiction and excluded state and federal databases.
Analytic Strategy
We conducted descriptive analyses to explore the sample characteristics, including characteristics of the index offenses, distribution of ODARA scores, and prevalence of recidivism. We tested the ODARA’s predictive validity for IPV recidivism in two ways. First, we tested discriminative accuracy using ROC analysis, yielding an AUC statistic and associated 95% confidence interval (CI). The AUC is most appropriate for data using a fixed follow-up time, compared with a ragged follow-up in which individuals are followed for varying lengths of time (e.g., Flores et al., 2017; Helmus & Babchishin, 2017). A 95% CI that does not include .50 is considered significantly different from chance prediction. We also examined prediction of time to recidivism using bivariate correlation and Cox proportional hazards regression.
Second, we tested the 11-item modified ODARA’s calibration properties using the E/O index, another statistic that is suited to fixed follow-up data (Hanson, 2017). An E/O index that is not statistically significantly different from 1 indicates good calibration. We accessed the data sets used for the published ODARA norms reported by Hilton (2021) and obtained 2-year recidivism rates among the 1,350 men followed for at least 2 years (counting those who recidivated after 2 years as nonrecidivists).
Finally, we tested the ODARA’s ability to accurately discriminate recidivists from nonrecidivists for other offense outcomes. Specifically, we examined postindex non-IPV violence, any violence, and nonviolent offending, using ROC analyses.
Results
ODARA and Recidivism Information
The mean ODARA score was 2.50 (SD = 1.76), with a range of 0 to 7 out of 11 scorable items. The distribution of scores by ODARA category is shown in Table 1. Table 2 shows the prevalence of each item and the number of cases with missing information for each item coded in this study. Prior custodial sentence, more than one child altogether, and victim’s biological child from a previous partner were missing in 2% to 71% of cases. In the fixed 2-year follow-up, 124 (35%) men met the criteria for IPV recidivism. The mean time until IPV recidivism or end of follow-up was 558 days (SD = 268), with a range of 0 to 731 days. Among men who recidivated, the mean time until IPV recidivism was 240 days (SD = 219), range 0 to 721 days.
Distribution of ODARA Scores, IPV Recidivism, and E/O Calibration Index (N = 356)
Note. IPV recidivism is defined as a police report as a suspect of a violent offense against an intimate partner, regardless of criminal charges, within two years of the index assault. Published norms derived from Hilton (2021). ODARA = Ontario Domestic Assault Risk Assessment; IPV = intimate partner violence; E/O = expected/observed; CI = confidence interval.
Prevalence of ODARA Items and Predictive Accuracy for IPV Recidivism (N = 356)
Note. Percentages are based on total N of 356. Missing cases are those for which the presence or absence of the item could not be determined because the available information was incomplete or unclear. IPV recidivism is defined as having a police report as a suspect of the offense, regardless of criminal charges, within 2 years of the index assault. ODARA = Ontario Domestic Assault Risk Assessment; AUC = area under the curve; CI = confidence interval; IPV = intimate partner violence.
Prediction of IPV Recidivism
The AUC of the 11-item ODARA score for IPV recidivism was .587 (SE = .031), 95% CI = [.527, .648], equivalent to a small effect size. AUCs for individual items are shown in Table 2 and illustrate that Prior Domestic Incident had the largest and only significant item effect.
The 11-item ODARA score was correlated with shorter time to recidivism or end of follow-up, r = -.11, p = .042, but not with time to recidivism among men who recidivated, r = .06, p = .532. The Cox regression model was significant, χ2 = 6.77, p = .009, and the ODARA score was a significant contributor with a hazard ratio of 1.14, 95% CI = [1.033, 1.252]. Thus, for every 1-point increase in the ODARA score, the rate of IPV recidivism at a given time point was increased by just over 1. Those below the median ODARA score of 2 had a mean survival time of 593 days (SE = 22), 95% CI = [549, 638], and those at or above the median had a significantly shorter survival time of 539 days (SE = 18), 95% CI = [503, 575] (where means falling outside each others’ 95% CIs are considered statistically different; e.g., Cumming & Finch, 2005).
Table 1 shows the observed IPV recidivism rates as a function of ODARA risk category, along with the logistic regression estimated rates used in the calibration analysis. The logistic regression Hosmer–Lemeshow goodness-of-fit test (examining the statistical assumption that the data follow a logistic distribution) was nonsignificant, χ2 = 6.22 (df = 5), p = .285, indicating appropriate analysis, and as expected, the overall regression model including the ODARA was significant, χ2 = 7.26 (df = 1), p = .007, 95% CI = [1.05, 1.35]; constant B = −1.07 (SE = 0.23), p < .001; ODARA B = 0.17 (SE = 0.06), p = .007. E/O indices were nonsignificant for all risk categories.
Table 1 shows the calibration analysis with the ODARA construction and cross-validation samples based in Ontario, Canada. The subset of 1,350 men in the construction and cross-validation samples with a 2-year follow-up had a mean total ODARA score of 2.88 (SD = 2.02). Their 2-year recidivism rates significantly underestimated the observed recidivism rate overall and for all but the two highest risk categories.
Prevalence and Prediction of Violent and Nonviolent Outcomes
In total, 242 men (68%) had any new offense reported. Table 3 shows the prevalence of each postindex outcome and the ODARA’s prediction (discrimination) in terms of the AUC. All tests revealed small but statistically significant predictive effects. Of the 11 items, four items—preindex domestic assault, preindex nondomestic assault, preindex custodial sentence, and preindex violence against others—significantly predicted all of these nondomestic outcomes. For non-IPV violence, item prediction effects ranged from AUC = .592 to .656. For any violence, the effects ranged from AUC = .570 to .602. For nonviolent offenses, the effects ranged from AUC = .584 to .619. For any offense, the effects ranged from AUC = .589 to .636.
Predictive Accuracy of the 11-Item ODARA Score for IPV Recidivism and Other Postindex Outcomes (N = 356)
Note. All outcomes are defined as police reports as a suspect of the offense, regardless of criminal charges, within two years of the index assault. ODARA = Ontario Domestic Assault Risk Assessment; AUC = area under the curve; CI = confidence interval; IPV = intimate partner violence.
Discussion
This archival study of 356 men with a police report of assault against their female intimate partner found that an 11-item modification of the ODARA predicted IPV recidivism in a fixed 2-year follow-up. To our knowledge, this is the first study conducted to test the ODARA’s predictive accuracy for IPV recidivism in a U.S. sample of men with a police record of IPV. There was a small but significant predictive effect, demonstrating the ODARA’s ability to accurately discriminate between recidivists and nonrecidivists in this sample. Similar results were found for the prediction of other criminal outcomes in the 2 years after the IPV index date. The highest predictive values were observed for nonviolent offending and any criminal re-offending, confirming the ODARA’s generalizability to the prediction of non-IPV outcomes among men with a history of IPV. The occurrence of nonviolent recidivism, and violent recidivism against nonintimate partners, is consistent with previous research illustrating that men with a police record of IPV often do not restrict their criminal behavior to violence against their partners (e.g., Hilton & Eke, 2016; Piquero et al., 2014; Richards & Gillespie, 2019).
Our calibration tests revealed substantial difference between observed IPV recidivism and recidivism rates for the same follow-up time in the ODARA construction and cross-validation samples that were used for the published norms. This result is consistent with previous studies that found poor calibration between the published norms and IPV recidivism among men who assaulted or threatened their partner (e.g., Gerth et al., 2017). In contrast to the overestimation of risk found in these previous studies, however, we found that the existing data significantly underestimated recidivism rates in our sample. The mean ODARA score in our sample was similar to that reported in the ODARA normative samples (Hilton et al., 2010), despite using an 11-item modification of the ODARA, such that the poor calibration may be partly attributable to an underestimation of the ODARA score in this study. In addition, our recidivism base rate was substantially higher than in the normative sample.
The 11-item ODARA’s predictive effect size for IPV recidivism was smaller than the average and range reported for the ODARA total score in previous research, according to systematic and meta-analytic reviews (i.e., .587 in our study vs. 0.64–0.77 in Graham et al., 2021; 0.666 in Messing & Thaller, 2013; 0.690 in van der Put et al., 2019). One explanation may be that the two items (i.e., substance abuse and barriers to victim support) that are lacking in our data set contribute important predictive power to the ODARA. Previously published studies of IPV recidivism by men identified in police records have reported mean ODARA scores of 5 to 6 and recidivism base rates averaging 6% to 16% per 2 years followed in fixed and variable follow-ups of police-reported incidents or charges within 1 to 8 years (e.g., Gerth et al., 2017; Hilton & Eke, 2016; Jung & Buro, 2017; Sentürk et al., 2016). This pattern contrasts with our base rate of 35% IPV recidivism despite a mean ODARA score of 2.50. However, the missing two items do not completely explain this difference and our relatively high recidivism rate. Although AUCs are less affected by base rates than other statistics are (e.g., Helmus & Babchishin, 2017), Hilton and colleagues (2008) found that the ODARA’s discriminative accuracy was reduced in a higher risk sample (with 49% IPV recidivism), and additional information pertaining to antisocial behavior and personality improved prediction. Furthermore, Olver and Jung (2017) reported that scores on purportedly dynamic items (including substance use, among others) incrementally predicted IPV recidivism over the ODARA score. Therefore, measuring additional, potentially changeable, risk factors may be particularly important to assess risk that is not captured by the ODARA.
Another possible explanation might lie in differences between the populations studied in Canada, Europe, and Australia compared with our U.S. sample, and their differing social contexts, which may not be sufficiently captured in the ODARA items. In this study, 58% of men were identified as Black and 31% White, compared with a predominantly White sample in research conducted elsewhere (e.g., Jung & Buro, 2017). To the best of our knowledge, the ODARA has not been previously tested in a sample having a majority of non-White individuals. Caution against applying risk assessment tools to populations for whom they have not been intentionally validated demands examination of the ODARA’s predictive accuracy in diverse samples. Because other ODARA researchers have not always reported the racial composition of their sample, comparisons of effect sizes as a function of this characteristic are not yet possible. Future studies examining the effect of race and other sample characteristics on the accuracy of IPV risk assessment are still needed.
Limitations and Implications
This study is not without limitations. As only one coder had access to all the sources of information, limited interrater reliability testing was possible; however, the coder and first author who consulted on the coding decisions both achieved interrater reliability coefficients above .80 in the online ODARA training program test. It is also important to note that the coding of the ODARA variables within the data set was conducted and overseen by researchers. Because the ODARA was not scored by police officers, the results cannot be assumed to generalize to the validity of the ODARA when it is scored by police officers in the field. As Svalin and Levander (2020) have noted, there is limited research that examines the predictive accuracy of practitioners’ use of IPV risk assessments; therefore, future research should examine the validity of the ODARA when scored by police officers and other practitioners.
The available data within the criminal justice records constrained or prevented the use of some important key measures. We coded recidivism data from the records available to three criminal justice agencies within one county, as we did not have access to state or federal databases. Therefore, recidivism occurring outside this jurisdiction would have been missed, which could result in underestimated rates of recidivism and predictive effect sizes.
We calculated time to recidivism based on the number of days between the index assault and the next subsequent DIR comprising an IPV offense, without subtracting time spent in jail or other custody, as we did not have access to this information. IPV cases are more likely to result in noncustodial or shorter custodial sentences than non-IPV cases (e.g., Beaupré, 2015; Durose et al., 2005). Olver and Jung (2017) reported omitting only 11 cases from their sample of 300 individuals charged with IPV offenses in Canada because of either no follow-up due to custody or other reasons (e.g., unavailable criminal records, individual died), suggesting that there may be limited impact of custody on our time at risk estimates. However, custody patterns may differ in our U.S. jurisdiction. It is estimated that 16% of individuals arrested for IPV or other family violence offenses in the United States receive custodial sentences, according to data from a survey of exposed adolescents (Hamby et al., 2015), with criminal justice data indicating a wide range of incarceration rates among individuals convicted for domestic offenses (e.g., Klein, 2009). Therefore, a substantial minority of our sample may have spent time in custody during our 2-year follow-up. Consequently, our time to recidivism data may underestimate actual time at risk disproportionately for a subgroup of our sample, likely those individuals who committed more serious index offenses and perhaps those with higher ODARA scores, which could result in an underestimated size of the association between the ODARA and time to recidivism. Follow-up studies that account for time spent in custody are an important next step for domestic violence risk assessment validation research.
Information needed to score the two items pertaining to children was missing in a substantial minority of cases. In the police reports, information about children was contained in a dichotomous (i.e., yes/no) checkbox within police reports, and as a result, we were unable to fully ascertain whether or not these responses were answered exactly as prescribed in the ODARA scoring manual. Adherence to scoring procedures is a crucial element in actuarial risk assessment, as actuarial tools use strict inclusion, exclusion, and interpretation criteria. In practice, the ODARA scoring manual recommends seeking more information where possible, or prorating when this is not possible (Hilton, 2021). We did seek more information on children by reviewing the police department’s victim assistance unit records where available. We elected not to prorate for missing information, which allowed us to examine prevalence and predictive effects of items when scored from the available information.
We note that the prediction effect, and indeed the interpretation, of a risk assessment tool is drawn from the score from the whole instrument, rather than each component item. For this reason, our validation of the ODARA is limited by its incompleteness. Two key items, substance abuse and barriers to victim support, were not captured within the data at all. Only one of the remaining 11 items (preindex domestic assault) significantly predicted IPV recidivism; three other items had predictive effects of similar magnitude, albeit not significantly so (preindex nondomestic assault, preindex custodial sentence, and violence against others). Substance use was one of the items most strongly correlated with IPV recidivism in the ODARA construction research (e.g., Hilton et al., 2004) and is a well-established predictor of IPV in other research (e.g., Cafferky et al., 2018), such that its omission may have reduced the ODARA’s predictive accuracy in this study. Furthermore, substance use is an identified criminogenic need found among men who have committed an IPV offense (e.g., Hilton & Radatz, 2018), and victim barriers are key factors needing to be addressed in safety planning with assaulted women (e.g., Hilton et al., 2008). Thus, these two items are important for risk management, and we recommended that police and other criminal justice practitioners collect and document this information to enhance not only risk assessment but also rehabilitative and preventive measures.
Finally, while this study is the first to examine the predictive accuracy of the ODARA within a U.S. sample of men who have been identified as committing an IPV offense, the present sample from one jurisdiction in New York may not be generalizable to the broader and diverse population of the United States. Future researchers should examine larger samples and explore gender and racial diversity when examining the predictive accuracy of the ODARA. Notably, if use of absolute IPV recidivism probability estimates is desired, research with larger samples is recommended to develop local norms.
We acknowledge that we faced multiple challenges that limited our study’s methodology and, therefore, restrict the confidence with which we can draw conclusions about the ODARA’s predictive validity in the United States or the differences between our sample and the ODARA construction sample. Despite these noted limitations, this study provides promising evidence of the ODARA’s association with IPV recidivism in a U.S. sample for the first time, against the backdrop of police records systems that have limited jurisdictional scope and a disadvantaged urban community experiencing relatively high rates of criminal justice involvement. This study is timely and adds valuable knowledge to the literature on IPV risk assessment in the United States, as several components of the criminal justice system are legally required to use validated tools to make informed, evidence-based decisions regarding individuals who have committed IPV (e.g., Walton, 2019).
Conclusion
Given the dearth of relevant research in the United States, the intent of this study was to examine the predictive accuracy of the ODARA among a population of men identified by police as committing an IPV offense within a U.S. sample. The ODARA was scoreable from criminal justice records available within one jurisdiction, except for two items. This study demonstrated ODARA’s discriminative accuracy in a current, diverse sample of men. Furthermore, the ODARA significantly predicted non-IPV recidivism outcomes within the U.S. sample of men, which aligns with prior validation studies of the ODARA in Canada and Europe (e.g., Olver & Jung, 2017; Seewald et al., 2017). However, the base rate of IPV recidivism was higher than expected and poorly calibrated with predicted rates based on normative data sets for most risk categories. Despite a number of methodological limitations, the current results provide initial support for criminal justice practitioners’ use of the ODARA to identify who is relatively high risk within this population and to help determine allocation of risk management resources, but practitioners should not rely on the published norms for expected IPV recidivism rates. In short, the ODARA may be an attractive assessment tool for criminal justice practitioners with limited time and resources. Further research to recalibrate the ODARA norms is advised.
Footnotes
Appendix
Sources Used to Code Odara Variables
| Variable | Source |
|---|---|
| Preindex domestic assault | Police department databases • A prior domestic incident report (DIR) is listed on file within the database with the man listed as a suspect (not necessarily charged) • Included only what is known within the police department database (i.e., a local jurisdiction check only) |
| Preindex nondomestic assault | Police department databases • A prior nondomestic violent incident report is listed on file within the police department database with the man listed as a suspect (not necessarily charged) • Included only what is known within the police department database (i.e., a local jurisdiction check only) |
| Preindex custodial sentence of 30 days or more | County sheriff’s office database • A sentence of 30 days or more is listed on file within the county sheriff’s office database • Included only what is known within the county sheriff’s database (i.e., a local jurisdiction check only) |
| Failure on preindex conditional release | County probation office database • A failure on a current or prior conditional release is indicated on file within the county probation office database • Included only what is known within the county probation office database (i.e., a local jurisdiction check only) |
| Threat to harm or kill | Police department database • An indication of a threat to harm or kill was noted in the DIR narrative |
| Confinement of the partner at index assault | Police department databases • An indication of confinement was noted in the DIR narrative |
| Victim concern | Police department databases • An indication of victim concern was noted in the DIR narrative |
| More than one child altogether | Police department databases; police department victim assistance unit records • An indication of more than one child was noted in the DIR narrative and/or in victim assistance unit records |
| Victim’s biological child from a previous partner | Police department databases; police department victim assistance unit records • An indication that the victim has a biological child from a previous partner was noted in the DIR narrative and/or within victim assistance unit records |
| Preindex violence against others | Police department databases • An indication of the man engaging in violence against others is denoted within the police department database • Included only what is known within the police department database (i.e., a local jurisdiction check only) |
| Assault on victim when pregnant | Police department databases • An affirmative check mark was made on a checkbox on the DIR that read, “Has suspect ever beaten you while you were pregnant?” |
| Substance use | Information not available |
| Barriers to victim support | Information not available |
Note. ODARA = Ontario Domestic Assault Risk Assessment; DIR = domestic incident report.
Acknowledgements
We thank the Niagara Falls Police Department, the Niagara County Sheriff’s Office, and the Niagara County Probation Department for permission to conduct this research and assistance with accessing the data. We thank Jaclyn Danvir for her research assistance. Angela Eke, Craig Rivera, and Maaike Helmus gave helpful feedback on an earlier version of this article.
N. Zoe Hilton is an author of the ODARA and declares a financial interest in a publication cited in this manuscript. Dana L. Radatz has no relevant financial interest or affiliations with any commercial interests related to the subjects discussed within this article. Dana L. Radatz received grant funding from Niagara University’s Research Council to support this work.
