Abstract
This study describes the rationale, development, and validation of the Victoria Police Screening Assessment for Family Violence Risk (VP-SAFvR). The actuarial instrument was developed on a sample of 24,446 Australian police reports from 2013-2014. Information from each report and criminal histories of those involved were collected with 12-month follow-up, and binary logistic regression used to develop an improper predictive model. The selected VP-SAFvR cut-off score correctly identified almost three quarters of cases with further reports, while half of those without were accurately excluded. It was effective for frontline police triage decision-making, with few screened-out cases reporting further family violence, while those screened-in required additional risk assessment. Predictive validity was adequate and consistent across family relationships and demographic groups, although it was less effective in predicting future family violence reports involving same-sex couples or child perpetrators. Further evaluation in a field trial is necessary to determine the validity of the VP-SAFvR in practice.
Recognition of the widespread harms caused by intimate partner and family violence has led to increased interest in evidence-based risk assessments that can inform preventative responses. The past 3 decades have seen a number of instruments published to identify and prioritize intimate partner or family violence cases that are more likely to involve further aggression (Messing & Thaller, 2013; Nicholls, Pritchard, Reeves, & Hilterman, 2013). In police settings, such instruments are used to assist in the allocation of resources toward cases with the greatest potential for harm and away from couples or families where additional interventions are not required (Messing & Campbell, 2016). Validated risk assessment instruments have the additional benefit of providing consistent language and measurement approaches across the multitude of systems that respond to victims and perpetrators of intimate partner and family violence (Kropp, 2004).
Most risk assessment instruments applicable to family violence have been developed to assess the likelihood of future physical violence by men toward their female intimate partners. When their predictive effect has been tested in this context, existing tools tend to obtain small to moderate effect sizes, with the Ontario Domestic Assault Risk Assessment (ODARA; Hilton et al., 2004), Spousal Assault Risk Assessment (SARA; Kropp, Hart, Webster, & Eaves, 1995), and Brief-Spousal Assault Form for the Evaluation of Risk (B-SAFER; Kropp, Hart, & Belfrage, 2010) performing most strongly, and the ODARA obtaining large effect sizes in some studies (Belfrage & Strand, 2012; Gerth, Rossegger, Bauch, & Endrass, 2017; Hilton & Eke, 2016; Hilton, Popham, Lang, & Harris, 2014; Lauria, McEwan, Luebbers, Simmons, & Ogloff, 2017; Loinaz, 2014; McEwan, Bateson, & Strand, 2017; Nicholls et al., 2013; Rettenberger & Eher, 2013; Storey, Kropp, Hart, Belfrage, & Strand, 2014; Williams & Stansfield, 2017). This body of literature indicates that there are a handful of Intimate Partner Violence (IPV) risk assessment instruments with sufficient research support to be used by police interested in identifying cases at increased risk of future male to female physical IPV.
However, the focus on male to female physical IPV presents a barrier to the use of these tools in jurisdictions in which family violence is defined more broadly. Over the past 3 decades, there have been substantial legislative changes in the approach to IPV and family violence in English-speaking countries (Williams, 2012). In Canada, parts of the United States and the United Kingdom, and across Australia and New Zealand, definitions of family violence have broadened considerably beyond married couples and their children to incorporate violence in a wide range of other family relationships (e.g., dating partners, female-to-male violence, same-sex partner violence, violence between children and parents, siblings, and those in other family or even caring relationships). In these jurisdictions, violence between intimate partners (regardless of perpetrator and victim sex) accounts for approximately half to two thirds of family violence reports recorded by police (Burczycka & Conroy, 2018; Millsteed & Coghlan, 2016; Stansfield & Williams, 2014; Strickland & Allen, 2017). This presents a significant problem for police agencies wishing to use evidence-based risk assessment to guide their decision-making, given the absence of a validated tool for a substantial proportion of cases.
It is currently unclear whether tools developed for the particular context of male-perpetrated physical IPV could simply be generalized to other kinds of family violence and family relationships. Tools developed for general violence risk assessment have been shown to have reasonable predictive accuracy for IPV recidivism (Nicholls et al., 2013), suggesting some level of generalization may be possible. Moreover, it has long been recognized that those who engage in IPV are more likely to also engage in violence toward other family members, meaning that the two behaviors likely share many common causes and risk factors (Finkelhor, Gelles, Hotaling, & Straus, 1983; Straus, Gelles, & Steinmetz, 1980/2017). This suggests that instruments developed for assessing risk of male-perpetrated physical IPV may well be effective for predicting other kinds of family violence.
Recent studies of the ODARA have shown that the tool can predict outcomes other than male physical assault of a female partner among men who meet ODARA inclusion criteria (men who come to police attention for assault or credible threat of harm to a female intimate partner and who have or had a marital, cohabiting, or dating relationship with the victim). For example, in a sample of 93 Canadian men, Hilton and Eke (2016) demonstrated that the ODARA could predict future stalking, sexual assault, and nonviolent offending over a 7.5 year follow-up, all with moderate to large effect sizes. Similarly, Lauria and colleagues (2017) recently demonstrated that the ODARA could predict police reports of IPV without physical assault over a 6-month follow-up in a sample of 200 Australian men, again with a large effect size. The only test of the ODARA on cases that do not meet the tool’s inclusion criteria is a small study (n = 30) of its predictive validity among Canadian female IPV perpetrators (Hilton et al., 2014). The large positive effect size suggested that the ODARA may be effective with this group. However, the very small sample size meant that this result was not significant, and the results clearly need to be replicated in a larger sample. There has been no published research on the predictive validity of the ODARA for violence in other kinds of family relationships. Similarly, the B-SAFER and Lethality Screen (LS; Messing, Campbell, Sullivan Wilson, Brown, & Patchell, 2017), both of which purport to be applicable to IPV involving female to male perpetration or same-sex IPV, have not yet been validated for these populations. At present, the evidence base for generalizing the use of an IPV-specific tool simply does not exist, meaning implementation would require a local field trial to test its predictive validity.
To date, the only instrument designed and validated for use in family violence cases more broadly is the Domestic Violence Screening Instrument–Revised (DVSI-R; Williams & Grant, 2006), which was explicitly developed for a U.S. jurisdiction with an expansive legislative definition of family violence (Williams, 2012). The DVSI-R is an actuarial risk assessment instrument, developed through identification of the optimal statistical model for predicting future family violence (in any familial relationship) in a given sample (in this case, almost 15,000 family violence cases from Connecticut; Williams & Grant, 2006). The DVSI-R has since been cross-validated in an entire population of 3,569 family violence cases presenting to Connecticut courts for family violence during 2 months in 2007 (Williams, 2012), and in a further sample from the same jurisdiction collected between 2010 and 2012 (Stansfield & Williams, 2014). It has demonstrated moderate to large predictive effects for a range of family violence outcomes (e.g., further family violence offenses or violation of protection orders) in different family relationships (e.g., child to parent violence, IPV by female perpetrators). In jurisdictions where agencies must respond to a wide range of family violence, an instrument like the DVSI-R may be a feasible way of screening cases to identify those that require more detailed risk assessment and development of risk management plans. However, as an actuarial tool, the DVSI-R would need validation to ensure that the risk factors maintain their level of predictive validity in populations other than that in which it was developed. Unless police in the jurisdiction wishing to implement the tool already collected all DVSI-R items, validation would require a field trial to determine the instrument’s validity and reliability.
An alternative to conducting a field trial of an existing instrument such as the ODARA or DVSI-R is to develop a new actuarial instrument using existing data from the jurisdiction (essentially mimicking the development of the ODARA or DVSI-R, but using local data to develop a novel tool). Actuarial instruments are relatively easy to develop if there is an appropriate database containing information about a wide array of potential risk factors and the relevant outcome. These data can be used to build a predictive statistical model that can then be cross-validated. If shown to be as effective as existing instruments, this is a cost-effective and efficient method of introducing an evidence-based approach to family violence risk assessment, without having to conduct a potentially costly and time-consuming field trial.
In the absence of a database containing DVSI-R or ODARA risk factors that would allow local validation, this study aimed to develop and test a novel actuarial risk assessment that could be used to screen all family violence cases reported to police in Victoria, Australia. We developed the Victoria Police Screening Assessment for Family Violence Risk (VP-SAFvR, pronounced V-P-safer) for use by first responders. Because of this, it needed to be quick to administer (taking no more than 10-15 min to complete) and usable by all police officers, including those without extensive training or particular knowledge and expertise in family violence. The VP-SAFvR was intended to overcome the key disadvantage of the ODARA by being developed from and so applicable to the full range of family violence cases reported to police. Like the DVSI-R, the VP-SAFvR was intended to inform responding officers’ decision-making at the time of a report and screen cases for more comprehensive risk assessment by specialist officers. The screening aim requires correctly detecting as many cases as possible where family violence is reported again (high sensitivity), while accurately excluding as many cases as possible that have no further reports (at least moderate specificity), so as to ensure that police resources are used in the most effective way.
Method
The study used a pseudo-prospective follow-up design to develop the statistical model underlying the VP-SAFvR. Predictive models were created from police data collected by officers responding to family violence incidents during 2013 and 2014, with a 12-month follow-up of each case. Data were obtained from Victoria Police databases, the policing agency for the State of Victoria, Australia (population 6.14 million). At the time, Victoria Police had 13,525 officers responsible for all aspects of crime prevention and detection in the State of Victoria.
Sample
The VP-SAFvR was developed from a population cohort of every family violence incident (FVI) recorded by Victoria Police between July 1, 2013, and June 30, 2014 (n = 65,154). Only data from the first incident involving a specific dyad during the collection period were used to build the model (n = 44,436), with police reports of subsequent incidents between the same pair over the following 12 months counted as family violence recidivism. Victoria Police always identify a respondent and primary victim in their incident records, allowing for linkage with subsequent incidents involving the same two people using individual police identifying numbers. The population was randomly divided into a development sample of 24,440 cases and two cross-validation samples of 9,999 and 9,997 cases. Descriptive statistics about the index FVI for the total sample and each subsample are shown in Table 1.
Descriptive Statistics for the Development and Validation Samples
Note. Intimate partner was coded whenever there had been a romantic relationship between AFM and respondent, regardless of any other characteristics of the case. Victoria Police recorded family violence incidents between same-sex partners as a separate category, meaning it was impossible to determine whether these incidents involved dating or common law relationships (same-sex marriage was not legal in Victoria at the time of the study), or whether the couple was separated. FVI = family violence incident; AFM = affected family member.
191 cases were missing Respondent gender. b176 cases were missing AFM gender.
In Victoria, family violence is defined by the Family Violence Protection Act 2008, however, not all forms of family violence described in the Act constitute a criminal offense (e.g., psychological abuse has no specific corresponding specific offense). Because of this, since 2007, Victoria Police have recorded family violence incidents (FVI) based on the definition in the Act and separate to records of charges. This provides a far larger sample that would be available if only arrest or charge was used to identify the presence of family violence (e.g., in 2016-2017, police laid charges in only 44% of recorded FVIs; Victoria Police, 2017).
Whenever Victoria Police members respond to an FVI, they record characteristics of the incident, the affected family member (AFM or victim), the respondent (or perpetrator), and their relationship on a specific form, known as the L17. In 2013-2014, the L17 recorded 52 separate risk factors thought to be associated with future family violence or lethal violence, largely based on findings from the intimate partner violence research literature (see Millsteed & Coghlan, 2016, for further information about the L17). The standard use of this form for any report of family violence means that Victoria Police collect information on a wide variety of evidence-based risk factors for future family violence in every case. The L17 is used by police to make judgments about whether future family violence is “likely” or “unlikely,” although previous research has shown these judgments to have only a small predictive effect (Area Under the Receiver Operating Characteristic Curve [AUC] of 0.54-0.56; McEwan et al., 2017). While the L17 itself has not demonstrated predictive utility, in conjunction with other police data, the L17 data set could be used to develop a far more parsimonious, time-efficient, and effective family violence screening and risk assessment instrument.
Procedure
Risk Factor Identification
Data from L17 forms completed by police were extracted from the Victoria Police Law Enforcement Assistance Program (LEAP) database in November 2015. Items recorded as present on the L17 were scored “1,” and absent items were scored “0.” Item absence may reflect true absence or missing data due to the “tick if present” format of the L17, meaning that the VP-SAFvR operates on an “if the risk factor is known to be present” basis.
Additional potential risk factors were identified through review of the IPV and general offender risk assessment literatures (Andrews & Bonta, 2010; Hilton, Harris, & Rice, 2010; Williams, 2012). Those that could be extracted from police databases were included in subsequent analyses: official history of previous FVIs between the same two people; information about the respondent’s history of any FVIs (as either AFM or respondent); the respondent’s and AFM’s histories of violent offending (defined as any offense in the offense categories of Homicide, Rape, Sex [nonrape], Assault, Robbery, Abduction/Kidnap, Arson, Aggravated Burglary); their histories of breaching a court order related to family violence; their histories of breaching any other court order (e.g., bail, a good behavior undertaking, a correctional supervision order); and their histories of stalking offenses.
Definition of Family Violence Recidivism
Each case was followed up for 12 months from the date of the index family violence incident in 2013-2014. Reflecting the broad legislative definition of family violence in Victoria, and the potential that family violence recidivism may involve family members other than the original victim (e.g., children of the perpetrator), a very broad definition of recidivism was used. Family Violence Recidivism included (a) any subsequent FVI involving the same two people (regardless of their role in the subsequent incident) AND/OR (b) a subsequent FVI involving the index perpetrator and a related child.
Statistical Analyses
Univariate analyses used χ2 tests of independence to identify variables significantly associated with family violence recidivism, with odds ratios as a measure of effect size. A priori decisions were made to exclude the presence of charges as a predictor, as charging patterns could change over time, affecting the relationship between the presence of a charge and further family violence. In addition, demographic predictors such as age and gender were excluded from the analysis to maximize the applicability and predictive validity of the instrument across all types of family violence. The final model was subject to post hoc testing in samples of different ages and genders to ensure its applicability across these groups. The decision was also made to exclude negatively related predictors to ensure ease of scoring for police.
Predictor variables that were significantly associated with the outcome of interest at a univariate level were entered into a single step binary logistic regression equation to develop a multivariate predictive model (presence coded as 1). Despite the number of univariate comparisons, significance was evaluated at the .05 level to allow the greatest possible number of predictors in multivariate analyses. The model was developed iteratively through removal and addition of variables with reference to the log likelihood value and the significance of the Wald statistic, in addition to the effect of the variable on rate of correct classification and goodness of fit, until an optimal parsimonious model was determined.
The proper logistic regression model was converted into an improper model that could be used by police members as a risk assessment instrument. Developing an improper model involves assigning prescribed consistent weights to each variable in the model, making it generalizable beyond its original development sample (Dawes, 1979). In this case, each variable was weighted by rounding the value of the odds ratio identified in the logistic regression model to the nearest whole number. The total score for each case was the sum of weighted item scores.
The validity of the total improper model score as a predictor of family violence recidivism was ascertained by examining rates of recidivism at each score and by calculating a Receiver Operating Characteristic Curve (ROC). For each possible total score, the ROC curve plots the true positive rate, or sensitivity (the proportion of those who reoffended who had that score or above), against the false positive rate (proportion of those who did not reoffend scoring at or above the relevant score). The AUC represents the probability that a randomly selected recidivist would have received a higher score than a randomly selected nonrecidivist. The AUC is used as index for the overall discriminant power of the model, with a value of 0.5 indicating that the score does not discriminate between those who do and do not reoffend. Suggested interpretation guidelines indicate that a small discriminant effect size is .56, a moderate effect is .64, and a large effect is .71, equivalent to d = .80 (Rice & Harris, 2005).
As the VP-SAFvR was intended as a screening tool to help police make decisions about the need for further specialized risk assessment, a recommended threshold score for that decision was identified. For any screening instrument, the primary concern is to maximize sensitivity while maintaining sufficient specificity that a large number of cases without reoffending are not needlessly screened-in. The optimal decision threshold was determined by choosing the score at which the maximum number of reoffenders and non-reoffenders were correctly classified (prioritizing sensitivity over specificity given the triaging nature of the assessment). As the sample represented the entire population of family violence incidents and recidivism, we were also able to report the positive predictive value (the proportion of those at or above the threshold who did reoffend) and the negative predictive value (the proportion of those below the threshold who did not reoffend).
Results
Over the 12-month follow-up period, a further FVI involving the same two people was present in 7,080 cases in the development sample (29%). Adding FVIs involving the index respondent and related children increased the recidivism rate to 30.2% (7,390 cases).
Developing the Predictive Model
A total of 74 possible predictors from the L17 and the police criminal history database were subject to univariate analysis. These included variables related to the presence and nature of any previous abuse or violence in the same relationship; the nature of any charges arising out of the index FVI; behaviors present during the index incident (e.g., threats, sexual assault); victim vulnerabilities (e.g., mental illness, drug or alcohol use, isolation); characteristics of the relationship (e.g., recent separation, pregnancy, presence of children); respondent characteristics (e.g., mental illness, employment, drug or alcohol use); victim level of fear of further violence; the presence and use of weapons; and the respondent’s and AFM’s criminal history and police-reported history of family violence. Several items from the original police L17 form were collapsed into categories for analysis (e.g., “perpetrator alcohol use possible” and “perpetrator alcohol use definite” items recorded by police were collapsed into “perpetrator alcohol use present”). The binary item “Duration of family violence greater than one month” was created using a ROC curve to find the optimum duration for predicting future family violence based on an ordinal categorical duration variable recorded by police on the L17. The full list of potential predictors, their prevalence, and their univariate relationships to family violence recidivism is available from the authors on request.
Of the 74 variables analyzed, 10 had a significant negative relationship to recidivism. All were charge-related variables, or captured behavior during the index FVI that was likely to have resulted in charges (e.g., sexual assault of victim). A further 26 variables were unrelated to family violence recidivism, while 48 were significantly and positively associated and so were entered into a logistic binary regression model. Tests of multicollinearity showed no problems requiring variable exclusion, and variables were removed iteratively based on the significance of their unique contribution and the overall fit of the model. The final optimal model is presented in Table 2, Hosmer-Lemeshow goodness of fit χ2(8) = 11.74, p = .16. The items “pregnancy or recent birth” and “prior FVI in the same dyad” were assigned weights of 2, and all other items were assigned weights of 1, for a total improper model possible score of 0-16. Distribution of scores was skewed, ranging from 0 to 16 with a modal and median score of 4 (q1 = 2, q3 = 6). Table 3 shows the proportion of cases and recidivism rate at each VP-SAFvR score.
Final Binary Logistic Regression Model Predicting Family Violence Recidivism in the Development Sample
Note. OR = odds ratio; CI = confidence interval; FVI = family violence incident.
Prevalence in development sample (n = 24,116), excluding individuals missing data for family violence duration (n = 324).
VP-SAFvR Total Scores and Family Violence Recidivism by Score in the Development Sample
Note. VP-SAFvR = Victoria Police Screening Assessment for Family Violence Risk.
Scores of 10 and above were grouped due to low prevalence and together accounted for 3.75% of the total sample.
Evaluation of Optimal Threshold for Screening Cases In and Out
Figure 1 plots the sensitivity, specificity, and the positive and negative predictive values (PPV and NPV) at each possible total score on the VP-SAFvR in the development sample. A threshold score of 4 was selected as providing optimal screening capability, balancing high sensitivity with at least moderate specificity. Table 4 shows the test characteristics of this threshold score (sensitivity, specificity, PPV, and NPV), and the overall predictive validity of the instrument in the development and both cross-validation samples. Consistent results across all three samples suggest that the relevance of the risk factors and the performance of the instrument is relatively stable in the entire Victorian population of police-reported family violence.

Test Validity Values for VP-SAFvR Total Scores Predicting Further Family Violence Incidents in the Development Sample
Efficacy of Selected Screening Threshold Score of 4 in the Development and Cross-Validation Samples, and Overall Predictive Validity of Total Scores
Note. PPV = positive predictive power value; NPV = negative predictive power value; AUC = area under the receiver operating characteristic curve; CI = confidence interval.
Performance in Subsamples of Family Violence Cases
To test whether the VP-SAFvR performed consistently across family violence cases with different characteristics, subsamples were created from the total sample of 44,436 to reflect different relationship types and perpetrator demographics (results shown in Table 5). For most subsamples, the improper model underlying the VP-SAFvR performed equivalently well as in the overall development sample (as indicated by overlapping confidence intervals for the AUC values in Tables 4 and 5). For child to parent violence, the AUC remained in the moderate range at AUC = 0.63, but as the confidence intervals did not overlap with those in the development sample, it was interpreted as being significantly below the AUC for the total score in that sample. For same-sex IPV cases, and in cases where the respondent was a child (aged under 18 years), the AUC indicated only a small or small-to-moderate effect size (Rice & Harris, 2005). The selected threshold score of 4 had lower sensitivity in these two groups, so the performance of other potential thresholds was examined to determine whether they would be appropriate for screening purposes. For both groups, a threshold of 3 produced better screening of recidivists, comparable to a score of 4 in the other groups (same-sex: sensitivity = .73, specificity = .38, PPV = .37, NPV = .74; child respondent: sensitivity = .66, specificity = .51, PPV = .38, NPV = .77).
VP-SAFvR Performance in Subgroups of Family Violence Types
Note. Each subgroup contains all cases with that characteristic from the total sample (N = 44,436). All test characteristic statistics are reported for a threshold score of 4. VP-SAFvR = Victoria Police Screening Assessment for Family Violence Risk; PPV = positive predictive power value; NPV = negative predictive power value; AUC = Area under the receiver operating characteristic curve; CI = confidence interval; AFM = affected family member.
Intimate partner was coded where there had been any intimate relationship between the AFM and respondent (dating, common law, married, separated, divorced) regardless of any other characteristics of the family violence.
Discussion
This study describes the development of the VP-SAFvR, an actuarial screening instrument designed to help guide frontline police decision-making in cases of family violence. The results of this study suggest that the selected VP-SAFvR threshold score provides an effective screening mechanism, excluding a substantial proportion of cases that will not experience family violence recidivism, while capturing three quarters of those who report further family violence to police over 12 months. These results demonstrate that, if appropriate data are available, it is possible to develop a valid local risk assessment instrument for family violence that demonstrates similar predictive ability to existing tools. This may be a strategy that other police forces could choose to pursue if they are unable to validate an existing instrument such as the ODARA or DVSI-R.
Efficacy of the VP-SAFvR as a Screening Instrument
The most important test characteristic for screening tools is the predictive performance of the selected screening score (Messing & Campbell, 2016). It is this score that determines whether the test correctly detects a sufficient number of cases with reoffending, while correctly screening out enough cases without reoffending, such that there is a genuine reduction in the number referred for comprehensive assessment and response. A good threshold will allow resources to be directed to cases that are most likely to require it. The choice of threshold is always a balancing act between sensitivity and specificity, with improvements in one invariably resulting in decreases in the other. In the development sample, a score of 4 correctly captured almost three quarters of cases involving future family violence across the development and both validation samples, and was moderately overly inclusive of cases without further incidents, positively but incorrectly identifying approximately 50% of these cases as requiring further assessment. Given the screening nature of the VP-SAFvR and the fact that all cases scoring positively would be subject to further risk assessment, the selected threshold was considered acceptable and appropriate.
The optimal sensitivity and specificity of an instrument are to some extent dependent on its purpose and consequences (Messing & Campbell, 2016). Maintaining a moderate level of specificity for an instrument such as the VP-SAFvR is important given the resource implications if low specificity directs large numbers of cases for further, potentially unnecessary, assessment. It may also have negative implications if a false positive score leads to unwarranted punitive interventions for perpetrators. Messing and colleagues (2017) recently reported on the utility of the LS when used to determine outcomes in police-reported cases of intimate partner abuse. They noted that a “high danger” result on the LS (indicating increased likelihood of lethal violence) had very high sensitivity (92%-93%) for detecting cases where there was subsequent near-fatal violence; however, specificity was only 21%, meaning that 79% of cases screened-in were false positives. In practice, this was not a substantial problem as the only apparent consequence of screening was a brief telephone-based intervention at the time of the report, which may well have been useful to a significant proportion of victims, including those who were not later subject to near-fatal violence. Sacrificing specificity to high sensitivity is appropriate given the relatively low cost and potential positive impact of the intervention that results, and the apparent absence of any punitive correctional or policing measures for perpetrators who are screened-in. The same is not true of the VP-SAFvR, the results of which would not only lead to referrals of victims to specialist police and domestic violence sector agencies for comprehensive assessment and intervention, but also of perpetrators to behavior change programs and potentially to increased police monitoring and response. The VP-SAFvR predicts a less severe and far more common outcome than the LS and so needs a higher level of specificity to avoid overwhelming police and domestic violence sector resources with cases that do not require additional assessment or preventive action. If a VP-SAFvR threshold was chosen to achieve similar values to those reported by Messing and colleagues for the LS (e.g., a score of 2 would achieve 92% sensitivity, but only 19% specificity in the development sample), this would result in approximately 12,000 additional cases a year being referred for further assessment and intervention across Victoria, only one in five of which would report further family violence to police.
Because the VP-SAFvR was developed using an entire population, it was possible to ascertain exactly how many people scoring above and below the threshold score actually reported family violence recidivism. The negative predictive power of the selected threshold score was good, with fewer than one in five of those screened-out going on to have another family violence report over 12 months, compared with approximately one in three when relying on chance. However, reflecting the relatively low specificity of the selected threshold, only 38% of the sample who scored at or above 4 actually had a further family violence incident within 12 months (positive predictive power). This group was at relatively increased risk, reoffending at twice the rate of those scoring less than 4 (19%), and almost one third higher than the base rate of reoffending of 30.2% in the development sample. This highlights the fact that the VP-SAFvR score is best used to screen out relatively lower risk cases rather than as a way of identifying those who are definitely at high risk of future family violence.
These results confirm that the VP-SAFvR functions as intended to assist with triaging cases in need of further comprehensive assessment rather than acting as a risk assessment per se. Unfortunately, these kinds of predictive statistics have not been reported for the only equivalent tool, the DVSI-R, making it difficult to judge the VP-SAFvR’s comparative performance in this regard. Studies of comparable instruments designed to triage for physical IPV have reported lower positive predictive power at selected thresholds in some cases and higher in others. The LS obtained PPV of 13.27% for a “high danger” cut score, only a marginal improvement on the base rate for near-fatal violence of 11.48%. Although the ODARA is intended as a comprehensive risk assessment, the authors did report an optimal threshold score of 4 for classification of cases as likely to involve recidivism (Hilton et al., 2004). This score produced PPV of 54%, almost twice the 29.7% base rate of physical assault in the development sample. Negative predictive power values obtained by the VP-SAFvR were better than or comparable with those reported for these instruments in the same studies (LS = 95.83%; ODARA = 82%; Hilton et al., 2004; Messing et al., 2017), and comparable with recent negative predictive power estimates from a machine-learning algorthim used to predict absence of further domestic violence arrests (NPV = 89% with base rate of domestic violence rearrest of 15%; Berk, Sorenson, & Barnes, 2016).
Overall Predictive Validity of the VP-SAFvR
Results in the development and cross-validation samples showed that increases in total score on the VP-SAFvR were associated with incremental increases in the 12-month incidence of future family violence, suggesting that the VP-SAFvR does have some utility even when used as a general risk assessment instrument rather than a triage tool. The VP-SAFvR obtained, on average, moderate overall levels of predictive validity assessed using Rice and Harris’s (2005) AUC effect size criteria (AUC = .66 across the development and cross-validation samples). The AUCs obtained were generally equivalent to those reported for the DVSI-R, the only other instrument that attempts to predict any reported family violence. On cross-validation, the DVSI-R also achieved a moderate AUC for the outcome that most closely approximates the “family violence recidivism” variable used in this study (total family violence offenses, AUC = .65; Williams, 2012). It should be noted that Williams evaluated DVSI-R scores against the broader outcome of any future family violence offending, while the VP-SAFvR was tested specifically against offending between the same perpetrator/victim dyad or against a related child.
While the AUCs obtained by the VP-SAFvR are lower than those obtained for the ODARA in a recent prospective study in the same jurisdiction (AUC = .72 for another FVI; Lauria et al., 2017), this is likely because the VP-SAFvR is attempting to predict a broader outcome in a broader population. The ODARA has highly selective inclusion criteria and predicts a very specific outcome, leading to less variance between cases and resulting in a more consistent set of risk factors. While this maximizes its predictive validity, as outlined in the introduction, it reduces the ODARA’s practical utility in jurisdictions where family violence is defined more broadly than just male perpetrated intimate partner assault (Lauria et al., 2017; Williams, 2012).
Validity of the VP-SAFvR for Different “Types” of Family Violence
A key aim in developing the VP-SAFvR was to create a single instrument that could be used in the wide range of family violence cases presenting to Australian police, including abuse and violence within nonintimate family relationships. The results shown in Table 5 demonstrate that, in large part, different family relationships and perpetrator demographic characteristics did not substantially affect the predictive validity of the VP-SAFvR. There were no significant differences in performance between male and female perpetrators, or between different kinds of family relationships.
Given the possibility that different kinds of family violence have quite different causes, this result may be surprising to some. However, it reflects the atheoretical nature of actuarial risk assessment. The validity of risk factors in these tools comes not from their theoretical links to the outcome of interest, but from the efficacy of the model in predicting the outcome (Quinsey, Harris, Rice, & Cormier, 2005). Therefore, the fact that the variables entered into the model to create the VP-SAFvR were not drawn from any particular theoretical perspective associated with family violence does not negatively impact the validity of the final risk assessment instrument derived. Its value is demonstrated by its effect in predicting family violence recidivism, regardless of why it is able to do so. That said, it is unsurprising that risk factors on the L17, which were largely drawn from the IPV literature, were able to predict other forms of family violence. There is a wealth of evidence to suggest that many of the developmental and social predictors of violence are shared, regardless of the context in which the violence occurs (see Hamby & Grych, 2013). As noted in the introduction, those who engage in IPV are known to be more likely to engage in both violence toward other family members and violence outside of the family, meaning that these different forms of violence likely share many common causes and risk factors (Finkelhor et al., 1983; Magdol et al., 1997; Straus et al., 1980/2017). We acknowledge that there are likely to be important differences in how violence in different family relationships should be managed and treated. However, until proven otherwise, there is no reason to assume that the risk of violence in different family relationships cannot be assessed actuarially using a single instrument, as demonstrated by the validity of both the VP-SAFvR and DVSI-R.
While the VP-SAFvR was shown to have predictive validity across demographic subsamples, there were identifiable groups among which it performed less well. The VP-SAFvR has a lower probability of differentiating between reoffenders and non-reoffenders when used with child respondents (having a small to moderate effect size in this group), when compared with use with adult respondents (where there was a moderate to large effect size). The only group for which there was clearly a small effect size were those in same-sex relationships. In this group, the AUC of 0.57 indicated only a small ability to differentiate between recidivists and nonrecidivists. Moreover, because of the relatively small sample (n = 262), the 95% confidence interval was broad and included .50, suggesting that prediction may not differ from chance for this group. It may be that the VP-SAFvR is simply not suitable for use with same-sex partners, or that predictive performance could be improved in this group by including additional or alternative risk factors. However, given the comparatively limited sample size, this result requires replication and further investigation in other samples of same-sex cases.
The VP-SAFvR may have performed less well for children and same-sex perpetrators because these groups are less likely to have some of the VP-SAFvR risk factors, meaning that they routinely obtain lower scores (e.g., pregnancy is less likely in same-sex relationships, and children may have had less available time to develop the history of previous offending and family violence required for many risk factors). Although this limits the utility of the VP-SAFvR as a risk assessment instrument per se in these groups, these results do not necessarily mean that the score cannot be used for triage or screening purposes. Rather, it may simply be necessary to adopt a triage threshold of 3 in these groups, to account for their lower VP-SAFvR total scores. Using a threshold score of 3, between two thirds and three quarters of recidivists in these groups would be identified and referred for further assessment, comparable to rates of identification using a threshold of 4 in the other groups. Together, same-sex and child perpetrator cases accounted for only 8% of all FVIs, meaning that the reduced specificity would likely not add dramatically to the overall number of cases screened-in, while ensuring that the majority of appropriate cases are identified for further assessment.
Study Limitations
We were unable to ascertain the interrater reliability of the VP-SAFvR as it was developed from an existing electronic data set rather than relying on researcher coding of data. Moreover, the L17 is a somewhat unreliable data source, due to limitations in both its content and format, and in protocols around its use by police members (State of Victoria, 2016). These limitations would directly affect the content and performance of the VP-SAFvR given it was partially derived from L17 items. That said, it is promising that key risk factors from the L17 that have been identified in other family and IPV risk assessment tools were statistically selected into the VP-SAFvR (e.g., pregnancy, previous violence, controlling behaviors). Risk factors such as the presence of firearms or strangulation were not included in the final VP-SAFvR; however, this may well reflect the fact that these risk factors come from studies of predictors of fatal or near-fatal violence (Campbell, Webster, & Glass, 2009; Campbell et al., 2003), rather than future reports to police of any family violence.
The VP-SAFvR could not be specifically validated for Aboriginal or Torres Strait Islander Australians or culturally and linguistically diverse communities. Victoria Police have no way of reliably ascertaining or recording the ethnicity of perpetrators or victims, meaning that it was impossible for us to select these subgroups for specific validation. Validation in these groups is necessary given evidence that risk assessments validated in majority populations may not be as effective in minority groups (Shepherd & Lewis-Fernandez, 2016).
This study is pseudo-prospective in design, drawing directly on information coded at the time by the responding officer, with cases followed up by researchers over 12 months without prior knowledge of who reoffended. This methodology is stronger than recoding of data from files as in a retrospective study, but still limited, particularly given intervention by police and other agencies would likely have influenced recidivism in some cases, and we were unable to account for this effect. The effects of police actions are evident from the univariate analyses where charges arising from the index FVI were negatively associated with family violence recidivism. While this is obviously a positive for victims and the community, it likely means that the efficacy of the VP-SAFvR is underestimated as some people with scores above the thresholds would have been prevented from reoffending, weakening the predictive validity of the algorithms underlying the tools (Storey et al., 2014; Williams & Stansfield, 2017).
The definition of family violence recidivism was required by our industry partner, Victoria Police, and so differed slightly to that used in studies of the only comparable instrument available, the DVSI-R. Our outcome variable is broader in some ways than that reported in the various studies of the DVSI-R, which used arrest as an indicator of recidivism (Stansfield & Williams, 2014; Williams, 2012; Williams & Stansfield, 2017), while narrower in others, as our outcomes were limited only to incidents involving the same dyad or the perpetrator and a related child. Moreover, while our use of police reports will bear some relationship to risk of experiencing ongoing family violence, police reports will obviously underestimate the true rate of recidivism in the community (Australian Bureau of Statistics, 2017; Tarling & Morris, 2010).
Finally, the VP-SAFvR was not administered as intended in this study. It will be necessary to develop and trial an operational version of the VP-SAFvR that involves developing wording for the items that can be used by police in the field, a system for prorating missing scores, and, potentially, the inclusion of a discretionary score override for users to apply when justified. Each of these changes may affect the performance of the VP-SAFvR when used in practice by police officers rather than in a computer simulation. Further evaluation in a field trial will be necessary to determine the actual validity, and the reliability, of the final instrument.
Future Directions and Conclusion
The specific risk factors identified in the VP-SAFvR may have more relevance to other Australian jurisdictions than risk factors identified in North American samples and included in existing instruments. Nonetheless, like all actuarial instruments, the VP-SAFvR will require further evaluation if considered for use outside of the location where it was developed. While developed in a very large Australian population of family violence cases, it may operate differently in parts of Australia with different demographic characteristics. For example, users in areas with more Aboriginal and Torres Strait Islander citizens, or citizens from other culturally and linguistically diverse backgrounds, may find that the VP-SAFvR performs differently.
These initial results suggest that the VP-SAFvR may be effective as a police screen, but it is not intended to be an instrument that generalizes to every setting. The VP-SAFvR is designed specifically to assist police in identifying cases that are likely to repeatedly come to their attention. In addition to the fact that police reports of family violence represent only a small proportion of all family violence in the community, some victims or perpetrators of family violence may be less inclined to share relevant information with police than with support agency staff. This could lead to differences in ascertainment of risk factors in the different settings, which would affect the predictive performance of the tool. Conversely, the VP-SAFvR relies on data usually only available to police (e.g., criminal history) and so is unlikely to be able to be scored by staff from other agencies who do not have access to similar records.
While the VP-SAFvR is not advised as an appropriate instrument for all settings, it appears that it can perform a specific role for police; ensuring that additional attention is paid to family violence reports in which there is likely to be ongoing harm to victims, including the risk of serious or fatal violence. It is in these cases that police can provide a much-needed preventative response, and it is in these cases where the VP-SAFvR may be able to assist police with evidence-based decision-making. The results of this study are promising, but to determine its true efficacy, the VP-SAFvR will need to be subject to further evaluation in prospective field trials with police responding to active reports of family violence and scoring the VP-SAFvR in real time.
Footnotes
Authors’ Note:
The authors would like to acknowledge Assistant Commissioner Dean McWhirter, Inspector Steven Soden, Det. Sgt. David Galea, Ms. Gordana Letic, Dr. David Ballek, and the members of Victoria Police Family Violence Command for their assistance with this research.
