Persistence and Potential Lethality in Intimate Partner Violence: Evaluating the Concurrent and Predictive Validity of a Dual Risk Assessment Protocol

Abstract

This study seeks to determine the concurrent and predictive validity of a dual risk assessment protocol. It combines the risk of persistence in intimate partner violence (IPV) measured via the Domestic Violence Screening Instrument–Revised (DVSI-R) with supplemental items from the Danger Risk Assessment (DRA) bearing on the risk of potential lethality. We further test whether this assessment protocol reproduces disparities by race and ethnicity found in the larger population. Using a sample of 4,665 IPV male defendants with a female victim, analyses support both types of criterion validity. The DRA risk score is associated with felony charges, incarceration at the initial arrest, and the frequency of subsequent dangerous behavior. Results also suggest minimal predictive bias or disparate impact by race and ethnicity. Incorporating supplemental items bearing on potential lethality risk adds important information concerning the risk management strategies of those involved in IPV.

Keywords

risk assessment intimate partner violence femicide

Femicide is a persistent and egregious problem in the United States. Women continue to have a greater risk of being killed by their intimate partners than any other person (Campbell et al., 2018). Indeed, the most recent 10 years (2008–2017) of homicide data reported to the National Violent Death Reporting System (NVDRS) show that when women are killed, approximately 40% of their deaths were perpetrated by their intimate partners (https://wisqars.cdc.gov:8443/nvdrs/nvdrsDisplay.jsp). The percentage is sharply lower for men killed during this time frame, with about 4% of their deaths perpetrated by intimate partners. The percentages are relatively stable across the 10 years, ranging from 37.5–41.1% for women and 3.4–5.2% for men.

Previous research also shows that half to three quarters of women killed by an intimate partner experienced some form of abuse by that partner prior to the lethal incident. Often the pattern manifests an escalating frequency and severity of abuse over time (Campbell et al., 2007, 2018; Cattaneo & Goodman, 2005; Fox & Fridel, 2017; Hilton & Harris, 2005; Petrosky et al., 2017; Sharps et al., 2003). Still, many women do not recognize their lives are in danger (Nicolaidis et al., 2003). Thus, persisting or escalating intimate partner violence (IPV) may lead to homicide unless women’s exposure to violent partners is reduced. Reduced exposure may come through the timely departure from the relationship (Dugan et al., 2003), the acquisition and effective enforcement of civil restraining orders or criminal protective orders (Buzawa et al., 2015), or identifying the risk of potential life-threatening violence, with the goal of using that information to mobilize some type of preventive intervention (Campbell et al., 2018; Eke et al., 2011).

Achieving this goal requires the development and validation of risk assessment instruments, a topic addressed in previous empirical studies. The literature is rich with examples of empirically validated IPV risk assessment tools focused on the persistence of IPV, including but not limited to the following (for a review, see Messing & Thaller, 2013): Brief Spousal Assault Form for the Evaluation of Risk (B-SAFER; Kropp & Hart, 2004), Domestic Violence Risk Appraisal Guide (DVRAG; Hilton et al., 2008), Domestic Violence Screening Instrument–Revised (DVSI-R; Stansfield & Williams, 2014; Williams, 2012; Williams & Grant, 2006), Ontario Domestic Assault Risk Assessment (ODARA; Hilton et al., 2004), and Spousal Assault Risk Assessment, version 3 (Kropp & Hart, 2015). Just one risk assessment tool is dedicated to life-threatening violence—the Danger Assessment (DA) Guide (Campbell, 2007; Campbell et al., 2009).

Studies blending these two approaches (assessments of persistence with assessments of potential lethality) are scarce to nonexistent. Combining risk of persistence with risk of lethality is critical because focusing solely on assessments of persistence may not capture IPV that escalates to life-threatening proportions. Furthermore, validating a risk assessment protocol that combines these two types of assessments is critical for generating an evidence-based procedure to guide preventive interventions that could save lives, such as effective treatment for batterers and/or effective safety planning for victims.

The use of risk assessment tools in the criminal justice system raises concerns about the introduction of bias in the risk assessment process, particularly for racial and/or ethnic minorities, where risk assessment may be reduced to race assessment depending on what items are included in the risk assessment instrument used (Harcourt, 2015, Starr, 2014, 2015). Although all forms of risk assessment could potentially exacerbate racial disparities in prosecution and sentencing, this assertion remains largely untested in IPV risk assessment research. However, to the extent risk assessment instruments exacerbate racial disparities in outcomes, they must undergo revision to eliminate such bias. Risk assessment instruments must not only be predictive (i.e., accurately differentiate between those who are truly “low risk” from those who are truly “high risk”) but also do so equitably across all racial and ethnic groups (Chouldechova, 2017; Dressel & Farid, 2018; Skeem & Lowenkamp, 2016).

The present research addresses these issues by evaluating the implementation of a risk assessment protocol for IPV defendants that uses a risk of persistence instrument (DVSI-R), supplemented by items reflecting the potential lethality of IPV. The identification of the supplemental items drew from previous research on femicide and the DA Guide (Campbell et al., 2003, 2009, 2018). The research includes two key components. First, as described below, we test the concurrent and predictive validity of this “dual” risk assessment protocol. Second, we further test whether this assessment protocol reproduces disparities by race and ethnicity found in the larger population, a critical issue facing any risk assessment used in evidence-based practice (Skeem & Lowenkamp, 2016). The site for the present research was the State of Connecticut, and thus we begin by describing the background that resulted in the implementation of this dual assessment protocol.

Background

Since 2002, the Court Support Services Division (CSSD) of the Judicial Branch in Connecticut supported the development and validation of the DVSI-R. Statewide use began in 2004 to estimate the likelihood of persistent IPV (Williams, 2008). Linking the risk of persistent IPV (i.e., the DVSI-R) with additional items bearing on the risk of lethal IPV became a pressing issue because the state experienced a significant rise in the number of IPV homicides. The decision to augment the DVSI-R with the risk of potential lethality was presented to the Speaker of the House of Representative’s Task Force on Domestic Violence in Connecticut in 2010. The main goal of developing a dual risk assessment protocol was to improve the court response to these potentially life-threatening acts of IPV.

The new protocol drew from the research of Campbell and associates (Campbell et al., 2003) with the DA to select items to measure the risk of potential lethality that did not overlap substantially with the items included in the DVSI-R. The DA is used by law enforcement, health care professionals, and domestic violence advocates to determine the danger of women being killed by their partners (dangerassessment.org). The DA underwent adaptations, where short-form or amended versions are developed for use in different settings. As an example, Messing and colleagues (2017) assessed the predictive validity of the DA-5, a short-form of the DA developed for use in health care settings with IPV survivors (Snider et al., 2009). The original DA-5 items (Snider et al., 2009) included (a) increase in frequency and severity of IPV, (b) sexual jealousy, (c) survivors’ belief that their partners could kill them, (d) IPV during pregnancy, and (e) use of or threats with a weapon. Analysis by Messing and colleagues (2017) suggested that including a measure of strangulation instead of IPV during pregnancy may provide more accuracy, although the difference was not statistically significant. These results also confirmed that a shorter assessment including strangulation is sufficient for a rapid assessment of homicide potential among IPV survivors.

Drawing from the research of Campbell and associates, five items were selected to measure the risk of potential lethality that did not overlap substantially with the items included in the DVSI-R and that their research found to be predictive of femicide: separation/estrangement, threats to kill, threatened or used a deadly weapon, nonfatal strangulation, and threatened or attempted suicide. These five items form the Danger Risk Assessment (DRA) in Connecticut. The review and selection process involved a collaborative team consisting of the domestic violence researchers and the program officers in Family Services, CSSD of the Connecticut Judicial Branch. The selected items were also reviewed at a statewide meeting of the Family Relations Counselors (FRCs), who conduct the pre-arraignment risk assessments in this state.

The dual risk assessment protocol (i.e., the DVSI-R combined with the DRA) was piloted in five geographic areas (essentially judicial districts) during 2013, with promising results (presented in Campbell et al., 2018). Specifically, defendants scoring two or more on potential lethality items were significantly more likely to be initially arrested or subsequently rearrested on felony charges, incarcerated at the initial arrest or upon subsequent rearrests, or to be engaged in escalating IPV during an 18-month follow-up period (Campbell et al., 2018). The dual risk assessment protocol went statewide in 2014, with additional 18-month follow-up data collected (described below) and analyzed here.

This Study

As previously suggested, the first objective of this study is to determine empirically the concurrent and predictive validity of the DRA over and beyond the estimated effects of the DVSI-R. Cronbach and Meehl (1955) long ago classified these two empirical tests as forms of criterion validity in which data for a test variable and a specified criterion variable are collected on the same research subjects, with the association between these two variables estimated. If the data are collected at the same point in time, the analysis constitutes a test of concurrent validity. If the data on the criterion variable are collected after the test variable, the analysis constitutes a test of predictive validity (Cronbach & Meehl, 1955).

The test for concurrent validation in the present research estimates the relation between the DRA score obtained at the initial intake risk assessment and two criterion outcome measures obtained at that same point in time: receiving felony charges and incarceration (being jailed) at the intake arrest. The predictive validation analysis estimates the relation between the initial intake DRA risk score and criterion outcome measures obtained during an 18-month follow-up period after the initial intake risk assessment, representing the frequency of new felony charges, incarcerations, or subsequent dangerous behavior. Assessing the risk of persistence using the DVSI-R is part of the “business as usual” risk protocol already in place in Connecticut and elsewhere. Hence, the pressing empirical issue is whether the two estimated effects (i.e., concurrent and predictive) of the DRA are independent of the estimated effects of the DVSI-R. If so, the DRA adds vital information to decision-making about the management of IPV cases (i.e., those scoring high may, indeed, reflect potentially life-threatening violence).

To establish the concurrent and predictive validity of the DRA, the following five hypotheses are tested:

Concurrent validity:

Hypothesis 1 (H1): The DRA risk score at initial arrest will be positively and independently associated with felony charges at intake.

Hypothesis 2 (H2): The DRA risk score at initial arrest will be positively and independently associated with being jailed at intake.

Predictive validity:

Hypothesis 3 (H3): The DRA risk score at initial arrest will be positively and independently associated with a higher frequency of felony charges within an 18-month follow-up period after the initial arrest.

Hypothesis 4 (H4): The DRA risk score at initial arrest will be positively and independently associated with a higher frequency of incarceration within an 18-month follow-up period after the initial arrest.

Hypothesis 5 (H5): The DRA risk score at initial arrest will be positively and independently associated with a higher frequency of dangerous behaviors (nonfatal strangulation, death threats, use of or threats with a deadly weapon, attempted or threatened suicide) within an 18-month follow-up period after the initial arrest.

The second objective is to determine whether racial or ethnic bias enters the risk assessment process. Skeem and Lowenkamp (2016) offer specific empirical criteria for addressing this concern. Drawing from The Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014), they recommend empirically assessing risk assessment instruments for predictive bias (differential prediction across racial or ethnic groups) and disparate impact (mean score differences across those groups). Indeed, Skeem and Lowenkamp (2016) maintain that “risk assessment instruments . . . must be empirically examined for both predictive bias and disparate impact. Simply put, risk assessment must be both empirically valid and perceived as morally fair across groups” (p. 685). Finding evidence of either of the two criteria raises questions about whether the risk assessment process is unbiased. By determining whether the relations between the DRA numeric score and the outcome measures of validity vary across racial or ethnic groups, we assess whether predictive bias is present. This is the first assessment of predictive bias conducted with either the DA or modified versions including the DRA.

Method

Sample

This validation study included 4,665 defendants arrested for an act of IPV involving women as victims and men as defendants. The term “defendants” is used because they are assessed prior to adjudication by trained FRCs in Connecticut. In the criminal court, FRCs assess all cases that are referred to Family Services and complete a pre-arraignment intake assessment for continued violence, usually within 24 hr after an arrest. In this process, FRCs review criminal histories, conduct in-depth interviews of defendants, and coordinate with family violence victim advocates who separately interview victims. The results of the victim interviews are released to the FRCs with victim consent. FRCs prepare detailed case assessments and recommendations for the Court pertaining to criminal protective orders (e.g., partial, residential stay away, or full no contact) and placement (prosecution or keeping defendants under family services supervision for further assessment concerning any recommended interventions). Such recommendations are guided (not prescriptively determined) by scores on both the DVSI-R and the DRA, with scoring on two or more DRA items considered “high danger.”

The study includes IPV defendants with initial intake assessments post arrest between July 1, 2014, and December 31, 2014. Intimate partners include spouses and ex-spouses, intimate cohabitants, unmarried couples, those in dating relationships or having a child in common regardless of whether they ever lived together. The DVSI-R, used to estimate the likelihood of IPV persistence, guides these initial intake assessments. This assessment protocol is part of the normal operating procedures for IPV cases and is used by FRCs to inform their recommendations at arraignment, as described above.

The racial and ethnic composition of the sample of defendants consisted of 41% non-Latinx White, 33% African American, and 26% Latinx. Average age was about 37 years, with a range of 19–85 years. As part of a confidentiality agreement with the state, data obtained for the present research were de-identified (no personal identifiers included in the data file).

Outcome Measures

Testing for concurrent and predictive validity requires identifying proxy outcome measures related to potentially lethal IPV, that is, dangerousness. The first measure of IPV dangerousness includes the level of charges for the initial offense (misdemeanor charges = 0, felony charges = 1) at intake (M = 0.27, standard deviation [SD] = 0.44) and the total number of felony charges subsequent to the initial intake and risk assessment (M = 0.33, SD = 0.74, range = 0–8). The second outcome measure is whether defendants were incarcerated (i.e., jailed) at intake for the initial arrest (M = 0.19, SD = 0.40, range = 0–1) or for subsequent rearrests during the 18-month follow-up period (M = 0.22, SD = 0.62, range = 0–6). Both measures reflect more serious, perhaps life-threatening IPV. The predictive validation analysis also includes another outcome measure—persistence in dangerous behavior subsequent to the initial intake and risk assessment (M = 0.49, SD = 1.48, range = 0–18). The forms of such behavior are those included in the DRA: nonfatal strangulation, death threats, use of or threats with deadly weapons, and attempted or threatened suicide. Persistence in these dangerous behaviors is derived by summing their frequency during the 18-month follow-up period after the initial arrest and risk assessment. Hence, the empirical question is whether higher scores on the DRA at the initial arrest and risk assessment are associated with a higher frequency of death threats, use of or threats with a deadly weapon, nonfatal strangulation, and/or attempted or threatened suicide after the initial arrest and risk assessment.

Measures of Risk: Potential Lethality and Persistence

Data for the five DRA items (discussed earlier) were collected by FRCs at the intake risk assessment immediately after the initial arrest. Data collection involved interviews with male and female defendants and victims. For example, data on use of or threats with a deadly weapon were obtained by asking defendants, “have you ever threatened to use or actually used a potentially deadly weapon against your partner?,” and victims were asked, “has your partner ever threatened to use or actually used a potentially deadly weapon against you?” If a “yes” response was given by either the defendant or the victim, the item was scored as “1.” If a response was “no” by both the defendant and the victim, the item was scored as “0.” FRCs were instructed to include any type of firearm, knife or sharp object, or any explosive devise as a deadly weapon. The total DRA score is calculated by summing across the five items to yield a total score ranging from 0–5 (M = 1.09, SD = 1.19). We should note that the five potential lethality items are administered as a supplement to the DVSI-R only to IPV cases, with the DVSI-R administered to all domestic violence cases (e.g., violence against family and/or other household members in addition to intimate partners) in Connecticut.

The risk of persistence is measured by the DVSI-R, which was empirically evaluated for criterion validity in a number of studies (Stansfield & Williams, 2014, 2021; Williams, 2012; Williams & Grant, 2006; Williams & Stansfield, 2017). The DVSI-R includes 11 items. Seven address the behavioral history of defendants (prior nonfamily assaults, arrest, or criminal conviction; prior family violence assaults, threats, or arrests; prior family violence intervention or treatment; prior violation of orders of protection or court supervision; prior or current verbal or emotional abuse; the frequency of family violence in the previous 6 months; and escalation of family violence in the previous 6 months). The other four items include substance abuse, any objects used as weapons, children present at the scene of the IPV, and employment status. Each item estimates the intensity of the risk factor, not just the presence, where scores ranged from 0–2 or 0–3, depending on the item. The total numeric DVSI-R risk scores range from 0–28 (M = 10.37, SD = 5.33). Internal consistency is alpha = .70. This study uses a modified version of the DVSI-R. Employment status was removed and used as a separate covariate in the analysis. Thus, the modified version has 10 items, with the range being 0–26 (M = 9.47, SD = 5.08). Internal consistency with employment status removed remains as alpha = .70.

Covariates

Four covariates are included in the analysis; all pertain to characteristics of defendants: race, ethnicity, age, and employment status of defendants. Race is measured as a dummy variable with 1 = African American and 0 = others, and ethnicity is also measured as a dummy variable with 1 = Latinx and 0 = others. The age distribution is subdivided into quartiles representing four defendant age categories: 19–27 years, 28–34 years, 35–44 years, and 45 years or older. Employment status is measured as 0 = employed (40.1%), 1 = uncertain or part-time (29.9%), and 2 = unemployed (30.0%).

Analysis Plan

Recall the first objective of the present research is to determine the concurrent and predictive validity of the DRA independent of the DVSI-R. The analysis bearing on concurrent validity estimates the relation between the DRA and the two outcome measures, felony charges (1 = Yes, 0 = No) and incarcerated at intake (1 = Yes, 0 = No), at the time of the initial intake risk assessment. Control variables include the DVSI-R total score and the covariates described above. Because the outcome measures are dummy variables, estimation involved Logistic Regression (see Table 1). The analysis bearing on predictive validity estimates the relation between the DRA and the frequency of felony charges, the frequency of incarceration, and the frequency of dangerous behaviors subsequent to the initial intake risk assessment, that is, during the 18-month follow-up period (see Table 2). Covariates are the same as those in the concurrent validity analysis. The outcome measures in the predictive validity analysis are continuous, having a strong positive skew. Accordingly, Negative Binomial Estimation procedures are used. Doing so to predict these outcome measures shows that the over-dispersion parameter alpha was significantly different from zero, indicating this estimation procedure is preferable to estimating Poisson Models.

Table 1.

Concurrent Validation of the DRA Total Risk Score Controlling for Covariates (Defendant Characteristics) and the DVSI-R Total Risk Score: N = 4,665.

	Felony charges		Incarceration at intake
	Model 1	Model 2	Model 1	Model 2
Variable	OR (SE)	OR (SE)	OR (SE)	OR (SE)
African American	1.18* (0.09)	1.12 (0.09)	1.91** (0.18)	1.82** (0.19)
Latino	1.24** (0.11)	1.31** (0.12)	1.66** (0.17)	1.78** (0.19)
Age categories	0.91** (0.03)	0.85** (0.03)	1.13 (0.04)	1.06 (0.14)
Employment status	1.18** (0.05)	1.01 (0.04)	2.10** (0.10)	1.84** (0.10)
DRA risk score	1.43** (0.03)	1.19** (0.04)	1.41** (0.04)	1.14** (0.04)
DVSI-R risk score		1.14** (0.01)		1.17** (0.01)

Note. DRA = Danger Risk Assessment; DVSI-R = Domestic Violence Screening Instrument–Revised; OR = odds ratio; SE = standard error.

p ≤ .05. **p ≤ .01.

Table 2.

Predictive Validation of the DRA Total Risk Score Controlling for Covariates (Defendant Characteristics) and the DVSI-R Total Risk Score: N = 4,665.

Variable	Frequency of felony offense		Frequency of new incarceration		Frequency of dangerous behavior
	Model 1	Model 2	Model 1	Model 2	Model 1	Model 2
	IRR (SE)	IRR (SE)	IRR (SE)	IRR (SE)	IRR (SE)	IRR (SE)
African American	1.11 (0.09)	1.08 (0.08)	1.46** (0.14)	1.42** (0.14)	1.44** (0.16)	1.38** (0.15)
Latino	1.03 (0.09)	1.06 (0.09)	1.31** (0.14)	1.36** (0.14)	1.10 (0.13)	1.15 (0.13)
Age categories	0.88** (0.02)	0.85** (0.03)	0.88** (0.03)	0.84** (0.03)	0.94 (0.04)	0.90** (0.04)
Employment status	1.20** (0.05)	1.12** (0.04)	1.44** (0.07)	1.31** (0.07)	1.21** (0.07)	1.15* (0.06)
DRA risk score	1.16** (0.03)	1.04 (0.03)	1.19* (0.04)	1.04 (0.04)	1.65** (0.06)	1.47** (0.06)
DVSI-R risk score		1.07** (0.01)		1.09** (0.01)		1.08* (0.01)

Note. DRA = Danger Risk Assessment; DVSI-R = Domestic Violence Screening Instrument–Revised; IRR = Incident Rate Ratio; SE = standard error.

p ≤ .05. **p ≤ .01.

The second objective is to determine whether racial or ethnic disparities bias the risk assessment process. Following Skeem and Lowenkamp’s (2016) recommendations, this is done by testing for predictive bias and disparate impact. Predictive bias is assessed by determining whether the estimated relations between the DRA total numeric risk score and the outcome measures bearing on concurrent and predictive validity varied across racial or ethnic groups. Estimation of predictive bias involves calculating interaction terms, which are the cross-product between the DRA total numeric risk score and racial or ethnic group membership (one cross-product for each group) and adding these calculated measures to equations estimating concurrent or predictive validity (see Table 4). Statistically significant interaction terms are indicative of predictive bias—estimated effects of the DRA are different for each group. Insignificant interaction terms are indicative of similar estimated relations across racial and ethnic groups, yielding no evidence of predictive bias.

Disparate impact is assessed by examining the distribution of racial and ethnic groups across scoring categories of the DRA, along with the mean DRA scores for each racial and ethnic group (see Table 3). Similar distributions and mean scores suggest minimal group differences, with substantially different distributions and mean scores suggesting disparate impact. As Skeem and Lowenkamp (2016) note, evidence of disparate impact in itself does not necessarily mean that a risk assessment instrument is biased because such differences might reflect actual group differences in recidivism risk. However, use of such an instrument might be perceived as morally unfair.

Table 3.

Distribution of Defendants on the DRA Total Risk Score by Race and Ethnicity: Testing for Disparate Impact.

DRA total risk score	African American	Latinx	Non-Latinx	White
DRA total risk score	N (%)	N (%)	N (%)	N (%)
0	644 (35.3)	458 (25.1)	723 (39.6)	1,825 (100)
1	493 (32.3)	418 (27.4)	617 (40.3)	1,528 (100)
2 or higher	390 (29.7)	329 (25.1)	593 (45.2)	1,312 (100)
Total	1,527 (32.8)	1,205 (25.8)	1,933 (41.4)	4,665 (100)
Gamma coefficient	−.08	.01	.07
M DRA risk score	1.00	1.09	1.15
SD	1.14	1.18	1.22

Note. DRA = Danger Risk Assessment; SD = standard deviation.

Results

The results of estimating the logistic regression models bearing on concurrent validity are presented in Table 1. The focus is on the estimated effects of the DRA risk score and whether the effects vary when the DVSI-R risk score is added to the models. However, a brief summary of the covariates is warranted. The DVSI-R risk score has statistically significant and positive estimated effects on the odds of felony charges and the odds of incarceration at the initial arrest. Furthermore, the indicators of race, ethnicity, and employment status all have statistically significant and positive associations with both felony charges and incarceration at the initial arrest. However, the estimated effects of race and employment status on felony charges become statistically insignificant when the DVSI-R risk score is controlled. Age is statistically significant and negatively associated with felony charges but has no statistically significant estimated effects on incarceration at intake.

The results for the DRA risk score can be summarized succinctly. Its estimated effects on both felony charges and incarceration at the initial arrest are statistically significant, with a one-unit increase of the DRA risk score associated with a 43% increase in the odds of felony charges (OR = 1.43, confidence interval [CI] = [1.37, 1.49]) and a 41% increase in the odds of incarceration (OR = 1.41, CI = [1.33, 1.49]). Estimated effects of the DRA risk score remain statistically significant when the DVSI-R risk score is added to the models for felony charges and incarceration at the initial arrest, but these estimated effects are attenuated. A one-unit increase of the DRA risk score drops to a 19% increase in the odds of felony charges (OR = 1.19, CI = [1.11, 1.27]) and a 14% increase in the odds of incarceration at intake (OR = 1.14, CI = [1.06, 1.22]).

The results of estimating Negative Binomial Models bearing on predictive validity are reported in Table 2. Summarizing the results of the covariates briefly, the DVSI-R risk score has statistically significant and positive estimated effects on the frequency of subsequent felony charges, the frequency of incarceration at rearrest, and the frequency of subsequent dangerous behavior. The indicators of race and ethnicity have statistically significant estimated effects on the frequency of incarceration upon rearrest before and after the DVSI-R risk score is added to the models. The estimated effects of race are also statistically significant and positive in the models for the frequency of dangerous behavior, but ethnicity is insignificant in the models. Both race and ethnicity are statistically insignificant in the models for the frequency of subsequent felony charges. The indicator of employment status consistently has statistically significant and positive estimated effects, and age consistently has statistically significant and negative estimated effects across all models, regardless of whether the DVSI-R risk score is excluded or included in those models.

The estimated effects of the DRA total numeric risk score on the frequency of subsequent felony charges and the frequency of incarceration at rearrest are statistically significant and positive. However, the estimated effects become insignificant when the DVSI-R risk score is added to the models. Finally, the DRA risk score significantly predicts the subsequent frequency of dangerous behavior, with a one-unit increase of the DRA risk score associated with a 65% increase in the subsequent frequency of this behavior (Incident Rate Ratio [IRR] = 1.65, CI = [1.53, 1.77]). Once again, the estimated effect declined, although remained statistically significant, when the DVSI-R risk score is controlled, with a one-unit increase of the DRA risk score associated with a 47% increase in the frequency of subsequent dangerous behavior (IRR = 1.47, CI = [1.35, 1.59]).

Empirical results bearing on the issue of disparate impact are displayed in Table 3. It shows the race and ethnic distributions across the categories of the DRA total numeric risk score, collapsed into three categories: zero, one, and two or more. Collapsing the upper end of the distributions is done because of their positive skew. Observe that the percentage in each of the scoring categories of the DRA for each of the three groups (African American, Latinx, and non-Latinx White) varied only slightly around each group’s respective composition of the total sample. Specifically, the range for African American is 30–35%, with the percentage in the total sample being 33%, Latinx is 25–27%, with the sample percentage being 26%, and non-Latinx White is 40–45%, with 41% of the sample being in this group. Indeed, the gamma coefficients suggest that the statistical association between the DRA risk scoring categories and race or ethnicity approaches zero. Moreover, the mean scores on the DRA for each group approximate each other, ranging from 1.00 for African Americans to 1.15 for non-Latinx Whites, with Latinx in the middle (1.09). In short, Table 3 shows no evidence of disparate impact by race or ethnicity.

Table 4 presents the results of testing for predictive bias. The models estimated are identical to those for tests of concurrent and predictive validity, although the DVSI-R total numeric risk score is not included. As noted in the footnote of Table 4, the results are the same when the DVSI-R is included. More importantly, the models include the measures reflecting the interaction between the DRA total numeric risk score and the racial and ethnic groups. Estimating this interaction bears directly on the issue of predictive bias. To summarize the findings succinctly, none of the interaction terms are statistically significant. Thus, Table 4 shows no evidence of predictive bias.

Table 4.

Testing for Predictive Bias of the DRA Total Risk Score Controlling for Covariates (Defendant Characteristics): N = 4,665.^a

Variable	Felony charges		Incarceration at intake		Dangerous behavior
	Concurrent	Predictive	Concurrent	Predictive	Predictive
	OR (SE)	IRR (SE)	OR (SE)	IRR (SE)	IRR (SE)
African American	1.07 (0.12)	1.11 (0.12)	1.60** (0.22)	1.39* (0.19)	1.67** (0.25)
Latino	1.22 (0.15)	0.99 (0.11)	1.78** (0.26)	1.08 (0.16)	1.10 (0.18)
Age categories	0.91** (0.03)	0.87** (0.03)	1.13** (0.04)	0.88 (0.03)	0.94 (0.04)
Employment status	1.18** (0.05)	1.20** (0.05)	2.11** (0.10)	1.44** (0.07)	1.21** (0.07)
DRA risk score	1.39** (0.06)	1.15** (0.05)	1.37** (0.07)	1.12* (0.06)	1.71** (0.10)
DRA interaction with race	1.08 (0.07)	1.00 (0.06)	1.15 (0.08)	1.03 (0.08)	0.88 (0.08)
DRA interaction with ethnicity	1.02 (0.07)	1.04 (0.07)	0.95 (0.07)	1.16 (0.09)	1.01 (0.09)

Note. DRA = Danger Risk Assessment; OR = odds ratio; SE = standard error; IRR = Incident Rate Ratio; DVSI-R = Domestic Violence Screening Instrument–Revised.

The statistically insignificant results of the interaction terms between the DRA total risk score and race or ethnicity also occurred when the DVSI-R total risk score was included in the estimation.

p ≤ .05. **p ≤ .01.

Discussion

The first objective of this study is to determine empirically the concurrent and predictive validity of supplementing the risk of persistence in IPV (i.e., the DVSI-R risk score) with items previous studies found to be associated with the risk of IPV becoming potentially life-threatening. The analytical results reported above support both types of criterion validity. Specifically, the DRA total numeric risk score is consistently and significantly associated with the outcome variables reflecting serious IPV, measured at the same time as the initial DRA (i.e., concurrent validity), independent of the DVSI-R risk score and the other covariates.

Furthermore, the DRA risk score is significantly associated with the outcome variables measured subsequent to the initial risk assessment (i.e., predictive validity), although the estimated effects become statistically nonsignificant for the frequency of felony charges and the frequency of incarceration upon rearrest when the DVSI-R total numeric risk score is added to the models estimated. We believe that the charges and incarceration may be more reflective of IPV persistence as opposed to escalation and lethality. By contrast, previous research shows that future dangerous behaviors are part of a pattern of escalating coercive control (e.g., Stansfield & Williams, 2021) that can lead to a higher risk of potential lethality. Supplemental items may therefore enable identification of potential lethality among defendants who score low on the risk of persistence. The findings generally support incorporating the supplemental items bearing on the risk of potential lethality into a dual risk assessment protocol. Doing so adds important information to the decision-making by FRCs concerning placement of IPV defendants, the issuance of protective orders, and other stipulated conditions.

Equally important, no evidence was found that racial or ethnic disparities are reproduced or exacerbated by this dual risk assessment protocol. As this was one of the first tests for racial equity in IPV risk assessment research that we are aware of, we recommend that other IPV risk assessment tools routinely check for predictive bias and differences by race and ethnicity. As noted by Skeem and Lowenkamp (2016), concerns about disparate effects on racial minorities are applicable to all uses of risk assessment in criminal justice system decision-making. Testing for predictive bias is also important to ensure that the best and most valid criminal history markers remain intact for predictive utility. The five items utilized here appear to have great utility in predicting dangerous behavior across groups.

More generally, empirically demonstrating the validity of risk assessment instruments, such as the DVSI-R and the DRA, is critical because they form the cornerstone of decisions about IPV defendants/perpetrators and victims beyond those made in Connecticut. These instruments are used in a variety of contexts to prevent or reduce the likelihood that IPV will escalate to lethal proportions. They inform safety planning in emergency room hospital settings or with other first responders to connect victims with needed services to reduce harm and intervene in current IPV (Campbell et al., 2018; Petrosky et al., 2017). These instruments also are used to hold perpetrators accountable and guide recommendations for levels of supervision and the intensity of preventive interventions. Given the high stakes associated with such decisions, using risk assessment instruments grounded in empirical evidence is truly vital and holds considerable promise to guide strategies, policies, and programs designed to prevent or reduce persistent IPV and the tragic lethality that too often accompanies such egregious behavior.

Although there is wide variation in sample sizes and follow-up periods used in the reporting of concurrent and predictive validity (Singh et al., 2013), this study draws strength from using a large statewide sample of IPV defendants and following their behavior for an average of 18 months after initial arrest. Although IPV is a persistent public health problem for all states in the nation, including Connecticut, IPV cases are screened and handled differently. Different states also use alternate risk assessment tools, thus limiting the generalizability of our results. In addition, the ultimate goal of risk assessments is violence prevention, not just prediction. By focusing here on the relation between calculated risk scores and serious reoffending to test predictive validity, we were unable to consider what happened to defendants during the period between initial arrest and subsequent rearrests and whether decisions about prosecution and supervision mediated the risk–rearrest relation (Williams & Stansfield, 2017). Although we cannot confirm the validity of a dual assessment protocol in other states, the results provide evidence that the addition of items pertaining to potential lethality should provide important information. If this translates into effective case management decisions, some potentially lethal incidents may be prevented. Other jurisdictions, therefore, may explore the value of combining assessment tools to identify cases at risk of lethality, which cannot be done if the focus is solely on the risk of persistence.

There are other limitations to this study that should be acknowledged and addressed in future research. First, data rely on incidents that were initially brought to the attention of police in Connecticut and resulted in an arrest. The results therefore may not indicate predictive validity among perpetrators of violence whose acts are never reported or incidents are not founded. As an example, dangerous acts of violence such as nonfatal strangulation tend to have low rates of arrest and prosecution given the difficulty of proving injury (Pritchard et al., 2017). Our study was also focused on male defendants with female victims. Recent research suggests that the DVSI-R also has predictive validity for female perpetrators of violence in different-sex and same-sex couples (Gerstenberger et al., 2019). Further research should validate the dual assessment protocol with female defendants.

Conclusion

We recommend continued use of the dual risk assessment protocol in Connecticut and recommend validation in other states that are currently using the DVSI-R to guide risk assessment strategies. Screening for potential lethality in addition to violence persistence allows the opportunity for case workers to recommend stronger supervision (including prosecution) for individuals who may score lower on the DVSI-R yet represent a risk for potentially lethal violence. This could protect many vulnerable survivors of IPV from experiencing severe and potentially deadly injury.

Footnotes

Authors’ Note

This research was conducted with assistance from the Judicial Branch, Court Support Services Division (CSSD) of the State of Connecticut.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Richard Stansfield

Author Biographies

Kirk R. Williams is a professor in the department of criminology, law, and society in the School of Social Ecology at UC-Irvine. He has published widely on the determinants of homicide rate variation and the causes and prevention of youth violence and intimate partner violence.

Richard Stansfield is an associate professor of criminal justice at Rutgers University in Camden, NJ. His research focuses on violence, homicide, and the causes and correlates of violent recidivism.

Jacquelyn Campbell is a national leader in research and advocacy in the field of domestic and intimate partner violence, publishing widely on violence and health outcomes.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). The standards for educational and psychological testing. American Educational Research Association.

Buzawa

E. S.

Buzawa

C. G.

Stark

E. D.

(2015). Responding to domestic violence: The integration of criminal justice and human services. Sage.

Campbell

J. C.

(Ed.). (2007). Assessing dangerousness: Violence by batterers and child abusers. Springer.

Campbell

J. C.

Glass

Sharps

P. W.

Laughon

Bloom

(2007). Intimate partner homicide: Review and implications of research and policy. Trauma, Violence, & Abuse, 8(3), 246–269.

Campbell

J. C.

Messing

J. T.

Williams

K. R.

(2018). Prediction of homicide of and by battered women. In Campbell

J. C.

Messing

J. T.

(Eds.), Assessing dangerousness (3rd ed., pp. 107–138). Springer.

Campbell

J. C.

Webster

Glass

(2009). The danger assessment: Validity of a lethality risk instrument for intimate partner femicide. Journal of Interpersonal Violence, 24(4), 653–674.

Campbell

J. C.

Webster

Koziol-McLain

Block

Campbell

Curry

M. A.

, et al. (2003). Risk factors for femicide in abusive relationships: Results from a multisite case control study. American Journal of Public Health, 93(7), 1089–1097.

Cattaneo

L. B.

Goodman

L. A.

(2005). Risk factors for reabuse in intimate partner violence: A cross-disciplinary critical review. Trauma, Violence, & Abuse, 6(2), 141–175.

Chouldechova

(2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153–163.

10.

Cronbach

L. J.

Meehl

P. E.

(1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.

11.

Dressel

Farid

(2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), Article eaao5580.

12.

Dugan

Nagin

D. S.

Rosenfeld

(2003). Exposure reduction or retaliation? The effects of domestic violence resources on intimate partner homicide. Law & Society Review, 37, 169–198.

13.

Eke

A. W.

Hilton

N. Z.

Harris

G. T.

Rice

M. E.

Houghton

R. E.

(2011). Intimate partner homicide: Risk assessment and prospects for prediction. Journal of Family Violence, 26(3), 211–216.

14.

Fox

J. A.

Fridel

E. E.

(2017). Gender differences in patterns and trends in U.S. homicide, 1976-2015. Violence and Gender, 4(2), 37–43.

15.

Gerstenberger

Stansfield

Williams

K. R.

(2019). Intimate partner violence in same-sex relationships: An analysis of risk and rearrest. Criminal Justice and Behavior, 46(11), 1515–1527.

16.

Harcourt

(2015). Risk as a proxy for race: The dangers of risk assessment. Federal Sentencing Reporter, 27, 237–243.

17.

Hilton

N. Z.

Harris

G. T.

(2005). Predicting wife assault: A critical review and implications for policy and practice. Trauma, Violence, & Abuse, 6(1), 3–23.

18.

Hilton

N. Z.

Harris

G. T.

Rice

M. E.

Houghton

R. E.

Eke

A. W.

(2008). An indepth actuarial assessment for wife assault recidivism: The Domestic Violence Risk Appraisal Guide. Law and Human Behavior, 32(2), 150–163.

19.

Hilton

N. Z.

Harris

G. T.

Rice

M. E.

Lang

Cormier

C. A.

Lines

K. J.

(2004). A brief actuarial assessment for the prediction of wife assault recidivism: The Ontario domestic assault risk assessment. Psychological Assessment, 16(3), 267–275.

20.

Kropp

P. R.

Hart

S. D.

(2004). The development of the Brief Spousal Assault Form for the Evaluation of Risk (B-SAFER): A tool for criminal justice professionals. Research and Statistics Division, Department of Justice.

21.

Kropp

P. R.

Hart

S. D.

(2015). The Spousal Assault Risk Assessment Guide Version 3 (SARA-V3). ProActive ReSolutions.

22.

Messing

J. T.

Campbell

J. C.

Snider

(2017). Validation and adaptation of the danger assessment-5: A brief intimate partner violence risk assessment. Journal of Advanced Nursing, 73, 3220–3230.

23.

Messing

J. T.

Thaller

(2013). The average predictive validity of intimate partner violence risk assessment instruments. Journal of Interpersonal Violence, 28(7), 1537–1558.

24.

Nicolaidis

Curry

M. A.

Ulrich

Sharps

McFarlane

Campbell

, et al. (2003). Could we have known? A qualitative analysis of data from women who survived an attempted homicide by an intimate partner. Journal of General Internal Medicine, 18(10), 788–794.

25.

Petrosky

Blair

J. M.

Betz

C. J.

Fowler

K. A.

Jack

S. P. D.

Lyons

B. H.

(2017). Racial and ethnic differences in homicides of adult women and the role of intimate partner violence—United States, 2003-2014. Morbidity and Mortality Weekly Report, 66(28), 741–746.

26.

Pritchard

A. J.

Reckdenwald

Nordham

(2017). Nonfatal strangulation as part of domestic violence: A review of research. Trauma, Violence, & Abuse, 18(4), 407–424.

27.

Sharps

Campbell

J. C.

Webster

G. D.

(2003). Risk mix: Drinking, drug use and homicide. National Institute of Justice Journal, 250, 9–13.

28.

Singh

J. P.

Desmarais

S. L.

van Dorn

R. A.

(2013). Measurement of predictive validity in violence risk assessment studies: A second order systematic review. Behavioral Sciences & Law, 31, 55–73.

29.

Skeem

J. L.

Lowenkamp

C. T.

(2016). Risk, race, and recidivism: Predictive bias and disparate impact. Criminology, 54, 680–712.

30.

Snider

Webster

O’Sullivan

C. S.

Campbell

(2009). Intimate partner violence: Development of a brief risk assessment for the emergency department. Academic Emergency Medicine, 16(11), 1208–1216.

31.

Stansfield

Williams

K. R.

(2014). Predicting family violence recidivism using the DVSI-R: Integrating survival analysis and perpetrator characteristics. Criminal Justice and Behavior, 41, 163–180.

32.

Stansfield

Williams

K. R.

(2021). Coercive control between intimate partners: An application to nonfatal strangulation. Journal of Interpersonal Violence, 36(9-10), NP5105–NP5124.https://doi.org/10.1177/0886260518795175

33.

Starr

(2014). Evidence-based sentencing and the scientific rationalization of discrimination. Stanford Law Review, 66, 803–872.

34.

Starr

(2015). The new profiling: Why punishing based on poverty and identity is unconstitutional and wrong. Federal Sentencing Reporter, 27, 229–236.

35.

Williams

K. R.

(2008). The Domestic Violence Screening Instrument (DVSI and DVSI-R). In Cutler

B. L.

(Ed.), Encyclopedia of psychology and law (pp. 240–242). Sage.

36.

Williams

K. R.

(2012). Family violence risk assessment: A predictive validation study of the Revised Domestic Violence Screening Instrument (DVSI-R). Law and Human Behavior, 36, 120–129.

37.

Williams

K. R.

Grant

S. R.

(2006). Empirically examining the risk of intimate partner violence: The Revised Domestic Violence Screening Instrument (DVSI-R). Public Health Reports, 121, 400–408.

38.

Williams

K. R.

Stansfield

(2017). Disentangling the risk assessment and intimate partner violence relation: Estimating mediating and moderating effects. Law and Human Behavior, 41, 344–353.