Abstract
This meta-analysis examined whether psychological treatments delivered to adults with histories of violent offending in correctional and forensic mental health settings were related to change in dynamic risk factors. Twenty-two controlled studies (86% comprised male samples) were identified via a comprehensive search. Treatments had a significant effect overall, with small to moderate improvements in trait anger, social problem solving, impulsivity, and general social skill. The treatment effect for antisocial cognitions was nonsignificant. There were a small number of significant moderators, which provide preliminary indications of treatment features that may promote greater change. However, small sample sizes and difficulties disentangling moderating effects preclude drawing firm conclusions. While these results are useful and promising, the field remains limited by few high-quality outcome studies, disparate outcome/measure selection, inconsistent/incomplete reporting of evaluations, and limited information about whether change in dynamic risk factors predicts reductions in recidivism. Suggestions for advancing the field are provided.
Interpersonal violence is a global public health problem, with manifest costs and consequences that span significant financial strain on the criminal justice and health sectors, adverse psychosocial effects for victims and their families, weakened public perceptions of safety, and reduced quality of life for those who perpetrate violence (Ross et al., 2013; Serin et al., 2009). Most violent crimes are perpetrated by a small group of persistently violent individuals (Falk et al., 2014) who have higher rates of both violent and nonviolent recidivism than other justice-involved individuals. These individuals are often imprisoned or, for some who experience mental illness, detained in secure psychiatric facilities for the dual purpose of protecting the community and promoting rehabilitation (Daffern et al., 2017). Recent estimates suggest that people with convictions for violence make up a large proportion (up to 70%) of imprisoned, securely hospitalized, and community-supervised individuals (Sturge, 2018; Völlm et al., 2018).
It is difficult, then, to overstate the value of psychological treatments for people who are violent that lead to meaningful reductions in violent recidivism. However, theoretical and empirical developments in psychological treatments for this population remain relatively limited (Polaschek & Collie, 2004) and high-quality evidence of the impact of treatment is surprisingly small (Polaschek, 2019), particularly for violent individuals experiencing mental illness (Morgan et al., 2012; Papalia et al., 2019). In the first meta-analytic review of interventions for adults with violent convictions, Jolliffe and Farrington (2007) could only locate 11 rigorous quasi-experiments and randomized controlled trials (RCTs). Only eight of these evaluations measured the effects of intervention on violent recidivism. Subsequently, Papalia et al. (2019) were only able to locate and meta-analyze the impact of 16 rigorous evaluations of psychological treatments on violent recidivism. Collectively, these two meta-analyses indicate that treatments for violent individuals are effective with a reduction in violent recidivism of 8 to 10 percentage points in favor of those receiving treatment. Nevertheless, this effect is modest, highlighting the need to improve treatments for people with histories of violent offending.
Understanding the mechanisms of behavioral change is crucial to deciding the appropriate focus and necessary features of treatments for those who use violence, as well as for advancing theories of violence and aggression (Gilbert & Daffern, 2010; Polaschek, 2019). According to the Psychology of Criminal Conduct, which underscores the risk, need, and responsivity (RNR) principles, a reduction in the likelihood of violence occurs via a weakening in the strength of dynamic risk factors (Andrews & Bonta, 2014). Several dynamic risk factors have been elucidated, and some empirical support exists for the proposition that reduction in the presence and relevance of dynamic risk factors is the mechanism of change in violent individuals (e.g., Lewis et al., 2013). As such, targeting dynamic risk factors in treatment has become the fundamental strategy in contemporary violence reduction treatment programs (Daffern et al., 2017; Klepfisz et al., 2016). However, little is known about whether dynamic risk factors actually change in violence treatments, for whom change is most likely to occur, in what settings/contexts change occurs most readily, and which treatment features are most relevant to change (e.g., treatment type, duration, delivery format, content). Ross et al. (2013) conducted a qualitative systematic review (N = 10) of the impact of psychological treatments on violent behavior in clinical and forensic settings. Although outcome measures pertaining to change in dynamic risk factors (or intermediate treatment outcomes 1 ) were included in their review, these authors did not systematically analyze the impact of treatments on these intermediate targets. There is meta-analytic evidence to suggest that violence treatments that incorporate components addressing anger control and interpersonal skills are associated with larger reductions in recidivism (Jolliffe & Farrington, 2007; Papalia et al., 2019). We could extrapolate that these findings suggest that anger control and interpersonal skills improve as a consequence of treatment, but without a direct assessment of these intermediate targets, this is ultimately unknown.
This meta-analysis aimed to synthesize the best available research evidence concerning the effects of treatment for adults who have violently offended on intermediate treatment outcomes—that is, the purported dynamic risk factors addressed and measured in each study. Given the absence of previous work in this area, we took a broad approach, focusing on quality evaluations (i.e., quasi-experiments and RCTs) in both correctional and forensic mental health settings, including males and females. Although there are distinctions between justice-involved males and females and those within correctional versus forensic mental health environments, there is evidence demonstrating considerable overlap in dynamic risk factors and thus treatment needs across these populations (Andrews et al., 2012; Skeem et al., 2014, 2016). We reasoned that any meaningful differences in treatment effectiveness by sex and/or setting could be explored via moderator analyses. This meta-analysis aimed to address two specific questions: (a) Are psychological treatments delivered to violent adults in correctional and forensic mental health settings, in custody and in the community, associated with significant change in intermediate treatment outcomes relative to a comparable control condition? (b) Do factors relating to study design features, sample characteristics, and treatment variables moderate the efficacy of these treatments?
Method
This review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (PRISMA; Moher et al., 2009). It forms part of a larger systematic review examining the efficacy of psychological treatments for violent behavior, with results pertaining to the impact of treatments on community recidivism and institutional misconducts already published (Papalia et al., 2019). Aside from the outcomes selected for analysis, the methodology described below is the same as that outlined in Papalia et al. (2019) and is reproduced with minor modifications. The review protocol was registered with PROSPERO International Prospective Register of Systematic Reviews (Registration Number: CRD42017070173).
Eligibility Criteria
All records were required to describe original research and be available in the English language. Given that the RNR model for the treatment of criminal behavior was first formalized in 1990 and because these principles are widely accepted and associated with reductions in recidivism, we excluded studies published prior to 1990. Both published and unpublished research studies were included.
Population
The target population was adults with a history of violent offending, defined by official records or self- or informant-reports. Due to variation in the definition of violence, all definitions were accepted, including those that encompassed acts of sexual violence and domestic violence. However, studies with a sole focus on individuals with histories of sexual or domestic violence were excluded due to concerns around the generalizability of these studies to the broader population of violent individuals. Studies evaluating violence treatments for people with intellectual and/or learning disabilities were excluded. Samples comprising individuals with histories of varied offenses were only eligible for inclusion if (a) there was clear evidence that at least 80% of the sample comprised people with a history of violent offending or (b) fewer than 80% of the sample comprised individuals with past violence, but study outcomes were reported separately for the violent subsample.
Interventions
All types of structured psychological treatments (e.g., cognitive behavior therapy, schema therapy) were eligible for inclusion. Psychological treatments were defined as “talking therapies” that drew upon psychological principles that intervened in the domains of thoughts, feelings, and/or behaviors. The aims of the treatments must have been broadly stated to reduce violent, aggressive, or otherwise antisocial behavior.
Comparators
Studies were eligible for inclusion if the psychological treatment was contrasted with one or more comparable control conditions. RCTs always satisfied this criterion, whereas single-group pretest and post-test designs never met this criterion. Studies that only employed a treatment attrition/drop-out group as the comparator were excluded. Nonrandomized (quasi-experimental) controlled studies were eligible if one of the following conditions was satisfied: (a) matching of the treatment and control groups on a recognized pretreatment risk variable for offending, such as prior offending history or risk of reconviction score; (b) comparability (i.e., no significant differences) between the treatment and control groups on a pretreatment measure of at least one offending behavior outcome variable (e.g., mean number of aggressive incidents in the previous 6 months); or (c) some other demonstration of pretreatment comparability of the treatment and control groups (e.g., comparability on index sentence length, scores on treatment target measures). These criteria are consistent with study designs at Level 3 to Level 5 on the modified University of Maryland Scientific Methods Scale (MSMS; Friendship et al., 2005; Sherman et al., 1998).
Outcomes
Studies were required to measure and report at least one quantitative primary or secondary outcome variable for the treatment and control groups. Primary outcomes were behavioral acts (e.g., criminal recidivism), whereas secondary outcomes were psychological constructs (e.g., anger, impulsivity). Given the number of studies identified and included in the review, primary and secondary outcomes were reported separately. An earlier paper reports results for impacts on violent reoffending (Papalia et al., 2019), whereas this article reports results for changes in intermediate treatment outcomes, that is, within-treatment change, generally measured pretreatment and post-treatment.
All intermediate outcomes examined across treatment and control groups were initially extracted for synthesis in the review. This process initially resulted in 14 outcomes (anger, hostility, impulsivity, criminal attitudes, violent attitudes, violence risk, social problem solving, social functioning, schemata, psychopathology, personality, coping, locus of control, and readiness for change). Eligibility of each outcome was determined, in part, by the number of separate studies reporting data for pretreatment and post-treatment. Although there is no agreed-upon rule for the minimum number of studies to include in meta-analysis, we followed the recommendations of the United States Agency for Healthcare Research and Quality (Fu et al., 2010), which recommends a minimum of four. Seven outcomes (violence risk, schemata, psychopathology, personality, coping, locus of control, and readiness for change) did not meet this criterion and were excluded from the article. Of the remaining outcomes, criminal attitudes, violent attitudes, and hostility were considered to be closely enough related to combine into a single outcome, which we termed antisocial cognitions—defined as cognitions that encourage criminal or harmful behavior. With the exception of one study (Watt & Howells, 1999), all outcomes examining anger included a trait anger measure. As a result, we decided to exclude the study that did not measure trait anger and redefine the outcome as trait anger—a long-term disposition to experiencing anger as a general tendency or in response to provocations (Carroll, 2013). Outcomes grouped under social functioning examined social-emotional sensitivity and expression, interpersonal competence, wellbeing, social competence, assertiveness, and general social skill. These outcomes were considered too diverse to be included in a single analysis. However, only general social skill, defined as the absence of problems in social functioning (Tyrer et al., 2005), had the requisite number of studies to be included in the analysis. Impulsivity was defined as a tendency to engage in behavior with little or no planning or forethought. Social problem solving was defined as the extent to which a person attempts to identify or engage in adaptive coping responses for everyday social problems (D’Zurilla & Nezu, 1990).
Included studies varied in when they measured outcomes post-treatment. It was unspecified in 12 studies, reported as commencing directly following treatment in five studies, within 2 weeks of the end of treatment in three studies, during and post-treatment measurements combined in one study, and combined measurements taken during treatment, post-treatment, and during follow-up in one study. A minority of studies (n = 9) reported follow-up measurements taken after post-treatment testing, with follow-up times ranging from 1 to 12 months. We opted to conduct quantitative analyses on effects at pretreatment and post-treatment to assess the effects across consistent follow-up periods. We examined the effect of longer follow-up periods on the overall effect by recalculating meta-analytic models with data from longer follow-up periods substituted for post-treatment measurements and reporting the pooled effect and 95% confidence intervals (CIs).
Settings
Studies conducted in correctional settings (i.e., violent individuals in prison, supervised on community corrections orders, or released on parole) and forensic mental health settings (i.e., violent patients with mental illness in forensic psychiatric inpatient facilities or receiving outpatient treatment) were eligible for inclusion.
Search Strategy
To locate primary studies for the present review, the following electronic databases were searched from January 1, 1990, to July 5, 2017: Cochrane Central Register of Controlled Trials (CENTRAL); PsycINFO; ISI Web of Science (Core Collection); Embase; MEDLINE; Criminal Justice Abstracts; ProQuest (Criminal Justice Database and Sociology Database); EBSCOhost (CINAHL and Health Source: Nursing/Academic Edition); and CINCH Australian Criminology Database. We used a range of search terms and subject headings relating to violence, justice-involved people, treatment, and study design. An example of an executed search strategy appears in the Supplemental Material (Table S1). Google, Google Scholar, and relevant government websites (e.g., Australian Institute of Criminology, Forum for Corrections Research) were searched to identify relevant gray literature and unpublished research. We also hand-searched 11 existing literature reviews and eight key journals (e.g., International Journal of Offender Therapy and Comparative Criminology, Criminal Justice and Behavior) to gather additional records. These additional searches were conducted on August 1 and 2, 2017. Finally, in November, 2017, we contacted 16 leading experts in the field to request information about any additional published, unpublished, or in-progress studies.
Study Selection and Data Extraction
Study selection and data extraction were undertaken by N.P. and three research assistants who were students undertaking doctoral degrees. Disagreements over study eligibility or data coding were resolved through discussion, involving authors M.D. and J.R.P.O. when required. After removing duplicate records, N.P. and two research assistants independently screened titles and abstracts of 200 randomly selected records to remove documents that obviously did not meet inclusion criteria. Interrater agreement for these records was “excellent” (Fleiss’ kappa = 0.85; Fleiss, 1981), and the remaining records were divided between research assistants for screening. Next, full-text articles were obtained and assessed against inclusion criteria using a piloted coding form. To ensure consistent decision-making, a random sample of full-text articles (10%) were independently reviewed and assessed for eligibility by author N.P. and two research assistants. Again, interrater agreement was “excellent” (Fleiss’ kappa = 0.78; Fleiss, 1981). The remaining full-text articles were assessed by the research assistants, with N.P. reviewing all inclusion/exclusion decisions. Attempts were made to contact authors where more information was needed to determine eligibility.
We developed a comprehensive coding form prior to commencing data extraction. We piloted the form using a small subsample of included records (n = 4) coded independently by N.P. and two research assistants. Every item was reviewed, discrepancies resolved, and poor coding items revised in an attempt to improve scorer accuracy and agreement. The final coding form included an array of variables grouped into five areas: study/author (e.g., year, geographic location, setting); sample (e.g., sample size, attrition rates, demographics); research design (e.g., study design, analysis type, length of follow-up); treatment (e.g., treatment type and intensity, session format, presence/absence of specific treatment components); and effect size (e.g., sample size in analysis, pretest and post-test means and standard deviations of treatment and control groups). Refer to Papalia et al. (2019) for a complete list of coding items. The remaining documents were assigned to the research assistants to complete data extraction.
Risk of Bias (RoB) in Individual Studies
RoB in included studies was assessed using the domain-based approach described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al., 2011). Domains included selection bias, performance bias, measurement bias, attrition bias, and reporting bias. Each domain was rated as either low/probably low RoB (plausible bias unlikely to seriously alter the results), high/probably high RoB (plausible bias that seriously weakens confidence in the results), or unclear RoB (insufficient information to permit judgment of RoB). Ratings were made at the study level rather than the outcome level, as most studies included multiple outcomes. We also used the MSMS, which has been widely used in criminological research (Friendship et al., 2005; Sherman et al., 1998). The MSMS provides a metric that integrates various methodological study features regarding the validity of causal inferences. RoB and MSMS ratings were independently scored by two research assistants, with disagreements resolved through discussion with N.P. RoB ratings were used to refine data synthesis narratively, whereas MSMS ratings were used in quantitative analyses as a moderator variable.
Synthesis of Results
The five intermediate outcomes (trait anger, impulsivity, social problem solving, antisocial cognitions, and general social skill) were meta-analyzed separately, with some studies contributing effect sizes to more than one outcome. A small number of studies were excluded from the meta-analyses of treatment effects due to insufficient information to calculate an effect size. A narrative summary of these studies was provided to complement the quantitative syntheses. All statistical calculations and analyses were conducted using the Metafor package (Viechtbauer, 2010), which is part of the R program for statistical computing (version 3.60; R Core Team, 2018).
The chosen summary effect measure was the standardized mean change (SMC) for pretest post-test control designs (Morris, 2008). The SMC indicates the average difference in change from pretest to post-test between the control and treatment groups. The SMC is expressed in standard deviation units, so that an SMC of one represents a difference of one standard deviation between the control and treatment group in the amount of pretest to post-test change. The SMC was calculated by obtaining mean scores, standard deviations, and number of participants for control and treatment groups at both pretreatment and post-treatment stages for each study. One study (Howells et al., 2005) reported ranges rather than a specific number of participants for control and treatment groups. A sensitivity analysis was employed where four meta-analytic models were compared utilizing a different n for the control and treatment groups based on the minimum and maximum of the range. Differences across the models were trivial, and a random selection process was used to select the final model for analysis.
Studies differed in the extent to which they reported both subscale and total scores for outcome measures, total scores only, or subscale scores only. For the purpose of parsimony, we relied on total scores where possible. In studies where only subscale scores were reported, the results for each subscale were aggregated following the procedure outlined by Borenstein (2009) and Gleser and Olkin (1994). The aggregation procedure utilized requires an estimate of the correlation between subscales, which was not reported in any of the relevant studies. To account for this, a sensitivity analysis was employed with aggregation performed at strengths ranging from r = .4 to .9 at intervals of r = 0.1 for each aggregation. Five models that varied in terms of the strength of the correlation within the aggregated effects were calculated. Differences between models were trivial, and a random selection process was used to determine which model to use for reporting. Finally, when studies analyzed outcomes using both intent-to-treat and completer-only analyses, we selected the more conservative intent-to-treat analysis in the calculation of effect sizes.
For each outcome, individual effect sizes were pooled across studies using the inverse variance random-effects model. To provide a measure of precision, we report the 95% CI estimate of the pooled effect for each outcome. We examined heterogeneity in the effect sizes through the Q (χ2) statistic and the I2 index reflecting the percentage of total dispersion across studies that is due to true variance rather than error variance, with I2 values of 25%, 50%, and 75% judged as low, moderate, and high, respectively (Higgins et al., 2003). We also calculated a 95% prediction interval for each outcome, which estimates where the true effects are expected to be for 95% of similar studies that might be undertaken in the future (Borenstein et al., 2017). Standardized residuals were calculated to identify studies having an undue influence on the pooled effect. In all the meta-analyses conducted, there were no standardized residuals above 1.96, suggesting no evidence of outliers. We also conducted sensitivity analysis using the “one study removed” approach. Finally, to examine the impact of possible publication bias, we used Egger’s regression test for funnel plot asymmetry, and both Rosenthal’s (1979) and Orwin’s (1983) fail-safe N methods. Testing of funnel plot asymmetry was not undertaken for outcomes where there were fewer than 10 studies in the meta-analysis.
Moderator Analysis
Methodology, sample, and treatment-related moderator effects were tested using a series of meta-regressions with single covariates. Subgroups with fewer than two effect sizes were not included in the meta-regressions. Due to large amounts of missing data, inconsistent recording of information within studies, and/or insufficient variation in some variables, we were unable to examine all prespecified moderator variables. 2 The small number of effect sizes available for general social skill precluded any assessment of moderator effects for this outcome.
Results
Study Selection and Characteristics
Figure 1 depicts the study selection process and reasons for excluding records. Ultimately, 22 independent studies were retained in the review, with 18 reporting sufficient data for meta-analyses. Table 1 displays the 22 included studies with a combined sample size of 1,969 (Mage = 34.2, SD = 4.5, range = 24–42; age data unavailable for one study). Most studies were conducted in the United Kingdom (n = 13, 59.1%), followed by the United States (n = 4, 18.2%), Australia (n = 3, 13.6%), Canada (n = 1, 4.5%) and the Netherlands (n = 1, 4.5%). Most studies examined male-only samples (n = 19, 86.4%). Half of the studies (n = 11, 50.0%) were undertaken in prison settings, 10 (45.5%) in an inpatient forensic mental health environment, and one (4.5%) in a community setting. Treatment types included several versions of the Reasoning and Rehabilitation programs (n = 7, 31.8%), other cognitive behavioral based treatments (n = 4, 18.2%), anger management programs (n = 6, 27.3%), third wave cognitive behavior therapies (i.e., dialectical behavior therapy, schema focused therapy; n = 2, 9.1%), and violence reduction programs (n = 3, 13.6%). The estimated average duration of treatment was 54.3 hr (SD = 65.3, range = 16–300; length of treatment [hours] unavailable for one study). Total attrition rate (comprising both treatment and control groups) across studies was considerable, ranging from 0% to 67.1% depending on the outcome examined. Three studies did not examine the attrition rate.

PRISMA Diagram Depicting the Flow of Studies From Identification to Inclusion
Characteristics of Studies Included in the Systematic Review
Note. Tx = treatment; NR = not recorded; STAXI = State-Trait Anger Expression Inventory; NAS = Novaco Anger Scale; BDHI = Buss-Durkee Hostility Inventory; O-H = Overcontrolled Hostility Scale; SFQ = Social Functioning Questionnaire; BIS = Barratt Impulsiveness Scale; CIRCLE = Chart of Interpersonal Reactions in Closed Living Environments; SPSI-R = Social Problem Solving Inventory–Revised; IVE = Eysenck Impulsiveness-Venturesomeness-Empathy Scale; HOS = Hostility Scale; MIS = MI Observation Scale; I-7 = I-7 Impulsivity Questionnaire; AQ-H = Aggression Questionnaire–Hostility subscale; DSM-IV = Diagnostic and Statistical Manual of Mental Disorders IV; MVQ = Maudsley Violence Questionnaire; APQ-I = The Antisocial Personality Questionnaire–Impulsivity subscale; DBSP-SP = Disruptive Behavior and Social Problem Scale–Social and Psychological Functioning subscale.
Where an outcome was measured by more than one tool, outcome measure in bold indicates the measure selected for meta-analysis. bTwo components of the therapy were described as not being of definite duration. The value presented is a minimum estimate taking into account the duration of all modules with a definite duration and excluding those of indefinite duration.
RoB
Table 1 groups studies according to their rating on the MSMS, with 15 (68.2%) rated as Level 3 (i.e., unmatched groups or poorly matched groups) and seven (31.8%) rated as Level 5 (i.e., randomly assigned groups). Domain-based RoB ratings are depicted in Figure 2. Selection bias was rated as high/probably high in 77.3% of studies, which often reflected the use of unmatched or poorly matched groups, selection of treatment completers, or, for RCTs, inadequate randomization procedures or concealment of allocations prior to assignment. Performance bias arises when the condition received by study subjects differs substantially from the condition intended or planned and was rated as high/probably high in a large minority of studies (40.9%). For RCTs, this was due to a lack of blinding of participants and personnel to allocated interventions, which could have influenced outcomes. For nonrandomized controlled trials (NRCTs), this was due to ambiguity in the classification of interventions (e.g., treatment completion retrospectively defined as having attended half the planned sessions) and/or deviations from intended interventions (e.g., unbalanced cointerventions across groups, poor adherence). Most studies contained insufficient information to properly rate performance bias (59.1%).

Risk of Bias Graph Showing the Proportion of Included Studies (N = 22) Rated as “Low/Probably Low,” “Unclear,” and “High/Probably High,” Risk of Bias Across Five Domains
Measurement bias was rated as high/probably high for most studies (90.9%) due to the use of self-reported or informant-rated data, which could have been influenced by knowledge of the intervention and comparator conditions. Attrition bias was assessed as high/probably high in 50% of studies; reasons included the exclusion of treatment noncompleters in outcome analyses and imbalance in numbers or reasons for missing outcome data across groups. Bias due to selective outcome reporting (reporting bias) was rated by examining a preregistered trial protocol (if available) or a prespecified statistical analysis plan. Around 40% of studies were rated as high/probably high RoB in this domain due to incomplete outcome data resulting in exclusion from meta-analysis, missing prespecified outcomes, or insufficiently detailed statistical analysis plans. All studies were rated as high/probably high RoB in at least one domain (M high/probably high RoB ratings = 3).
Effects of Treatments
Trait Anger
There were 17 studies that measured trait anger outcomes and 14 were included in the meta-analysis of treatment effects. The excluded studies (Doyle et al., 2016; Heseltine et al., 2010; Kubiak et al., 2015) found small, nonsignificant effects when comparing treated groups with controls on measures of trait anger. Heseltine et al. and Doyle et al. were in favor of treatment, and Kubiak et al. was in favor of controls. Individual effect size data for studies included in the meta-analysis of trait anger outcomes are shown in Figure 3. The pooled SMC was −0.27 (95% CI = [−0.45, −0.10]; p = .002), equivalent to a drop of more than one fourth of a standard deviation relative to controls. The pooled SMC using the “one study removed” approach ranged from −0.21, p =.02 (removing Doyle et al., 2013) to −0.32, p < .001 (removing Bowes et al., 2012). Two studies contained data for an additional follow-up time beyond post-treatment (Cullen et al., 2012; Forbes, 1990). When utilizing follow-up data in place of post-treatment data for these studies, the pooled SMC was slightly reduced to −0.25 (95% CI = [−0.44, −0.06]).

Individual Effect Sizes for Trait Anger Outcomes
Impulsivity
Eleven studies measured impulsivity, nine of which were included in the meta-analysis of treatment effects (Figure 4). The two excluded studies (Doyle et al., 2016; Serin et al., 2009) found small, nonsignificant effects in favor of controls relative to treated groups on measures of impulsivity. The pooled SMC for impulsivity was −0.32 (95% CI = [−0.61, −0.04]; p = .03), equivalent to a drop of nearly one third of a standard deviation relative to controls. Removing one study at a time, the pooled SMC ranged from −0.22, p = .10 (removing Doyle et al., 2013) to −0.40, p = .009 (removing Bowes et al., 2012). One study contained data for an additional follow-up time beyond post-treatment (Cullen et al., 2012). The pooled SMC was slightly reduced to −0.30 (95% CI = [−0.58, −0.02]) when using follow-up data in place of post-treatment data for this study.

Individual Effect Sizes for Impulsivity Outcomes
Social Problem Solving
All nine studies that measured social problem solving were included in the meta-analysis of treatment effects (Figure 5). The pooled SMC was 0.39 (95% CI = [0.13, 0.64]; p = .003), equivalent to an increase of more than one third of a standard deviation relative to controls. The pooled SMC using the “one study removed” analysis ranged from 0.34, p =.02 (removing Doyle et al., 2013) to 0.44, p = .003 (removing Rees-Jones et al., 2012). Only Cullen et al. (2012) contained follow-up data beyond post-treatment, and when used in place of post-treatment data, the pooled SMC was slightly reduced to 0.35 (95% CI = [0.10, 0.61]).

Individual Effect Sizes for Social Problem Solving Outcomes
Antisocial Cognitions
Of the 13 studies that measured antisocial cognitions, 11 were included in the meta-analysis of treatment effects (Figure 6). Doyle et al. (2016) and Serin et al. (2009) lacked sufficient statistics to calculate the required effect size. Doyle et al. found a small nonsignificant reduction in hostility in favor of the control group, and Serin et al. found a nonsignificant difference between treatment and control groups using the repeated measures analysis of variance (ANOVA); the effect size was miniscule with a partial eta squared of .02. For the remaining 11 studies, the pooled SMC was not significant, −0.19 (95% CI = [−0.41, 0.03]; p = .09). Sensitivity analyses showed the SMC ranged from −0.24, p = .04, on removal of Daffern et al. (2017) to −0.13, p = .30, removing Rees-Jones et al. (2012). Three studies contained follow-up data beyond post-treatment (Cullen et al., 2012; Jotangia et al., 2015; Rees-Jones et al., 2012). When utilizing follow-up data in place of post-treatment data, the pooled SMC was slightly reduced to −0.17 (95% CI [−0.41, 0.06]).

Individual Effect Sizes for Antisocial Cognitions Outcomes
General Social Skill
Four studies measured general social skill outcomes, and all were included in the meta-analysis (Figure 7). The pooled SMC was 0.55 (95% CI = [0.14, 0.96]; p = .008), equivalent to an increase of more than half a standard deviation relative to controls. “One study removed” analyses had the pooled SMC ranging from 0.50, p = .04 (removing Jotangia et al., 2015) to 0.67, p = .007 (removing Davidson et al., 2009). No studies reported follow-up data beyond post-treatment.

Individual Effect Sizes for General Social Skill Outcomes
Publication Bias
Egger’s regression test yielded nonsignificant results for trait anger (z = −0.18) and antisocial cognitions (z = −0.13) indicating a symmetrical dispersion of the effect sizes by standard error. Rosenthal’s (1979) fail-safe N method indicated that 34 studies for trait anger, 22 for social problem solving, 20 for impulsivity, and 7 studies for general social skill with null findings would be required to bring the treatment effect for each outcome below the level of significance. Orwin’s (1983) fail-safe N method indicated that to bring the pooled treatment effect to a trivial size (SMC = ±.10), 28 trait anger, 30 social problem solving, 23 impulsivity, 20 general social functioning, and 10 antisocial cognition studies with an SMC of 0 would be required.
Moderator Analysis
Results of moderator analyses for methodology, sample, and treatment-related variables can be found in the Supplemental Material (Tables S3–S9). None of the methodology and sample-related covariates were significantly associated with treatment effects. Treatments that were primarily facilitated by individuals with qualifications in psychology (i.e., fully qualified psychologists and trainee psychologists) were associated with improved treatment outcomes for trait anger; however a lack of sufficient information in studies for other outcomes meant that this moderator could only be examined for trait anger. Findings related to impulsivity indicated that the number of sessions per week was significantly associated with treatment effect, such that an increase in the number of sessions per week was associated with a relative reduction in impulsivity scores relative to controls. Two other moderators associated with impulsivity attained significance. The first, treatment format indicated that studies involving group therapy only were associated with a greater reduction in impulsivity relative to studies involving group and individual treatment. The second, the presence of a moral/values training component was associated with a reduction in impulsivity relative to studies that were coded as not containing this treatment component.
Discussion
This is the first meta-analysis to review the impact of psychological treatments for adults with a history of violent offending on intermediate treatment outcomes, that is, dynamic risk factors assessed and measured in original studies. On average, treatments had a statistically significant effect, with small to moderate improvements on indicators of trait anger, social problem solving, impulsivity, and general social skill. These changes were greater than those observed among untreated violent individuals and are unlikely to be fully accounted for by publication bias.
While the meta-analytic findings are positive and provide support for the notion that certain so-called dynamic risk factors can change during treatment, two important caveats must be considered. First, due to possible client and treatment provider bias, we do not know whether changes on intermediate treatment targets reflect change on the relevant latent constructs. The validity of the self-report of justice-involved populations is one of the perennial problems with evaluating intermediate outcomes for criminal justice and forensic mental health populations. Individuals may be more susceptible to impression management and other response biases at the end of treatment in an attempt to show treatment progress or to secure benefits such as parole (Juarez & Howard, 2018). Clinician-rated measures to assess change may also be subject to confirmation bias because the same raters often conduct both pretreatment and post-treatment assessments and are not blind to what occurred during treatment (De Vries Robbé et al., 2015; Yesberg & Polaschek, 2019). In some cases, assessors have provided treatment.
Second, it was not possible to examine whether intermediate change, on any single attribute or combined, was associated with reductions in violent behavior. The studies did not assess and adequately report both change in the intermediate targets and recidivism. Only four studies included in this meta-analysis overlapped with our earlier meta-analysis focusing on behavioral outcomes (i.e., institutional misconducts and community recidivism; Papalia et al., 2019) and none of them tested whether change in intermediate outcomes predicted change in behavioral outcomes. Even more broadly, the evidence for within-treatment change translating into real-world reduction in violent recidivism is limited and inconsistent (e.g., De Vries Robbé et al., 2015; Lewis et al., 2013; Yesberg & Polaschek, 2019). There is also debate within the field regarding what constitutes a dynamic risk factor and how it should be measured (Douglas & Skeem, 2005; Klepfisz et al., 2016); this is evidenced by the wide variability of measurement instruments used to measure change in the violence treatments included in this analysis. The field needs greater investment in violence treatment evaluations that encompass both pretreatment and post-treatment change (relative to a comparable control condition) and longer term behavioral change.
The results suggest that the pooled treatment effect for antisocial cognitions was smaller than other effects and not statistically significant. Antisocial attitudes/cognitions are regarded as one of the “Big Four” criminogenic needs (Gendreau et al., 1996). It may be that antisocial cognitions are simply harder to change than other intermediate outcomes, that the treatments did not adequately address this construct, or that treatment intensity (M = 61 hr) was insufficient to produce significant change. Although many studies utilized validated tools shown to detect change in antisocial cognitions (e.g., Crime-Pics II; Maudsley Violence Questionnaire), the suitability of other measures as change assessments was unclear. For example, the informant-rated Chart of Interpersonal Reactions in Closed Living Environments (CIRCLE; used by Daffern et al. (2017) to assess interpersonal hostility) has strong predictive validity for violence, but evidence for its suitability as a change instrument is limited. There also remains the possibility that the meta-analysis for antisocial cognitions was simply underpowered to detect small changes, with an average study sample size of 49.
Moderators of Treatment Effects
In contrast with prior meta-analytic work (Gannon et al., 2019; Jolliffe & Farrington, 2007; Papalia et al., 2019), we found very few significant effects for the moderators examined. Finding that a moderator is not significantly related to effect size variation does not mean that there is no relationship, particularly in meta-analytic reviews with few studies, as is the case here. Our meta-regressions were underpowered due to the small number of ks and the small samples within primary studies. Therefore, we recommend caution in interpreting the results of our moderator analyses.
Four treatment variables emerged as significant moderators. Treatments that were primarily facilitated by individuals with qualifications in psychology were significantly associated with improved outcomes for trait anger. Missing data precluded analyses with other outcomes, suggesting that future studies would do well to provide more detailed staffing information (Gannon et al., 2019). The finding is consistent with recent meta-analytic work examining specialist offense programs (Gannon et al., 2019) and highlights the importance of psychological training and expertise to the delivery of effective psychological treatment with violent individuals (Gannon & Ward, 2014). However, results should be interpreted cautiously, given the extent of missing data (50%). Furthermore, a closer examination of studies included in this moderator analysis suggests that program facilitation covaried with treatment intensity. Treatments that were “psychology led” were more intense, comprising 72 treatment sessions on average, compared with 12 treatment sessions on average for those that were not “psychology led.”
The remaining treatment-related moderators were relevant to impulsivity. First, we found a dose–response relationship between the number of treatment sessions delivered per week and the size of reductions in impulsivity relative to controls. This is consistent with other studies and reviews demonstrating a relationship between frequency of sessions and reductions in reoffending (Chitty, 2005; Papalia et al., 2019) and could reflect increased opportunities to efficiently develop a positive therapeutic alliance and early treatment engagement. Alternatively, and consistent with RNR’s risk principle (Andrews & Bonta, 2014), treatments involving more frequent sessions were possibly delivered to higher risk people who have greater potential to achieve benefit from treatment (Olver et al., 2014; Yesberg & Polaschek, 2019).
Group-based treatments (relative to treatments that combined group and individual sessions) and treatments incorporating a moral/values-based treatment component were associated with larger reductions in impulsivity. The apparent superiority of group-only programs was recently found by Gannon et al. (2019) in their meta-analysis of treatment programs for sexual offending behavior. They hypothesized that facilitators of group-only programs were more aware there would be no opportunities for additional individual sessions, which may have led all critical issues to be discussed within the group environment, thus enhancing group cohesion, a key contributor to treatment effectiveness (Burlingame et al., 2011). However, these findings are at odds with our earlier review showing similar reductions in recidivism among treatments that were group only and those that combined individual and group work (Papalia et al., 2019). This could be explained by differences in mean treatment length across the two meta-analyses, with combined group/individual-format programs involving an average of 70 hr of treatment in the present review, compared with 207 hr of treatment in our earlier review (Papalia et al., 2019). It is difficult to interpret the larger effect size found for treatments incorporating moral/values reasoning due to insufficient information about what these components actually involved. Furthermore, not only did the inclusion of a moral/values reasoning component covary with session format (i.e., treatments with a moral/values-based component also tended to be group-only treatments), these two moderators also appeared to covary with sessions per week, making it difficult to determine what is driving the effects.
Given challenges associated with interpreting the moderator analyses, it might be argued that such analyses are premature. However, the number of studies included in our moderator analyses was comparable to other reviews (Deković et al., 2011; Jolliffe & Farrington, 2007). A major limitation within this field is the small number of studies that exist (c.f., Gannon et al., 2019; Jolliffe & Farrington, 2007; Papalia et al., 2019; Polaschek & Collie, 2004). Yet, the number of people with violent convictions in correctional and forensic mental health systems around the world is enormous, leaving service providers and treating practitioners with uncertainty about how to treat persistently violent individuals and whether particular treatments (or components) are worthwhile. Our results provide preliminary indications of treatment features that might promote greater gains. Ultimately, however, there is a critical need for more evaluations that provide comprehensive information on methodology, sample, and treatment characteristics, to determine their impact on treatment effectiveness.
Limitations and Future Directions
Very few of the original studies showed large or significant effects for the outcomes examined. Consequently, the outcomes were well suited to meta-analysis, which, in this case, revealed patterns hidden in the original studies. Nevertheless, there are some limitations associated with our meta-analysis. The first relates to the quality and quantity of the original studies included, given (a) the small number of effect sizes analyzed for each outcome (k range = 4–14), underpowering the moderator analyses; (b) few studies (n = 7) employed a randomized design and sample sizes for these were small (M = 69); (c) the absence of studies rated as Level 4 on the MSMS, which reflect well-matched treatment and control groups; and (d) the high proportion of studies rated as high/probably high RoB in several domains. Indeed, meta-analyses do not always agree with the results of large RCTs; thus, additional confidence in the findings would come from a subsequent large RCT. While study quality/bias did not significantly moderate treatment effects, the point estimates and CIs suggest a trend toward larger treatment effects for less rigorous studies (see Tables S2–S5). There is a clear need to conduct more high-quality quasi-experimental studies and RCTs, which pay attention to, and seek to minimize, potential sources of bias across key domains, and adequately describe their designs (Polaschek, 2019).
There were many variables we could not examine as possible moderators. For example, we were interested in the role of the baseline individual risk level and motivation to change/treatment readiness as treatment moderators, given their relevance to the RNR model and evidence linking them to treatment outcome (Andrews & Bonta, 2014; Higley et al., 2019). However, moderator analyses were not possible due to missing data and inconsistent approaches to defining and recording these variables. Relatedly, we assumed that if program descriptions did not mention a particular treatment feature, the feature was not present (Papalia et al., 2019). Consequently, where program descriptions lacked detail, it is possible we coded particular treatment features as absent, when in reality they were present. Future meta-analytic reviews would be greatly assisted by study authors improving the quality and detail of methodology and treatment program information contained within their reports.
There were major disparities across studies in the outcomes measured and the tools used to assess them. This meant that (a) the outcomes included in this review were not exhaustive and (b) categorizing tools into meaningful and valid outcome groups was not always straightforward. There was often a paucity of information about the particular treatment program theory and theory of change—without clear, theory-driven hypotheses about the violence-relevant constructs on which change is expected, researchers may succumb to a catch-all approach to outcome/measure selection, an issue that others have raised (Gilbert & Daffern, 2010; Polaschek & Collie, 2004). It is recommended that future evaluations adopt a top-down, theory-led approach to outcome and measure selection. One approach is to conceptualize outcomes (i.e., dynamic risk factors) in terms of broad domains that are theoretically relevant to violence and which subsume a number of more specific risk factors (Klepfisz et al., 2016). Polaschek (2006), for example, identifies four domains frequently targeted in violence reduction treatment: (a) attitudinal factors (e.g., offense-supportive attitudes and cognitive or information-processing biases); (b) impulsivity and self-regulation deficits; (c) affective dyscontrol (e.g., anger, hostility, and poor coping skills); and (d) lifestyle-related needs that also predict offending (e.g., substance abuse, antisocial peers, poor interpersonal skills, and family relationships; see also Klepfisz et al., 2016). The task for researchers then becomes selecting the most appropriate methods and tools for assessing these domains, with decisions ideally guided by theory, the extent to which the tools are reliable and valid for the populations with which they are used, and previous research demonstrating their sensitivity to change and association with violence.
Another issue is that outcome measurements rarely continued beyond the end of the treatment program. This precluded a more in-depth and reliable analysis of whether treatment change persisted beyond the difficult post-treatment period (Polaschek, 2019). Future studies should incorporate post-treatment follow-up assessments for treatment and control groups, particularly in light of recent evidence suggesting that post-treatment change may mediate the relationship between within-treatment change and likelihood of recidivism among individuals with violence convictions (Yesberg & Polaschek, 2019). As mentioned previously, it is imperative that future research link treatment-related change to recidivism data. Although we found small to moderate improvements in anger, impulsivity, social problem solving, and general social skill, these gains may be insufficient to effect reductions in violent recidivism (Daffern et al., 2017).
Conclusion
Overall, psychological treatments generated improvements on indicators of trait anger, impulsivity, social problem solving, and general social skill in violent adults across correctional and forensic mental health settings. The small and nonsignificant pooled effect for antisocial cognitions suggests more studies are needed to establish whether treatment is effective for this outcome. There is still a great deal of uncertainty about how, why, and for whom treatment is effective and which treatment components affect which outcomes (Papalia et al., 2019). There is a clear need to undertake a greater volume of high-quality quasi-experiments and RCTs, and it is critical that comprehensive information about design, sample, and treatment characteristics, including both treatment content and the way in which treatment is facilitated, is reported to enable future investigations of moderating effects. To help progress the evidence-base beyond its current state, we recommend two parallel streams of research (Papalia et al., 2019). The first should test theory-driven hypotheses to isolate the features of treatment that are effective, in what combination, for whom, and in relation to which violence-relevant outcomes. The second should evaluate the effectiveness of specific treatment programs on intermediate treatment targets (e.g., dynamic risk factors) and long-term behavioral outcomes (e.g., institutional behavior and community recidivism), testing whether change in the former is linked to change in the latter.
Supplemental Material
supplemental_material – Supplemental material for Are Psychological Treatments for Adults With Histories of Violent Offending Associated With Change in Dynamic Risk Factors? A Meta-Analysis of Intermediate Treatment Outcomes
Supplemental material, supplemental_material for Are Psychological Treatments for Adults With Histories of Violent Offending Associated With Change in Dynamic Risk Factors? A Meta-Analysis of Intermediate Treatment Outcomes by Nina Papalia, Benjamin Spivak, Michael Daffern and James R. P. Ogloff in Criminal Justice and Behavior
Footnotes
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
