Abstract
The Stalking Risk Profile (SRP) is a structured professional judgment guideline to help practitioners evaluate and respond to risks associated with stalking. Using a sample of 217 Australian adults referred to a community forensic mental health service, this study examined the predictive validity of SRP risk judgments made by assessing clinicians. Discrimination, classification, and prediction statistics were examined at 12-month intervals over a 4-year follow-up using police charges for stalking-related offending. Total scores were positively related to risk judgments. Moderate/high risk judgments generally showed good sensitivity, though weaker specificity. Risk judgments about stalking violence and recurrent stalking of the same victim showed moderate to strong discrimination and predictive benefits over the base rate of reoffending, with minor improvements over time. Judgments about stalking different victims or persistent stalking of the same victim were less effective, possibly due to methodological limitations of the prospective study design.
Development of structured approaches for assessing stalking risks began in the late 2000s (McEwan, 2021). Research into the validity of these approaches is more limited than equivalent bodies of research into risk assessment for sexual violence, intimate partner violence, and violence generally, perhaps reflecting the fact that while stalking is prevalent in the community, it is less commonly identified in criminal justice systems than these other offenses (Brady & Nobles, 2017). Nonetheless, stalking is a common and potentially damaging behavior, warranting attention from practitioners with responsibility for assessing and managing risks of future harm. This article presents the third validation study of the Stalking Risk Profile (SRP; MacKenzie et al., 2009), a structured professional judgment (SPJ) guideline intended to help practitioners understand and assess the range of risks that can be present when a client has a history of stalking. The aim of this study was to examine the predictive validity of risk judgments made by psychologists and psychiatrists when using the SRP in their everyday practice.
Defining Stalking and Stalking Risks
Stalking is a pattern of behavior in which someone engages in repeated and unwanted intrusions into the life of another person (or a related group of people), causing them to experience distress and/or fear. Stalking exists in the cumulative impact of the pattern of unwanted behavior, regardless of what that behavior entails (Spitzberg & Cupach, 2014). The frequency and nature of stalking behavior can fluctuate, and the time from the first to the most recent unwanted intrusion is often referred to as a “stalking episode” in research literature (Sheed et al., 2024). Knowing when a stalking episode has ceased can be difficult, but it has been suggested that an episode can be assumed to have stopped if there have been no known stalking behaviors toward the victim(s) for 6 months (MacKenzie et al., 2009). While many stalking cases involve a single stalking episode, some are characterized by recurrent episodes where the person returns to target the same victim(s) after periods of months or even years without any stalking behavior. Other people move on to target a different victim in a new stalking case, a phenomenon Coupland and Storey (2024) have labeled “serial stalking.”
Stalking victimization is associated with significant harms. Stalking victims have shown elevated rates of mental illness, including mood, anxiety, and trauma-related symptomatology (Kamphuis et al., 2003; Kuehner et al., 2007). Research suggests that psychological harm is greatest among victims who are subject to physical violence, threatened with harm (Purcell et al., 2012), or followed in person (Hauch & Elklit, 2023). Unfortunately, such experiences are common, with explicit threats reported by most stalking victims and physical violence by 20% to 50% of victims (McEwan, 2021). Physical violence can range from pushing and shoving to serious sexual assault and even homicide in rare cases (Sheed et al., 2024). Violence is more often reported by ex-partner stalking victims, but is present in a small proportion of cases involving acquaintances, strangers, or estranged family members or friends (McEwan, 2021).
While the potential for physical violence is a central concern, stalking causes harm regardless of its presence. Qualitative studies clearly describe how fear and constant disruption lead victims to withdraw from relationships and activities out of concern for their own or others’ safety and not wishing to be a burden on others (Korkodeilou, 2017). Several studies highlight the economic impacts of stalking, with victims changing residence, giving up jobs, purchasing new cars, incurring substantial legal costs, or, in some cases, even moving abroad (Logan & Landhuis, 2024; Storey et al., 2023). These outcomes make managing the risk of further stalking as important as managing the risk of stalking violence.
The SRP
The SRP was published in 2009 with the intention of helping to structure professionals’ assessments of stalking risks. The SRP guides assessment of four domains of stalking risk (a further two domains are assessed if the victim is a public figure). Risk of future stalking is assessed using either the Persistence domain (if stalking behavior is present in the 6 months prior to assessment, that is, there is a current stalking episode) or the Recurrence domain (if there is no current stalking episode at the time of assessment). The Recurrence domain includes separate judgments about recurrent stalking of the same victim (Recurrencesame) and future serial stalking involving a different victim (labeled Recurrencedifferent in the SRP). Risk of Stalking Violence toward the victim(s) of the current stalking episode is assessed in a separate domain (Stalking Violence is defined as physical contact or attempted contact with a weapon, with the intent to coerce or harm). The SRP also considers the risk of Psychosocial Damage to the person stalking due to their situation.
Each SRP risk judgment is based on a discrete set of risk factors, reflecting the fact that different risk factors relate to different stalking outcomes (McEwan, 2021). A unique feature of the SRP is its use of Mullen and colleagues’ (2000) stalking typology to structure which risk factors are assessed in a given case. This typology encompasses five types: Rejected stalking, which emerges after the breakdown of a close relationship, motivated by anger and/or a desire to resume the relationship; Resentful stalking, which emerges in the context of a personalized grievance with the motivation of rectifying or avenging the perceived injustice; Intimacy Seeking stalking, which is motivated by pursuit of an intimate relationship in the context of a morbid infatuation; Incompetent Suitors who stalk in a gauche attempt to initiate a friendship or sexual relationship; and finally Predatory stalking motivated by a desire for sexual gratification (see Mullen et al., 2000 for further detail). The SRP describes both general risk factors (assessed in all cases) and “type-specific” factors that are assessed only in cases of that stalking type. Between 32 and 40 risk factors are rated when using the SRP, depending on the type of stalking and domains of risk being assessed. Completion of the SRP therefore involves six steps: determining the type of stalking; determining domains of risk to be assessed; rating risk factors; making a risk judgment of “low,” “moderate,” or “high” in each domain assessed; developing a formulation; and making risk management plans.
There have been two prior studies of the predictive validity of the SRP. McEwan and colleagues (2018) used a sample of 256 stalking cases from an Australian community forensic mental health service (241 individual clients; 92.7% male; 50.4% ex-partners). For 84 clients, the SRP was rated from file by a researcher and clinicians used the SRP as part of their clinical assessment in the other 157. Inter-rater reliability (IRR) was assessed in a sub-sample of 66 cases, with moderate to substantial agreement in risk judgments (Intra Class Correlation [ICC] = .70–.90), excellent agreement in classification of stalking type (Kappa = .98), and fair to substantial agreement for most individual risk factors. Further stalking was identified using police charges for stalking or related offenses over an average of 211.52 weeks (SD = 130.13; without controlling for time at risk) and was present in 26.4%. Recidivistic stalking of the same victim was present in 14.9% of cases and stalking of a different victim in 13.6%. Survival analyses showed that those judged to be at high risk of further stalking of the same victim (whether a Persistence or Recurrencesame judgment) were significantly more likely to reoffend over the entire follow-up period than those judged moderate or low risk. Those at high risk of Recurrencedifferent were significantly more likely to reoffend at 6 months and across the whole follow-up time than those at low risk, but only over 6 months compared with those at moderate risk. Discrimination statistics were in the moderate to large range for Persistence and both Recurrence judgments (area under the receiver operating curve [AUC] = .66–.68). Sensitivity in the high risk category for all domains was relatively low (38% for Recurrencesame to 57% for Persistence), though increased markedly when the moderate risk category was also included (62%–80% depending on domain). The authors concluded that the SRP showed promise for guiding risk assessment of future stalking in the long term. Stalking Violence judgments were not investigated due to the low base rate of recidivism.
Penney and colleagues (2023) evaluated SRP judgments made from file review in a sample of 86 Canadian forensic mental health hospital patients with a history of stalking (82.6% male). Most (93.0%) were not guilty by reason of mental disorder, although most of the index offenses leading to this finding did not involve stalking. In contrast to the Australian sample, only 14% stalked an ex-partner, with stranger stalking accounting for 43.5% (the latter group constituted only 16.4% of the Australian sample). Further stalking was identified from health files over a maximum of 36 months, defined as any known contact with a prior stalking victim or at least two unwanted contacts with a different victim. Further stalking was present in 34.9% of cases (26.7% same victim and 15.1% different victim). In a sub-sample of 12 cases IRR was acceptable for both risk judgments and stalking type assignment (ICC = .75–.77). A combined “future stalking” judgment (combining Persistence and Recurrencesame judgments), showed a strong ability to discriminate between those with and without future stalking of the same victim (AUC = .72, comparable to AUC = .73 for the same variable in McEwan et al., 2018). Sensitivity was 46% for the high risk category, and specificity 50%, but positive and negative predictive values were not reported. The authors concluded that the results supported the use of the SRP, though noted the limitation that the researchers had not been trained in use of the guidelines prior to conducting the study.
These two studies suggest that risk judgments made using the SRP can provide a useful categorization of those at relatively greater or lesser risk of future stalking. However, neither evaluated the performance of the Stalking Violence judgment, nor the relationship between risk factors and overall risk judgment. While a total score is not used clinically to inform risk management, the presence of more risk factors should generally relate to a judgment of increased risk (Douglas et al., 2013) and it is unclear whether this is true of the SRP. This is particularly important given the atypical structure of the SRP, with more risk factors assessed for some stalking types, and different risk factors assessed for different risk outcomes. Therefore, the current research aimed to evaluate the predictive validity of the SRP, replicating and expanding on prior research by addressing the following questions:
How do SRP risk judgments relate to domain total scores? It was hypothesized that domain scores would be positively correlated with risk judgments.
How well do SRP risk judgments and domain total scores discriminate between those with and without further stalking-related charges over time? Discrimination for future stalking was expected to be moderate to strong.
How accurately do SRP risk judgments classify those with and without stalking-related charges over time?
How well do SRP risk judgments predict future stalking-related charges?
Method
Study Design
This study used a pseudo-prospective real-world design. Forensic psychologists and/or psychiatrists employed at a statewide community forensic mental health service in Melbourne, Australia, used the SRP in routine practice between 2012 and 2016. Assessments were extracted from files by researchers between 2019 and 2021, and police records extracted in 2023, with any time incarcerated excluded from the follow-up period (no charges were recorded during periods of incarceration). Due to the real-world nature of the study we could not generate inter-rater reliability data.
Procedure
The sample was part of a wider research project in which files of every new client of a specialist community forensic mental health program between 24 January 2012 and 5 December 2016 (n = 615) were accessed by researchers. Human research ethics approvals were granted by Swinburne University (2019/069) and the Victorian Department of Justice (CF1822498). Risk assessment, offending, and demographic information was extracted and coded. The name, sex and date of birth of each person assessed was matched to records held by Victoria Police’s Law Enforcement Assistance Program (LEAP) data base using SOUNDEX matching (a phonetic algorithm for encoding and indexing names based on sound). Victoria Police provides all policing services for the state of Victoria, Australia, and officers maintain LEAP as a live database, being required to record a potential charge as soon as they are satisfied that, on the balance of probabilities, a crime has occurred. Although a charge is recorded in LEAP, it may not proceed to court, may be withdrawn at court, or may not result in a conviction. Five hundred one (96%) of the original sample were able to be matched. For each matched individual, all charges recorded between 1 June 1967 and 9 March 2020 were extracted from LEAP on 31 August 2023, with details of the commission date of each alleged crime. The same identifying information was provided to the State correctional service, which identified individuals with periods of incarceration between June 17, 1997 and February 12, 2021 (n = 264). Days of follow-up ranged from 553 to 1,460 (M = 141, SD = 123) with time incarcerated removed. Follow-up time was limited to 1,460 days at risk to manage high levels of censoring after this time.
Sample
The program where the sample was collected provides services for individuals who have engaged in harmful sexual behavior, violence, stalking, firesetting and other problem behaviors (McEwan & Darjee, 2021). Although part of the statewide community forensic mental health service, clients are accepted due to problem behavior, regardless of the presence of mental disorder. Assessments take an individualized case-formulation approach and involve an interview with the individual, collateral material (usually including formal offending records), and interviews with third parties. Structured professional judgment guidelines are routinely used, with choice of guideline dependent on past problem behavior, the referral question, and clinicians’ discretion. Psychological treatment is offered for high-risk clients with no other treatment options.
The SRP was used to assess 236 individuals during the study period. Nineteen were missing all risk judgment data and were excluded, leaving n = 217 individuals; 23 (10.60%) women and 194 (89.40%) men, aged between 19 and 78 years (M = 37.84, SD = 11.03), of whom 148 (77.42%) were born in Australia. There were no significant differences between those excluded (n = 19, M = 34.96, SD = 12.17) and included (n = 217, M = 37.71 SD = 10.17) in age at assessment, t(23) = 1.66, p = .10) or sex (included: female = 23, male = 194; excluded: female = 4, male = 15; Fisher’s exact test, p = .45). Comparisons of country of birth, relationship status at assessment, employment at assessment, criminal history, incarceration history, and stalking reoffending also showed no significant differences (see Supplementary Material, Table S1 [available in the online version of this article]). Four cases had two SRP administrations. The first set of risk judgments recorded on file was used in analyses where there was no difference in judgments between administrations, or the SRP administration with higher assessed risk when there was. Assessment date was missing for two individuals. Neither had any reoffending recorded and they were included using the sample average days at risk. There is some overlap with the sample reported by McEwan and colleagues (2018), up to a maximum of 70 individuals assessed in 2012 and 2013. McEwan et al. did not use a consecutive cohort and samples could not be matched as the earlier dataset was non-identifiable.
The SRP was completed by 51 different clinicians: psychologists in 199 cases (91.70%), psychiatrists in three cases (1.38%) and both in 15 cases (6.91%). All had completed a two-day training workshop and engaged in supervised practice in the use of the SRP. Two clinicians were also authors of the SRP, together accounting for 30 administrations (13.75%). Clients were referred from probation services (n = 191, 88.02%) and general mental health services or private health practitioners (n = 19, 8.75%), with missing referral data in seven cases (3.23%). Of the 217 individuals, n = 92 (42.40%) were matched to incarceration data and incarceration time was controlled in analyses. Unmatched cases were treated as not being incarcerated during follow-up.
Measures
The SRP
The SRP (MacKenzie et al., 2009) is a SPJ risk assessment guideline designed to be used with adults (18+) with a history of stalking. The SRP guides assessment of targeted risk, with most risk judgments applying to a specific stalking case. Four domains of risk can be assessed: Persistence, Recurrence, Violence and Psychosocial Damage to the person stalking (each defined above). Most administrations of the SRP involve making risk judgments about either Persistence or Recurrence (depending on the presence of active stalking in the 6 months prior to assessment), Stalking Violence, and Psychosocial Damage. The Psychosocial Damage domain identifies issues germane to the person’s wellbeing but is not intended to have a predictive relationship with future stalking and so was not examined in this study.
Each risk domain has its own set of risk factors assessed as present, partially/possibly present, or absent (coded as 0, 1, and 2 for analyses), with options to rate as Unknown if there is insufficient information or Omit where it is not relevant to the individual (e.g., the risk factor relates to access to shared children and the person has never had an intimate relationship with the victim). Each domain results in a risk judgment of “low,” “moderate,” and “high” risk (assigned values of 0, 1, and 2 for analyses). Risk judgments for each domain are made based on the presence and perceived importance of risk factors to the case, and other case-specific information the evaluator perceives as being relevant to risk. In the Stalking Violence domain, there are also five “red flag” risk factors, with assessors instructed that the risk of Stalking Violence would usually be high if any of these factors are present.
Ascertaining Stalking From Police Data
Of the 217 individuals in the sample, 203 (93.55%) were matched in LEAP. It was assumed that those who were not matched (n = 14, 6.45%) did not have a formal criminal history and were treated as not reoffending in the analyses. Stalking is known to be poorly understood by police and stalking charges are underused (Sentencing Advisory Council, 2022). Given our reliance on police data for establishing reoffending, stalking was identified based on either the presence of a stalking charge (s21A of the Victorian Crimes Act, 1956) or the presence of multiple stalking-related charges (see Supplementary Table S2 [available in the online version of this article]) involving the same victim in separate incidents over more than 2 weeks. The two outcomes were combined for analysis.
To identify whether stalking reoffending involved a new victim or the same victim as the stalking episode leading to assessment, we used modified versions of the numeric person identifiers linked to each charge in LEAP. Contravention of family violence restraining order charges record the “victim” as the court rather than a person. When this charge was present, we used information from linked family violence reports to infer the identity of victims by matching case IDs, victim IDs and charge dates. Nine individuals were identified in LEAP but had no stalking charges related to the index stalking episode, making matching victims impossible. Only one of these people had a subsequent stalking charge and was included as stalking a different victim.
Analyses
All analyses were conducted with RStudio v2024.12.0 and scripts were written in R-Project v4.4.2. The validity of risk judgments was investigated using the relevant stalking outcome shown in Table 1. As it is impossible to identify whether a stalking episode had ceased and recommenced from police administrative data, Persistence and Recurrencesame were assessed against the same outcome of any future stalking of the same victim.
Definitions of Stalking Outcomes Used for Evaluation of Each SRP Risk Judgment
Domain total scores were created by summing risk factors within each risk domain (excluding red flag risk factors for Stalking Violence). Within each risk domain, possible total scores differ between stalking types (due to having greater or fewer risk factors within a domain). The maximum possible domain total scores are: Stalking Violence = 34, Persistence = 24, Recurrencesame = 16, Recurrencedifferent = 20. Stalking types with lower maximum total scores were scaled up to the same range as the maximum by prorating. Each person’s observed total score in the risk domain was divided by the maximum total score for their stalking type, and the result was multiplied by the maximum possible total score for that risk domain across stalking types. For example, for Resentful stalking cases, the maximum Stalking Violence total score is 32. The prorated Stalking Violence total score for a Resentful stalking case with an observed score of 28 is therefore

Survival Curves for Different Stalking Outcomes and Risk Judgments
Relationships between risk judgments and domain total scores were assessed using τ with supplemental multinomial regression to examine how scores predicted judgments. Discrimination between those with and without stalking reoffending was assessed at 1, 2, 3, and 4 years after the SRP assessment date using AUC analyses. Classification accuracy of the risk judgment categories was assessed using sensitivity and specificity statistics over the same periods. Predictive power can be assessed by calibrating rates of recidivism in each risk category to rates predicted from a large, comparable, independent sample rated with the same tool (Hanson, 2017); however, such a sample is not available for the SRP. Alternatively, calibration can be established by comparing recidivism rates in each risk category to expected values from a logistic function. This is appropriate for an actuarial risk assessment tool constructed using logistic regression. However, SPJ guidelines are constructed rationally, and there is no reason to think that results might be consistent with a logistic function. In the absence of a comparator sample, positive and negative predictive power (PPV and NPV) are a more appropriate predictive measure for SPJ guideline evaluation. They serve a similar function (e.g., PPV allows comparison of the probability of correctly identifying someone as reoffending given their risk categorization with the probability of correct identification given the base rate of reoffending), but they are more meaningful for clinicians as they include information about expected probability of reoffending within the group assigned to the risk category. Therefore, predictive power of SRP risk judgments was evaluated using PPV (the probability that someone classified as being at increased risk will reoffend) and NPV (the probability that someone classified as being decreased risk will not reoffend). Calculation of sensitivity, specificity, PPV and NPV requires a dichotomous predictor and outcome so risk categories were grouped as low vs moderate/high. Prediction over time was investigated using Kaplan–Meier survival curves stratified by risk judgment category. The log of the negative log of survival was used to calculate the 95% confidence interval for survival curves to avoid truncating the confidence intervals (those more than 100%) at the beginning of the curve.
Results
Sample demographic information is shown in Table 2. In the stalking cases leading to assessment, 66.36% (n = 144) targeted ex-partners, 24.88% (n = 54) targeted acquaintances (including estranged family or former friends), 5.99% targeted a stranger (n = 13), and relationships in the remaining six cases were not known. One hundred and sixty-two cases (74.65%) were classified as Rejected stalking, 25 (11.52%) as Resentful stalking, 11 (5.07%) as Intimacy Seeking, 14 (6.45%) as Incompetent Suitors and fewer than 5 cases were classified as either Predatory or had no clear motivational type. One hundred and twenty-two (56.22%) participants were diagnosed with at least one mental disorder at the time of assessment. Thirty-seven participants (17.05%) had a depressive disorder, 20 individuals (9.21%) a psychotic disorder, and the same number with a personality disorder, whereas 13 (5.99%) met diagnostic criteria for a substance use disorder. Other mental disorders were all present in less than 5% of the sample (bipolar affective disorder, anxiety and trauma-related disorders, neurodevelopmental disorders, eating disorders).
Demographic Characteristics of the Sample at Time of Assessment (N = 217)
Overall, 26.27% (n = 57) of participants had further stalking or a related pattern of charges across the follow-up period. Nine (4.19%) had stalking reoffending involving both the same and a different victim. Among those with subsequent stalking of the same victim, 39% engaged in stalking-related violence (n = 9). The different kinds of stalking-related reoffending at each of the four follow-up periods are shown in Table 3.
Cumulative Frequencies of Stalking Reoffending From 365 Days to 1460 Days at Risk Post SRP Assessment (N = 217)
Note. Individuals may appear in multiple reoffending categories, according to their first date of relevant reoffence type.
Relationship Between SRP Risk Judgments and Domain Total Scores
Table 4 shows the frequency of each risk judgment category, domain total score range, and descriptive statistics. There are relatively fewer Persistence judgments because this domain is rated only when stalking is present in the 6 months prior to assessment, with Recurrence risk assessed in other cases. The normalized total score in each domain was significantly positively correlated with the relevant risk judgment: Stalking Violence (n = 190, τ = 0.58, p < .001); Persistence (n = 50, τ = 0.58, p < .001); Recurrencedifferent (n = 117, τ = 0.63, p < .001); and Recurrencesame (n = 151, τ = 0.55, p < .001). Plotting the predicted probabilities from the multinomial regressions (see Supplementary Table S5 and Figure S2 [available in the online version of this article]) showed the expected pattern for Stalking Violence, Persistence and Recurrencesame, with the probability of a higher risk judgment increasing as the number of risk factors rated present increased. However, for Recurrencedifferent, higher total scores were associated with a greater probability of a moderate rather than high risk judgment.
Frequency of SRP Risk Judgment Categories and Distribution of Domain Total Scores (n = 217)
Note. Missing values were coded either when there was some risk factor data on file, but no judgment recorded, suggesting no judgment was made. In other cases, all risk factors were rated, but the judgment wasn’t available in the file. Stalking Violence and Recurrence domains have different totals for different types of stalking.
Ability of SRP Risk Judgments and Domain Total Scores to Discriminate Between Those With and Without Stalking Reoffending
Discrimination effects of risk judgments were relatively stable over time (see Table 5). While AUC values for Stalking Violence and Recurrencesame suggested moderate to large and statistically significant effects at most time points, those for Persistence and Recurrencedifferent judgments were small to moderate and generally not significant. Recurrencedifferent judgments were not significantly associated with further stalking of the same victim, and Reccurencesame judgments were not significantly associated with future stalking of a different victim (see Supplementary Table S6 [available in the online version of this article]). The low base rate of violence means that the AUC value is more uncertain and the wide range of the 95% confidence intervals (.62–.87) over time should be considered in interpretation.
Discrimination Between Those With and Without Relevant Reoffending Using Risk Judgments and Domain Total Scores
Note. Days = number of days controlling for time at risk; AUC = area under the receiver operating curve; 95% CI = 95% confidence interval of the AUC. Results in bold are statistically significant given the 95% CI does not include .50.
AUCs for Stalking Violence, Recurrencesame, and Recurrencedifferent calculated using prorated total scores.
Most domain total scores were able to discriminate between those with and without the various stalking outcomes over time. The exception, as shown in Table 5, was Persistence total scores. Although Recurrencedifferent scores performed similarly poorly as their risk judgments over the first year, their discriminatory effect increased with time, producing a moderate effect size at 4 years post-assessment. The performance of the Recurrencesame domain score was weaker than the risk judgment, failing to differentiate between those with and without further stalking of the same victim in the first year but improving over time.
Classification of Reoffending by Risk Judgments Over Time
Table 6 presents sensitivity and specificity data contrasting low risk to a combined moderate/high risk category. In this sample, a judgment of moderate/high risk for Stalking Violence correctly classified all of those who engaged in stalking-related violence over the 4-year follow-up period, while the false positive rate hovered around 50% (false positive rate = 1 − specificity). Moderate/high Persistence judgments showed good levels of sensitivity across time, accurately identifying 80% to 90% of those charged with further stalking of the same victim. This came at the cost of poor specificity (25%–30%), translating to a high false-positive rate among those judged moderate/high risk. Recurrencesame judgments, measured against the same outcome, had slightly lower sensitivity, with moderate/high judgments correctly classifying approximately 60% to 70% of cases with subsequent stalking of the same victim. Recurrencedifferent moderate/high judgments displayed poor sensitivity in the first year following assessment, identifying fewer than 50% of cases with this form of stalking recidivism; however, this improved over the subsequent 3 years to correctly classifying 60-70% of cases with a stalking charge involving a new victim. For both Recurrence moderate/high judgment categories, the false positive rate was approximately 40% to 50%.
Classification and Prediction Statistics for Risk Judgments at Four Time Points (Low Risk Vs. Moderate/High Risk)
Note. PPV = Positive predictive power; NPV = Negative predictive power; Days = Days at risk with the number of days incarcerated removed; Violence—Same Victim Violence, sample sizes are Moderate and High (n = 103) compared with Low as a reference group (n = 106). Persistence—Same Victim, sample sizes are Moderate and High (n = 41) compared with Low as a reference group (n = 13). Recurrence Different—Different Victim, sample sizes are Moderate and High (n = 68) compared with Low as a reference group (n = 60). Recurrence Same—Same Victim, sample sizes are Moderate and High (n = 67) compared with Low as a reference group (n = 98).
Prediction of Future Stalking-Related Charges
As shown in Table 6, the calculation of PPV and NPV for the Stalking Violence risk judgment showed substantial improvements over prediction using the base rate of stalking violence reoffending. From Years 1 to 3, those assessed as moderate or high risk in this domain had twice the rate of reoffending than was present in the whole sample, dropping to approximately 50% higher than the base rate in Year 4. In each case, the prevalence of violence in each risk group was low, but the relative increase in prediction associated with a judgment of moderate/high was meaningful. For the three future stalking risk domains, prediction was improved over the base rate at most time points, but the gains were more marginal, ranging from 10% to 25% improvements on the base rate. High NPVs indicate that those identified as low risk of Persistence, Recurrencesame or Stalking Violence rarely go on to have further offenses related to the same stalking victim.
Figure 1 shows Kaplan–Meier survival curves for each risk domain against the relevant reoffending outcome. There were no statistically significant differences between risk category survival curves for Persistence or Recurrencedifferent judgments, but differences were observed between low and high Stalking Violence and Recurrencesame judgments (χ2 = 13.01, p < .001 and χ2 = 10.24, p = .001, respectively). For Stalking Violence, low and moderate were also statistically significantly different (χ2 = 8.27, p = .004), but not for Recurrencesame. See Supplementary Table S7 (available in the online version of this article) for non-significant results.
Discussion
This research evaluated the predictive validity of risk judgments made using the SRP in everyday clinical practice. As hypothesized, domain total scores were positively and moderately correlated with all SRP risk judgments, suggesting that clinicians were adhering to the adage that more risk factors generally mean higher risk. However, regression analyses showed that this was not true for higher total scores in the Recurrencedifferent domain. In this domain, those with the highest scores were more likely to be judged as moderate than high risk. This suggests that clinicians were applying more clinical judgment in this domain to adjust their risk judgments away from the presence of SRP risk factors, perhaps based on apparent protective factors, other case-specific factors they felt were important, or discounting some SRP risk factors that were present but perceived as not relevant. This kind of clinical adjustment is a core feature of all SPJ risk assessment, but the regression results suggest it was used differently in the Reccurencedifferent domain, though with limited effect on the validity metrics. Across all risk domains, there were minimal differences in AUC statistics for risk judgments and domain total scores, with largely overlapping confidence intervals.
The ability of risk judgments to discriminate between those with and without subsequent charges was partially consistent with hypotheses. This is the first study to report discrimination statistics for Stalking Violence judgments, and results showed a strong discriminatory effect across time, though this must be interpreted cautiously given the small number of people who reoffended violently. The confidence intervals around the AUC values indicate that the discrimination effect may range from small to large in a sample with more people with this outcome. Survival curves showed little differentiation between moderate and high risk groups, but both were differentiated from the low risk group.
Where statistically significant, AUC values for both Recurrence judgments were comparable to those reported by McEwan et al. (2018) across their total follow-up, while Penney and colleagues (2023) did not report separate AUC statistics for Persistence and Recurrencesame, making direct comparison impossible. For the Recurrencedifferent domain, there was no significant effect over the first 2 years, before a small to moderate discriminatory effect became apparent at Year 3 and into Year 4. In contrast, Recurrencesame judgments sustained a moderate to large discriminatory effect across follow-up points. Examination of survival curves help explain these patterns, showing that those in the Recurrencedifferent low risk group tended to reoffend as often and as quickly as those in the moderate risk group over the first 18 months to 2 years of follow-up, undermining the discriminatory capacity of the risk judgment during this time. This was less apparent in the Recurrencesame survival curve, though the moderate and low risk curves did intersect in the first 12 months of follow-up, negatively affecting discrimination in the first year. Removal of cases in which the assessing clinician had made an intermediate risk judgment (e.g., “low-moderate”) meaningfully improved AUC values for Recurrencedifferent judgments (see Supplemental Table S2), with a moderate discriminatory effect achieved from the second year of follow-up. Recurrencesame judgments’ discriminatory effect reduced slightly in the same analyses and became non-significant in the 12 months following assessment. This suggests that lack of confidence in the risk judgment had a notable impact on overall ability of judgments to discriminate, warranting further attention in both research and training.
In contrast to McEwan and colleagues’ (2018) study (conducted in the same location and including up to 70 of the same cases), Persistence judgments did not discriminate between those with and without future stalking of the same victim. The survival curve indicates that this was because the moderate risk group reoffended more quickly than the high risk group, mostly in the first 12 months following assessment, whereas most reoffending in the high risk group occurred more than 12 months after assessment. In contrast, approximately half of reoffending in the high-risk group occurred within 12 months of assessment in the 2018 study, allowing for effective discrimination. Two substantial differences between the studies could explain these disparate results. First, a greater proportion of the current sample were referred by probation services and on court orders at the time of assessment, usually for 12 to 18 months post-assessment date (88% vs. 56% in the 2018 study). Any deterrent effect of probation would therefore have influenced a greater proportion of the current sample. Second, the earlier study used retrospective assessments coded by researchers from files in one third of the sample, meaning the risk assessment could not influence subsequent risk management. Conversely, the entire sample in the current study was prospectively assessed by clinicians, meaning the SRP results likely had a direct effect on subsequent risk management. This could affect Persistence in particular, given the risk is related to an active stalking episode, incentivizing immediate risk management in cases thought to be at high risk. These sampling and design differences together mean that the behavior of a greater proportion of the current sample may have been influenced by risk management in the period following assessment. Consistent with this hypothesis, same victim stalking recidivism over 4 years was one third lower in the current study than in McEwan et al. (2018). Comparison of survival curves suggests that this is almost entirely due to less recidivism in the high-risk group in the 12 months following assessment in the current study. This lends some credence to the idea that additional risk management during this time may have influenced results. Any protective effects of intervention would have also impacted upon the discriminatory effect of other SRP risk judgments; however, the limited sample with Persistence judgments makes validity statistics more vulnerable to even small changes in reoffending. Alternatively, it is possible that the risk judgment did not discriminate effectively because there is some characteristic of its use in practice (rather than in file review) that renders it invalid. Further evaluation of the validity of this domain is needed.
Classification statistics indicate that, in this sample at least, judging someone as being at increased risk of Stalking Violence led to the identification of 100% of those who went on to engage in stalking violence over the next 4 years. While promising, the small number of people with subsequent stalking-related violence means this result is uncertain. If only one or two people with subsequent stalking violence were rated differently, this would account for a substantial proportion of the recidivist sample and could lead to markedly different results. Judging someone to be at increased risk of future stalking (moderate or high risk judgments in the Persistence or Recurrence domains) correctly identified between 60% and 90% of those charged with further stalking over the 4-year follow-up. Results were very consistent with sensitivity and specificity statistics reported by McEwan et al. (2018) at the same threshold, the only notable difference being somewhat higher sensitivity for the Recurrencedifferent judgment in the current study, and consequently lower specificity.
The final research question related to the predictive performance of SRP risk judgments. Consistent with McEwan et al. (2018), positive predictive values suggest that judging someone to be at moderate or high risk using the SRP provides some predictive advantage over randomly selecting people for risk management. This was most obvious for the Stalking Violence domain, where the rate of reoffending in the moderate/high categories was double the base rate in the sample. The predictive gains in other domains were smaller. Overall, moderate/high SRP risk judgments tend to identify most of those with subsequent relevant offending, though with high rates of false positives (those identified as being at increased risk who did not go on to have the risk outcome). Conversely, negative predictive values indicate that low SRP risk judgments generally identify true low-risk individuals (perhaps with the exception of Recurrencedifferent judgments where NPV fell below 90%).
These false positive rates could suggest that the SRP risk categories are too inaccurate, leading to scant resources being allocated to those who do not require them. However, a high false positive rate, even in the high-risk category, does not necessarily mean that clinicians’ judgments informed by the SRP were wrong. They judged these individuals to be part of a group that overall would be more likely to reoffend, and in this, they were correct. Which people in that group actually reoffended would be influenced by personal circumstances, having opportunities to offend, the presence or absence of effective risk management, and other protective factors. In this real-world study the effects of risk management would likely be exacerbated as the risk judgments used in analyses had a direct impact on subsequent management activities. Reoffending may have been prevented in an unknown number of higher risk cases, contributing to the observed high false positive rates.
These findings emphasize the importance of conceptualizing SPJ risk judgments as risk states relative to others with a history of stalking and communicating and acting on them accordingly. Using Stalking Violence judgments as an example, this study suggests that those in moderate or high risk groups are at relatively increased risk compared to those in the low risk group, so risk management resources can have the greatest impact if they are directed toward the former groups (consistent with the Risk Principle; Bonta & Andrews, 2023). This does not mean that all or even most of those in the moderate or high groups will go on to use violence if they continue to stalk the victim. The base rate of stalking violence was 39% in the sample, with further stalking of the same victim (and only 5.45% in the sample as a whole). This means that risk judgments need to be communicated carefully to minimize misinterpretation (Goossens, 2024; Hilton et al., 2015). Higher risk judgments should be the basis for directing support and intervention, and for highlighting where ongoing monitoring of apparent dynamic risk is necessary. They should not be interpreted as indicating that those in a given risk category will or will not reoffend. In the context of an SPJ guideline such as the SRP, it may be most effective and accurate to communicate categorical risk judgments using phrasing such as “. . . leading to higher/similar/lower concern about the risk of future stalking violence than would be present for others with a history of stalking.” This both emphasizes the relative nature of the risk judgment and that it is the assessor’s opinion rather than a probabilistic statement (Borum, 2023). Consistent with the principles of SPJ risk assessment, the risk statement must also be clearly tied to the risk factors that have led to it, a formulation of when risk is most likely to be realized, and discussion of the nature and intensity of risk management thought to be necessary (Logan & Johnstone, 2023).
The most important limitation of this study is our measure of future stalking and stalking-related violence. Police records of stalking may severely under-estimate recidivism. More than half of stalking cases in Australia are not reported to police (Australian Bureau of Statistics, 2016). Similarly, in the United States up to 60% of people who stalk have further contact with the victim following criminal justice intervention, and fewer than half of these are charged with stalking (Mohandie et al., 2006). While we maximized identification of stalking by using police-recorded charges (including those that did not proceed to court) rather than convictions, and identifying patterns of related charges involving the same victim that indicated the presence of stalking, it is likely that we still severely underestimated the true rate of further stalking in the sample. We were also unable to identify any stalking charges or periods of incarceration in other jurisdictions. These factors would have inflated the false positive and true negative rates in classification and predictive analyses.
The other major limitation was the lack of inter-rater reliability data. Because of the real-world nature of the study and retrospective data collection by research assistants untrained in the SRP, we could not collect the data necessary to calculate reliability statistics. Previous research in the same clinic has demonstrated excellent inter-rater reliability (McEwan et al., 2018), but it is possible that those results may not generalize to the current data. It should also be borne in mind that up to 29% of the current sample overlaps with the sample reported in McEwan et al. (2018). Results involving total scores must be interpreted in light of the prorating approach used to create a scaled total score for each risk domain across stalking types, which may produce different results from analyses conducted on raw total scores within each stalking type. Finally, given that some authors of the SRP were involved in the conduct of this study and were responsible for approximately 14% of the SRP assessments, these results may be affected by authorship bias to some degree (Singh et al., 2013).
With these limitations in mind, the results provide the first evidence that Stalking Violence risk judgments may be predictively valid. They also provide further evidence that SRP risk judgments about recurrent and serial stalking give a valid indication of differences in risk over the longer term, though suggest that they are less effective in the short to medium term if there are risk management strategies in place. The findings did not support the validity of Persistence risk judgments though, as discussed, it is possible that sampling and the methodological limitations of a field trial involving active risk management influenced by the risk assessment could explain these results. The findings demonstrate that anchoring judgments to the number of risk factors present provides a sound basis for a valid risk judgment. They also highlight the importance of understanding that SPJ risk judgments are statements about relative risk in different groups rather than probability estimates and communicating them accordingly. Like other SPJ guidelines, the SRP is intended to help inform the assessor’s thinking about why a person might present at higher risk at particular times. Future research would benefit from investigating whether formulations informed by the SRP lead to reliable hypotheses about the reasons for increased risk, and identification and implementation of effective strategies that actually reduce risk and stalking over time.
Supplemental Material
sj-docx-1-cjb-10.1177_00938548261445853 – Supplemental material for The predictive validity of the Stalking Risk Profile in everyday practice
Supplemental material, sj-docx-1-cjb-10.1177_00938548261445853 for The predictive validity of the Stalking Risk Profile in everyday practice by Troy E. McEwan, Reneta Slikboer, Benjamin Spivak, Rajan Darjee, Nina Papalia, Ashley Dunne, Melanie Simmons and James R. P. Ogloff in Criminal Justice and Behavior
Footnotes
Authors’ Note:
The authors wish to thank the Victorian Institute for Forensic Mental Health (Forensicare), Victoria Police and Corrections Victoria for assistance with data collection. This project was funded by the Catalyst Consortium. The views expressed herein are solely those of the authors and do not necessarily reflect the views or policies of Forensicare, Victoria Police and Corrections Victoria. Troy E. McEwan and James R. P. Ogloff are co-authors of the Stalking Risk Profile (SRP). Neither have a financial interest in the SRP. Prof. McEwan receives income from facilitating SRP training. Due to legal and ethical restrictions on disclosure and use of sensitive and personal information required, given the nature of the research, supporting data are not available.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
