Abstract
Global research about empowerment self-defense (ESD)—a sexual assault resistance intervention recommended as a component of a comprehensive sexual assault prevention strategy—continues to emerge, with studies reporting positive effects, including reduced risk of sexual assault victimization. Researchers have suggested ESD may produce additional positive public health outcomes beyond the prevention of sexual violence, but more research is needed to understand the benefits associated with ESD training. However, to conduct high-quality research, scholars have suggested a need for improved measurement tools. To better understand these measurement gaps, the purpose of this study was to identify and review measures used in ESD outcome studies; and in doing so, to determine the range of outcomes previously measured in quantitative studies. Within the 23 articles meeting study inclusion criteria, there were 57 unique scales that measured a range of variables. These 57 measures were grouped into nine construct categories: assault characteristics (n = 1); attitudes and beliefs (n = 6); behavior and behavioral intentions (n = 12); fear (n = 4); knowledge (n = 3); mental health (n = 8); any past unwanted sexual experiences (n = 7); perception of risk and vulnerability (n = 5); and self-efficacy (n = 11). Except for mental health, most scales were developed in the Global North using college student populations, so measures for diverse populations (e.g., diverse in age, culture, ethnicity, geographical origin) are critically needed. Future research should focus on identifying and/or developing standardized tools that measure the full constellation of targeted outcomes. Evaluation of the methodological quality of studies assessing psychometric performance of the tools should also be prioritized.
Assessing Empowerment Self-Defense Intervention Outcomes: A Comprehensive Review of Measures
The magnitude of violence against women, including sexual violence, persists as a major social problem worldwide (Dartnall & Jewkes, 2013; Garcia-Moreno et al., 2006; World Health Organization [WHO], 2018). The Centers for Disease Control and Prevention (CDC) defines sexual violence as any sexual act that occurs without the consent of the victim, including when the person is unable to provide consent. Sexual assault specifically refers to unwanted, nonconsensual sexual contact (Basile et al., 2014). Global research about empowerment self-defense (ESD) as a strategy to prevent sexual assault continues to emerge, with studies reporting positive effects, including reduced risk of sexual assault victimization (Hollander, 2014; Senn et al., 2015, 2017, 2021; Sinclair et al., 2013I). There is theoretical support and evidence from qualitative research suggesting that there may be many additional positive outcomes associated with ESD training aside from reduced rates of sexual violence. To empirically examine these outcomes, ESD research scientists have identified a need for a comprehensive range of new and/or revised measurement tools to facilitate future ESD research around the globe. The goal of this study was to identify and review measures used in previous ESD studies. In doing so, I also identify the range of intervention outcomes that researchers have previously measured. Results of this study are intended to be used to help identify the needs for developing new standardized measures and/or improving existing measures to use in ESD research.
Background
Current estimates indicate that one out of three women worldwide have experienced some form of physical and/or sexual violence (WHO, 2018). In the United States, 19.3% (one in five) women have experienced a completed or attempted rape, but for Black, Indigenous, and People of Color (BIPOC) women, these rates are even higher (Breiding et al., 2014). For BIPOC women, rates of completed or attempted rape estimates across the lifetime are 32.3% among multiracial women, 27.5% among American Indian/Alaska Native women, and 21.2% among Black women (Breiding et al., 2014). Sexual minority women are also at increased risk of sexual assault; almost half of bisexual women (46.1%) have experienced a completed rape in their lifetime, and 74.9% have experienced another form of sexual violence (Walters et al., 2013). For lesbian women, the rates are 13.1% (completed rape) and 46.4% (other form of sexual violence) (Walters et al., 2013).
Experiencing sexual violence has been linked to both acute and chronic health-related outcomes. For example, sexual violence victimization is associated with an increased risk for mental health problems, physical health problems, risk behaviors (e.g., substance misuse, sexual risk-taking), disruption of daily life, disengagement from work and/or school, increased risk of revictimization, and interruptions with personal relationships (Basile & Smith, 2011; Jordan et al., 2010; Kendall-Tackett et al., 2013; Smith & Breiding 2011; Walker et al., 2019).
Prevention and ESD
Because sexual assault can contribute to long-term, damaging outcomes for many survivors, interventions are needed to prevent sexual assault and promote improved physical and mental health outcomes for survivors. It is well established that prevention approaches should be comprehensive and should target multiple levels of the social-ecological model (i.e., individual relationship, community, and societal) to make a population-level effect on sexual violence (Basile et al., 2016). Empowerment-based education programs for women have been identified by the CDC as a recommended component of a comprehensive prevention strategy (Basile et al., 2016).
One category of empowerment-based training for women is called Empowerment Self-Defense (ESD). During the early second-wave feminist anti-rape movement of the 1960s and 1970s, women began developing a model of self-defense—now referred to as ESD—in response to the realization that available strategies at the time (e.g., martial arts training) were not adequately preparing women to address their highly prevalent real-world experiences with sexism and violence (Bevacqua, 2000). Whereas other self-defense programs tend to focus primarily or exclusively on physical resistance skills to deter attacks from strangers, ESD includes additional, practical skills training designed to interrupt or deter unwanted, gendered behaviors across a continuum of violence, with a predominate focus on the behaviors of acquaintances (Hollander, 2016; Senn et al., 2018).
ESD programs have multiple intermediate and long-term objectives. Intermediate outcomes aim to increase participants’ ability to label sexual violence and detect risk behaviors, shift beliefs about gender, increase assertiveness, increase self-defense self-efficacy, and decrease emotional and psychological barriers to engaging in assertive and protective behaviors (Hollander, 2018; Nurius & Norris, 1996; Rozee & Koss, 2001; Thompson, 2014). The long-term outcomes include decreased sexual violence victimization, decreased fear, decreased self/victim blame, increased healing, and societal-level shifts in around gender norms reflected through decreased population rates of sexual violence perpetration (Gidycz et al., 2006; Hollander, 2021; McCaughey, 1997; Rozee & Koss, 2001; Senn et al., 2018). Consequently, ESD program participants do learn and practice physical skills (e.g., striking and kicking targets, releasing from grabs, holds, and pins), but they also learn and practice evidence-informed and theory-driven strategies that address gender socialization; risk factors for sexual violence within diverse populations (e.g., age, ability, race, ethnicity, geography, religion, sexuality, prior exposure to violence, experience with high episodic drinking, etc.); detection of risky behaviors/situations from potential perpetrators; and identification of psychological and emotional barriers to resistance. ESD program activities incorporate role-play scenarios whereby students practice applying verbal and non-verbal skills in situations that are relevant to the types of unwanted experiences they tend to or are likely to encounter based on their social identity and current environment.
Several national and international studies about ESD programs have demonstrated a significant reduction in sexual assault victimization among participants (e.g., Hollander, 2014; Senn et al., 2017, 2021; Sinclair et al., 2013). A multi-site randomized control trial (RCT) of a 12-hour program called Enhanced, Assess, Acknowledge, Act (EAAA) demonstrated that program participants, compared to the treatment-as-usual control group participants, were 46% less likely to have experienced an attempted rape and 63% less likely to have experienced a completed rape at a one-year follow-up period (Senn et al., 2015, 2017). Risk of experiencing sexual assault was reduced for both women with and without a history of prior victimization since the age of 14 (Senn et al., 2015, 2017). Much more research is needed, however, to examine the effects of ESD on diverse women and girls in global settings.
It should be emphasized that the burden is not on women and girls to prevent sexual assault. ESD programming makes explicit that women may have the ability and the right to defend themselves against violence, but this does not mean that they are responsible for preventing it. Rather, the responsibility for prevention is ascribed to the perpetrator, and women should neither be expected to prevent it nor be blamed if they experience an assault (Hollander, 2018; McCaughey, 1997; Senn et al., 2018). Because sexual violence is a complex and deeply rooted social problem, it is not expected that a single strategy will be effective in addressing the innumerable social structures that facilitate violence. It is necessary both to acknowledge that women are not responsible for preventing violence and to continue empowering women in ways that help reduce their likelihood of experiencing violence (Hollander, 2014; 2018).
Study Rationale
Although the evidence suggests that ESD is a promising intervention for sexual assault prevention, more research—such as replication studies, implementation studies, and effectiveness trials in uncontrolled research environments—is needed to strengthen the evidence base supporting ESD and to promote program adoption (Basile et al., 2016; Brekke et al., 2007; Kerr-Wilson et al., 2020). A challenge for ESD research is the inconsistent selection of standardized instruments across studies—even when measuring the same construct. The lack of consistency makes it difficult to draw conclusions or make comparisons across studies. This inconsistency is also problematic because multiple replication and implementation studies are needed before determining the external validity of intervention outcomes. Engaging in this much-needed replication and implementation research requires the necessary tools—namely a consistent, comprehensive, and theoretically-driven battery of measures with strong psychometric properties that can be used across diverse populations.
One reason for the inconsistent use of outcome measures may relate to the reported concerns about the existing instruments. Some measures used in past research studies are noted for being too lengthy, incongruent with targeted ESD outcomes, or outdated in their language and context (Hollander, 2018). Other researchers have made an appeal for improved measures for risk detection (e.g., Parks et al., 2016; Senn et al., 2021; Vitek et al., 2018), and for a multi-item measure for sexual assault risk perception and optimism bias (Senn et al., 2021).
Although scholars pinpoint some of the concerns related to ESD measurement science, there is not yet—to the best of our knowledge—a review of the measures used in ESD outcome research. To address this gap and to help inform recommendations for future ESD measurement science, the goal of the current study was to compile a comprehensive and descriptive record of the standardized tools used previously to measure ESD outcomes in diverse and global populations. The study is guided by the following research question: What are the characteristics of existing measures that have been used to measure outcomes associated with participation in ESD programs? Reporting of this systematic review follows the preferred reporting items for systematic reviews and meta-analysis protocols (PRISMA-P) (Shamseer et al., 2015).
Search Strategies
The sample of measures used in the review included peer-reviewed articles that were identified through database searches and reference harvesting. For the database searches, a Boolean string of keywords was entered into seven databases: Scopus, EMBASE, Academic Search Complete, APA PsycINFO, Psychology and Behavioral Sciences Collection, Social Work Abstracts, and SocINDEX. The Boolean string included keywords to identify articles reporting quantitative results from ESD research studies. The first author consulted with key informant scholars in the field of ESD research to determine the final keyword search string: (self-defense OR self defense) AND (sex* assault OR sex* violence) AND (questionnaire OR assess* OR scale OR instrument OR measure*).
The database search results were screened systematically by two coders using inclusion and exclusion criteria to identify the articles included in the review. The screening occurred sequentially in the following order: article titles, article abstracts, and the full article. Articles were organized and managed during the multiple stages of review using Zotero, a citation management software.
The second method for identifying articles included reference harvesting by reviewing the reference lists in the articles that were identified using the database search. First, the titles of each reference were screened for inclusion and exclusion criteria (see Eligibility Criteria). The articles for any newly identified titles were retrieved online for further review. The abstract and, if relevant, full articles were screened for inclusion or exclusion criteria. Both search methods were conducted in October of 2020.
Eligibility Criteria
Articles were included in the sample for the review if they met certain criteria. Peer-reviewed journal articles were included if measurement scales associated with ESD program outcomes were used in the study. Only articles published in English were included due to feasibility, but there were no restrictions for year of publication. Articles were excluded if they only used qualitative methods to evaluate ESD program outcomes. Studies that were cross-sectional in nature (i.e., were not intervention studies) were excluded. Not all ESD programs are labeled as such, so programs were also screened for specific elements. To be included, programs had to address knowledge components and skills practice (both verbal and physical) to deter or interrupt unwanted sexual behaviors. Sexual violence resistance programs that did not include an element of verbal and/or physical skills practice were excluded.
Data Extraction
Multiple descriptive features of the measures were extracted and compiled into purposively designed data extraction tables. Characteristics that were obtained included the name of the measure, subscales (if relevant), number of items in the measure, response options, scoring, mode of completion, study population, and psychometric properties when used and reported in ESD research.
Results
The database searches yielded a total of 128 papers. After removing duplicates, there were 91 papers to screen. Figure 1 summarizes the screening and review process. A total of 23 studies were included in the final sample (Baiocchi et al, 2017; Ball & Martin, 2012; David et al., 2006; Decker et al., 2018; Gidycz et al., 2006; Gidycz et al., 2015; Hollander, 2004; Hollander, 2014; Hollander & Cunningham, 2020; Holtzman et al., 2014; Mouilsa et al., 2011; Munsey et al., 2015; Orchowski et al., 2008; Ozer & Bandura, 1990; Pinciotti & Orcutt, 2018; Sarnquist et al., 2014; Sarnquist et al., 2017; Senn et al., 2011; Senn et al., 2015; Senn et al., 2017; Sinclair et al., 2013; Weitlauf et al., 2000; Weitlauf et al., 2001): 18 were conducted in North America and 5 were conducted in sub-Saharan Africa. There were 57 unique measures (see Table 1) used across these 23 ESD outcome studies. The nine concepts featured across the measures include the following: assault characteristics (n = 1); attitudes and beliefs (n = 6); behavior and behavioral intentions (n = 12); fear (n = 4); knowledge (n = 3), mental health (n = 8); any past unwanted sexual experiences (n = 7); perception of risk and vulnerability (n = 5); and self-efficacy (n = 11).

Flow diagram of search and selection process.
Summary of Findings.
Assault Characteristics
One measure was used to measure the characteristics of participants’ assault experiences. The measure used multiple question types to gather details about the most severe incident of sexual assault following participation in a sexual assault prevention program (Gidycz et al., 2015). Respondents were asked to provide details about the number of times they had been assaulted, resistance tactics they used, and the degree to which they blamed themselves or the perpetrator. The measure was administered to a sample of college women. Response options varied based on the items. For example, the attribution of blame was a Likert-type scale ranging from 1 (not at all responsible) to 5 (very much responsible) (Gidycz et al., 2015).
Attitudes and Beliefs
There were six tools used to measure attitudes and beliefs, although only five of them were used as outcome measures and the sixth used as a check for acquiescent response bias on the accompanying self-report measures. These measures included: Perceived Causes of Rape Scale (Cowan & Campbell, 1995); Rape Attributions Questionnaire—Causes of Rape (Frazier, 2003); Levenson’s Internality, Powerful Others, and Chance Scales (Levenson, 1972); Liberal Feminism Ideology Scale—Short Form (Morgan, 1996); Illinois Rape Myth Acceptance Scale—Short Form (Payne et al., 1999); Marlowe Crowne Social Desirability Scale—Short Form (Reynolds, 1982). Two of the measures—the Perceived Causes of Rape Scale and Rape Attributions Questionnaire—were created to determine respondents’ beliefs about the causes of rape (Cowan & Campbell, 1995, 1997; Frazier, 2003). The Illinois Rape Myth Acceptance Scale—Short Form is a 45-item measure of rape myth beliefs (Payne et al., 1999), where rape myths are defined as “attitudes and beliefs that are generally false but are widely and persistently held, and that serve to deny and justify male sexual aggression against women” (Lonsway & Fitzgerald, 1995, p. 134). The Liberal Feminism Ideology Scale—Short Form is an 11-item measure of feminist attitudes related to goals of feminism and feminist ideology (Morgan, 1996; Woodbrown, 2015). Levenson’s Internality, Powerful Others, and Chance Scales is a measure for locus of control, defined as “expectancies for control as they relate to involvement in voluntary social action activities” (Levenson, 1972, p. 261). As the name suggests, this measure has three dimensions for internality, chance, and powerful others. The fifth scale, Marlowe Crowne Social Desirability Scale—Short Form, is a measure of social desirability (Reynolds, 1982). This scale was designed to function as an adjunct measure to determine the extent to which social desirability affects participant responses on the other self-report measures related to the primary purpose of the study. As such, it is not used as an outcome measure to examine program impact, but rather is used to assess for acquiescent response bias—the validity of participants’ responses on the accompanying measures.
All six of these measures of attitudes and beliefs reported information about the psychometric evaluation of the measures. Levenson (1972) reported questionable to acceptable ranges of reliability for the three subscales, with alpha coefficients of 0.64, 0.77, and 0.78. However, when used in the ESD study, Weitlauf et al. (2000) reported unacceptable internal consistency for two of the three subscales in Levenson’s Internality, Powerful Others, and Chance Scales. Coefficient alphas were 0.17, 0.40, and 0.61, respectively, so only the data from the Powerful Others subscale was retained for analysis in their study. The remaining five measures reported reliability metrics that were acceptable or good.
Sample populations were somewhat diverse among the attitudes and beliefs measures. The Liberal Feminism Ideology Scale—Short Form was used on a population of undergraduate college women, but the remaining five measures were tested on mixed-gendered populations (or unspecified gender, as in the case of the Rape Attributions Questionnaire (RAQ)). Most of the measures were tested with adult populations, either undergraduate students or adults, but the Perceived Causes of Rape Scale (Cowan & Campbell, 1995) was unique in that a modified version with simpler language was evaluated on a small sample of high school students and found to have acceptable reliability.
Behavior and Behavioral Intentions
There were 12 scales used to measure actual behavior and behavioral intentions. These scales included the following: The Aggression Questionnaire (Buss & Perry, 1992); Sexual Communication Survey (Hanson & Gidycz 1993); Dating Behavior Survey (Hanson & Gidycz 1993); Sexual Assault Self-Protection Scale (Holtzman & Menning, 2015); Silencing the Self Scale (Jack & Dill, 1992); Dating Self-Protection Against Rape Scale (Moore & Waterman 1999); Sexual Assertiveness Scale (Morokoff et al., 1997); Resistance Tactics (Orchowski et al., 2008); Participant and Avoidant Behavior (Ozer & Bandura 1990); Behavior Test of Self-Protective Skill (Ozer & Bandura 1990); Rathus Assertiveness Schedule (Rathus, 1973); and the Sexual Assertiveness Scale for Women (Walker, 2006).
Three of the tools—the Sexual Communication Survey (SCS; Hanson & Gidycz, 1993), the Sexual Assertiveness Scale (SAS; Morokoff et al., 1997), and the Sexual Assertiveness Questionnaire for Women (SAQ-W; Walker, 2006)—measured college women’s assertive sexual communication. The SCS was designed to assess respondents’ perceptions about communication of sexual intent in dating situations (Hanson & Gidycz, 1993). The SAS measured three dimensions of assertive sexual communication, including initiation of sexual activity, refusal of sexual activity, and communication around prevention of pregnancy and sexually transmitted illness (Morokoff et al., 1997). The SAQ-W measured four dimensions of sexual assertiveness: relational sexual assertiveness, sexual confidence and communication, commitment focus, and sex-related negative affect (Walker, 2006).
All measures, except for the behavior test of self-protective skill (Ozer & Bandura, 1990), had items that were self-reported by the respondent. The behavior test of self-protective skill was a behavioral observation of three mock assault scenarios in which participants were rated on their strike proficiency and overall effectiveness. Ozer and Bandura (1990) reported strong inter-rater reliability among the raters (r = 0.73 and 0.81, respectively). Reliability data were not reported for the behavior test of self-protective skill (Ozer & Bandura, 1990), resistance tactics (Orchowski et al., 2008), and participant and avoidant behavior (Ozer & Bandura, 1990). Most of the measures (n = 9) were used with a study population composed of women (seven with college women). The Rathus Assertiveness Scale (Rathus, 1973) was evaluated using both men and women college students between the ages of 17 and 27, and the Dating Self-Protection Against Rape Scale (Moore & Waterman, 1999) included both men and women college students.
Fear
There were four measures related to fear. Both measures by Ozer and Bandura (1990) were single-items (i.e., negative thoughts and anxiety arousal). For “negative thoughts,” participants rated on a 6-interval scale how frequently they had thoughts about sexual assault, and the “anxiety arousal” item measured on a 10-interval scale the level of anxiety about the possibility of experiencing a sexual assault (Ozer & Bandura, 1990). No further information about these items was reported.
The two additional fear scales were much lengthier in comparison. The Fear of Rape Scale (Senn & Dzinas, 1996) is a 31-item measure with Likert-type items and true/false items, and the Perceptions of Dangerous Situations Scale is a 37-item measure that is completed three times, so there are 111 responses a respondent must provide (Hughes et al., 2003). The 37 items are repeated three times to assess participants’ perception of fear of rape, likelihood of victimization, and confidence in being able to manage dangerous situations (Hughes et al., 2003). The reliability of the subscales ranged among poor, acceptable, and good.
Knowledge
There were three scales used to measure knowledge. The Ohio University Sexual Assault Risk Reduction (SARR) Program Knowledge Measure used 30 items to measure a variety of topics covered in the SARR program (Gidycz et al., 2006). These items were a variety of multiple-choice, true/false, and short-answer questions. The second knowledge measure was an item about self-defense tactics in which participants provided a response to an open-ended item: “If a man I knew (e.g., a date or acquaintance) tried to force me to have sex with him when I didn’t want to, I would. . ..” (Senn et al., 2017, p. 151). Two coders scored the responses into dichotomous categories based on whether the respondent mentioned an effective resistance strategy as defined by Ullman (1997). The two coders also recorded a count of the number of forceful resistance strategies mentioned in the participant’s response (Senn et al., 2017). Inter-rater agreement ranged from good to excellent with Cohen’s kappa coefficients ranging from 0.82 to 0.91 (Senn et al., 2017). Both study populations for these two measures included college women (Gidycz et al., 2006; Senn et al., 2011, 2017). With their community ESD sample, Hollander & Cunningham (2000) used a 4-item tool adapted from Gordon & Riger (1989) that assessed respondents’ knowledge about typical outcomes associated with resisting sexual assault.
Mental Health
Recognizing that self-defense programs can attract women with prior experiences of sexual violence, authors of some studies included measures of mental health to determine whether self-defense training could affect mental health outcomes. In total, there were eight tools used to measure mental health outcomes: The Beck’s Depression Inventory (BDI; Beck et al., 1988); Symptom Checklist-90—Revised (SCL-90-R; Derogatis, 1977); Posttraumatic Stress Diagnostic Scale (PDS; Foa et al., 1997); Ways of Coping (WCQ; Folkman & Lazarus, 1988); Emogram ( Computer software, n.d.; Mudge, 2003); Rosenberg Self-Esteem Scale (RSES; Rosenberg, 1965); Washington Self-Description Questionnaire (WSDQ; Smoll et al., 1993); and the Posttraumatic Stress Disorder Checklist—Civilian Version (PCL-C; Weathers et al., 1993).
Of these eight measures, two measured self-esteem: the WSDQ (Smoll et al., 1993) and the RSES (Rosenberg, 1965). Two instruments (PCL-C and PDS) were used to measure PTSD symptoms (Foa et al., 1997; Weathers et al., 1993). Psychological distress was measured with the SCL-90-R (Derogatis, 1977). The BDI (Beck et al., 1988) was used to measure depression symptoms, and the WCQ (Folkman & Lazarus, 1988) was used to measuring behavioral and cognitive coping strategies. Only the 8-item “Escape and Avoidance” subscale was used in the ESD study that examined coping as an outcome (Mouilso et al., 2011). This subscale measures avoidant and escape behaviors used to cope with distress (Folkman & Lazarus, 1988).
Most of these mental health measures are completed via self-report, except for the Emogram (Mudge, 2003). The Emogram is a computer-based program that analyzes participants’ emotional state and their changes in 11 emotions: anger, anxiety, contempt, disgust, distress, fear, happiness, interest, sadness, shame, and surprise (Mudge, 2003).
Unwanted Sexual Experiences
All eight measures of unwanted sexual experiences are self-report measures. Of these eight measures, three were versions of the Sexual Experiences Survey (SES; Koss et al., 1985; Koss et al., 1987; Koss et al., 2007). Three of the tools were single-item measures to assess sexual assault victimization (Baiocchi et al., 2017; Decker et al., 2018; Holtzman & Menning, 2015; Sarnquist et al., 2014; Sarnquist et al., 2017; Sinclair et al., 2013; Senn et al., 2011). Pinciotti and Orcutt (2018) used five items from the National Violence Against Women Survey to measure rape and attempted rape (Tjaden & Thoennes, 1998). Ozer and Bandura (1990) used a self-report measure of past experiences (i.e., those that occurred prior to the intervention) with physical and sexual assault, but additional details about the measure were not described.
The most frequently used scale to measure unwanted sexual experiences was the SES (Koss et al., 1985; Koss et al., 1987; Koss et al., 2007). The SES was originally developed in 1982 to examine both sexual violence victimization and perpetration but has been revised and refined over the years to improve reliability and validity, and to capture legal and policy definitions of unwanted sexual experiences more accurately (Koss et al., 2007). The instrument is a self-report measure using behaviorally-specific items to identify unwanted sexual experiences. The short form has 10 questions to determine the number of times respondents experienced a particular behavior within the past year; while the long form includes an additional 11 questions and includes a second recall time period. Respondents answer each item considering their experiences in the past 12 months and since the age of 14; and, when used as an outcome measure, respondents consider the period since participating in a program or another specified timepoint (Koss et al., 2007). The SES measures six categories of unwanted experiences: rape, attempted rape, coercion, attempted coercion, sexual contact, and no unwanted sexual experiences. Responses to the SES are scored by calculating prevalence for each category or by calculating mutually exclusive categories to determine frequency of experiences according to a respondent’s most severe experience. (Koss et al., 2007).
None of the measures except for the SES indicated evidence of having undergone psychometric evaluation. Although on the lower end of acceptability, all versions of the SES have demonstrated Cronbach coefficient alphas greater than 0.70 (Koss et al., 2007). Psychometric evaluation of the SES also has demonstrated good internal consistency for diverse populations (Johnson et al., 2017). For example, among a sample of African American adolescent women, Cecil and Matson (2006) reported good internal consistency, convergent validity, and support for discriminant validity. The SES was one of two measures to be used with adolescent populations—the other being the single-item measure used in the Kenya and Malawi studies (Baiocchi et al., 2017; Decker et al., 2018; Sarnquist et al., 2014, 2017; Sinclair et al., 2013). See Discussion for additional details about measures of unwanted sexual experiences used in a similar study in Kenya (Rosenman et al., 2020).
When administering the SES, Senn et al., (2011) included an additional item to measure “close call” experiences, which referred to experiences when the respondent successfully applied resistance strategies (Testa et al., 2006). Participants were asked, “Have you (since the program ended/in the last 3 months) had a dating situation where you believe you AVOIDED sexual coercion or sexual assault by your actions? (e.g., removing yourself from the situation, calling a friend, etc.)”; and if participants answered “yes,” they were then asked to provide more details about that experience (i.e., “Can you tell us what happened?”) (Senn et al., 2011, p. 79).
Perception of Risk and Vulnerability
Five measures have been used to measure personal vulnerability and perceived risk: Perceived Risk of Acquaintance Rape (Gray et al., 1990); Risk Perception Survey (Messman-Moore & Brown, 2006); Perception of Risk—Michael scenario (Norris et al., 1999 with added items from Testa et al., 2006); Personal Vulnerability (Ozer & Bandura, 1990); and Risk Assessment and Discernment (Ozer & Bandura, 1990). Two of these measures were scenario-based (Messman-Moore & Brown, 2006; Norris et al., 1999; Testa et al., 2006). The hypothetic situations in the measures involve an escalation of sexually coercive behaviors by a male acquaintance or a male stranger. The sequence of the man’s behaviors—which ultimately end with him sexually assaulting the protagonist—becomes known to the respondent after the first administration of the measure. If the measure were to be administered a second time (e.g., as a post-test survey), participant responses would be biased—it is likely that participants would accurately characterize the man’s escalating behaviors as being indicators of risk since they know he eventually sexually assaults the protagonist. Repeated administration of identical vignettes, therefore, is not recommended because doing so would yield biased responses (Senn et al., 2017). The Perceived Risk of Acquaintance Rape (Gray et al., 1990) and Personal Vulnerability (Ozer & Bandura, 1990) each have a single item. Ozer and Bandura (1990) also created a Risk Assessment and Discernment measure with two items. The only measure of risk perception with reported psychometric properties is the Perception of Risk—Michael scenario (α = 0.81) (Norris et al., 1999; Testa et al., 2006).
Self-Efficacy
Various types of self-efficacy were studied using 11 different measures, all of which are self-report measures. The measures included the following: Coppel’s Self-Efficacy Scale (Coppel, 1980); self-defense self-efficacy according to assailant type (Hollander, 2014); self-efficacy ratings (Marx et al., 2001); coping self-efficacy (Ozer & Bandura, 1990); cognitive control self-efficacy (Ozer & Bandura, 1990); the Physical Self-Efficacy Scale (Ryckman et al., 1982); General Perceived Self-Efficacy (GSE; Schwarzer & Jerusalem, 1995); Self-Efficacy Scale (Sherer et al., 1982); Domain-specific self-efficacy (Weitlauf et al., 2001); Task-specific self-efficacy (Weitlauf et al., 2000); and Self-defense self-efficacy (Weitlauf et al., 2001). The Coping Self-Efficacy Scales by Ozer and Bandura (1990) included three domains: self-defense self-efficacy, interpersonal self-efficacy, and activities self-efficacy. Ozer and Bandura’s self-defense self-efficacy subscale was used to inform the development of the two other self-defense self-efficacy scales by Marx et al., (2001) and Weitlauf et al., (2001). Self-defense self-efficacy was the most frequently measured form of self-efficacy (Ball & Martin, 2012; David et al., 2007; Gidycz et al., 2006; Gidycz et al., 2015; Hollander, 2004; Hollander, 2014; Hollander & Cunningham, 2020;Orchowski et al., 2008; Ozer & Bandura, 1990).
Eight of the 11 self-efficacy measures reported at least some information about psychometric evaluation of the scale. The three measures that did not report this information were the self-defense self-efficacy according to assailant type (Hollander, 2014), the self-efficacy ratings (Marx et al., 2001), and cognitive control self-efficacy (Ozer & Bandura, 1990). The remaining measures were shown to have, at minimum, acceptable reliability.
The GSE is the only measure to have undergone psychometric evaluation with populations diverse in age, gender, race, and nationality (Schwarzer & Jerusalem, 1995). Marx et al. (2001) did not report information about the sample population information, but the remaining measures were predominantly evaluated with college women with the exception of the two Ozer and Bandura (1990) measures, which were evaluated with women between the ages of 18 and 55 years old.
Discussion
The goal of this study was to identify and describe the measures that have been used in ESD intervention research. Across the 23 studies identified through the database search, 57 instruments had been used to measure nine categories of outcomes. There were several notable features about these instruments that have implications for future ESD research and the development of psychometrically sound measures (Table 2).
Implications.
The primary outcomes that ESD interventions aim to achieve are decreased sexual violence victimization, decreased fear, decreased self/victim blame, increased healing, and societal-level shifts in around gender norms reflected through decreased population rates of sexual violence perpetration. Although these targeted outcomes were largely represented across the battery of measures, there are many complex considerations that researchers and practitioners should contemplate before hastily selecting and using measures described in this study.
Results of this study reveal several concerning attributes among many of the measures. One major concern with many of them is the minimal reporting, and presumably evaluation, of the full range of psychometric properties (i.e., construct validity, content validity, structural validity, internal consistency, measurement invariance and cross-cultural validity, reliability, measurement error, criterion validity, and responsiveness) (Mokkink et al., 2018; Prinsen et al., 2018). This finding reveals a critical need for more robust psychometric testing and reporting of psychometric properties of these measures.
Many existing measures reported in this review were developed using college student and adult populations, so psychometric properties for most measures have not been evaluated with diverse populations, and particularly with adolescents, children, and women in transnational locations. The nature of sexual violence varies across populations diverse in age, race, ethnicity, ability, sexuality, gender, previous exposure to violence, etc., so it is critical to have measures that are valid for diverse populations, including women with heightened risk of victimization (e.g., women who engage in heavy episodic drinking, substance use, and misuse, and/or have a prior history of sexual victimization). Additionally, given the prevalence of sexual violence that occurs before the age of 18—and the corresponding heightened risk of revictimization—there is also an urgent need for measures that are valid and reliable for younger populations.
Most of the scales used in ESD intervention research were developed over 20 years ago in the 1990s and early 2000s. Some constructs such as fear and perception of risk may be susceptible to change over time because these variables could be impacted by social, political, and cultural contexts (Johnson & Johnson, 2021). The perception of risk and vulnerability scales, for instance, were developed in 1990 and 1999 (with additional items added in 2006), and the measures for fear were created in 1990, 1996, and 2003. All of these scales were created long before sexual violence was heavily featured in the public eye (e.g., #MeToo, #SayHerName). The shift in sexual violence discourse may have affected women’s perceptions of risk and vulnerability along with their sense of fear of sexual violence. These social influences are likely to have had differential effects on people according to their race and ethnicity, particularly with the explicit and violent expressions of xenophobia that increased over the past years. Additional psychometric studies should be conducted to determine the properties of these scales when applied to contemporary and diverse populations. Many of the identified studies involved in this review included measures of prior victimization. Measures of sexual violence experiences included both single-item and multiple-item measures. While multiple-item measures are often used to characterize various aspects of an abusive situation (Godbout et al., 2009), single-item measures have also been endorsed to balance competing needs: the need to understand the nature of violence and the need to respect the privacy and well-being of respondents (Becker-Blease & Freyd, 2006). Complicating the decision between using single- or multiple-item measures of violence, there is also no consensus around key victimization outcome variables for prevention trials—there is still debate about whether targeted outcomes “should be total cessation of violence, lower frequency of violent acts, or non-initiation of violence, and whether all violence should be considered together or particular types of violence privileged, or independently examined (such as physical and/or IPV rather than emotional or economic IPV)” (Jewkes et al., 2020, p. 3). While applying multiple measures of violence could be useful to maximize likelihood of examining “the right thing,” it also increases risk of returning positive findings due to chance from multiple testing (Jewkes et al., 2020).
Repeated testing has demonstrated that the SES—commonly used in the United States as a multiple-indicator measure of unwanted sexual experiences—has strong psychometric properties. Less is known about the performance of single-item measures of sexual violence experiences. These single-item measures, even though, are congruent with the recommendation to use behaviorally-specific wording, rather than colloquial terms such as rape (Fisher, 2009).
The SES measures multiple forms of sexual victimization, but it also only measures these experiences starting at the age of 14 to reflect most legal definitions of rape in the United States and Canada. Prior sexual assault victimization, including child sexual abuse, is a strong predictor of future sexual assault victimization (Briere, 1992; Messman-Moore & Long, 2000; Walker et al., 2019). Therefore, measuring sexual victimization prior to age 14 could be beneficial to explore how ESD intervention outcomes differ between women who have prior experience of victimization before the age of 14 and those who do not. As is widely recognized, measures for younger populations should be appropriate for their developmental and literacy levels (e.g., instrument design, vocabulary, comprehension, minimum age for achieving valid and reliable responses) (Matza et al., 2013).
Because the SES was informed by legal definitions of rape produced in the Global North, it is not applicable to globally diverse populations, nor populations under 14. Understandably, the studies in Malawi and Kenya that are reported in this review did not use the SES, and instead used single-item measures. However, in a similar study in Kenya, researchers used seven items to measure victimization (Rosenman et al., 2020). The questions—informed by relevant surveys including the Kenya violence Against Children Survey and the Stepping Stones project in South Africa—asked details pertaining to experiences of rape including relationship to the perpetrator, methods of coercion or threat, injury sustained during the assault, presence of alcohol or drugs, and the number of times they had been assaulted (Rosenman et al., 2020). Having asked multiple items about victimization, the researchers were able to resolve inconsistent responses and empirically validate their procedure to produce unbiased estimates of baseline rape prevalence. Their approach was particularly novel and relevant to ESD research conducted with younger populations and populations outside of the Global North because, in some cases, measures of sexual assault prevalence have been prone to yielding inconsistent responses (Rosenman et al., 2020). For example, in a study about college women, 20% of responses initially categorized as “rape” were later categorized as “undetermined” because of the inconsistent responses reported in the follow-up surveys (Fisher & Cullen, 2000). Future scholars may benefit from the approaches to survey item development and data validation reported in Rosenman et al. (2020), particularly in culturally and geographically diverse settings.
Measures of fear and attribution of blame were used in prior ESD intervention studies. Out of the four measures of fear, the Fear of Rape Scale was the only one to report strong psychometric properties (Senn & Dzinas, 1996). Although the two measures for attribution of blame—the Perceived Causes of Rape Scale (Cowan & Campbell, 1995; Cowan et al., 1997) and the RAQ (Frazier, 2003) —reported strong psychometric properties, both are restrictively lengthy (32 items and 43 items, respectively), were developed over 20 years ago, and have not been widely tested with diverse populations over time to assess for measurement invariance.
Findings from this review indicate that the full constellation of intended outcomes for ESD have not yet been measured, revealing a need to develop and/or apply new scales capable of measuring these outcomes. For example, there was not a conceptual and operationalized measure for “healing.” Several of the mental health measures, such as the PTSD symptom scales and the depression inventory, were used to measure mental health symptomology, but healing from sexual trauma has been identified as a process of recovery that extends beyond mere mitigation of mental health symptoms (Draucker et al., 2009; Sinko et al., 2022). Healing from sexual trauma may deviate from other traumatic experiences because sexual violence is an “intentional violation of bodily autonomy perpetrated by another person” (p. 15). Similarly, previous qualitative research suggests that ESD may contribute to healing from past sexual trauma (Beaujolais, 2022; Senn et al. 2021). Consequently, there is a need for additional quantitative measures capable of assessing all domains of healing from sexual trauma.
Two other intended outcomes of ESD are a societal-level shift in normative gendered behavior and a decrease in rates of perpetration. To date, no ESD study has measured the effect on cultural change and rates of perpetration. Because research in this field is still relatively young, a priority for measurement in ESD research is reduced victimization, so it is likely that the field has not yet been ready to measure the effect of ESD on rates of perpetration within a community where women have received ESD training. However, when the field advances to this stage, the SES—Short Form Perpetration [SES-SFP] could be used to measure this outcome.
There were fewer measures related explicitly to shifts in beliefs about gender. The Liberal Feminism Ideology Scale—Short Form relates to gender but is focused more on the goals of feminism than it is on gender beliefs. Theoretical support and qualitative evidence suggest strong linkages between ESD training and shifts in beliefs about gender (e.g., Hollander, 2015; 2021), so future research should explore optimal ways to measure shifts in gender beliefs.
Results of the current study reinforce the need for application of additional measures of sexual assault risk detection. Environmental, verbal, and non-verbal cues—many of which are subtle, ambiguous, and nuanced—are known to precede sexual assault (Davis et al., 2009; Norris et al., 1996). In addition to being realistic, risk perception measures using scenario-based measures must be relevant and meaningful to the respondents because this salience is necessary for capturing in vitro perceptions that closely mimic the respondent’s likely in vivo reaction (Noel et al., 2008). It was for this reason that Parks et al. (2016) developed a measure involving three videos depicting low, medium, and high-risk cues for alcohol-related sexual assault. The authors noted a potential confounding effect of racial differences and/or childhood sexual abuse (CSA) on risk perception. For the high-risk scenario, racial minority status was associated with decreased risk perception; however, all respondents in this group had also experienced CSA. Nonetheless, the authors determined that the video measures demonstrated convergent construct validity, reliability, and relatability (2016). Additionally, authors of a newly developed measure of sexual assault risk—the Sexual Assault Scripts Scale (SASS)—reported evidence of criterion validity for the SASS and evidence of acceptable internal consistency for all four subscales of the measure (Yeater et al., 2020). The measures included in this current study were all written scenarios. “While it remains an empirical question whether the mode of scenario presentation, written, audio, or verbal, increases validity, it is clear that standardized measures for assessing risk perception need to be developed and assessed for reliability and validity” (Parks et al., 2016, p. 2). Additionally, future ESD research could explore the use of these video measures and the SASS to assess the effect of ESD training on participants’ ability to assess risk cues for sexual assault.
Empowerment, which is featured in feminist and empowerment theories, was not explicitly operationalized or measured in any of the studies. As with other constructs, empowerment could potentially be operationalized using other outcome variables such as self-efficacy and assertiveness, but again, these variables fall short of the theoretical construct for empowerment as it is defined in social work and feminist literature (Miguel et al., 2015). A direction for future ESD research might consider including empowerment as a distinct construct to measure.
Limitations
Findings should be considered in the context of the study limitations. This review only included peer-reviewed published studies, subjecting findings to a publication bias. Another limitation is that not all ESD programs are labeled as such, which creates a challenge when delineating the boundaries for inclusion and exclusion. For example, the EAAA program (Senn et al., 2011, 2015, 2017) is labeled as an ESD intervention, but it has not consistently been named as such in the literature. Rape, Aggression, and Defense (RAD) programs are not typically branded as ESD because they are not known for being consistently positioned as a feminist, empowerment program. As such, this program name was not included in the keyword search. One study of a RAD program was included, however, because it was identified in the search results, and there was no indication in the study that the program elements were not congruent with ESD. It is possible that other studies of RAD programs were excluded because of the keyword search. Similarly, ESD courses can be considered sexual violence resistance programs, but not all sexual violence resistance programs are congruent with ESD. It was outside of the scope of this review to include all sexual violence resistance programs. However, the inconsistent labeling of programs is an undeniable challenge that researchers in the field need to confront.
An additional limitation is that this study did not include a review of the methodological quality of the studies employing the measures nor of the studies reporting on the development of the measures. To do so was outside the scope of the current study. Future work should apply the rigorous review methodology outlined by the COSMIN (
Conclusion
As the science of sexual violence prevention continues to evolve, so too must measurement science. The importance of strong measures cannot be overstated. Results from this review can inform future directions of measurement development for ESD research. In addition to bolstering the psychometric evaluation of existing scales, research efforts are needed for the development and utilization of measures capable of assessing a broader range of outcomes for diverse populations, and particularly for those diverse in age, ability, and geographical origin. Hopefully, these study findings can help evaluators exercise caution when selecting standardized measures appropriate for meeting their evaluation goals and help researchers as they advance the measurement science for ESD.
Footnotes
Acknowledgements
This study was conducted as a component of my dissertation research. I would like to thank my dissertation committee members who contributed invaluable expertise and support: Dr. Cecilia Mengo (chair), Dr. Susan Yoon, Dr. Michelle Kaiser, and Dr. Christine Gidycz. I would also like to thank the reviewers who provided instructive feedback.
Authors' Note
Brieanne Beaujolais is now affiliated to Mighty Crow Media, Columbus, OH, USA.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
