Abstract
The purpose of the current study is to review the available scientific evidence on the relationship between testosterone and sexual aggression. A systematic search for all primary studies comparing basal testosterone levels in sex offenders and non-sex offenders was undertaken across 20 electronic databases using an explicit search strategy and inclusion/exclusion criteria. A total of seven studies were identified and 11 effect sizes were computed; effects were pooled using both fixed and random effects meta-analysis models. Although individual study findings present a mix of results wherein sex offenders have higher or lower baseline levels of testosterone than non-sex offenders, pooled results indicate no overall difference between groups. Moderators of the analyses suggest possibly lower rates of testosterone in child molesters than controls; however, results are dependent on study weighting. Limitations, policy implications with respect to chemical castration laws, and future directions for research are discussed.
Introduction
The use of castration as a sentencing modality in the United States has been controversial. Scholars have questioned the constitutionality of laws mandating chemically induced castration 1 (e.g., Beckman, 1997; Fitzgerald, 1990; Lombardo, 1997; Mancini & Mears, 2013; Scott & Holmberg, 2003), raising many legal and ethical issues surrounding these treatments (Beauregard & Lieb, 2011; del Busto & Harlow, 2011; Mancini, Barnes, & Mears, 2011; Scott & Holmberg, 2003; Tregilgas, 2010). One of the assumptions behind such laws is that through chemical or surgical treatment aimed at reducing testosterone levels, sex offenders’ sexual urges will be reduced thus reducing the risk of recidivism (American Psychiatric Association, 1999; Scott & Holmberg, 2003).
The purpose of this article is to investigate whether there is scientific evidence to suggest that sexual offenders have higher levels of testosterone. It is not uncommon for policymakers and professionals dealing with sex offenders to invoke biological defects (including abnormal levels of testosterone) to justify treatments (Lea, Auburn, & Kibblewhite, 1999; Sample & Kadleck, 2008). Researchers reviewing past scholarship, while questioning the overall quality of prior studies and the strength of the evidence, have argued that “although [testosterone] plays an essential role in sexual behavior, there is no clear association between basal [testosterone] levels and forensically relevant sexual disorders such as paraphilias” (Jordan, Fromberger, Stolpmann, & Müller, 2011b, p. 3012; Krueger & Kaplan, 2001). These studies were limited to narrative reviews of selected literature, and to our knowledge, no systematic review on the topic has been conducted. Moreover, past reviews have often been limited to populations with clinically diagnosed sexual disorders and paraphilias, which do not necessarily reflect the makeup of sexual offenders as defined by law (Krueger & Kaplan, 2001).
In this article, we seek to fill this gap by conducting a meta-analytic review of studies on the relationship between testosterone levels and sexual offending. We begin by reviewing the evidence on the relationship between testosterone and sexual behavior and aggression generally. We then present our methodology, followed by the results of our analyses. We conclude with a discussion of the results and policy implications of our findings.
Testosterone and Sexual Behavior
T is a steroid hormone (androgen) that plays a key role in male sexuality (Jordan, Fromberger, Stolpmann, & Müller, 2011a). It has been shown to have effects on autonomic processes (i.e., involuntary physiological reactions to sexual stimuli), cognitive aspects (i.e., perception and classification of stimuli as sexually inviting), emotional components (i.e., pleasure associated with increases in arousal), and motivational aspects (i.e., perceived urges to engage in sexual activity). T has also been shown to interact with the mesolimbic dopamine system, which increases sexual motivation (Jordan et al., 2011a).
Studies in which hypogonadism is pharmacologically induced (a condition characterized by lower than normal production of sexual hormones) in human subjects have found that lower testosterone levels led to a decrease in frequency of sexual fantasies, sexual arousal, and sexual motivation (Bagatell, Heiman, Rivier, & Bremner, 1994; Jordan et al., 2011b; Schmidt et al., 2009). However, studies have underscored that normal testosterone levels are much higher than is necessary to maintain normal sexual functions (Christiansen, 2001; Krueger & Kaplan, 2001). This implies that reducing testosterone levels does not guarantee the total loss of sexual function and drive (Jordan et al., 2011a; Krueger & Kaplan, 2001; Rösler & Witztum, 1998). Nevertheless, given the neurophysiological processes that link T and sexual behaviors, a reduction in T is likely to lead to a decrease in sexual activity, and therefore a reduction in recidivism for those whose crimes are sexual in nature.
Indeed, many studies have reported on the effectiveness of chemical castration treatments in reducing sexual drive, erections, and sexual fantasies of sexual offenders (Bradford & Pawlak, 1993; Cooper, 1981; Gijs & Gooren, 1996; Hucker, Langevin, & Bain, 1988; Maletzky & Field, 2003; Meyer & Cole, 1997; Turner, Basdekis-Jozsa, & Briken, 2013; Winder et al., 2014). Chemical castration treatment evaluations have also generally found support for a reduction in recidivism rates (Grossman, Martis, & Fichtner, 1999; Lösel & Schmucker, 2005; Maletzky, Tolan, & McFarland, 2006; Meyer & Cole, 1997; Meyer, Cole, & Emory, 1992). However, as Rice and Harris (2011) pointed out, most studies on the effectiveness of chemical castration in sex offenders have severe methodological flaws, most notably the use of comparisons between individuals who volunteered for treatment with individuals who refused. Many researchers have commented on the lack of rigorous evaluations of medical intervention for the treatment of sex offenders (e.g., Hanson, Bourgon, Helmus, & Hodgson, 2009; Långström et al., 2013) and have argued that methodological flaws often confounded evidence of effectiveness (Lösel & Schmucker, 2005).
Given that chemical castration is known to reduce testosterone levels and sexual drive, why have evaluations of these treatments in sex offenders shown, at best, mixed results? Notwithstanding weak evaluation designs, a possible explanation may be that the relationship between testosterone levels and sexual offending is not as strong as these treatment options imply. As Rice and Harris (2011) contended, “one must regard the professional literature as very curious. The outcome evaluation research is remarkably weak, so weak that, were the treatment not so plausible, it would have to be regarded as empirically unsupported” (p. 325).
The Relationship Between Testosterone and Aggression
The logic behind the association between T and aggression stems from a common observation about gender differences: Males are typically more aggressive than females. According to Beaver (2010), because this observation seems to hold “in every culture studied, at every time period in history, and in every sample ever examined” (p. 89), it follows that biological factors—as opposed to sociocultural factors—are likely to explain well-established gender differences in violence. Therefore, the identification of a biological factor that is associated with aggression and is more prevalent in males should explain these differences (Beaver, 2010). T, it turns out, appears to be a prime candidate for such explanations.
The association between T and aggression was observed decades ago in animal studies (e.g., Beeman, 1947; Bevan, Levy, Whitehouse, & Bevan, 1957; Collias, 1944; Seward, 1945). These studies generally found that when mice and birds were injected with T, they behaved more aggressively. Conversely, when animals were castrated, effectively removing the production of testosterone, aggression decreased. Such findings have been replicated in subsequent studies leading to the belief that there exists a clear causal relationship between testosterone levels and aggression in animals (Haller, 2014).
This relationship appears to hold for humans as well. In a review of the literature on the link between testosterone and aggression in human participants, Archer (1991) found that adults exhibiting more aggressive behaviors tend to have higher levels of testosterone, although correlations were generally low. Such findings were replicated in a subsequent meta-analysis (Book, Starzyk, & Quinsey, 2001; see Archer, Graham-Kevan, & Davies, 2005, for a re-analysis showing a smaller, yet still significant overall effect size).
Although studies generally find that testosterone plays a role in human aggression, other findings cast doubt on the presence of a simple causal relationship. Haller (2014) summarized six of these contradictory findings. First, despite having lower testosterone levels than males, studies have shown that females also engage in aggressive behavior. Second, aggression does not seem to increase at puberty, although this period is associated with the highest levels of testosterone. Third, individuals exhibiting high or low levels of aggression are not consistently found to have differing levels of serum testosterone (i.e., blood levels). Fourth, when males who suffer from hypogonadism are injected with testosterone to regulate sexual activity, aggression levels generally remain stable. Fifth, castration (surgical or chemical) is not consistently associated with a decrease in aggression. Finally, the replication of findings suggesting a positive relationship between testosterone and aggression has proven to be difficult.
As Haller (2014) and others (e.g., Archer, 2006; Mazur & Booth, 1998; Simpson, 2001) have pointed out, it appears unlikely that T causes human aggression, although such a relationship may be true for other animals. Moreover, as Mazur and Booth (1998) have highlighted, if testosterone does play an important role in aggression, it is unlikely to be through circulating blood levels of the hormone. Rather, it seems more likely that testosterone is linked to aggression through its effect on the perinatal development of neural connections that regulate aggressive behaviors (Haller, 2014; Mazur & Booth, 1998).
Furthermore, Mazur and Booth (1998) argued that testosterone may be related to aggression associated with displays of dominance, but not with aggression per se. As the authors explained, aggression in humans is generally defined as behavior associated with the intent to hurt others. The intent behind dominance, however, is to retain or gain status. In animals, the behavioral expression of dominance is often aggression, which is not necessarily the case for humans. Mazur and Booth suggested that such a distinction may explain why the correlation between testosterone and aggression is usually larger in animals compared with humans; testosterone may be associated with aggression in humans in the context of dominance displays, but not in other situations where aggression is used.
There is some empirical evidence for Mazur and Booth’s hypothesis. For instance, Tremblay et al. (1998) found that although testosterone levels were not associated with aggression in adolescents, they were positively associated with measures of social dominance. Still, there is some evidence for at least a correlational link between testosterone and aggressive criminal behavior. For example, Booth and Osgood (1993) found that men with higher testosterone levels tend to be arrested more often and are more likely to use weapons during a fight. Research has generally found support for a positive relationship between testosterone levels and violence in offender populations (Chichinadze, Domianidze, Matitaishvili, Chichinadze, & Lazarashvili, 2010; Dabbs, Carr, Frady, & Riad, 1995; Dabbs, Frady, Carr, & Besch, 1987; Dabbs, Jurkovic, & Frady, 1991; Kreuz & Rose, 1972). Yet, others have failed to find such relationships. For example, Bain, Langevin, Dickey, and Ben-Aron (1987) compared offenders convicted of homicide, assault, and property crime and found no differences in T levels across violent and non-violent offenders. Moreover, many studies argue that other factors such as social aspects (Booth & Osgood, 1993) or other hormones and neurotransmitters (Chichinadze et al., 2010; Dabbs et al., 1991) are better predictors of aggression or mediated the relationship.
Haller (2014) conceded there is little doubt that T plays a role in aggression but that its role is highly complex and likely depends on a variety of contextual factors and the simultaneous influence of multiple mechanisms associated with aggression. This likely explains why, despite strong evidence of a link between T and aggression under specific conditions, larger studies of human participants generally find only a weak correlation. Archer (2006) said it best as the author argues that the notion of a causal link between T and human aggression “despite its superficial appeal and its incorporation into media accounts of young men’s behavior, is at best an oversimplification” (p. 320).
It could be argued that—to use Archer’s words—the “superficial appeal” of the association between T and sexual aggression is further emphasized by the ubiquitous association between T and sexual behavior. Understanding the biological causes of sexual aggression (if any) has important policy implications. Although systematic reviews of research on T and aggression have been conducted (Archer et al., 2005; Book et al., 2001), no such reviews exist in the context of sexual aggression. Given how central the relationship between T and sexual offending is to current criminal justice policies, such a review is an important gap that we attempt to fill in this article.
Method
Sample
Identification and selection of studies
A systematic attempt was made to identify and retrieve all primary empirically published and unpublished studies following the Campbell Collaboration’s (2015) guide to information retrieval for systematic reviews. The central technique was a comprehensive search of bibliographic databases, from the inception of each database through April 30, 2015 (see the appendix for a list of the 20 databases included in the search). In addition, the reference lists of all studies that met inclusion criteria were reviewed for any additional studies that might be candidates for the current analysis, and the first authors of included studies were contacted to ask whether they had conducted any additional research on this topic. Finally, Google Scholar was searched for variations of the terms “testosterone and sex offender/rapist/child molester,” and the names of all authors of the included studies (not limited to first authors) were searched in combination with the term “testosterone.”
Search terms
The following keywords were combined in a Boolean abstract search in each of the 20 databases:
(“sex* offen*” OR rapist* OR rape* OR molest* OR “sex* assault*” OR “sexual violence” OR “sex* crime*” OR “sexual aggressi”)
AND (testosterone OR androgen* OR “blood serum”)
The initial search results included numerous studies irrelevant to the current study that focused on plant research. As such, included were several terms using the “NOT” operator:
(rapeseed OR “rape seed” OR molestus OR molesta* OR raper OR oilseed)
Selection criteria
Inclusion criteria
When reviewing the database results from the primary search, a study was retrieved for further review if the abstract or title suggested that it might meet the following five inclusion criteria:
The study measured T levels for a group of convicted sexual offenders and a comparison group of non-sexual offenders or non-offenders.
The sex offender group consisted of adult male offenders above the age of 18 years convicted for at least one prior sexual offense.
The study measured T levels for participants in both the sex offender and comparison groups through blood serum/plasma levels or saliva using a consistent, objective method of sample collection and assay. 2
The research report provided sufficient numerical or graphical data to allow for the calculation of an effect size.
The report was written in English or French.
Exclusion criteria
Studies were excluded if
The sample focused on young offenders, as T levels are known to increase at puberty (Jordan et al., 2011b). 3
The sex offender sample overlapped with that reported in another included study. If multiple studies report on the same population of sex offenders, only one was used so as not to double count the effects. 4
After retrieving the articles and applying the inclusion and exclusion criteria, information such as measure of T, data collection procedures, offender age, sample, and publication information was extracted systematically for each study. All coding was checked by a second rater for accuracy.
The Effect Size
After finalizing the list of included studies and coding the data for each research report, individual study findings were converted to a similar metric (an effect size) that is commensurate across studies. This allows for the magnitude of study impacts to be considered, as opposed to dichotomous indications of direction (positive vs. negative), or statistical significance (e.g., statistically significant at p < .05 vs. not statistically significant; see Lipsey & Wilson, 2001). In the current study, effect sizes were calculated as standardized mean differences. Specifically, we used Cohen’s d, calculated as the difference between the mean score of the sex offender group and the mean score of the comparison group over the pooled standard deviation: d = (
Meta-Analysis
To pool the effect sizes across studies, we first assumed that there is variability between study results and implemented a random effects model (see, for example, Borenstein, Hedges, Higgins, & Rothstein, 2009). In this model, each study is weighted by its inverse variance using a formula that takes into account both within-study variability and between-study variability (see, for example, Sweet & Applebaum, 2004). Random effects models typically produce more conservative estimates of the significance of the effect of interest (by using a wider confidence interval [CI]) than do fixed effect models, described next. The random effects model also gives proportionately larger weights to smaller studies and smaller weights to larger studies than does a fixed effects model.
In addition, we implemented a fixed effects model; this approach differs from the above in the manner in which the between-study variability is handled (Egger & Smith, 2001). The fixed effects model assumes that all studies are estimating the same population parameter and that between-study variability is a result of random variation (Deeks, Altman, & Bradburn, 2001). The use of both models was implemented as a robustness check; although between-study variability was evident as shown below, the proportionately higher weights given to smaller studies in the random effects model is not always desirable and may overestimate the probability of Type I error in the face of a small set of studies (e.g., Guolo & Varin, 2017; Schulze, 2007). Schulze (2007) cautions that with a sample size of less than 30 studies, the random effects model is unreliable due to an unstable estimate of between-study variability. Meta-analyses were conducted using the metan module in Stata SE 14.0.
Results
The database search resulted in a total of 451 hits of which 23 were deemed potentially relevant studies based on title and abstract. All studies identified as candidates for inclusion in the analysis were retrieved and screened in full for eligibility. After applying inclusion and exclusion criteria to these 23 studies, seven were selected for the analysis and contributed a total of 11 effect sizes. Table 1 displays descriptive information for these studies.
Key Study Features.
The studies by Aromaki (2002_1 and 2002_2), Giotakos (2004_1 and 2004_2), Rada (1976_1 and 1976_2), and Rada (1983_1 and 1983_2) represent two separate effect sizes presented within the same research report—one focusing on rapists and one focusing on child molesters. The studies by Bain (Bain, Langevin, Dickey, et al., 1988a; Bain, Langevin, Hucker, et al., 1988b) are two separate research reports on separate samples, which happened to be published in the same year.
As shown in Table 1, two studies by Bain and colleagues were included; these studies involve separate samples (Bain, Langevin, Dickey, Hucker, & Wright, 1988a; Bain, Langevin, Hucker, et al., 1988b). The studies by Aromaki, Lindman, and Eriksson (2002_1 and 2002_2); Giotakos, Markianos, Vaidakis, and Christodoulou (2004_1 and 2004_2); Rada, Laws, and Kellner (1976_1 and 1976_2); and Rada, Laws, Kellner, Stivastava, and Peake (1983_1 and 1983_2) report two effect sizes each; one focusing on rapists and the other focusing on child molesters.
Studies were published between 1976 and 2004, in Europe (Finland and Greece), and North America (Canada, California, and Minnesota). Five of the effect sizes presented are from samples of rapists, five are from samples of child molesters, and one effect size is from a study that combined rapists and child molesters into a single offender group (Seim & Dwyer, 1988). Comparison groups were typically non-offenders, either community samples or hospital staff; the Bain et al. (Bain, Langevin, Dickey, et al., 1988a; Bain, Langevin, Hucker, et al., 1988b) studies used comparison groups of non-violent, non-sexual offenders.
Sample sizes of offender groups ranged from 10 to 59 participants, and controls ranged from 11 to 50 participants. The studies were fairly evenly mixed with respect to whether offender samples or comparison group samples had higher mean levels of T. Five of the studies showed higher T levels in the offender group, whereas six studies showed higher levels in the comparison group.
Figure 1 presents a forest plot of each study’s standardized mean difference effect size, 95% CI, and the relative weight the study contributed to the overall pooled effect using a fixed effects model. In the forest plot, the squares and the horizontal lines correspond to the studies’ effect size estimates and 95% CIs, respectively. The diamond at the bottom of the figure displays the pooled estimate of the effect size (−0.082) with its 95% CIs indicated by the left and right edges of the diamond (−0.253, 0.088). It is clear from Figure 1 that the pooled estimate crosses the line of no effect and is not statistically significant (z = 0.95, p = .344). In other words, no significant relationship is shown between T level and sexual offending.

Forest plot of study effect sizes using fixed effects model (k = 11).
The pooled effect from the random effects model was non-significant as well (d = −0.004, 95% CI = [−0.438, 0.430], z = 0.02, p = .986).
Moderators of Effect
Four of the studies in the included set of effect sizes (Aromaki et al., 2002; Giotakos et al., 2004; Rada et al., 1976; Rada et al., 1983) each contributed two effect sizes, which used the same comparison group, but different sex offender groups. As such, dependence was introduced into the analysis. To examine potential differences in T levels among rapist and child molester subgroups and resolve the issue of dependence, the set of effect sizes was split into subgroups of rapists and child molesters. Of the 11 effect sizes, five each focused on rapists and child molesters, whereas one study (Seim & Dwyer, 1988) presented results for a sample of combined rapists and child molesters. The analyses were conducted both with and without the Seim and Dwyer study in each of the subgroups, using both fixed and random effects models.
Rapists
Table 2 displays the overall pooled meta-analytic results for the subgroup of rapists. With the set of six effect sizes focusing on a comparison of rapists with a control group (including the Seim and Dwyer study), no significant pooled effect was found for either the fixed effects (d = 0.023, z = 0.22, p = .827) or the random effects meta-analysis model (d = 0.248, z = 0.81, p = .416). When dropping the Seim and Dwyer study from the set due to its inclusion of child molesters, results for the random effects model were still non-significant (d = 0.438, z = 1.38, p = .168); however, results for the fixed effects model showed a significant pooled effect wherein rapists had higher levels of T than controls (d = 0.279, z = 2.23, p = .025). This latter result is displayed in Figure 2.
Pooled Meta-Analytic Results for Subgroup of Rapists.
Note. CI = confidence interval.
p<.05, **p<.01, ***p<.001.

Forest plot of rapist study effect sizes using fixed effects model (k = 5).
Child molesters
For child molesters, results from the random effects models for both the set of six studies (including Seim and Dwyer) and the set of five studies were non-significant (see Table 3). However, both fixed effects models showed significant pooled effects (six studies: d = −0.420, z = 3.48, p = .001; five studies: d = −0.301, z = 1.97, p = .048), indicating that child molesters had significantly lower levels of T than controls. Results for the six-study subgroup are displayed in Figure 3.
Pooled Meta-Analytic Results for Subgroup of Child Molesters.
Note. CI = confidence interval.
p<.05, **p<.01, ***p<.001.

Forest plot of child molester study effect sizes using fixed effects model (k = 6).
Comparison of Rapists and Child Molesters
The different findings for T levels among rapists versus child molesters were intriguing and were also explored directly using the full set of 10 studies in a fixed effects model (minus the Seim & Dwyer, 1988), study, which combined rapists and child molesters in the same sample) with offender type as a moderator. Significant heterogeneity was found between these two groups when compared directly, with Q = 8.66, p < .01, in other words, the pooled, positive effect size of 0.279 for rapists was significantly different from that of child molesters (d = −0.301).
Robustness of Findings
A concern with the set of included studies is that two of the effect sizes used comparison groups of non-violent, non-sexual offenders (Bain, Langevin, Dickey, et al., 1988a; Bain, Langevin, Hucker, et al., 1988b), whereas the remaining nine effects were computed based on non-offenders (hospital staff or a community sample). To address this issue, and acknowledging the low statistical power, we used comparison group type as a moderator in a fixed effects model. Results indicate that the non-offender group was not significantly different from the sex offender group (d = 0.049, z = 0.52, p = .603), whereas the sex offender group did have significantly lower levels of T than the offender comparison group (consisting of non-violent, non-sex offenders; d = −0.898, z = 3.85, p < .001).
Discussion
Based on a set of seven studies and 11 effect sizes with a total sample size of 325 sexual offenders and 196 comparison participants, no significant relationship between T and sexual offending was found using either a fixed effects or random effects model. All 11 effect sizes focused on adult male known sex offenders; no juveniles were included, nor were community samples of self-reported sexually aggressive men. All studies obtained physical samples in the morning, important because T levels are known to peak in the morning and fluctuate throughout the day (Rose, Kreuz, Holaday, Sulak, & Johnson, 1972). Although nine of the 11 effect sizes were based on T levels measured in blood samples, a high correlation between blood and saliva measures has been reported (Khan-Dawood, Choe, & Dawood, 1984; Rilling, Worthman, Campbell, Stallings, & Mbizva, 1996); thus, the inclusion of the saliva-based effect sizes from Aromaki et al. (2002) is justified. This study provides no evidence that convicted sex offenders as a whole differ in their baseline levels of T from non-sex offenders.
Subgroup analyses suggest there may be differences in T levels between rapists and child molesters that warrant further investigation. In one (out of four) model tested, rapists were found to have a significantly higher level of T than the comparison group. The model showing significantly different T levels used a fixed effects weighting algorithm and a set of five studies. In the remaining three models, no significant differences were found. Conversely, in two out of the four models tested, child molesters were found to have significantly lower levels of T than controls. The two models showing lower T levels both used a fixed effects approach; the random effects models indicated no significant pooled difference between the groups. Given the tenuous nature of these findings, which differ depending on the manner in which between-study variability is modeled, and are highly dependent on the inclusion of a single study in the case of results for rapists (Seim & Dwyer, 1988, as well as Rada et al., 1983_1), we caution against interpretation of these findings other than a call for further research assessing baseline levels of T in different sex offender types. Furthermore, close inspection of the means and ranges of age in the rapist and child molester groups suggests that the child molesters were typically older than were the rapists. In addition, in three of the studies of child molesters, the mean age of the comparison group was more than 10 years younger than the mean age of the sex offender group. This difference in age between offender types and comparison groups may well be a contributing factor to differences in T levels found between the rapist and child molester groups.
This study is not without limitations; first, and most importantly, a very small number of studies met the inclusion criteria for the meta-analysis. This result is not due to an overly restrictive set of inclusion criteria with respect to study methods or samples; instead, it is simply a reflection of the surprisingly limited number of studies, which have compared T levels in sex offenders and non-sex-offenders. We say “surprising” given the policy and clinical implications of this distinction. Regardless of the reason for the dearth of studies examining the association of interest, the small number of effects included in the meta-analysis limits our ability to examine details about the relationship between T and sexual offending.
Second, any methodological or conceptual weaknesses in the primary studies themselves may bias the results of the meta-analysis. Such concerns would include possible inaccurate measurement of T based on single blood sample draws in some studies (as opposed to multiple draws as performed in Rada et al., 1983), different types of assay procedures, and small sample sizes (e.g., 10 offenders in the groups studied by Aromaki and colleagues, 2002).
Third, the pooling of studies using different types of comparison groups may be of concern; these ranged from offenders incarcerated for non-sexual crimes, to hospital employees, to community members. Convicted, incarcerated sex offenders may differ in important ways from non-sex offenders and non-offenders; indeed, moderator analysis based on the set of two studies by Bain and colleagues suggests that non-sexual offenders have lower levels of T than sexual offenders. The choice of comparison group is an important consideration. A final methodological concern is that in all of the studies, the focal group of sex offenders had been convicted, and in all but one of the studies (Seim & Dwyer, 1988), the sex offenders were incarcerated. Findings thus cannot be generalized to all rapists or child molesters; those not convicted and incarcerated may differ on some relevant demographic, social, and biological variables.
Importantly, the variability in the findings across the set of study effect sizes was high, as evidenced by significant Q statistics for all models and a fairly even split across the set of studies in terms of direction of effect. More than anything, this finding suggests a need for further research to examine this variability by using additional moderator variables (for example, violent vs. non-violent offenders, or repeat offenders). With this small set of studies and disparate study effects, conclusive statements regarding the link between T and sex offending are difficult to make. Nonetheless, given results from the current analysis, which summarizes the available literature on this topic, we can conclude that at this time, a biological explanation of sex offending in terms of offenders having higher levels of T is not supported.
Future Directions
Our study demonstrates that the available evidence does not support the conventional wisdom that sexual offenders have higher levels of T compared with non-sexual offenders. Furthermore, our analysis suggests that different types of sexual offenders may vary in terms of their T levels. In fact, theories on the T–aggression link suggest that different types of aggression, for instance, aggression related to displays of dominance (Mazur & Booth, 1998) or related to intra-species competition (Archer, 2006), may have different relationships with T. Perhaps such frameworks also apply to variations between sexual offender types. These theories may explain, for example, why our analysis revealed T levels may be different for rapists versus child molesters when compared with controls. Again, more studies are needed to further test these findings.
Considering the extent and quality of available research, it would be premature to state that no relationship exists between T and sexual aggression. As Haller (2014) suggests regarding research on aggression and T, the complexity of biological mechanisms regulating human behavior is such that a relationship may exist but is obscured by conditions that moderate the influence of T on sexual behavior. Moreover, it may be that other hormones, related to the production of T, could better explain sexual offending. Although the authors did not compare sexual and non-sexual offenders in their study, Kingston et al. (2012) found that although T levels were not associated with sexual recidivism, luteinizing hormone levels, responsible for the activation of Leydig cells that produce T, were significant predictors of sexual and violent recidivism. Future studies should examine whether sexual offenders have higher precursor hormones to the production of T than non-sexual offenders.
Given the relatively widespread use of chemical castration in the United States (Mancini, Barnes, & Mears, 2011), it is surprising that more studies examining the effectiveness of chemical castration on sexual recidivism have not been conducted, and that many of these studies rely on relatively small samples. Such studies would have important policy implications but they could potentially allow researchers to better understand the relationship between T levels and sexual offending. The widespread use of castration laws may give researchers the ability to collect data from larger samples and may allow for them to compare T levels pre-castration between different types of sexual offenders. Furthermore, the mandatory nature of such laws may allow researchers to isolate the effects of T reduction from propensities an offender may have to change or suppress their deviant sexual behaviors; these propensity may (or may not) be implied by their willingness to volunteer for castration. As we have discussed in the introduction of this article, many prior studies on the topic have had to rely on samples of volunteers.
Policy Implications
Our findings have important implications for chemical castration laws. Chemical castration may be effective in reducing sexual recidivism, although scientific evidence of effectiveness is mixed. The effectiveness of such treatment has generally been shown in studies using non-equivalent comparison groups wherein participants submit voluntarily to the treatment (Berlin, 1997; Berlin & Meinecke, 1981; Prentky, 1997). Many scholars have warned that chemical castration, although effective at reducing sex drive and occurrence of fantasies during treatment, does not suppress sexual function completely and does not ensure that deviant fantasies will disappear after cessation of treatment; chemical castration is effective at reducing sexual drive only under constant and continued use of the treatment (Berlin & Meinecke, 1981; Grubin & Beech, 2010; Lösel & Schmucker, 2005; Meyer et al., 1992; Prentky, 1997; Rice & Harris, 2011; Rösler & Witztum, 1998). This is especially crucial to consider in the context of limited probation sentences. Despite these warnings, only two out of the nine states mandating or offering castration require that offenders receive any type of counseling (Scott & Holmberg, 2003).
Our contribution to the debate is that our study finds no evidence to suggest there is anything chemically wrong with sexual offenders. If chemical castration is indeed effective, it is not because it is treating an abnormal medical condition, but rather because it is inhibiting sexual functioning in the same way it would for most humans. Antiandrogen treatments were primarily proposed to deal with diagnosed paraphiliac patients. Although chemical castration may be an important treatment avenue for such populations, not all sex offenders are paraphiliacs and not all paraphiliacs are sex offenders (Krueger & Kaplan, 2001). Indiscriminate application of chemical castration, as is the case in many states where medical or psychiatric evaluations are not a requirement to prescribe such treatments (Scott & Holmberg, 2003), is unlikely to be medically defensible. Sexual offenders are not necessarily driven by uncontrollable urges (e.g., Lussier, Proulx, & LeBlanc, 2005); situational characteristics (Beauregard & Leclerc, 2007) and prior antisocial tendencies (Lussier, Leclerc, Cale, & Proulx, 2007; McCuish, Lussier, & Corrado, 2015) appear to be as important, if not more important, than sexual drive or deviant fantasies in decisions to offend sexually.
The appropriateness of mandatory chemical castration, therefore, falls in the realm of ethical and legal questions than of medical questions. Although it is beyond the scope of this article and the expertise of its authors to elaborate on such legal issues, we argue that our findings should be taken into consideration alongside findings related to the potential harm caused by the side effects of antiandrogenic drugs (e.g., Alibhai, Gogov, & Allibhai, 2006; Grubin & Beech, 2010; McGinty et al., 2014). We believe that such findings should be kept in mind by lawmakers given their importance in analyzing, at least in the United States, the constitutionality of mandatory castration laws (Beckman, 1997; Fitzgerald, 1990; Lombardo, 1997; Mancini & Mears, 2013; Scott & Holmberg, 2003).
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
