Abstract
Adolescents who have sexually offended have unique treatment needs. For mental health professionals to adequately address these unique needs, further research is necessary. To that end, we explored the assessment of sexual interest (which may play an integral role in understanding potential for sexual reoffending) in a sample of 103 male adolescents who have sexually offended. We compared results from a physiological assessment (MONARCH 21 penile plethysmography [PPG]) and an actuarial assessment (Screening Scale for Pedophilic Interest [SSPI]), plus data from an unobstrusive assessment (Affinity, a viewing time measure) in a smaller subsample of 16 male adolescents. One finding that has particular relevance for clinical assessment is that the SSPI may have limited utility with adolescents. We also found evidence for some overlap between data from PPG and viewing time assessments, although whether or not PPG data are ipsatized may affect relationships with other assessment modalities.
Introduction
Numerous researchers have observed that adolescents who have sexually offended are a widely heterogeneous group (e.g., DiCataldo, 2009; Letourneau & Miner, 2005), distinct from both adolescents who have generally offended (i.e., adolescents with histories of nonsexual criminal offending; Seto & Lalumière, 2010) and adults who have sexually offended (Letourneau & Miner, 2005). At the group level, adolescents who have sexually offended are not at particularly high risk of sexual recidivism. In a meta-analysis of 63 studies of adolescent sexual offending with an average follow-up of nearly 5 years, only 7.08% of the total sample of 11,219 adolescents who had sexually offended were found to have reoffended sexually (Caldwell, 2010). The serious impact that sexual offending can have on victims, however, warrants research to increase our understanding of what leads to both initial sexual offending and less frequent sexual reoffending in this population.
The presence of atypical sexual interest may be particularly important for understanding adolescents who have sexually offended and the smaller subset of such adolescents who may also be at high risk of sexual recidivism. Across studies, one of the most consistent findings about adolescents who have sexually offended relates to atypical patterns of sexual interest (e.g., sexual interest in young children or arousal to the use of threats and/or force in sex). Seto and Lalumière (2010) observed in their meta-analysis of 50 studies that the presence of atypical sexual interests was the single largest group difference (d = .67) between adolescents who have sexually offended and their peers who have generally offended. Atypical sexual interests may play a role in explaining both initial index sex offenses and sexual reoffending, perhaps specifically deviant sexual interests (i.e., interests in sexual activity characterized by harm to others or a lack of consent). Hanson and Morton-Bourgon (2005), in a meta-analysis of 82 studies, found that deviant sexual interest was the single strongest predictor (d = .30) of sexual recidivism for both adolescents and adults who have sexually offended, and McCann and Lussier (2008) found a similar relationship in a sample of eight studies of recidivism among adolescents who have sexually offended.
Andrews and Bonta (2010) have noted that, according to their empirically supported “need” principle of treatment for sexual offending, factors related to offending (and/or reoffending) need to be addressed in treatment. With adolescents who have sexually offended, treatment needs might certainly include addressing atypical or deviant sexual interests. Assessing sexual interests in this population, however, is complicated by the fact that adolescents in particular may only have a single known sexual offense from which to draw data for such assessments, thereby limiting the accuracy of offense-based assessments. Therefore, reliable and valid tools for assessing adolescent sexual interest that do not rely solely on offense characteristics have the potential to be valuable in both risk assessment and treatment.
Historically, for both adolescents and adults who have sexually offended, the most common method of assessing sexual interest has been penile plethysmography (PPG). The male PPG assessment subject is required to fit a measurement device (devices vary from rubber strain gauges to volumetric tubes) around his penis, and his penile blood flow (i.e., sexual arousal) is measured in response to a variety of sexually explicit stimuli. No assessment methodology is perfect, of course, and problems associated with PPG include high rates of individuals who exhibit no clinically significant arousal to any stimuli (“flat-line” profiles can be due to a variety of factors, including anxiety and intentional suppression of sexual arousal; Kalmus & Beech, 2005; Mahoney & Strassberg, 1991). The percentage of adolescents with clinically uninterpretable responses to PPG assessments has ranged from a low of roughly 1% (Clift, Rajlic, & Gretton, 2009) to above 30% (Becker, Kaplan, & Tenke, 1992). Nonetheless, out of 373 adolescent treatment programs throughout the United States in one recent survey, just over 9% used PPG assessments for adolescents who have sexually offended (McGrath, Cumming, Burchard, Zeoli, & Ellerby, 2010).
In addition to the frequency of clinically uninterpretable responses (i.e., nonresponsiveness), there are two concerns with PPG assessments specific to adolescents who have sexually offended. The first concern is ethical. Regardless of age, PPG is a clearly invasive assessment procedure. Some authors (e.g., DiCataldo, 2009; Hunter & Becker, 1994) have pointed out that PPG administrations also typically test for atypical sexual interests by presenting not just sexual stimuli, but sexual stimuli including depictions of child sexual abuse, sexual coercion, and rape. For the significant subgroup of adolescents who have sexually offended and have also experienced sexual, physical, and/or emotional maltreatment (Seto & Lalumière, 2010), PPG assessment might be experienced as revictimizing. For adolescents who have sexually offended but have no history of maltreatment, exposure to the atypical sexual stimuli that characterizes PPG assessments may nonetheless be harmful in ways not yet identified. If we also consider the inherently physiologically invasive nature of the assessment, using PPG to assess the sexual interests of adolescents who have sexually offended seems less than ideal.
The second concern relates to the validity of what PPG assessments are capturing in adolescents who have sexually offended. The utility of PPG for assessing adults who have sexually offended is relatively clear; for example, pedophilic sexual interests can be reliably detected even in nonadmitting adults (Blanchard, Klassen, Dickey, Kuban, & Blak, 2001). The few research teams conducting empirical studies of PPG use with adolescents who have sexually offended, however, have not yet found clear evidence of the external validity of these PPG results (Becker, Hunter, Goodwin, Kaplan, & Martinez, 1992; Becker, Kaplan, & Tenke, 1992; Clift et al., 2009; Gretton, McBride, Hare, O’Shaughnessy, & Kumka, 2001; Hunter, Goodwin, & Becker, 1994; Kaemingk, Koselka, Becker, & Kaplan, 1995; Rice, Harris, Lang, & Chaplin, 2012; Seto, Lalumière, & Blanchard, 2000).
Most recently, Rice and colleagues (2012) found that both adolescents who have sexually offended and offense-matched adults displayed greater preferences for child sexual stimuli (i.e., had higher child preference indices) than individuals in a nonoffending control group. They also observed, however, that adolescents’ preferences for such stimuli were significantly weaker than those of the adults. There is also some troubling evidence directly speaking to both the ethical appropriateness and interpretability of PPG assessments for adolescents who have been sexually victimized themselves. In one sample, adolescents who have sexually offended and had also been sexually abused in childhood were more likely than other adolescents who have sexually offended to show indiscriminate (i.e., consistently high) arousal to all stimuli (Becker, Kaplan, & Tenke, 1992); there have been similar patterns in more recent samples as well (Murphy, DiLillo, Haynes, & Steere, 2001). Finally, Clift and colleagues (2009) found that, as with adults (Mahoney & Strassberg, 1991), when adolescents who have sexually offended are specifically instructed to suppress their arousal during an assessment, they are effective enough at doing so that the assessment results lose discriminative validity. As a whole, we interpret these concerns as being indicative of the fact that PPG assessment and interpretation with adolescents who have sexually offended is currently a complicated endeavor, and the validity of such assessment is not firmly established with this population.
An alternative to PPG use in the assessment of these adolescents is viewing time assessments. The principle behind the methodology is simple, although we cite research on adults below given the lack of viewing time research with adolescents. Basically, people look longer at images of people to whom they are sexually attracted (Israel & Strassberg, 2009; Quinsey, Ketsetzis, Earls, & Karamanoukian, 1996; Rullo, Strassberg, & Israel, 2010). Among adults who have sexually offended, viewing time methodologies are effective assessment tools (Harris, Rice, Quinsey, & Chaplin, 1996; Laws & Gress, 2004), and some have found evidence that viewing time and PPG assessments may yield information of comparable quality and utility (Tong, 2007). In the most recent study of the issue, Mokros and colleagues (2013) found that adults who have sexually offended differed from both adults who have generally offended and adults who have never offended in terms of the amount of time spent looking at child stimuli (in addition to how attractive they rated such stimuli), although classification accuracy was only mediocre. One potential advantage of viewing time assessments is that they may be less susceptible to attempts at willful misrepresentation; it is obvious to men and male teens undergoing PPG assessments that their penile responses are being assessed, whereas it may never occur to those taking a viewing time assessment that the test is measuring anything other than their subjective ratings of the stimuli.
Currently, adolescents who have sexually offended frequently undergo viewing time assessments in treatment (over 35% of 373 U.S. treatment programs indicated using such practices; McGrath et al., 2010). Empirical research on viewing time assessment of these adolescents has, however, lagged behind its clinical popularity. Two research teams have focused on the Abel Assessment for Sexual Interest (Abel, 1995), with mixed results. An independent team (i.e., not including Abel) found that 2-week test-retest reliability was unacceptably low in a sample of adolescents who have sexually offended (Smith & Fischer, 1999), whereas a team led by Abel found that adolescents who have sexually offended who looked longer at photographs of children were also more likely to have committed greater numbers of child sex offenses than other adolescents who have sexually offended (Abel et al., 2004). Only one team of researchers has examined another commonly used viewing time protocol, the Affinity (Version 1.0); they found preliminary evidence for correspondence between viewing time data and offense characteristics (Worling, 2006). No one to date, however, has directly compared PPG and viewing time responses in these adolescents.
Viewing time assessments are both quicker and less expensive than PPG assessments. If viewing time assessments yield data of potential utility for treatment planning, clinicians in the majority (over 60%) of adolescent treatment programs using neither PPG nor viewing time assessments may be able to add an easy and clinically useful assessment tool to their treatment approach. In addition, for clinicians in the roughly 9% of adolescent treatment programs who still use PPG assessments, switching to viewing time assessments would solve an ethical conundrum, by replacing an invasive and possibly iatrogenic assessment tool with a methodology that is unobtrusive and therefore less likely to be perceived as unreasonably invasive by the public as well as adolescents who have sexually offended and their caregivers.
To provide further empirical evidence regarding data quality from assessments of sexual interest in adolescents who have sexually offended, we explored correlations between assessment instruments in two related samples. First, we compared PPG assessments with adolescents’ scores on the Screening Scale for Pedophilic Interest (SSPI; Seto & Lalumière, 2001) in a sample of 103 male adolescents. The SSPI is a four-item measure based on an individual’s offense characteristics that has some empirical support for its use with adolescents (Seto, Murphy, Page, & Ennis, 2003), and we included it in our analyses because it is a brief and simple screening tool that could potentially be used when more extensive assessment of sexual interest is not feasible. Second, in a smaller subsample of 16 male adolescents drawn from the original sample, we compared PPG, SSPI, and viewing time assessments.
Method
Participants and Procedure
Our larger sample (N = 103) was a national sample of adolescents who have sexually offended and who also underwent an assessment using a MONARCH 21™ PPG (Behavioral Technology, Inc., UT, USA), described in further detail below. The MONARCH 21 PPG is used in a variety of locations throughout the United States, and results from assessments are sent back to the manufacturer (who also owns the private practice where our smaller subsample was assessed). During the assessments, participants completed a brief self-report questionnaire in addition to their MONARCH 21 PPG–based assessment.
Participants were between 15 and 20 years old at the time of their psychosexual evaluation (M = 16.83, SD = 1.21). Participants reported having between 1 and 100 victims (M = 6.54, SD = 12.58, median = 2.00). A majority of the participants described themselves as heterosexual (n = 87, 84.5%), with 3 (2.9%) identifying as gay, 9 (8.7%) identifying as bisexual, and 4 (3.9%) stating they were unsure about their sexual orientation. Information on their racial/ethnic backgrounds was not available.
Our smaller subsample (n = 16) was composed of adolescents who had been assessed at a Salt Lake City–based private practice in assessing and treating sexual offending in adults, where clinicians conduct occasional assessments of adolescents. Participants received court-mandated psychosexual evaluations between 2004 and 2012. At the time of evaluation, all participants had already been enrolled in local adolescent treatment programs; all evaluations were mandated by the Utah Division of Juvenile Justice Services as a result of the marked severity of the participants’ sexual offense history and/or poor progress in treatment. The purpose of the evaluations was to assess participants’ level of risk for both sexual and general recidivism. In addition to completing a brief self-report questionnaire and a MONARCH 21 PPG–based assessment, these participants also completed an Affinity (either Version 2.0 or 2.5) viewing time assessment.
The 16 participants in the subsample were between 15 and 20 years old at the time of their psychosexual evaluation (M = 17.00, SD = 1.55). Participants reported having between 1 and 100 victims (M = 13.13, SD = 1.41, median = 5.50). A majority of the participants described themselves as heterosexual (n = 10, 62.5%), with 2 (12.5%) identifying as gay, 2 (12.5%) identifying as bisexual, and 2 (12.5%) stating that they were unsure about their sexual orientation.
We compared demographic characteristics between the larger sample and our smaller subsample. The 16 adolescents included in the subsample were not significantly different from the original sample in terms of age, as identified by a relevant t test (p = .532). The subsample did have a greater percentage of nonheterosexual adolescents than adolescents included only in the larger sample (37.5% vs. 11.5%, respectively), χ2(1) = 6.97, p = .017. Given the archival and anonymous nature of the data from both the larger sample and smaller subsample, the Institutional Review Board at the University of Utah deemed the project exempt from human subject research reviews.
Measures
Self-report and the Screening Scale for Pedophilic Interest
Prior to PPG assessment, each participant completed a one-page questionnaire of demographic and offense-related questions. Self-report data were coded to score each participant on the SSPI (Seto & Lalumière, 2001), which assesses sexual offense characteristics that have been empirically linked with greater risk of sexual recidivism. The SSPI yields scores from 0 to 5 (1 point each for having multiple victims under the age of 18, any victim under the age of 12, or any extrafamilial victim; 2 points for having a male victim). SSPI scores were positively correlated with PPG-based indices of sexual interest in prepubescent children in three samples of adolescents who had sexually offended (with rs = .46, .24, and .23, for Ns = 45, 141, and 67, respectively; Seto et al., 2003).
Utilizing self-reports in forensic contexts always includes the risk of deception; however, three factors about the present study likely mitigate that risk. First, there is evidence that adolescents’ self-reports do not typically diverge wildly from other assessment data; for example, adolescents in one sample self-reported general age preferences (e.g., child vs. teen/adult) that corresponded well with the age preferences evident from their viewing time assessments (Worling, 2006). This evidence is applicable to our entire sample. Regarding the subsample in particular, those 16 adolescents knew they were being given a multimethod assessment and therefore may have been inclined toward honesty if they considered that their self-report might be compared with other measures. Second, complete psychosexual evaluations for this subsample included a clinician review of related police reports. This means that participants would have had little incentive for distorting offense characteristics that could be easily cross-checked with third-party information, although it is of course possible that participants failed to reveal additional victims unknown to law enforcement. Third, the type of treatment in which subsample participants were engaged is typified by exhaustive discussions of participants’ sexual histories, including their sexual offense histories, which means both that treating therapists are another source of third-party information, and that participants would have been practiced in disclosing and discussing the details of their offenses.
Adolescents in the subsample had significantly higher SSPI scores (M = 3.50, SD = 1.51) than did adolescents included only in the full sample (M = 2.23, SD = 1.74), t(23) = 3.02, p = .006. 1 This difference in SSPI scores appears to result from the fact that adolescents in the subsample were more likely to have had multiple victims (75.0% vs. 47.1%), χ2(1) = 4.20, p = .040, and to have had male victims (81.2% vs. 42.5%), χ2(1) = 8.11, p = .004. Adolescents in the subsample also appeared to have relatively high SSPI scores compared with the three other known adolescent samples assessed in one study (with Ms = 3.7, 3.0, and 2.7, for Ns = 45, 141, and 67, respectively; Seto et al., 2003).
MONARCH 21TM PPG
All participants underwent PPG assessment. The equipment for the 90-min MONARCH 21 PPG assessment (Behavioral Technology, Inc.) consists primarily of a strain gauge that measures penile circumference change. In the PPG assessment, all participants first viewed a standard 3-min baseline segment, during which physiological arousal was tracked in the absence of any sexual stimuli. After the baseline segment, all participants viewed 12 standard segments while physiological arousal was tracked via the penile strain gauge: four toddler (ages 3-5) segments, four preteen child (ages 6-11) segments, and four teen (ages 16-18) segments. Each set of segments consisted of two male and two female segments (within genders, one segment described noncoercive sexual contact, whereas the other described coercive sexual contact; we excluded coercive segments from our analyses to avoid conflating possible sources of variance). Each 3-min segment started with a 90-s audio narrative describing sexual contact, followed by four photographs of partially clothed individuals of the corresponding age range and gender. Segment order was randomized during each participant’s assessment.
Affinity (Versions 2.0 and 2.5)
Only the 16 subsample participants underwent Affinity assessment. The 20-min Affinity computerized assessment (Pacific Psychological Behavioural Assessment, BC, Canada) measures both subjective and objective sexual interest. 2 Participants who completed the Affinity 2.0 assessment viewed 56 photographs (28 female and 28 male) of clothed individuals in four age categories: small children (ages 0-5), prejuveniles (ages 6-10), juveniles (ages 11-15), and adults (ages 18 and above). As participants viewed each photograph, they were asked to rate the photographed person’s attractiveness on a 15-point gradient with anchors of “very unattractive,” “neutral,” and “very attractive” (yielding subjective self-report scores ranging from −7 to 7 for each photograph), while the time spent viewing each photograph was recorded. Participants who completed the Affinity 2.5 assessment viewed 80 photographs (40 female and 40 male) of individuals in the same four age categories used in the Affinity 2.0, using the same rating system. For both versions of the Affinity, the photograph order was randomized during each participant’s assessment. Furthermore, viewing time was measured in two distinct time periods: on-task latency (OTL; time it took a participant to make an attractiveness rating, and which research suggests has more external validity with other assessment measures; Glasgow, 2009) and post-task latency (PTL; time a participant continued looking at a photograph after having made the attractiveness rating).
Data Analysis
Prior to hypothesis testing, we first addressed whether there were adolescents in either the full sample or subsample whose PPG-assessed responsiveness fell below the acceptable threshold for interpretation. We evaluated responsiveness using two thresholds for identifying adolescents as PPG nonresponders. The first threshold was if an adolescent’s three segments of highest arousal yielded an average circumference change of less than .25 cm; below this threshold, PPG-based diagnostic consistency for pedophilia is no better than chance (Lykins et al., 2010). The second, more conservative threshold was if an adolescent’s single segment of highest arousal yielded a circumference change of less than .47 cm (a value that approximates roughly 20% of full erection; Kuban, Barbaree, & Blanchard, 1999). Both thresholds are based on samples of adult men; no existing research has been used to identify reliable and valid thresholds specifically for adolescents. The number of nonresponders in both the full sample and smaller subsample will be addressed below.
A final note about the data generated by both PPG and viewing time assessments is necessary here. The research standard for both types of assessment has been to make interindividual comparisons of intraindividually ipsatized, as opposed to raw, data (Glasgow, 2009; Kalmus & Beech, 2005). This standard is by no means non-controversial; although ipsatizing the raw data facilitates interindividual comparisons, the resulting ipsatized scores actually represent relative preferences between the different categories presented in an assessment, as opposed to an individual’s absolute preferences (see Fischer & Smith, 1999, for a thorough review of this issue, particularly as it relates to clinical interpretation of individual assessment profiles). Given the exploratory nature of this first attempt to compare PPG and viewing time data in the subsample, we conducted all analyses utilizing both raw and individually ipsatized scores from the PPG and viewing time assessments.
Results
Full Sample Analyses
We first addressed the possibility that a portion of adolescents in the full sample would be classified as nonresponsive to the PPG during the preliminary data analyses. Using the less stringent (.25) threshold described above, we found that 5 of 103 participants were nonresponsive to the PPG (4.9%). Using the more stringent (.47) threshold described above, we found that 20 out of 103 participants were nonresponsive to the PPG (19.4%). The following analyses are from the subsample of 83 adolescents with adequate responsivity to the PPG; however, results from the full sample of 103 were identical in terms of statistical significance or lack thereof.
We then computed four PPG-based pedophilic indices based on six stimuli categories: two involving male and female small children (ages 3-5), two involving male and female preteen children (ages 6-11), and two involving male and female teens or adults (aged 16-18). For PPG raw scores (the change in penile circumference, measured in cm2, for each category), we calculated one pedophilic index as the mean raw score across the four child categories minus the mean raw score across the two teen categories, and another pedophilic index as the highest raw score across the four child categories minus the highest raw score across the two teen categories (for both indices, higher numbers indicate greater interest in younger stimuli). Numerous other studies have utilized the second approach to calculating pedophilic indices; we included the first given the exploratory nature of this research. We repeated these calculations with PPG z scores (ipsatized PPG scores for each category based on adolescents’ mean response and standard deviation in responses across categories).
We next explored the correlation between these pedophilic indices and SSPI scores in our full sample of 83 adolescents. Results are displayed in Table 1. The four PPG-based pedophilic indices were, predictably, strongly positively correlated with one another (rs ranging from .82 to .92, all ps < .001). Contrary to our original hypothesis that PPG data and SSPI scores would be positively correlated, however, both the Raw Pedophilic Index based on means and the Ipsatized Pedophilic Index based on means were weakly, but significantly, negatively correlated with SSPI scores (with rs = −.26 and −.25, respectively; both ps < .05). This finding is also in the opposite direction of previously published research on the relationship between PPG and SSPI data in adolescents (Seto et al., 2003).
Correlations Between PPG-Based Pedophilic Indices and the SSPI (Larger Sample; N = 83).
Note. PPG = penile plethysmography; SSPI = Screening Scale for Pedophilic Interest.
p < .05. **p < .001. All other p values >.05.
The first author had visually inspected bar graphs of the full PPG assessment (including additional segments depicting incest, rape, and internet-based sexual activity) and observed that several teens had indiscriminately high responding across categories, as has been noted elsewhere in the literature (Becker, Kaplan, & Tenke, 1992; Murphy et al., 2001). We thus explored whether the negative correlation between PPG data and SSPI scores might be explained by patterns of indiscriminately high responding. We divided the 83 adolescents into those who scored 0 or 1 on the SSPI (n = 25, 24.3% of the total sample) and those who scored 4 or 5 on the SSPI (n = 43, 41.7%) and then conducted a t test comparing overall responsivity 3 between those with low versus high SSPI scores. Adolescents who scored a 0 or 1 on the SSPI exhibited lower overall responsivity (M = .61, SD = .52) to PPG assessment than did adolescents who scored a 4 or 5 on the SSPI (M = 1.09, SD = .84), t(66) = −2.92, p = .005. The negative correlation between PPG-based pedophilic indices and SSPI scores seems less likely to be the result of adolescents who are disinterested in children sexually, and more an issue of greater overall arousability on the part of adolescents who have more extensive and/or serious histories of sexual offending.
Given the surprising findings regarding SSPI scores, we also briefly looked at possible correlations between an adolescent’s self-reported number of victims and our pedophilic indices. Number of victims was significantly positively correlated with SSPI scores (r = .45, p < .001) but was also significantly negatively correlated with all four pedophilic indices (rs ranging from −.25 to −.49, all ps < .013).
Subsample Analyses
Only one participant in our subsample evidenced arousal insufficient for interpretation, regardless of which threshold for nonresponsiveness we used. This percentage of adolescents classified as nonresponsive to the PPG was thus nearly three times smaller in the subsample as opposed to the full sample, although this difference was not statistically significant (21.8% of participants only used in the full sample vs. 6.3% in the subsample, respectively, when utilizing the more stringent .47 cutoff), χ2(1) = 2.10, p = .187. We conducted subsequent subsample analyses both including and excluding the nonresponding adolescents’ data. Only results from analyses including all 16 participants are reported below, to maintain as large an n as possible for this subsample, given that results did not differ between the two approaches.
We next examined the relationship between viewing time variables (see Table 2 for results). We calculated Spearman’s rank order correlation coefficients between OTL, PTL, and the subjective ratings adolescents made regarding how attractive they found the person in each photograph. OTL and PTL were both significantly positively correlated with subjective ratings; these findings match the earlier research (Glasgow, 2009). For ease of comparison between studies, we used OTL in all subsequent analyses as the objective viewing time variable.
Correlations Between Three Viewing Time Variables.
Note. OTL = on-task latency; PTL = post-task latency.
p < .001.
We then tested the correlation between PPG and viewing time data (both OTL and subjective ratings). The MONARCH 21 and Affinity protocols do not include identical stimulus categories, so we limited our comparisons to the most readily comparable categories the two assessment tools share. Six categories are roughly overlapping: two involving male and female small children (ages 3-5 in the PPG assessments, and ages 0-5 in viewing time assessments), two involving male and female preteen children (ages 6-11 in PPG assessments, and ages 6-10 in viewing time assessments), and two involving male and female teens or adults (ages 16-18 in PPG assessments, and ages 18+ in viewing time assessments). For teen stimuli, another possible comparison was between the PPG teen categories (ages 16-18) and the viewing time young teen categories (ages 11-15). A Fischer r to z transformation indicated that the correlations between the teen PPG categories and both the adult and young teen viewing time categories did not differ significantly. As a result, we compared teen categories (ages 16-18) of the PPG assessment with adult categories (ages 18+) of the viewing time assessment, given that these categories might overlap to some small degree, whereas comparing 11-year-olds with 18-year-olds would not.
We explored both raw and ipsatized versions of the PPG and OTL viewing time data (given that all adolescents made subjective ratings along the same −7 to 7 scale, ipsatized subjective ratings scores were unnecessary). For viewing time data, an adolescent’s raw scores were the means (of both OTL and subjective ratings) of all photographs within each category. Ipsatized viewing time data were z scores calculated for each category based on their mean and standard deviation in OTL across the categories. Table 3 summarizes these results.
Correlations Between PPG and Viewing Time Variables Across Six Categories.
Note. PPG = penile plethysmography; OTL = on-task latency.
p < .001. All other p values >.05.
We also considered the possibility that the relatively low correlation between raw PPG and OTL scores, and the lack of correlation between PPG and OTL z scores, might be affected by the varying degree of overlap in ages between PPG and viewing time categories. We re-ran the overall correlations looking just at the preteen and small child categories (the four categories with a high degree of age overlap across the two measures) and found no significant differences in the correlations between variables. We also re-ran overall correlations separating categories by gender; looking only at either the correlations between male or female categories rendered the correlation between raw PPG and OTL data no longer statistically significant.
We also assessed whether correlations between the PPG and the subjective (ratings) and objective (OTL) viewing time measures varied notably across categories (we utilized raw scores here given the lack of correlation between the z scores across measures). For comparison’s sake, we examined PPG responses to teens (again, ages 16-18) in relation to viewing time responses to both young teens (again, ages 11-15) and adults (again, ages 18+). Table 4 summarizes these results. Only one of eight comparisons between raw PPG and viewing time OTL data yielded a statistically significant positive correlations (for the female preteen category), while another approached significance (for the male small child category, p = .061); none of the eight comparisons between raw PPG and viewing time subjective ratings were statistically significantly correlated. Within the viewing time data, five of eight comparisons between raw OTL and subjective ratings yielded statistically significant positive correlations. These results bore some resemblance to the pattern noted by Worling (2006) in another study of viewing time assessments of predominantly heterosexually identified male adolescents; specifically, the highest correlations between OTL and subjective ratings were in categories of male stimuli.
Correlations Between PPG Change (cm2) Raw Scores, Viewing Time Raw OTL, and Viewing Time Subjective Ratings Within Eight Category Comparisons.
Note. PPG = penile plethysmography; OTL = on-task latency.
p < .05.
Finally, we calculated Spearman’s rank order correlation coefficients between PPG and viewing time OTL pedophilic indices and SSPI scores in our subsample. As with the full sample, for PPG raw scores, PPG z scores, OTL raw scores, and OTL z scores, we calculated one pedophilic index as the mean score across the four child categories minus the mean score across the two teen/adult categories, and another pedophilic index as the highest score across the four child categories minus the highest score across the two teen categories (for all indices, higher numbers indicate greater interest in younger stimuli).
Results are illustrated in Table 5. Unsurprisingly, almost all the four pedophilic indices within each assessment tool were strongly positively correlated with one another (all rs but one ranging from .54 to .94, all ps but one <.05); however, none of pedophilic indices were significantly correlated either across PPG and viewing time (rs ranging from −.49 to .28, all ps > .05) or with SSPI scores (rs ranging from −.31 to .03, all ps > .05). Given the surprising findings regarding both SSPI scores and number of victims in the full sample, we also briefly looked at possible correlations between the number of victims an adolescent reported having and our pedophilic indices, now including four viewing time–based pedophilic indices. In our subsample, number of victims was not statistically significantly correlated with either SSPI scores or any of the eight pedophilic indices.
Correlations Between PPG- and Viewing Time–Based Pedophilic Indices and the SSPI.
Note. PPG = penile plethysmography; SSPI = Screening Scale for Pedophilic Interest.
p < .05. **p < .001. All other p values >.05.
Discussion
Our hypotheses were partially supported in the present study, the first of which we are aware to compare PPG and viewing time data in adolescents. PPG and viewing time category-level data were significantly positively correlated with each other in our subsample, but only when comparing raw as opposed to ipsatized data. Neither PPG- nor viewing time–based pedophilic indices were correlated with SSPI scores in our subsample, and we unexpectedly found that two PPG-based pedophilic indices were significantly negatively correlated with SSPI scores in our full sample. Again in the full sample, we found that adolescents with higher SSPI scores had greater overall responsivity to PPG stimuli than did adolescents with lower SSPI scores. Our results have several implications for the methodology and interpretation of assessing sexual interest in adolescents who have sexually offended.
First, the issue of ipsatization is important. Had we compared only ipsatized data from the PPG and viewing time assessments, the two tools would have appeared uncorrelated in our subsample. This is notable because ipsatization is the standard approach for both PPG (e.g., Clift et al., 2009) and viewing time (Glasgow, 2009) data. The rationale for ipsatization is reasonable; calculating z scores facilitates comparisons across different people or assessment instruments. If PPG and viewing time raw data are correlated, however, but ipsatized data are uncorrelated, as they were in our sample, researchers and clinicians looking only at either raw or ipsatized data will have an incomplete picture of how these two assessment tools are related. We suggest that as researchers continue to explore what is the common ground between PPG- and viewing time–based measures of sexual interest, including both raw and ipsatized data in comparative analyses is warranted. In clinical terms, there is an overlap between these adolescents’ penile and visual responses to sexual stimuli, but the degree of overlap may depend on the kind of data (e.g., raw or ipsatized) being examined during assessments, which is often a choice already made in program software as opposed to by assessing clinicians.
Our results constitute the first empirical test of the relationship between PPG and viewing time data in adolescents who have sexually offended, and we found evidence for a moderate correlation (r = .32) between raw PPG scores and raw OTL-based viewing time scores. The correlation we found may be smaller than what would emerge from larger samples. If this moderate correlation is replicated in larger samples, however, such that PPG and viewing time data are only moderately positively correlated, what are these two assessment tools measuring, and which measurement is more useful in assessing and treating this population of adolescents? We know from the current subsample that, even with just 16 adolescents, objective and subjective viewing time data are strongly positively correlated (r = .69). Do viewing time data, both objective and subjective, represent “true” sexual interest (which may or may not lead to sexual arousal), whereas PPG data represent a purer measurement of physiological arousability? These adolescents may also experience sexual arousal to so many sexual themes that PPG or viewing time assessments may be insufficiently sensitive to clinically meaningful differences. Whether our results reflect insufficient sample size, true differences between what PPG and viewing time assessments measure, or artifacts of the instruments themselves, more research on adolescent responses to both types of assessment is certainly needed. It is not clear, at this point, whether either assessment tool yields data on adolescent sexual interest that are clinically useful for accurate assessment of real-world sexual interest and treatment planning.
We found no evidence that PPG or viewing time data are correlated with SSPI scores; that is, we did not find the expected relationship between an adolescent’s sexual interest in children based on his actual sexual offending characteristics and his responses to child sexual stimuli on assessments. This lack of correlation in our subsample of 16 adolescents may just reflect the limitations of analyses with such a small group. Sample size was not a concern, however, in our finding of a weak negative correlation between SSPI scores and PPG-based pedophilic indices in the full sample of 83 adolescents, a correlation in the opposite direction of what we expected to find. If this correlation is replicable, it would suggest that the two measures are not assessing the same construct, and it would also be unclear which measure, if either, validly assesses adolescent sexual interest in prepubescent children. One possibility is that as a very brief, four-item scale intentionally designed as a screening tool, the SSPI has global limits to its ability to capture meaningful variations between individuals and their offense histories; Mokros and colleagues (2013) recently found, for example, that SSPI scores were uncorrelated with viewing time data in a sample of adults who have sexually offended. In short, despite supportive evidence from three samples in one study (Seto et al., 2003), the SSPI may not be a valid or reliable proxy measure of adolescent sexual interest in children.
One reason that the SSPI may not be valid or reliable for use with this population is suggested by our finding that adolescents with high SSPI scores showed significantly greater responding across all PPG categories than did adolescents with low SSPI scores. Data support the idea that adolescents, as a group, exhibit the greatest arousal during assessments of any age group (Blanchard & Barbaree, 2005; Kaemingk et al., 1995), but individual differences in arousability may also be important. Deviant sexual interests are a risk factor for recidivism (Hanson & Morton-Bourgon, 2005), and perhaps the data in our sample reflect that high, nonspecific sexual arousability is meaningfully linked with sexual offending just as is high, very specific attraction to prepubescent children, at least for adolescents. For adolescents who have sexually offended, high SSPI scores may capture both of these patterns of sexual interest.
The fact that number of victims was significantly negatively correlated with our four PPG-based indices also points to the possibility that high, nonspecific sexual arousability is a real phenomenon worth addressing in these adolescents. For both clinicians and researchers, a cautionary note is that ipsatization can make one adolescent’s high, nonspecific arousal look like another adolescent’s low, nonspecific arousal, although those two profiles may have very different implications for both assessment and treatment. The adolescent with high nonspecific arousal might have truly broad sexual interests or have difficulty disengaging from even not particularly arousing sexual stimuli (e.g., the latter possibility might make it particularly difficult for such an adolescent when faced with a potential trigger for sexual reoffending in the real world). The adolescent with low, nonspecific arousal, however, may have been attempting to suppress his responses (whether specific or nonspecific) in the PPG assessment. For clinicians still utilizing PPG assessments with this population, visually comparing graphs of an adolescent’s raw and ipsatized data can often quickly provide an answer as to whether he exhibited high or low nonspecific arousal.
Our results may indicate that the SSPI is not a useful screening tool for assessing sexual interest in adolescents who have sexually offended, but it is also possible that the categories we used to calculate the PPG-based pedophilic indices contributed to the negative correlation between two of those indices and SSPI scores. Seto and colleagues (2003) found significant positive correlations between PPG data and SSPI scores in three separate samples. Across the three samples, the pedophilic index was calculated as the largest ipsatized response to children minus the largest ipsatized response to adults, individuals over the age of 15, or adolescent peers, respectively. The correlation between a pedophilic index and SSPI scores was strongest (r = .46) in the sample that included only responses to adults in the index calculation. It is quite possible that, for adolescents who have offended sexually, arousal to teens can mean a variety of things, on a continuum from normative arousal to peers (or near-peers) to the very upper age range of truly pedophilic interest. It may simply be more difficult to assess pedophilic interest in adolescents as opposed to in adults, given the smaller age differences between those who have offended and their victims. To that end, our results may well have looked different if the PPG categories available for pedophilic index calculations included adult categories (while ours went no higher than 16-18) that provide the highest contrast in ages to child stimuli. It appears that, when summarizing adolescent sexual interest, both whether data are ipsatized and what categories of sexual stimuli are ipsatized are important considerations. For clinicians, the lack of empirical data on patterns of sexual interest in adolescents who have sexually offended also means that we have no clear guidelines as to how to best conduct clinically useful assessments.
This study does have several limitations worthy of consideration. First, particularly with the data from the subsample of 16 adolescents, the small sample size had a pervasive potential impact on the analyses. All these findings are in need of replication with larger samples. Second, with regard to the viewing time data, it is possible that our results were affected by the fact that participants did not all undergo the same Affinity assessment. Specifically, if differences between the two versions (e.g., number of photographs) introduced significant additional variance, the correlation between viewing time and PPG data might have been artificially reduced. In addition to issues of intra-method consistency, inter-measure consistency was also a weakness. That is, third, the ideal comparison for measures of sexual interest would require that all instruments used identical age ranges within categories (e.g., being able to compare segments featuring teens ages 13-17 on both PPG and viewing time assessments); this lack of direct comparison for teen and adult categories was a significant methodological limit. This comparison was therefore not ideal, and some of the results may be idiosyncrasies resulting from using two assessment tools that were neither designed in tandem nor perfectly analogous. Given that we are unlikely to live in a world of ideal assessments any time soon, more research into the real strengths and limitations of these assessment tools may nonetheless aid clinicians and researchers using the tools already available to them. Fourth and finally, given the finding that an adolescent’s own history of being sexually abused may confound our ability to interpret PPG results (e.g., Becker, Kaplan, & Tenke, 1992; Murphy et al., 2001), it is a potentially confounding factor that we could not account for this variable in the present analyses. Future research will hopefully incorporate abuse history and other potentially relevant variables (e.g., a brief measure of social desirability might help explain discrepant findings between PPG and viewing time data) as part of a larger picture of adolescent sexual interest.
In summary, results from the current study revealed at least some overlap between PPG and viewing time assessments in a small sample of adolescents who have sexually offended; that is, adolescents exhibited similarities between their penile and visual responses to sexual stimuli. The SSPI, however, may have limited specificity regarding sexual interest in children in some samples of adolescents. Specifically, an adolescent’s actual sexual offense characteristics with regard to child victims did not appear to map on to clinically assessed sexual interest in children. Agreement across assessment measures may also depend heavily on what PPG data are available and how PPG-based pedophilic indices are calculated. In addition, more research is certainly warranted on adolescents who have offended sexually and also exhibit high, nonspecific patterns of sexual arousal. Assessing sexual interest has important implications for both assessing recidivism risk and treatment planning; we suggest that our results indicate that there is yet more work to be done to increase the interpretability and better understand the validity of our assessment tools.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
