Abstract
Given the centrality of pedohebephilic interest in understanding sexual offending against children, several interventions have been developed to help men manage or inhibit their sexual arousal to children to reduce the intensity of their experience of such arousal. A meta-analytic review was conducted to examine the effectiveness of interventions for managing pedohebephilic arousal, as measured by phallometric testing. A systematic literature review identified 23 within-group design studies and 18 single-case design studies (N = 1,071) for analysis. Behavioral and pharmacological interventions showed moderate to large effects for reducing pedohebephilic arousal. Moderator analyses suggest that men with high pretreatment pedohebephilic arousal showed the greatest reductions in arousal. Small effects were found for comprehensive treatment programs; none of the interventions had the effect of increasing sexual arousal to adults. These results support the effectiveness of behavioral and pharmacological interventions for managing pedohebephilic arousal in men convicted of sexual offenses against children.
The sexual abuse of children has wide-ranging adverse psychological, health, and financial impacts on victims and society (Cotter & Beaupré, 2014; Paolucci et al., 2001; Wang & Holton, 2007). Given the costs associated with child sexual abuse, understanding characteristics that increase individuals’ likelihood of committing sexual offenses against children and identifying effective treatments for these characteristics is a high priority. Most theories of sexual offending include pedohebephilic interest as one risk factor that plausibly explains the initiation of sexual contact with children (for a review, see Seto, 2018), while meta-analytic research has supported these interests as predicting re-offending by men convicted of sexual offenses against children (Hanson & Morton-Bourgon, 2005; Mann et al., 2010; McPhail et al., 2019).
Assessment and Treatment Considerations of Pedohebephilic Interests and Arousal
In forensic and criminal justice contexts, structured assessment approaches are employed with clientele charged with or convicted of sexual offenses to assess their risk for future sexual violence, whereas intervention and management approaches are employed to mitigate that risk to prevent further sexual violence (Bonta & Andrews, 2017). Although a range of clinical rating tools, diagnostic screens, and self-report measures exist, a mainstay of assessing pedohebephilic interest has been phallometric testing, which measures changes in penile volume or tumescence during the presentation of erotic audiovisual stimuli. Increases in penile volume or tumescence during the presentation of erotic stimuli involving children is interpreted as indicating a sexual interest in children of a certain age and sex category; increases in penile volume or tumescence during the presentation of erotic stimuli involving consenting adults are interpreted as an indication of teleiophilic interest (see Table 1 for list of definitions).
Definitions of Sexual Interests
Although phallometric testing is not a risk assessment procedure per se, it assesses a clinically and forensically relevant psychosexological construct that can aid risk formulation and intervention. For instance, a recent meta-analysis found phallometric tests for pedohebephilic arousal to be a strong predictor of sexual recidivism by men who have been convicted of sexual offenses against children (d = 0.44; McPhail et al., 2019). This predictive effect is stronger than most single dynamic risk factors for sexual recidivism (Hanson & Morton-Bourgon, 2005) and even equivalent to some risk assessment tools (Hanson & Morton-Bourgon, 2009). Other meta-analytic research suggests that treatment programs that target arousal control are associated with greater decreases in sexual recidivism (Gannon et al., 2019). In contrast to these strengths, phallometric testing has larger methodological concerns, such as limited reliability (Marshall & Fernandez, 2003). Taken together, this body of research suggests pedohebephilic interests, as measured by phallometric testing, fall under the rubric of what Mann et al. (2010), have termed psychologically meaningful risk factors; that is, biopsychosocial processes that are possible causes of sexual offending, predict maintenance of sexual offending, and when treated and managed, lead to reductions in sexual offending.
The risk-need-responsivity (RNR; Bonta & Andrews, 2017) model of effective correctional intervention explicitly links assessment and intervention through positing: (a) that the intensity of services (i.e., dosage) should be matched to the risk level of the client (risk principle), (b) that treatment should prioritize psychologically meaningful risk factors linked to criminal behavior (need principle), and (c) service delivery should be tailored to the unique characteristics of clientele such as culture, motivation, and learning style among other areas (specific responsivity principle), within the context of a warm, empathic, firm but fair relationship (general responsivity). Pedohebephilic interests can be construed as an RNR-based construct, given that (a) individuals assessed with a high level of such interests and corresponding arousal tend to be higher risk for sexual offending (although assessment of a single construct is insufficient to determine risk level) and merit services of appropriate intensity (McPhail et al., 2019), (b) pedohebephilic interest is a criminogenic need or psychologically meaningful risk factor associated with future sexual violence to be prioritized for management to reduce risk (Mann et al., 2010), and (c) clinical skill, sensitivity, and discretion are helpful to promote client engagement, obtaining valid assessment information, and to promote positive changes in sexual self-regulation (Marshall et al., 2011).
Given the centrality of pedohebephilic interests in understanding sexual offending toward children, several intervention approaches are used in sexual offense treatment programs (McGrath et al., 2010). In general, behavioral interventions aim to provide men with skills to manage or inhibit their sexual arousal to children in daily life. For instance, behavioral interventions, based on operant conditioning principles, include aversion therapies in which a noxious stimulus is paired with arousal to children or reinforcement therapies in which adult stimuli are associated with a rewarding experience (e.g., masturbation and orgasm; Marshall et al., 2009). In addition, pharmacological interventions, which include antiandrogen medications that reduce sex drive, can be provided as an adjunct or alternative to behavioral conditioning procedures (Garcia & Thibaut, 2011). Pharmacological interventions can alleviate the intensity or frequency of sexual arousal to, or sexual thoughts involving, children. In research testing the effectiveness of these interventions, 1 phallometric testing may be used not only to assess the presence of pedohebephilic arousal, but also as a treatment tool to monitor changes in the capacity to interrupt or inhibit arousal. In addition, phallometric testing may be a somewhat more valid indicator of treatment change because self-reported fantasy and behavior is susceptible to socially desirable responding in the context of treatment (Turner & Briken, 2018).
Research Examining the Effectiveness of Treatment
Research into the effectiveness of treatments for managing pedohebephilic arousal dates to the 1960s; however, no meta-analytic research has been conducted to summarize this literature. Systematic reviews indicate that behavioral treatments for pedohebephilic interests have the effect of decreasing interest in children and increasing interest in female and male adults (Kelly, 1982; Laws & Marshall, 1991). A more recent, nonsystematic review of behavioral treatments reached this same conclusion (Marshall et al., 2009). However, findings from single research studies indicate that behavioral treatments may have limited effects on increasing men’s arousal to adults (Crolley et al., 1998; Johnston et al., 1992; Jones, 2014). Systematic reviews of pharmacological treatment studies suggest that some men experience a decrease in interest in children, though studies using phallometric testing were either sparse (Lewis et al., 2017) or produced equivocal levels of support (Turner & Briken, 2018). The reviews of pharmacological treatments tended not to review whether pharmacological treatments increased arousal to adults (Turner & Briken, 2018), though this omission may be due to pharmacological treatments being expected to have the effect of reducing arousal as a whole, instead of to a particular class of individuals (e.g., children).
Given this state of affairs, there are important gaps in our understanding of these treatment approaches which meta-analytic research is well-suited to address. Perhaps most importantly, the current state of the research provides no overarching or coherent quantitative statement on the effectiveness of these treatment approaches. Whereas behavioral treatments have a reasonable level of support, pharmacological treatments seem to have a somewhat more equivocal effectiveness when changes on phallometric tests are the outcome measure. Relatedly, it is unclear whether different treatment modalities produce differential effects. The importance of this for clinical practice is hard to understate. If certain treatment modalities are effective or a modality is found to be ineffective, this will provide guidance for clinicians regarding what treatments to offer. Furthermore, given the emphasis on the reporting and interpreting results of null hypothesis significance tests in primary studies, the magnitude of treatment effects for different treatment modalities is lacking. Having an estimate of the magnitude of change over treatment will allow clinicians to have a more nuanced understanding of how much change to anticipate over a course of treatment.
The existing research is also equivocal regarding whether treatment has an effect on arousal to adults or whether the effect of treatment is solely on helping men manage their arousal to children. While past qualitative reviews suggest that men who commit sexual offenses against children can increase their sexual interest in adults (Kelly, 1982; Marshall et al., 2009), some primary studies suggest this is not the case. Meta-analytic research may help clarify this issue. Further to this, the previous reviews of behavioral treatments are either out of date (Kelly, 1982; Laws & Marshall, 1991) or were less systematic in their approach (Marshall et al., 2009); a meta-analytic review would provide a needed update using a systematic approach.
Important questions also remain unasked or unanswered about whether treatments are effective for sexual offending subgroups, such as those who offend against related or unrelated children, or those with differing intensities of pedohebephilic interest and arousal. Meta-analysis may allow for an understanding of who benefits the most from treatment. The clinical implications of such findings are also considerable because if certain groups are identified as not benefiting from treatment, there would be little to be gained from offering them such treatment. Finally, the existing research has relied on uncontrolled single-group pre–posttreatment designs. Individual studies using this research design, on their own, provide little confidence in what is found; however, meta-analytic methods can improve the collective strength of the body of such studies and, in part, offset threats to internal validity faced by individual studies (Canadian Psychological Association, 2012).
Present Study
This research presents a meta-analytic review of the effectiveness of interventions for managing pedohebephilic arousal. We examined treatment effects for intervention types across developmental stage of sexual offending persons, sexual offense subgroup, ages of individuals depicted in erotic stimuli, and intensity of pedohebephilic arousal. Given the status of phallometric testing as an established assessment measure of pedohebephilic interest and that it is used in much of the treatment literature, the present meta-analytic review focused on studies using phallometric testing.
Method
Study Inclusion
Studies were included in the present meta-analysis if the research (a) included a sample of adult or adolescent males who had sexually offended against children, defined as men who had committed a sexual offense against a child below 15 years of age (Cantor & McPhail, 2015)—samples that had offended against related children, unrelated children, female children, or male children were included; (b) reported data from a phallometric assessment of pedohebephilic interests; (c) provided an intervention targeting pedohebephilic interests in a sample of sexual offenders against children; (d) included sufficient statistical information to calculate a within-subjects, pre–posttreatment change effect size (ES) in a treatment group.
Literature Search
A literature search to identify eligible studies was conducted through to March 2017 and then updated in October 2019. We systematically searched multiple databases, including Pro-Quest Dissertations and Theses Global, PsycINFO, Web of Science, and PubMed. Searches included combinations of the following terms: “phallometry,” “penile plethysmography,” “PPG,” “sexual arousal,” “deviant arousal,” “deviant sexual interest,” “sexual preference,” “sexual offender against children,” “child molester,” “child sex offender,” “pedophile,” “pedophilia,” “treatment,” “intervention,” “cognitive behavioral,” “CBT,” “EMDR,” “behavior therapy,” “BT,” “therapy outcome,” and “treatment outcome.” A search of governmental agency websites, journal table of contents, conference programs, and reference lists from relevant review articles, books, and book chapters was also conducted.
Data Extraction and Coding
Study and Sample Level Characteristics
Studies were coded for publication status, year of publication, setting in which treatment was provided (prison/institution, community, or combination of both), and country.
Study Design-Level Characteristics
Studies were coded according to the type of research design used, which included single-case designs, nonrandomized single treatment group design, nonrandomized treatment and control group design, and randomized control trial. Single-case designs were included to capture a relatively large case study literature assessing the effect of behavioral treatments on pedohebephilic interests (k = 18). Data from follow-up assessments were also coded.
Treatment-Level Characteristics
Interventions were coded as behavioral, cognitive-behavioral, pharmaceutical, comprehensive treatment programs, eye-movement desensitization and reprocessing, or a combination of interventions. Specific techniques within these intervention types were also coded. Behavioral treatments included masturbatory reconditioning, olfactory reconditioning/aversion, covert or vicarious sensitization, and satiation. Cognitive-behavioral interventions coded for included thought-stopping, self-talk, identifying automatic thoughts related to sexual arousal, and cognitive restructuring. Pharmaceutical interventions coded for included medroxyprogesterone acetate, cyproterone acetate, luteinizing hormone–releasing hormone analogues, and antipsychotics or selective serotonin reuptake inhibitors. Interventions coded as “comprehensive” included programs which aimed to address multiple psychosocial risk factors, in addition to arousal control, to reduce sexual recidivism. For example, Jones (2014) reports on a comprehensive treatment program in which men completed material on offense chain mapping, victim empathy, arousal reconditioning, relationship skills, mood management, and relapse prevention.
Phallometric Assessment Characteristics
Phallometric tests for pedohebephilic interest differentiate men who sexually offend against children from other groups based (Cantor & McPhail, 2015; McPhail et al., 2019), are robust predictors of sexual recidivism, and have adequate reliability (see below). In the present study, the age of persons depicted in stimuli used during phallometric testing was coded. Two age categories were used to identify child stimuli: prepubescent stimuli (aged 10 years and younger) were coded as pedophilic and pubescent stimuli (aged 11–14 years) were coded as hebephilic. These age ranges were adopted because these age ranges reflect when pubertal changes occur for children, which is typically before or around age 11 (see Cantor & McPhail, 2015). Some studies describe child stimuli without providing specific ages for the children, complicating the classification of the age of the subjects depicted in the stimuli. In these cases, the stimuli were coded as pedohebephilic. In addition, the pedohebephilic category includes stimuli coded as pedophilic and hebephilic and groups stimuli depicting individuals below the age of 15 into a superordinate category. Stimuli depicting adults were coded as teleiophilic. Phallometric stimuli were also coded according to the type of phallometric data reported (raw changes in penile circumference, percent full erection [PFE] data, z score data, indices derived from PFE data, or indices derived from z score data). Data type was used to inform the normative comparison analysis (see below).
Moderating Variables
Recent taxometric research has found pedophilic interest to be taxonic (Brankley, 2019; McPhail et al., 2018; Schmidt et al., 2013) and dimensional (Stephens et al., 2017). The main meta-analyses that we conducted model pedohebephilic interest as a dimension. However, a recent taxometric analysis has also generated support for a three-group structure to pedophilic interest, with the groups having increasing levels of pedophilic interest (McPhail et al., 2018). Given that latent structure can have implications for research into treatment effectiveness, we examined to what extent group membership, operationalized as pretreatment intensity of pedohebephilic arousal, may moderate treatment effects. To assess whether pretreatment intensity of arousal moderated treatment effects, the highest pretreatment PFE score in a sample was used to categorize samples into one of three groups, using PFE cut-scores provided by McPhail et al. (2018). We labeled these groups low pedohebephilic arousal, moderate pedohebephilic arousal, and high pedohebephilic arousal. The criterion for categorizing samples into the low pedohebephilic arousal group was the highest mean pretreatment PFE score to a child stimulus trial being below 11.6. For the moderate pedohebephilic arousal group, these samples’ highest mean pretreatment PFE score was between 11.6 and 24.8; for the high pedohebephilic arousal group, the samples’ highest pretreatment PFE score was greater than 24.8. Group membership was used as a moderator variable of treatment effects for all treatments combined, behavioral and comprehensive treatments combined, and behavioral treatments across pedohebephilic, pedophilic, and teleiophilic interests.
Risk of Bias
Risk of bias in primary studies was assessed using the ROBINS-I tool, which provides a means to systematically assess risk of bias in nonrandomized studies of interventions (Sterne et al., 2016). Raters relied on descriptions of the items in the ROBINS-I guidance manual (Sterne et al., 2016). Six of the seven domains of bias on the ROBINS-I were coded because the selection bias domain was not relevant to the included studies. Each domain is rated as having a low, moderate, serious, or critical risk of bias. The coder is asked to make two global ratings: the overall risk of bias present in the study and the direction of the bias. All single-group, pre–postdesign and single-case design studies were rated as having a “critical” risk of bias and that the direction of the bias was unpredictable. 2
Interrater Reliability
Interrater reliability analyses were based on 10 studies coded by two independent raters, both of which extracted data from these 10 studies and coded the abovementioned variables. Interrater reliability ranged from κ = .53 to 1.00 for categorical variables and intraclass correlation (ICC; two-way mixed, single measures, consistency ICC) values ranged from .82 to 1.00 for continuous variables. Disagreements among coders were settled via consensus through discussion and revisiting the original study document.
Analytic Approach
Calculating ESs
Pre- and posttreatment means and standard deviations, p values for pre- to posttreatment change, t values for pre- to posttreatment change, and differences in means were coded from the studies to calculate the ES, g. The ES metric g is an estimate of the standardized difference between pretreatment and posttreatment mean scores, adjusted to account for bias introduced by small sample size (Borenstein et al., 2009; Morris & DeShon, 2002). A sizable minority of studies presented average phallometric data in figures, and to include these studies, we used WebPlotDigitizer Version 3.8 (Rohatgi, 2018) to extract data from figures. The use of this program in meta-analytic reviews is recommended as a method to capture more study data (Burda et al., 2017). Data from participants across single-case design studies were averaged and a standard deviation for all participants in these studies was computed. This resulted in the single-case design studies producing a single ES estimate for behavioral interventions.
A common issue in intervention studies is that many do not report a pre- to posttreatment correlation, which is needed to compute a within-subjects ES. Some authors recommend imputing a pre- to posttreatment correlation of r = .70 (Rosenthal, 1993); however, this approach can introduce bias (Cuijpers et al., 2017). To compute more accurate ESs for this meta-analysis, we reviewed studies included in McPhail et al. (2019) and additional research to identify studies that report test–retest correlations for phallometric tests for pedohebephilic and teleiophilic interests. We found six studies reporting test–retest correlations and conducted a fixed-effects meta-analysis of these data (see Supplemental Appendix A2, available in the online version of this article, for full citations). For phallometric tests for pedohebephilic interests, the aggregate test–retest correlation was r = .51 (95% confidence interval [CI] = [.47, .55], Q = 4.80, I2 = 0.00, k = 6, N = 1,256); for teleiophilic interests, the aggregate test–retest correlation was r = .43 (95% CI = [.26, .57], Q = 2.89, I2 = 0.00, k = 5, N = 124). We imputed one of these two values when computing pre- to posttreatment change ESs, depending on the age of persons depicted in the phallometric stimuli used in computing the ES.
Aggregating ESs
Data from the included studies were aggregated using both fixed-effects and random-effects meta-analysis. Fixed-effects meta-analytic results are conceptually restricted to the particular set of studies included in the meta-analysis, whereas random-effects meta-analytic results allow for more confidence in generalizing to the population the current sample of studies is drawn from. When variability across studies is low (i.e., Q < degrees of freedom), random-effects and fixed-effects meta-analysis produces identical results. When the analysis includes a small number of studies (k < 30), greater interpretive weight should be given to fixed-effects rather than random-effects analyses because the between-study variability estimate necessary for random-effects analyses loses precision (Schulze, 2007).
Multiple studies reported data on two or more phallometric outcomes. Two approaches were used for studies with multiple outcomes. In the first approach, ESs from multiple outcomes were averaged within studies using the Comprehensive Meta-Analysis software (Biostat, 2013); this method is used in most of the analyses reported below. For example, Bradford and Pawlak (1993) reported pre- and posttreatment means for three phallometric stimuli and the mean ES of these three outcomes was used in this first method. In the second approach, we selected the phallometric outcome for which the study sample showed the highest average arousal at pretreatment. For example, the sample in Bradford and Pawlak (1993) showed the highest pretreatment arousal to stimuli depicting sexual activity with a passive child and these were the data selected for Bradford and Pawlak (1993) in the second approach. 3
When two or more study ESs were available, a meta-analytic aggregate effect was computed. Conducting a meta-analysis using only two ESs may produce an inaccurate estimate of dispersion and CIs; for this reason, when an aggregate effect is based on only two ESs, these should be interpreted with caution and as preliminary estimates (Borenstein et al., 2009). Estimates of the heterogeneity from the fixed-effects model are reported. The Q statistic indicates whether the observed heterogeneity among individual ESs is statistically significant. I2 indicates the proportion of the observed variability between studies that is due to factors beyond spurious variation (Borenstein et al., 2009), and this statistic can be interpreted as an estimate of the amount of inconsistency in the findings of the studies included in a meta-analysis (Higgins et al., 2003). Outlier analyses were conducted using the following criteria: four or more studies contributed to the mean ES, the Q statistic was significant (p < .05), an individual study’s ES was the most extreme value, and an individual study’s ES accounted for 50% or more of the Q value (Whitaker et al., 2008). Meta-analytic analyses were conducted using Comprehensive Meta-Analysis, Version 3.0 (Biostat, 2013).
Planned moderator analyses were conducted using group membership in terms of intensity of pedohebephilic arousal (low, moderate, high). In moderator analyses, the Q statistic is portioned into Qbetween and Qwithin. Qbetween reflects the variability explained by the moderator variable (between-level variability) and Qwithin reflects the pooled within-level variability (unexplained variability; Borenstein et al., 2009). Qbetween follows a χ2 distribution with x – 1 degrees of freedom, where x is the number of levels in the moderator.
Publication Bias
Analyses for publication bias were assessed using the trim-and-fill method (Duval & Tweedie, 2000), Egger’s test of the intercept (Egger et al., 1997), and visual inspection of the funnel plot. The trim-and-fill method assesses whether studies with negative results are missing from a meta-analysis and can provide an estimate adjusted for missing studies. Egger’s test uses a regression model to detect publication bias; if bias is present, the intercept in the model will deviate from zero. We conducted these analyses when appropriate to do so (I2 < 50, nonsignificant Q, 5+ studies, at least one effect is significant; Ioannidis & Trikalinos, 2007) and report results when publication bias was present.
Benchmarking
Most studies used single-group, pre–posttreatment designs. To ameliorate some of the limitations in using such studies in meta-analysis, we constructed natural history benchmarks for pedohebephilic and teleiophilic interests against which to compare treatment effects. These benchmarks provide a means of determining whether treatment effects are greater than change due to natural history processes (Minami et al., 2008). Benchmarks for pedohebephilic interests were created by aggregating pre–post phallometric scores in three waitlist control groups and two test–retest samples (see Supplemental Appendix A3, available in the online version of this article, for full citations). A meta-analysis of these five samples resulted in a pedohebephilic interests natural history benchmark of g = 0.113 95% CI = [−0.044, 0.269], Q = 6.23, I2 = 35.80, k = 5, N = 152). Using two samples, a teleiophilic natural history benchmark was also created (g = −0.076, 95% CI = [−0.309, 0.158], Q = 3.09, I2 = 67.64, k = 2, N = 74). A range-null test was used to test whether treatment effects were beyond a critical value and can be interpreted as being statistically significantly greater than natural remission (Minami et al., 2008). This range-null test follows a noncentral t distribution with N – 1 degrees of freedom. Noncentrality parameters, t critical values, and g critical values were derived using formulas presented in Minami et al. (2008). A predetermined margin for a clinically trivial difference was selected (g = 0.20) and aggregate treatment effects were considered clinically relevant if the ES was at least one fifth of a standard deviation larger than the natural history benchmark (i.e., g = 0.40).
End-State Normative Comparisons
Benchmarking allows for some confidence in identifying treatment effects that are greater than natural remission. However, these analyses do not address how well men who have sexually offended against children are functioning at the end of treatment. To address this limitation, end-state normative comparisons were conducted (Kendall et al., 1999; McAleavey et al., 2017; McEvoy & Nathan, 2007). Normative comparison data were constructed by aggregating data from samples of men without a history of sexual offending as reported in McPhail et al. (2019). Normative comparison data, in the form of means and standard deviations, were constructed for pedohebephilic, pedophilic, and teleiophilic interests (see Supplemental Appendix A4, available in the online version of this article, for full citations). 4 Posttreatment means and standard deviations from treatment studies were used to construct weighted means and pooled standard deviations across the three sexual interests. 5
The equivalence of means in the normative and treatment samples was tested using the two one-sided test procedure (Kendall et al., 1999; Lakens, 2017; van Wieringen & Cribbie, 2014). This method requires selecting a range of closeness (δ) that specifies the range within which group differences must fall to be considered equivalent. In the context of phallometric tests, there is little guidance available for what an appropriate interval would be for the normative comparisons. In this situation, multiple δ are selected and used in equivalence tests (van Wieringen & Cribbie, 2014). On reviewing the standard deviations in the normative samples (SDnormative sample) and the results in McPhail et al. (2019), we selected two intervals, δ = 0.5SDnormative sample and δ = 0.75SDnormative sample, for the equivalence tests. The results for the 0.5SD and 0.75SD were identical and these results are presented together.
Results
Study Characteristics
Twenty-three studies describing treatment effects for samples of men identified as having sexually offended against children, producing 197 ESs (MD = 4) and including 1,045 men (MD = 25), were included in the analyses. Of these 23 studies, 74% were published (k = 17), 48% provided treatment in an inpatient setting (k = 11) and 44% provided treatment in an outpatient setting (k = 10), and 74% used a nonrandomized single treatment group design (k = 17), whereas two studies used random assignment in the design. Eighteen studies reporting single-case designs and included 26 men who had sexually offended against children. Of these 18 studies, 94% were published (k = 17), 67% provided treatment in an inpatient setting (k = 12), and 22% provided treatment in an outpatient setting (k = 4). See Supplemental Table S1, available in the online version of this article, for more detailed information from each study included in the analyses (see Supplemental Appendix A1, available in the online version of this article, for full citations).
When interpreting the direction of ESs, a positive ES for pedohebephilic or pedophilic arousal indicates that the treatment group showed lower levels of arousal to child stimuli from pre- to posttreatment. For teleiophilic arousal, a positive ES indicates the treatment group showed higher levels of arousal to adult stimuli from pre- to posttreatment.
Overall Effect of Interventions
Behavioral Treatments
The meta-analytic results for the effect of intervention on pedohebephilic, pedophilic, and teleiophilic arousal are shown in Table 2. There was a positive effect for behavioral treatments for pedohebephilic and pedophilic arousal; the magnitude of the effect increased with the inclusion of single-case design studies. When results were restricted to ESs derived from the samples’ highest response to stimuli depicting children, the positive effect of behavioral treatments was large (g = 0.79, 95% CI = [0.63, 0.96], Q = 15.5, I2 = 28.8, k = 12, N = 183). Behavioral treatments demonstrated little effect for increasing phallometric responding to adults. The treatment effects for pedohebephilic and pedophilic arousal were clinically and significantly greater than the natural history benchmark critical value (gcv ranged from = 0.43 to 0.66; all p < .01; see Supplemental Table S2, available in the online version of this article). The treatment effect for pedohebephilic arousal was also greater than the natural history benchmark using the samples’ highest response to stimuli depicting children (gcv = 0.50, p < .01; see Supplemental Table S3, available in the online version of this article).
Effects of Interventions on Pedohebephilic, Pedophilic, and Teleiophilic Arousal
Note. CI = confidence interval; LL = lower limit; UL = upper limit.
p < .05. **p < .01.
Two studies also examined the effect of behavioral treatments from pretreatment to follow-up and from posttreatment to follow-up. The magnitude of change from pretreatment to follow-up was g = 0.74, 95% CI = [0.40, 1.08], Q = 2.8, I2 = 64.5, k = 2, N = 39, average follow-up time = 111 days). In addition, treatment gains in arousal management were maintained from posttreatment to follow-up (g = 0.12, 95% CI = [−0.18, 0.43], Q = 3.4, I2 = 70.7, k = 2, N = 39).
Pharmacological Treatments
Pharmacological treatments showed a similar positive effect for management of pedohebephilic arousal (Table 2); however, there were too few ESs available to evaluate their effect on pedophilic and teleiophilic arousal. Pharmacological treatment effects were greater than the natural history benchmark critical value (gcv = 0.63; p < .05; Table S2). A similar result was found when the treatment effects were examined using the samples’ highest response to stimuli depicting children (g = 0.70, 95% CI = [0.20, 1.30], Q = 2.1, I2 = 0, k = 4, N = 32; benchmark gcv = 0.64, p < .05; Table S3).
Comprehensive Treatments
Comprehensive treatment programs showed a small significant positive effect for management of pedohebephilic and pedophilic arousal (Table 2). Restricting the analysis to treatment effect for the samples’ highest response to stimuli depicting children produced similar findings (g = 0.34, 95% CI = [0.19, 0.48], Q = 1.7, I2 = 0, k = 3, N = 187). Comprehensive programs had a small, positive effect for increasing teleiophilic arousal. These treatment effects were not clinically or statistically greater than the natural history benchmark (Supplemental Tables S2 and S3).
Eye-movement and Desensitization Reprocessing
Two studies reported pre- to posttreatment changes over eye-movement and desensitization reprocessing interventions (EMDR). These studies found significant positive change in pedohebephilic arousal (g = 0.64, 95% CI = [0.14, 1.14], Q = 0.1, N = 13). Although this treatment effect met the predefined threshold for clinical relevance (i.e., g ≥ 0.40) relative to the natural history benchmark, the difference was not statistically significant (p > .05).
Effect of Specific Behavioral Interventions
Pedohebephilic Arousal
Olfactory aversion showed a large, significant effect on pedohebephilic arousal (g = 1.35, 95% CI = [0.57, 2.14], Q = 0.1, I2 = 0, k = 2, N = 15). Moderate and significant effects were also found for covert and vicarious sensitization (g = 0.65, 95% CI = [0.41, 0.89], Q = 0.3, I2 = 0, k = 2, N = 75) and satiation (g = 0.76, 95% CI = [0.54, 0.99], Q = 9.6, I2 = 58.3, k = 5, N = 89). Primary studies also reported on the effects for combinations of behavioral interventions, which were grouped according to conditioning principles informing the intervention (e.g., positive reinforcement or extinction-based interventions). Moderate and significant effects were found for combined positive reinforcement and extinction-based interventions (g = 0.60, 95% CI = [0.21, 1.00], Q = 5.2, I2 = 55.4, k = 6, N = 93) and combined aversion and extinction-based interventions (g = 0.63, 95% CI = [0.45, 0.88], Q = 4.2, I2 = 80.8, k = 2, N = 40). These two treatment effects were clinically and statistically greater than the natural history benchmark for pedohebephilic arousal (gcv = 0.60 and 0.57, p < .05; Table S4). Small effects were found for combined signaled punishment and biofeedback (g = 0.39, 95% CI = [0.08, 0.70], Q = 0.9, I2 = 0, k = 2, N = 150) and when positive reinforcement, aversion, and extinction interventions were combined (g = 0.19, 95% CI = [0.09, 0.29], Q = 9.4, I2 = 89.3, k = 2, N = 410). 6 These two effects were not significantly greater than the natural history benchmark (p > .05)
Pedophilic Arousal
Two primary studies report ESs for satiation and found a large, significant effect (g = 1.08, 95% CI = [0.72, 1.45], Q < 0.1, I2 = 0.9, k = 2, N = 33). Three single-case reports also examined satiation, and when these cases were included, the aggregate ES was g = 1.12, 95% CI = [0.76, 1.47]. The ES for satiation interventions was clinically and statistically greater than the natural history benchmark for pedohebephilic arousal (gcv = 0.78, p < .01; Table S4). Small ESs were found across two studies that combined aversion and extinction-based interventions (g = 0.30, 95% CI = [0.04, 0.55], Q < .01, I2 = 0, k = 2, N = 58), which were not greater than the natural history benchmark (p > .05).
Teleiophilic Arousal
Studies reporting the effect of individual behavioral interventions for increasing teleiophilic arousal found nonsignificant ES for olfactory aversion (g = 0.12, 95% CI = [−0.35, 0.59], Q = 0.2, I2 = 0, k = 2, N = 14), directed masturbation (g = −0.30, 95% CI = [−0.66, 0.05], Q = 4.7, I2 = 78.7, k = 2, N = 33), and satiation (g = −0.03, 95% CI = [−0.28, 0.23], Q = 0.1, I2 = 0, k = 2, N = 17). Studies reporting combinations of behavioral interventions produced small ES for aversion and extinction-based interventions (g = 0.23, 95% CI = [−0.05, 0.50], Q = 0.1, I2 = 0, k = 2, N = 58) and positive reinforcement, aversion, and extinction-based interventions combined (g = 0.20, 95% CI = [0.10, 0.29], Q = 0.2, I2 = 0, k = 2, N = 432).
Effect of Interventions in Child Sexual Offense Subgroups
The effect of behavioral interventions in different subgroups of men who had sexually offended against children was examined according to the relationship of the individual to the victim(s), the gender of the victim(s), and the age of the perpetrator. Behavioral interventions showed significant ESs for pedohebephilic arousal across all child sexual offense subgroups examined (Table 3). For the incest offending subgroup, the treatment effect exceeded the natural history benchmark for clinical relevance, but was not statistically significant (Table S5). The treatment effects in the other offending subgroups were clinically and statistically greater than the natural history benchmark (p < .05). In contrast, behavioral interventions showed little effect for increasing teleiophilic arousal (Table 3).
Meta-Analyses of Changes in Pedohebephilic and Teleiophilic Arousal During Behavioral Treatments in Child Sexual Offense Subgroups
Note. CI = confidence interval; LL = lower limit; UL = upper limit; SOC-E = sexually offended with unrelated child victims; SOC-I = sexually offended against intrafamilial children; SOC-FV = sexually offended with female child victims; SOC = sexually offended against children.
p < .01.
Treatment Effects Across Pretreatment Arousal Magnitude
Supplemental analyses demonstrated that the magnitude of pretreatment arousal, across arousal type, was a significant moderator when all treatments were combined in the analysis and when behavioral and comprehensive treatments were combined (Supplemental Table S6, available in the online version of this article). With respect to pedohebephilic arousal, only aggregate effects in the high arousal group were associated with decreases in pedohebephilic arousal that were clinically and statistically greater than the natural history benchmarks (all ps < .01; Table S7). There was no significant difference in treatment effect across groups when only behavioral intervention studies were considered. With respect to pedophilic arousal, only treatments provided to samples classified in the high arousal group were associated with decreases in pedophilic arousal that were clinically and statistically greater than the natural history benchmarks (all ps < .01; Table S7). None of the treatment effects for teleiophilic arousal were meaningfully different from the natural history benchmarks.
Normative Comparisons
The two one-sided test procedure was conducted for pedohebephilic and pedophilic arousal when all treatment types were combined, for combined behavioral treatments, and for pharmacological treatments. Teleiophilic arousal was not examined, given that treatments demonstrated little effect for increasing arousal toward adults. The posttreatment PFE and index scores were equivalent to normative group data for pedohebephilic and pedophilic arousal when all treatments were combined (Table 4). When z score–based data were used, the posttreatment scores were not equivalent to normative data, indicating that posttreatment scores remained elevated. A similar pattern was observed for sample receiving behavioral interventions, with the exception that PFE data did not show equivalence for pedophilic arousal and posttreatment scores remained elevated relative to normative data for both groups. Men receiving pharmacological treatments for pedohebephilic arousal were not equivalent to normative men at posttreatment. It should be noted that the sample sizes available for some of the equivalence tests were small and the results should be interpreted with caution.
Comparisons in Pedohebephilic and Pedophilic Arousal at Posttreatment Between SOC and a Normative Comparison Sample
Note. SOC = sexually offended against children; k = number of studies included; PFE = percent full erection.
Discussion
Our meta-analysis of interventions for managing pedohebephilic arousal demonstrated a “dodo bird” effect of sorts, consistent with treatments for other mental health issues (Luborsky et al., 2002). Diverse intervention methodologies demonstrated moderate to large reductions in pedohebephilic arousal. The common thread for most of these interventions is helping the individual develop a skill set to (a) attenuate pedohebephilic arousal, through pharmacological means or through conditioning by pairing arousal to children with a noxious odor, highly aversive imagined consequence, or boredom; (b) develop strategies to control that arousal such as via cognitive/behavioral techniques and/or reduction of serum testosterone; and/or (c) increase the interest in, or normalize, arousal to teleiophilic stimuli.
The meta-analysis supported the former two propositions, but support for the third was generally lacking in the available literature. Importantly, the effect of treatment on pedohebephilic arousal transcended different age attractions and the age of perpetrators or subgroup the offending individual belonged to (i.e., incest or extrafamilial child victims). This is the first meta-analytic research to suggest pedohebephilic arousal fulfills the requirement that, to be a psychologically meaningful risk factor, biopsychosocial processes are susceptible to change via intervention (Mann et al., 2010). Other past meta-analytic studies suggest pedohebephilic arousal predicts sexual recidivism (McPhail et al., 2019) and treatments that target arousal control are more effective in reducing recidivism (Gannon et al., 2019). While more research is needed, taken together, these meta-analytic studies offer support for pedohebephilic arousal as a central psychologically meaningful risk factor for sexual offending against children.
Few men are exclusively pedohebephilic. Extant literature demonstrates that about a third to half of men identified as having sexually offended against children have pedohebephilic interests which coincides with some form of teleiophilic interest, whether that be preferential or not (Seto et al., 1999). The motivation and ability to increase already present teleiophilic interest might be more difficult to detect, which may explain the null effect for increasing arousal to adults found in the present research. Alternatively, this result may indicate that treatments are simply not suited for this purpose. This finding is especially important considering past reviews have suggested treatment enhances sexual interest in adults (Kelly, 1982), which the current review does not support.
The analyses grouping samples according to pretreatment pedohebephilic arousal intensity suggest that, taking into account the sample sizes across studies, between 67% and 86% of men who underwent interventions specifically for the purpose of decreasing pedohebephilic arousal did not show gains above natural processes. By contrast, those men with the highest levels of pedohebephilic arousal, who have the most room to change, demonstrated the most substantive changes in arousal management and hence reductions in risk for future sexual contact with a child. The findings are consistent with the Risk and Need principles which assert that targeted interventions should be prioritized for individuals who exhibit a problem with a risk factor (Bonta & Andrews, 2017). While such findings may be promising, there are important caveats. Specifically, the methodology used in primary studies cannot rule out the possibility of regression-to-the-mean, given that the studies (a) lacked untreated control groups and (b) seldom linked changes in arousal to an external criterion such as recidivism. Future research using control samples of highly pedohebephilic men who do not undergo treatment and randomization are needed to improve confidence that treatments have a causal effect for men with high levels of pedohebephilic interest.
This dodo bird effect does not extend to comprehensive treatment programs. One possible explanation is that these programs treated men with low levels of pedohebephilic arousal. Indeed, most samples that participated in a comprehensive treatment program had relatively low pretreatment pedohebephilic arousal. In addition, comprehensive programs were more likely to include a combination of behavioral treatment types, which may be less effective than using single behavioral interventions and which may also indicate less experimental control regarding what kinds and doses of treatment individual men were receiving. Comprehensive programs typically last a lengthy period of time, given the range of psychosocial issues addressed in these programs. However, there are compelling reasons for why additional foci in programs such as improving intimacy skills and developing healthy sexuality are important aspects of treatment, if reducing sexual recidivism is the aim of treatment (Marshall et al., 2009). Such a long period between test and retest may have also reduced detected treatment effects. At present, it is generally unclear whether the small effects associated with comprehensive programs are due to providing services to men without pedohebephilic interests, or potentially due to the treatment approach itself or a diluted effect from combining specialized interventions.
The present findings have implications for the recent and ongoing debate regarding the changeability of men’s sexual arousal toward children (see Cantor, 2018; Fedoroff, 2018). However, an important limitation to the evidence reviewed here is that showing changes on phallometric testing is likely best understood as a change in a person’s ability to monitor and manage their arousal as opposed to representing a shift in sexual interest or orientation per se. Furthermore, establishing that interventions can shift sexual orientation, which would include changes in sexual behavior, emotional and romantic attractions, sexual fantasy, sexual identity, and sexual arousal, is beyond the scope of the reviewed evidence base. While this limitation attenuates the conceptual implications of the findings, it is important to underscore that men who experience pedohebephilic arousal and have committed sexual offenses have, by definition, demonstrated an inability and/or unwillingness to control and manage their sexual arousal toward children, at least during the commission of their offense(s). Improving the ability to control arousal in everyday life is likely an important aspect of managing risk when released into the community; indeed, research suggests that men find behavioral interventions to be helpful in this regard (Milner, 2016).
Limitations and Future Directions
There are several noteworthy limitations to discuss. The most notable is that the use of no-treatment comparison groups was absent from most studies included and randomization was completely absent. This methodological omission results in most studies having an critical risk of bias. Single-group pre–posttreatment designs are limited because the resulting treatment effects have some unknown amount of influence from natural processes and regression to the mean. We made efforts to ameliorate the likelihood that natural processes might completely account for the findings, specifically using natural history benchmarks and posttreatment normative comparisons. However, we could not apply rigorous inclusion criteria to established higher quality natural history benchmarks. This limits the confidence we have in the benchmarking results and that treatment effects were due to intervention, and not natural processes. To address these concerns, research that uses randomized no-treatment or waitlist control group designs is needed to establish the efficacy of interventions for managing pedohebephilic arousal. Such designs, using samples of men who display moderate to high levels of pedohebephilic arousal, are currently feasible and are desperately needed. Given the centrality of pedohebephilic arousal to preventing sexual offending, this is a priority for future research.
A further limitation was that the k for some ESs was small and ranged from 2 to 11 studies. This concern is somewhat offset by limited ES heterogeneity across most analyses (i.e., measures of heterogeneity tended to be small in magnitude or not significant), and a high level of consistency between fixed-effects and random-effects analyses, both of which support the stability of findings and confidence in conclusions. The limited number of studies contributing to some meta-analytic estimates also limited the ability to assess for the influence of publication bias. The presence of publication bias was not found in the meta-analyses where this bias check was conducted; however, multiple meta-analytic effects were not assessed for publication bias, and the influence of this form of bias remains unknown. A further interpretive limitation created by the small number of studies is that most of the equivalence tests likely had low statistical power. The majority of the equivalence tests identified that men with sexual offense histories did not display arousal at posttreatment that was equivalent to nonoffending men. However, these significance tests may have been influenced by low statistical power due to low sample sizes.
One limitation of the current state of psychological interventions for pedohebephilic arousal is that technical innovation has stagnated for decades, with most studies included being published prior to 1995. Innovations in the treatment of pedohebephilic interest and arousal are needed, to keep abreast the growing understanding of the psychophysiology of human sexuality (Janssen, 2007), and possible, given the developing understanding of the influences of learning on sexuality (Hoffmann, 2017). Another reason for technical innovation is that aversive interventions may not be acceptable to some clients and the negative side effects of these interventions are unknown. Encouragingly, recent scholarship has outlined “third-wave” psychotherapeutic approaches that may have value in helping individuals live with paraphilic interests (Walton & Hocken, 2020).
There is a need for the field to examine whether positive treatment change on indices of sexual interest and arousal predict reductions in sexual recidivism. Only two phallometric studies in the present sample of studies examined such associations; however, a larger literature exists examining within treatment change in dynamic risk factors and its association with sexual offending, and a synthesis of this work will be forthcoming. Given the link between higher phallometric test scores and higher rates of sexual recidivism (McPhail et al., 2019), we anticipate that meaningful reductions from pre- to posttreatment in phallometric test scores should predict reductions in sexual recidivism among men with histories of sexual offenses against children. Developing an empirical understanding of the relationship between change in pedohebephilic arousal and sexual recidivism will improve our ability to target interventions and will establish pedohebephilic interests as a psychologically meaningful risk factor (Mann et al., 2010).
Conclusion
At present, there are no meta-analytic reviews of the effectiveness of interventions for managing pedohebephilic arousal. The present meta-analysis represents a unique contribution that aggregates the current understanding of treatment effects and extends that knowledge in novel directions. The present results provide much reason for optimism in terms of helping men convicted of sexual offending manage their sexual arousal toward children. Most behavioral and pharmacological interventions were found to be associated with reductions in pedohebephilic arousal, especially for those men who showed high levels of such arousal. However, there was little evidence that these interventions can increase sexual arousal toward adults and the few comprehensive programs examined had small effects.
The clinical implications are considerable. These findings provide clinicians with evidence and recommendations on which clients to offer specialized services to, what interventions may work with their clients, and the amount of change clients can expect to make over the course of treatment in terms of their capacity to manage pedohebephilic arousal. In short, men and youth are capable of managing pedohebephilic arousal when trained in behavioral techniques to manage that arousal. And in some instances, men may display similar levels of arousal to children as non-pedohebephilic individuals. We do not know the long-term effects of these interventions or their longevity. But in principle, the capacity to control or inhibit arousal is a portable skill that can be pivotal to risk management and the prevention of sexual offending against children. Future research using more rigorous methodologies and technical innovations will further advance the field by establishing the efficacy of treatments, expanding the intervention choices for clinicians to offer their clients, and address the important relationship of within treatment change and sexual recidivism.
Supplemental Material
CJB-19-0164.R2_Online_appendix – Supplemental material for Interventions for Pedohebephilic Arousal in Men Convicted for Sexual Offenses Against Children: A Meta-Analytic Review
Supplemental material, CJB-19-0164.R2_Online_appendix for Interventions for Pedohebephilic Arousal in Men Convicted for Sexual Offenses Against Children: A Meta-Analytic Review by Ian V. Mcphail and Mark E. Olver in Criminal Justice and Behavior
Footnotes
This research was supported by funding from the Social Sciences and Humanities Research Council of Canada, Public Safety Canada, and the Centre for Forensic Behavioural Science and Justice Studies. Thanks to Stephanie Fernane for support in data extraction and coding for this project.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
