Abstract
The emotional Stroop task has been used to assess deviant sexual interests of sexual abusers. Two limitations noted in the literature are difficulties surrounding the choice of word stimuli and the task’s inability to elicit significant differences between offender subtypes thus far. The purpose of this study was to examine differences in emotional Stroop bias between three adult groups using new, empirically derived word stimuli intended to reflect sexual interests more specific to sexual abusers. Significant differences were found between sexual abusers and nonoffending controls for affective and sexual word stimuli. The results further support differential processing biases between sexual offenders and nonoffenders; however, difficulties in differentiating between offender groups are still evident. Implications of these findings are discussed and recommendations for future research are made.
For decades, researchers and clinicians have been using penile plethysmography (PPG) to measure deviant sexual interests. However, there have been questions raised regarding its construct validity and PPG’s ability to accurately measure sexual interest, as opposed to simply measuring sexual arousal (Laws, 2003, 2009; O’Donohue & Letourneau, 1992). It is widely accepted that clinicians and researchers tend to measure the penile response (i.e., arousal) because it is a tangible response to measure. However, it has also been noted that the ways in which they conduct PPG assessments puts them at risk of overlooking affective and cognitive components that are also relevant to sexual behavior (Barlow, 1977; Laws, 2003, 2009; O’Donohue & Letourneau, 1992).
In theory, research has demonstrated that there are differences in the cognitive processes of sexual abusers related to their interests (Abel et al., 1989; Beech et al., 2008; Geer, Estupinan & Manguno-Mire, 2000; Nunes, Firestone, & Baldwin, 2007; Price & Hanson, 2007; Smith & Waterman, 2004; Ward, Hudson, Johnston, & Marshall, 1997). It is also argued that one should therefore expect sexual abusers to possess cognitive and affective associations that are specific to their experiences and interests and that they would exhibit biases toward salient stimuli that are relevant to these experiences and interests (McNally, 1998; Smith, 2009). It is with these ideas in mind, and questions surrounding the construct validity of PPG assessment, that work on the development of information-processing paradigms to measure sexual interest have originated from. For example, researchers have tested the use of the Implicit Association Test (Brown, Gray, & Snowden, 2009; Gray, Brown, MacCulloch, Smith, & Snowden, 2005; Gray, MacCulloch, Smith, Morris, & Snowden, 2003; Nunes et al., 2007), choice reaction time task (Gress, 2008; Mokros, Dombert, Osterheider, Zappalà, & Santilla, 2010), and rapid serial visual presentation (Beech et al., 2008; Flak, 2009; Flak, Beech, & Humphreys, 2009) in being able to assess sexual interest and have found significant differences in responses between offending and nonoffending groups.
The emotional Stroop task is also considered to be an information-processing measure of sexual interest. The emotional Stroop task is a color-word task that presents neutral and emotion-inducing words and/or disorder-related words in various colors. Participants are required to name the ink color of the stimuli presented, and the difference in mean reaction times (RT) between neutral or control words and emotional or experimental words is said to reflect interference in information processing caused by the emotional content of the words (Larsen, Mercer, & Balota, 2006; MacLeod & MacDonald, 2000; Mutter, Naylor, & Patterson, 2005; Wentura, Rothermund, & Bak, 2000). More specifically, the response latencies experienced by an individual following the presentation of salient word stimuli are believed to be a function of the processing of the stimuli presented and a reflection of the strength of representations associated with the words, both cognitively and affectively (Smith, 2009).
Emotional Stroop studies that have been conducted to date examining sexual and aggressive interests of offender populations have shown that sexual offenders display significantly longer response latencies to sexual word stimuli compared to control participants (i.e., adult community sample; Price & Hanson, 2007; undergraduate students; Smith & Waterman, 2004). Similarly, violent offenders display a response bias toward aggression-themed word stimuli over undergraduate control participants (Smith & Waterman, 2003, 2004). These studies provide support for the assumptions that sexual abusers would exhibit biases toward salient stimuli that are relevant to their experiences and interests.
The measurement of sexual interests informs researchers in the treatment of sexual abusers (Beech, Oliver, Fisher, & Beckett, 2006; Marshall, Anderson, & Fernandez, 1999; Yates, Goguen, Nicholaichuk, Williams, & Long, 2000), in risk assessments (Craig, Browne, & Beech, 2008; Hanson, Harris, Scott, & Helmus, 2007; Hanson & Thornton, 2003; Prentky & Righthand, 2003; Wong, Olver, Nicholaichuk, & Gordon, 2004), and as a predictor of sexual recidivism as measured by PPG (Hanson & Bussière, 1998). In practice, sexual interests also aid researchers in differentiating between persistent sexual abusers and lower-risk offenders and nonoffenders (Thornton & Laws, 2009). Given the importance attributed to sexual interests in the assessment and treatment of sexual abusers, further development of an information-processing tool that could measure the implicit nature of this factor would be a valuable contribution to theory and practice in work with sexual abusers.
Smith (2009) has noted that the offense-related emotional Stroop tasks that have been tested with offender samples have used word stimuli that are general in nature, or do not test for specific deviance, and are thus unable to differentiate between offender subgroups. Smith (2009) and Price and Hanson (2007) suggest that if the word stimuli were altered to reflect more specific deviant sexual interests, then they could be used to assess sexual interests relevant to sexual abusers. Given the inability thus far of the emotional Stroop task to be able to differentiate between different offender groups (i.e., between sexual offenders and violent offenders or between pedophiles and rapists), an amendment to the word stimuli could be an appropriate solution to this dilemma. It could be that the results yielded thus far on the emotional Stroop task with sexual abuser samples are measuring sexual preoccupation (i.e., “sex drive”; Thornton, 2002; “an obsession with sex so that sex is an unusually salient activity”; Craig & Beech, 2009, p. 98) or a preference for sexual material more generally rather than specific elements of sexual interest (i.e., ideas and beliefs regarding what an individual finds attractive, sexually stimulating, or significant for sexual arousal to occur; Price & Hanson, 2007, p. 203).
This study is an examination of Stroop interference effects experienced by sexual abusers, violent offenders, and nonoffending controls to offense-related word stimuli. This study makes use of the word stimulus set from Smith and Waterman’s (2004) study and a new stimulus set derived to reflect sexual interests that are specific to sexual abusers to investigate differences between the groups. In addition, attention is paid to potential participant characteristics that have been known to have an effect on Stroop results (i.e., age, levels of executive function, and level of vocabulary) and to changes in the administration of the Stroop task (i.e., button-press responses instead of voice-activated responses; see Price, Beech, Mitchell, & Humphreys, 2012) to examine whether they produce similar response patterns across the groups.
Method
Participants
Twenty-seven men convicted of sexual offences took part in the study. Of these, there were five exhibitionists, four incest offenders, nine pedophiles, seven rapists, and two mixed sexual abusers (i.e., had offended against both adult and child victims). Eight of the sexual abusers were on community supervision and recruited through probation; 19 sexual abusers and 21 violent offenders were recruited through poster recruitment and by research representatives from separate wings at a U.K. prison. All of the offending controls had been convicted of either manslaughter or murder. An additional 38 nonoffending control participants took part in the study and were recruited through a university research scheme whereby psychology students complete 10 hr of research credits per year to fulfill degree requirements.
Apparatus/Materials
Information concerning all participants’ age, presence of a learning disability, handedness, and whether they had an existing criminal record was collected. Information of all offenders’ index offense, prior offenses, victim age, victim gender, and relationship to the offender was collected to aid in the subcategorization of offender groups.
The British Picture Vocabulary Scale (BPVS)
The BPVS (Dunn, Dunn, Whetton, & Burley, 1997) was used to confirm that participants had an appropriate level of understanding of the words that were presented in the emotional Stroop task and to ensure that the effects of reading skill deficits were minimized. For example, since few interference effects are experienced by individuals lacking basic reading skills (i.e., participants do not become distracted by word meaning; Schiller, 1966), it was of interest to test for level of vocabulary to ensure that all participants were able to experience interference in information processing that is caused by processing of word information. Therefore, the BPVS was used as a measure of vocabulary understanding in anticipation of the level of learning difficulties that may be prevalent in the sample.
Hayling and Brixton tests of executive function
The Hayling Sentence Completion test (Burgess & Shallice, 1997) is a measure of response initiation and response suppression. It consists of two sets of 15 sentences each missing the last word. In the first section, the examiner reads each sentence aloud, and the participant has to simply complete the sentences, yielding a simple measure of response initiation speed (measured using a stopwatch). The second part of the Hayling requires participants to complete a sentence with a nonsense ending word (and suppress a sensible one), giving measures of response suppression ability and thinking time. Average interrater reliabilities of up to 96.0% have been found for final scoring of the Hayling test (Bielak, Mansueti, Strauss, & Dixon, 2006).
The Brixton test (Burgess & Shallice, 1997) is a visuospatial sequencing task that measures the ability to detect rules in sequences of stimuli. It takes between 5 and 10 min to administer and yields an easily understood scaled score between 1 and 10. The Brixton consists of a 56-page stimulus booklet, each page displaying two rows of five circles numbered between 1 and 10. One circle on each page is filled in with blue color. It is the participant’s task to identify where the blue dot would be on the following page each time, on the basis of a pattern or rule governed in the previous page. The outcome measure of this task is the total number of errors across 55 trials (Burgess & Shallice, 1997). These tasks were used as a measure of executive function and analyzed as potential confounding variables of Stroop interference.
Beliefs About Children Scale (BACS)
Participants completed a shortened version of the BACS (Beckett, 1987). Scoring items on the BACS yields two subscale scores: Cognitive Distortions (CD) and Emotional Congruence With Children (EC). Both subscales are scored through 15 items, where the CD scale is intended to measure beliefs about children and their sexuality; the EC subscale is intended to measure the understanding of what an individual believes to be the thoughts, feelings, and interests of children, where higher scores indicate higher levels of self-reported emotional congruence with children (Harkins, Flak, Beech, & Woodhams, 2012). The CD subscale has yielded a test-retest reliability score of .77 (Beech, 1998) and has been reported by Thornton (as cited in Beech, 1998) to have high internal reliability (α = .90). The test-retest reliability of the EC scale has been reported as .63 (Beech, 1998).
The emotional Stroop task
Computerized versions of the emotional Stroop task were presented randomly on a Toshiba laptop on a 12-in. × 9-in. screen. Word stimuli were presented using Version 2.0 of E-Prime software (Psychology Software Tools, Inc.). The color-identification response latencies for each trial were detected and recorded by a five-button serial response box (Psychology Software Tools, Inc.; Model 200A) with four task-specific colored buttons identified (green, red, blue, and white). Button-press response recordings (as opposed to voice activated) were used because of the noisy environments the research was often carried out in (i.e., prisons, group treatment settings) and to maintain consistency across the groups. Two emotional Stroop word stimulus sets were used (see appendix): (a) Smith and Waterman (2004) word stimulus set and (b) a set of newly derived word stimuli that were intended to reflect deviant sexual interests of sexual abusers.
Word Development
Price and Hanson (2007) derived a new set of word stimuli to be used to measure deviant sexual interests. This was necessary as it was assumed that the sexual words presented in Smith and Waterman’s (2004) study were of a general sexual nature and consequently not refined enough to distinguish between subtypes of sexual abusers. In testing their set of newly derived stimuli, Price and Hanson found no significant differences between the word types. The authors proposed that the stimuli they had developed may not have reflected the motives, thoughts, or feelings experienced by sexual offenders because the words were not empirically derived. Therefore, the aim of empirically deriving a new set of word stimuli was to divert away from general sexual terms used in previous research and to ensure that the words used as stimuli were a true reflection of deviant sexual interest preferences of sexual abusers.
Two groups of workers from local charities working with sexual abusers (n = 18) and two groups of sexual abusers (n = 27) aided in the development of a new word stimulus set for the purpose of this study. The newly derived word stimuli were created from the responses of these groups to Question 6a of the Relapse Prevention Questionnaire (Beckett, Fisher, Mann, & Thornton, 1997): “How would you describe who would be most at risk from you?” It was thought that deriving the word stimuli from offenders’ accounts of what they find sexually stimulating or attractive would result in a purer reflection of deviant sexual interests. Input was requested of workers because of their experience and daily interactions with sexual abusers.
After word stimuli were generated from these four sources, they were arranged in alphabetical order and given to five PhD students to be categorized into separate categories: emotional-personality descriptors (EPD), sexual actions (SA), and physical descriptors (PD). Responses to this categorization task were then reviewed, and words for which three or fewer raters agreed on the word category were removed. Control words were derived from the MRC Psycholinguistic Database, and all words were matched for word frequency, word length, and word type (i.e., adjectives, verbs). The final version of the newly derived word stimulus set can be seen in the appendix.
Analysis
Emotional Stroop bias scores were calculated by subtracting the mean RT of neutral or matched words from the mean scores of target words (i.e., words with emotional content). This resulted in five Stroop bias scores to be compared across the groups for the Smith and Waterman (2004) word stimulus set (i.e., positive, negative, color, aggression, and sexual) and three Stroop bias scores for the newly derived set of stimuli (i.e., EPD, SA, and PD). For the new word set, overall differences between the experimental words and matched words were also explored, resulting in an additional emotional Stroop bias score for the experimental words (EXP). Positive bias scores would indicate that participants were slower to color-name words with emotional content (i.e., interference in information processing was caused by the added emotional content of the experimental words; Larsen et al., 2006; MacLeod & MacDonald, 2000; Mutter et al., 2005; Wentura et al., 2000).
RT and Stroop bias scores were explored for outliers using histograms and boxplots. When outliers were identified, the participant’s full set of data was removed from the data set, if the removal from the data set could be justified. After outliers were removed, if the data still did not meet the assumptions of parametric testing, attempts were made to transform the mean RTs and Stroop bias scores using the log and inverse functions (Miller, 1991, as cited in Gress & Laws, 2009) to correct problems with the distribution of the data across word categories. Difficulties with the assumptions of parametric tests are not uncommon when working with RT data. The data often tend to be skewed (Gress & Laws, 2009), but extreme values may be representative of the research question being examined (Miller, 1991). For example, if we expect that sexual abusers will take longer to process deviant sexual information because it is more salient to that particular group, one would not want to treat extreme values in the sexual word category for this group as outliers.
All available data, including variables such as age, BACS subscales, tests of executive function, vocabulary, number of errors made on the Stroop tasks, mean Stroop RTs, and emotional Stroop bias scores, were explored to examine whether the assumptions of parametric testing were met. After the exploration of the data was complete, all mean RTs and emotional Stroop bias scores were entered into separate one-way ANOVAs (or the nonparametric equivalent, the Kruska-Wallis test, if the data did not meet the assumptions of parametric testing) with participant group as the between-groups variable. Post hoc analyses were conducted using a Bonferroni correction to control for the Type I error rate (or Mann-Whitney with a Bonferroni correction). However, Gabriel’s post hoc procedure was also conducted because the sample sizes were unequal and this test has greater power (Field, 2005).
Because all of the groups completed the same Stroop tasks, it was possible to run a two-way mixed ANOVA with type of emotional Stroop bias score as the repeated-measures variable and participant group as the between-groups variable. Interaction effects were examined to determine whether the type of Stroop bias and participant group have a combined effect on the resulting Stroop bias scores.
In addition to testing for significant differences in Stroop effects between the groups, it was of interest to test whether covariates were present that might also have an influence on the results and to control for these variables within the analysis. Therefore, analysis of covariance (ANCOVA) was carried out on the dependent variables (i.e., mean RTs and Stroop bias scores) with the participant characteristic variables that have been shown to influence Stroop results (i.e., age, level of executive function, and level of vocabulary) set as covariates. ANCOVAs were also conducted on the BACS subscales of CD and EC. It is important to note that although ANCOVAs were run for each dependent variable, when the data violated the assumptions of parametric testing and nonparametric measures were necessary, the results of the ANCOVAs were interpreted with caution because there is no available nonparametric equivalent of ANCOVA (see Price, 2011, for full description of analysis).
Procedures
Participants were provided with an information sheet that outlined the procedures of the experiment and were then asked to sign consent forms agreeing to take part in the study. Participants were required to learn color mappings of the response buttons prior to the presentation of target words by completing four practice blocks (25 words per block) of neutral word stimuli to memorize the color response mappings on the serial response box. Participants then completed one of the two possible emotional Stroop tasks. All Stroop word stimuli were presented randomly, and prior to each word, a fixation x appeared at the center of the screen for 500 ms. Participants were asked to press the button corresponding to the color of ink in which the word was presented and to ignore the semantic meaning of the word. No opportunity to correct mistakes was provided because when a response was provided, the next word stimulus was prompted. Following the first emotional Stroop task, participants completed the BPVS. The second emotional Stroop task was then completed by participants, followed by the Hayling and Brixton tests of executive function, and finally, a shortened version of the BACS questionnaire. The emotional Stroop tasks were counterbalanced to reduce the likelihood of order effects. The BACS was always completed at the end of the session because of the sensitive nature of the questions and to avoid priming effects that would be possible if the questionnaire were to be filled out prior to viewing the sexual word content involved in the emotional Stroop tasks.
Results
Two participant outliers, with respect to the mean RTs and Stroop bias scores, were identified from the sample of nonoffending controls. These participants were removed from the database as they yielded extremely high scores on the BACS subscales and RTs that were two standard deviations above the mean RTs for all of word categories. Therefore, the results from these two participants did not appear to be representative of the nonoffending control subgroup. Rather their results seemed to indicate a hypersensitivity to the emotional Stroop task. Data from one nonoffending control participant were lost for the newly derived word stimulus set.
Significant differences between the groups resulted for participant age, H(2) = 60.50, p < .001; the Hayling test scores, H(2) = 9.51, p < .05; Brixton test scores, H(2) = 13.50, p < .05; and number of errors made on the Smith and Waterman (2004) stimulus set, H(2) = 13.15, p < .05 (see Table 1).
Mean Age, Scores on Study Measures, and Number of Stroop Errors by Participant Group
Note. Standard deviations shown in parentheses. BACS = Beliefs About Children Scale (Beckett, 1987); CD = Cognitive Distortions subscale; EC = Emotional Congruence With Children subscale; BPVS = British Picture Vocabulary Scale (Dunn, Dunn, Whetton, & Burley, 1997).
Post hoc analysis showed that the sexual abusers were significantly older than the offending control participants (U = 115, r = .47) and nonoffending controls (U = 3, r = .85). Sexual abusers scored significantly lower than the nonoffending controls on the Hayling test of executive function (U = 266.5, r = .37) and the Brixton test of executive function (U = 196.5, r = .46). Despite the significant differences in the tests of executive function, all participants scored within the average range or better. There were no significant differences evident between groups for scores on the subscales of the BPVS, meaning that their level of vocabulary was adequate across all groups. Finally, the sexual abusers made significantly fewer mistakes on the Smith and Waterman (2004) emotional Stroop task than did the nonoffending controls (U = 289, r = .35).
Smith and Waterman Stimulus Set
Results from the one-way ANOVAs, with participant group as the within-subjects factor and separate word categories as the between-groups factors, revealed significant differences between the groups for all mean RTs: neutral, F(2, 81) = 18.12, p < .001; positive, F(2, 81) = 25.85, p < .001; negative, F(2, 81) = 27.82, p < .001; color, F(2, 81) = 19.15, p < .001; aggression, F(2, 81) = 20.15, p < .001; and sexual, F(2, 81) = 26.00, p < .001. Nonparametric testing confirmed these significant differences for neutral, H(2) = 25.39, p < .001; positive, H(2) = 36.19, p < .001; negative, H(2) = 33.67, p < .001; color, H(2) = 26.47, p < .001; and sexual, H(2) = 33.25, p < .001, word categories. Multiple comparisons for the mean RTs using a Bonferroni correction show that the sexual abusers took significantly longer to respond to the aggressive word stimuli than the nonoffending controls (p < .05). Post hoc analysis on the nonparametric data revealed that the sexual abusers took significantly longer to respond to negative word stimuli than offending controls (U = 171, r = .34) and nonoffending controls (U = 94, r = .69). Additionally, the sexual abusers took significantly longer to respond than nonoffending controls to neutral (U = 149, r = .59), positive (U = 89, r = .69), color (U = 133, r = .62), and sexual (U = 94, r = .69) word stimuli. Table 2 displays the mean RTs and emotional Stroop bias scores for each adult group from this stimulus set. Although medians are typically reported when nonparametric testing is conducted, the means have been reported in this case to remain consistent with the reporting in the literature and to display the magnitude of variability in the data.
Mean Reaction Time (RT) and Emotional Stroop Bias Scores in Milliseconds for the Smith and Waterman (2004) Word Stimulus Set
Note. Standard deviations shown in parentheses.
Although the groups displayed significant differences in mean RTs, differences in Stroop bias scores demonstrate the word categories that are most salient for each of the groups. Significant differences between the groups were yielded for negative Stroop, F(2, 81) = 6.22, p < .05, ω = .48; positive Stroop, H(2) = 7.12, p < .05; and sexual Stroop, H(2) = 7.92, p < .05, bias scores. Multiple comparisons revealed that the sexual abusers experienced greater negative Stroop interference than did the offending controls and nonoffending controls (p < .05). In addition, sexual abusers displayed significantly larger Stroop bias scores than did the nonoffending controls for the positive (U = 304, r = .32) and sexual (U = 238, r = .32) Stroop bias scores.
Figure 1 displays the means of the main effects of the two-way mixed ANOVA that was run with type of Stroop bias as the within-subjects variable and participant group as the between-subjects variable. The effect of type of Stroop bias was nonsignificant, F(3.41, 275.94) = 1.56, ns, as was the effect of the interaction, F(6.81, 275.94) = 1.44, ns. However, the test of between-subjects effects yielded a significant main effect of participant group, F(2, 81) = 4.51, p < .05, indicating that the participant groups elicit different Stroop bias scores regardless of the specific type of Stroop bias. Bonferroni-corrected post hoc comparisons show that Stroop bias scores between adult sexual abusers and adult offending controls did not significantly differ (p = .412); however, Stroop bias scores did significantly differ between the adult sexual abusers and nonoffending adult controls (p < .025).

Means for the Main Effects of Type of Stroop Bias and Participant Group: Smith and Waterman Stimuli
ANCOVA revealed that the covariate, BACS cognitive distortions, was significantly related to aggression Stroop, F(1, 77) = 5.05, r = .25. The effect of participant group on aggression Stroop bias scores remained nonsignificant. BACS (CD) also displayed a significant relationship with positive Stroop bias scores, F(1, 77) = 4.01, p < .05, r = .22, and the significant effect of participant group on positive Stroop bias scores remained significant after controlling for the effect of BACS (CD), F(2, 81) = 3.35, p < .05. Finally, the results from the ANCOVA from this word stimulus set did not yield any significant relationships between the Stroop bias effects and the covariates age, BACS (EC), Hayling and Brixton tests of executive function, and BPVS raw scores.
Newly Derived Stroop Stimulus Set
Analysis of the mean RTs yielded significant differences between the groups for the following word categories: EPD, H(2) = 31.61, p < .05; matched/control EPD, H(2) = 24.39, p < .05; SA, H(2) = 24.17, p < .05; matched/control SA, H(2) = 25.47, p < .05; PD, H(2) = 28.45, p < .05; matched/control PD, H(2) = 31.45, p < .05, EXP, H(2) = 30.33, p < .05; and matched/control EXP, H(2) = 29.49, p < .05. Table 3 provides the mean RTs and Stroop bias scores per group.
Mean Response Time (RT) and Emotional Stroop Bias Scores in Milliseconds for the Newly Derived Word Stimulus Set
Note. Standard deviations shown in parentheses. EPD = emotional-personality descriptors; SA = sexual actions; PD = physical descriptors; M represents matched categories.
Post hoc analysis revealed that the sexual abusers had slower mean RTs on the EPD and matched/control PD word categories than the offending controls (U = 172, r = .33; U = 174, r = .33) and nonoffending controls (U = 109, r = .66; U = 149, r = .58). Additionally, sexual abusers had significantly slower mean RTs than the nonoffending controls for the SA (U = 156, r = .57), MSA (U = 152, r = .58), PD (U = 119.5, r = .64), EXP (U = 117, r = .64), and matched/control EXP (U = 114, r = .64) mean RTs.
Analysis of the emotional Stroop bias scores revealed a significant difference between the groups for the EPD Stroop bias score, H(2) = 7.60, p < .05, with the sexual abusers exhibiting significantly larger EPD Stroop effects than the nonoffending controls (U = 310, r = .29). There were no other significant effects for the emotional Stroop bias scores. The two-way mixed ANOVA that was run with type of Stroop bias (i.e., EXP, EPD, SA, and PD) as the within-subjects variable and participant group as the between-subjects variable did not reveal any significant main effects.
ANCOVA revealed that the covariate age was significantly related to the EXP Stroop bias score, F(1, 77) = 4.193, p < .05, r = .22; however, the effect of participant group on EXP Stroop bias score remained nonsignificant when the effect of age was controlled for. Finally, analysis of covariance showed that the covariates BACS (CD), BACS (EC), Hayling and Brixton tests of executive function, and BPVS raw scores were not significantly related to the Stroop bias effects from the newly derived word stimulus set.
Discussion
This study examined differences in Stroop response patterns to two stimulus sets containing sexual, aggressive, and sexual interest–themed words. Generally, the sexual abusers took significantly longer to respond to the task than the two comparison groups. Additionally, significant differences for the positive Stroop, sexual Stroop, and Stroop bias for EPD of potential victims were evident between sexual abusers and nonoffending control participants.
The disorganized nature of the data makes it difficult to interpret the results of the Stroop task. Details regarding the violations to the assumptions of parametric testing were provided to display the unsystematic features of RT data, and to aid in the interpretation of the analyses. However it is not unusual to have skewed data, extreme values or numerous violations to the assumptions of parametric testing when dealing with RT data (Gress & Laws, 2009; Miller, 1991). These difficulties with the data are expected because the results are dependent on the salience of the stimuli to different groups of participants. Additionally, extreme values are often anticipated due to the research question (Gress & Laws, 2009).
Regardless of the difficulties in the data, significant differences were evident between the groups on age and level of executive functioning. However, when the participant characteristics were taken into account in the analysis, they did not appear to have a significant effect on the outcome of Stroop results. This finding demonstrates that although significant differences in participant characteristics were evident between the groups, the differences did not appear to be contributing to the significant differences in Stroop interference evident between the groups.
The results of the subscales from the self-report BACS questionnaire did not reflect the anticipated results. One would expect that the sexual abusers would display the highest scores on the scale. Instead, the nonoffending control participants scored similarly to the groups on CD and highest on EC. Oddly, the covariate BACS (CD) displayed a relationship with aggression and positive Stroop bias scores. The results of the ANCOVAs should be interpreted with caution because of the nonparametric nature of the data. Moreover, when the influence of BACS (CD) was controlled for, the significant results from the between-groups analysis for Stroop bias scores remained the same despite the influence of the covariate. Had the results of these measures worked as they were intended, the relationships of the subscales to Stroop bias effects would have been considered more authentic.
Discussion of Stroop Findings
The emotional Stroop results from this study are consistent with previous findings from emotional Stroop task studies using offender samples. The sexual abusers in this study took significantly longer to color-name all word stimuli than the nonoffending control participants, a finding that has also been demonstrated by sexual abuser samples in Price and Hanson’s (2007) and Smith and Waterman’s (2004) studies. Similar patterns in Stroop bias effects were also similar to previous findings in that the sexual abusers from this study yielded significantly larger sexual Stroop bias scores than nonoffending controls, and the offender controls (violent offenders) experienced the greatest Stroop interference for aggression-themed word stimuli than the other groups; however, this finding was not significant. Therefore, the offender groups continue to display interference effects toward stimuli that are relevant to their offending behavior using the Smith and Waterman word stimuli, despite the change in Stroop methodology (i.e., button-press recording of responses vs. voice-activated response recordings).
Noteworthy is that the Smith and Waterman (2004) word stimuli were still not able to differentiate between offender types for the offense-related word stimuli. The results from the two-way ANOVA confirmed overall that significant differences were evident between the sexual abusers and nonoffending controls on Stroop interference. No significant interaction effect was evident for the Smith and Waterman stimulus set between type of Stroop bias and participant group, suggesting that the groups are responding in a similar way across the different word categories. This finding could support concerns regarding the nature of the word stimuli and the need to develop word stimuli that are more specific to the interests of sexual abusers.
Interestingly, the control participants in this study generated negative Stroop bias scores for the emotional word categories (i.e., positive, negative, aggression, and sexual), meaning that they took longer to respond to neutral word stimuli than to words with emotional content. This result would indicate that the target words from the Smith and Waterman (2004) stimulus set did not produce interference effects for the nonoffending control participants in this study. Importantly, the nonoffending controls did experience Stroop interference for the incongruent color-word stimuli, confirming that they respond in a manner that is consistent with responses we would expect on the basis of literature from the traditional color-word Stroop task (Stroop, 1935).
What is notable about the patterns of Stroop interference between the groups is that both offender groups experienced greater Stroop interference effects across each of the word categories compared to control participants. This finding, in combination with the negative Stroop bias scores elicited by the nonoffending controls, would suggest that there are clear differences in the way that offender groups process affective information compared to nonoffenders.
Interestingly, the sexual abusers in this study yielded greater Stroop interference for negative word stimuli than for sexual word stimuli. Mood questionnaires were not administered to participants in this study; therefore one could only propose that their bias toward negative word stimuli is attributable to the negative stigma often attributed to their offending behaviors or to mood factors elicited by their surrounding (i.e., prison setting).
For the newly derived word stimulus set, the sexual abusers were again significantly slower to respond in mean RTs to all word categories than were nonoffending controls. Although there was only one significant difference between sexual abusers and nonoffending controls on Stroop bias scores for this word stimulus set, the patterns in Stroop bias effects between participants across the word categories are interesting.
For example, the sexual abusers displayed a negative Stroop bias for the PD word stimuli, whereas the other two groups experienced interference effects for these types of words. On the basis of results from the first Stroop task, whereby sexual abusers displayed interference effects for all of the word categories containing emotional content, one could infer that PDs are not salient or central in the idea of sexual interest for sexual abusers.
Furthermore, the sexual abusers experienced the greatest amount of Stroop interference for the EPD word category in this stimulus set. In comparison, these word stimuli did not have as much of an effect for offending controls and did not have an effect at all for nonoffending controls. If we reflect on the question that these word stimuli were derived from (i.e., “How would you describe who would be most at risk from you?”), it could be that the EPDs of potential victims are the more likely elements of sexual interest for sexual abusers.
Finally, the offending control participants displayed negative bias scores for the SA words, whereas the sexual abusers and nonoffending controls responded similarly to these words. Of note is the difference in the nature of the SA words in the new stimulus set compared to the nature of the sexual words from the Smith and Waterman (2004) words. For example, the new words contain less violent connotations and are relatively common sexual associations. Therefore, it is interesting that without the aggressive undertones in the word category, the violent offenders are not responding to these word stimuli.
Limitations and Directions for Future Research
Although the methodologies used in this study adhered to suggestions made to control for participant characteristics and experimental variables, this study does suffer some limitations. First, the heterogeneity of the sexual abuser group could have had an impact on the significance of the findings. Not only did the sexual abuser sample vary in offense type, but the length of time that had elapsed since having committed their crimes and the severity of their crimes differed. Some of the sample was on community supervision, whereas others had been in prison for more than a decade. Despite efforts made to recruit a more homogeneous sample, difficulties with recruitment resulted in a more mixed sample than intended. Additional analysis on the subgroups of this sample would not have elicited enough power to be able to generalize the results for these subgroups; therefore this analysis was not conducted. However, this limitation leaves room for future research to be conducted that concentrates on expanding on the group sizes to obtain more confident results that one would be able to generalize to more specific subgroups of sexual abusers.
The heterogeneity of the group also raises questions about whether risk level and time spent in prison would have an effect on Stroop results. For example, would significant differences have been evident between subgroups of sexual offenders scoring in different risk categories (i.e., low, medium, high)? Information on risk level and amount of time spent in prison, or amount of time that had elapsed since the occurrence of the offense, was collected when possible. However, since criminal records and risk-level information were not accessible to the researcher, analysis of these variables was not possible, and conclusions could not be made regarding these questions raised. Future research efforts should examine whether emotional Stroop biases differ between groups when these factors are considered in the analysis.
Another possible limitation is in relation to the newly derived stimulus set. Although the words were empirically derived, the sample that these words were derived from was also heterogeneous in nature (subgroups of sexual abusers and workers). Therefore, it is possible that the words derived were not specific to a particular type of sexual abuser. It could then be valuable to develop word stimuli from the responses of subgroups of sexual abusers and compare them separately among subgroups of sexual abusers.
Although age was not considered to be a covariate, a nonoffending control group of a similar age group to the offenders would have been desired and perhaps a more appropriate comparison group for the study. Additionally, had time allowed, mood questionnaires could have been advantageous to use.
Conclusions
The overall aims of this study were to examine differences in information processing of sexual material between groups of adult offenders and nonoffenders and to determine whether the emotional Stroop task is a reliable tool to be used in the assessment of deviant sexual interest. This study examined the differences in mean RTs and emotional Stroop bias scores between adult sexual abusers, adult offending controls, and adult nonoffending controls for the Smith and Waterman (2004) and newly derived stimulus sets. The study found that overall sexual abusers took significantly longer than nonoffending controls to respond to the emotional Stroop task and that the offenders in general were slow in completing the task. Adult sexual abusers displayed a response bias to the sexual word stimuli and to the emotional-personality words that were specified to their sexual interests. These results provide further evidence that the emotional Stroop task could be a useful tool to use with sexual abusers to measure sexual interest, particularly when work on the development of appropriate word stimuli is a focus of the research.
Previous studies have found that sexual abusers display a processing bias toward sexual word stimuli (Price & Hanson, 2007; Smith & Waterman, 2004). It has been suggested that the word stimuli previously used to assess the sexual interests of sexual abusers on this task are of a general sexual nature and that more should be derived that would reflect the sexual interests of sexual abusers more exclusively (Price & Hanson, 2007; Smith, 2009). The results from this study provide further evidence to support this view, and this has important implications for the assessment of sexual interest.
Therefore, the findings in this study provide a contribution to our understanding of the cognitive processes of sexual abusers and how they process emotional and sexual information. The emotional Stroop task is an information-processing task that measures the strength of representations associated with the words on a cognitive and affective level (Smith, 2009); therefore, the emotional Stroop task may have practical applications in work with sexual abusers.
For example, differences in information processing were present between the sexual abusers and other groups. Not only did the adult sexual abusers demonstrate a processing bias for the sexual stimuli using the Smith and Waterman (2004) word stimulus set, but they displayed significant differences in response bias to the word stimuli (i.e., EPD) that were specified to reflect the sexual interests of sexual abusers. Although the results were not significant between the sexual abusers and other offenders, the differential patterns in response bias across the word categories for the adult groups were interesting. More importantly, the clear differences in response patterns between the two word stimulus sets across the groups support the notion that the Stroop task may be able to measure more individualized sexual interest.
