Abstract
This study investigated the ability of two viewing time (VT) methodologies and a pictorial Stroop task to differentiate sexual interest in a group of nonoffending heterosexual and homosexual men. The two VT methodologies were investigated given the lack of consistency among published studies supporting this method. The results indicated that the VT methodology in which participants are asked to browse through some images is slightly superior to the method in which participants are asked to rate their attractiveness to the images. The results also indicated that the Pictorial Stroop Task did not adequately differentiate between age categories of sexual interest within the two groups. The results have implications for the methodology used in VT tasks.
Singer (1984) presents a model of the process of sexual arousal, in which he identifies three independent, yet generally sequential, components of the erotic response. The first stage, “the aesthetic response,” he proposes, is an emotional reaction to spotting an attractive face or figure. This emotional response produces an increase in attention toward the object of attraction. The second stage, “the approach response,” progresses from the first and involves a physical approach toward the object of attraction. The third stage, “the genital response,” rests on the premise that with both closer proximity and increased attention, physiological reactions will result in genital tumescence.
A number of methodologies have been used to indirectly assess sexual interest, based on the different stages of Singer’s (1984) model. These methodologies can be divided into two camps; those that rely on physiological measures of sexual arousal (e.g., penile plethysmography [PPG]; Freund, 1975) and response latency measures reflecting information processing (Banse, Schmidt, & Clarbour, 2010). One such response latency method is the viewing time (VT) assessment measure. This method covertly measures relative VTs on a range of visual stimuli. It is likely that it is influenced by Singer’s (1984) first stage of sexual response, the aesthetic stage, in which an attractive object receives increased attention from the attracted individual. This method is based on the assumption that a person will look longer at stimuli they consider attractive than they would look at stimuli they deem as less attractive or neutral stimuli. This was first observed by Rosenzweig (1942). Later Zamansky (1956) found that homosexual males looked longer at male nudes than female nudes, whereas the reverse was true for heterosexual males. A number of different explanations have been put forward to explain this phenomenon. For example, Freund (1990), drawing on evolutionary theory, postulates that VT may be a measure of sexual interest as it reflects the initial stage of courtship, locating and evaluating an appropriate partner.
The VT method has been shown to differentiate between groups based on their sexual interest (Abel et al., 2004; Abel, Huffman, Warberg, & Holland, 1998; Abel, Jordan, Hand, Holland, & Phipps, 2001; Banse et al., 2010; Harris, Rice, Quinsey, & Chaplin, 1996; Worling, 2006) and correlates well with self-reported ratings of sexual attraction (Glasgow, Osborne, & Croxen, 2003; Harris et al., 1996; Landolt, Lalumière, & Quinsey, 1995; Quinsey, Ketsetzis, Earls, & Karamanoukian, 1996) and phallometry (Quinsey et al., 1996). However, the method is not without its critics. Smith and Fischer (1999) were unable to demonstrate discriminant validity in a sample of adolescent sex offenders and adolescent nonoffenders using the VT aspect of the AAIP (Abel Assessment for Interest in Paraphilias). In addition, Fischer and Smith (1999) found only weak support, in terms of the measure’s psychometric properties, for its use with adults.
Information Processing Tasks
An alternative attention-based method of assessing sexual interest would be to utilize the impact increased attention has on information processing tasks (IPTs). These methods are based on the premise that the discrimination of sexual interest is a product of impaired decision making when a sexually attractive stimulus is present, compared with when a sexually unattractive or neutral stimulus is present.
The premise on which these measures are based originates from theories regarding the effects of emotion on attention and cognitive load. The cognitive resource allocation model (Kahneman, 1973) states that the more interested a person is in a stimulus, the more cognitive resources they will devote to viewing that stimulus. This leaves fewer resources available for responding to a secondary task (e.g., in a Stroop task, naming the color of a word on a screen). Thus, a person would be expected to respond faster to a secondary task while viewing a stimulus in which they have less interest. Sexual content-induced delay (Geer & Bellard, 1996; Geer & Melton, 1997) is proposed to occur when a preferred sexual stimulus occupies attentional processes, which then interfere with attentional resources available for other tasks, causing a delay in task processing. The amount of attention used by a sexual stimulus is related to the emotional saliency of the stimulus. Stimuli that have the potential of eliciting more arousal attract more attention compared with less arousing or neutral stimuli.
A number of these attention-based tasks have been developed in recent years and applied to the area of sexual interest. The choice reaction time (CRT) has been shown to differentiate groups in terms of sexual interest in nonforensic populations (Wright & Adams, 1994, 1999) and forensic populations (Mokros, Dombert, Osterheider, Zappalà, & Santtila, 2010). However, Gress (2007) found the CRT differentiated between reaction times for female stimuli in a group of adult sex offenders and juvenile non-sex offenders but not in a sample of university students. Giotakos (2006) found a combination of viewing reaction time and an incidental learning task could serve as an unobtrusive measure of males’ sexual interest, particularly that of extrafamilial child molesters. Beech et al. (2008) used a rapid serial visual presentation task and found an interference effect for child images in a group of child sex offenders. However, recently the task has failed to significantly differentiate between juvenile sex offenders and non-sex offenders (Crooks, Rostill-Brookes, Beech, & Bickley, 2009). Snowden, Wichter, and Gray (2008) compared the implicit association task (IAT) and the priming task (PT) in a group of heterosexual and homosexual men and found that both measures were good predictors of sexual orientation, although the IAT (area under a receiver operating characteristic curve [AUC] = 0.97) performed better than the PT (AUC = 0.86).
The IPT used here is the pictorial-modified Stroop task (Ó Ciardha & Gormley, 2009). The traditional modified Stroop task requires the color of a word to be correctly identified and responded to, while ignoring the semantic content of the word. In a modified Stroop, when the target word is highly emotional to the participant, the latency in responding to the color naming task is generally longer than when a neutral word is presented. The modified Stroop task has been used to investigate a wide variety of research questions, including, the responses of cocaine addicts toward images related to cocaine use (Hester, Dixon, & Garavan, 2006). The task has also been applied to a variety of emotional disorders and demonstrated attentional bias, for example, to threatening information (Mathews & MacLeod, 1994; Williams, Mathews, & MacLeod, 1996) and trauma-related stimuli in posttraumatic stress disorder (PTSD) patients (Cisler et al., 2011). The Stroop task has also been applied to the forensic domain. For example, the interference of aggression stimuli has been investigated (Smith & Waterman, 2003, 2005). More specifically, an interference effect for words related to sexual offending was found among sex offenders and violent offenders (Price & Hanson, 2007s; Smith & Waterman, 2004). In the present task, pictures are used rather than words as it is expected that images would show greater ecological validity than mere lexical representations. Images have been found to produce a larger interference effect than words (Hester et al., 2006; Stormark & Torkildsen, 2004). It is also suggested that images differentiate between the age categories more clearly than words (Ó Ciardha & Gormley, 2011). Ó Ciardha and Gormley (2011) found that the measure proved effective in discriminating preferred sexual interest both in a group of nonoffenders and a group of child sex offenders.
Gaither (2001) provided less support for the VT or IPTs in the assessment of sexual interest. This study assessed the ability of PPG, VT (while subjectively rating images), CRT, and “reaction time to a secondary task” (RTST) to measure sexual interest. The secondary task used in this study involved pressing a keypad when a tone is presented. Results indicated that the experimental measures not only have poor predictive validity and not highly correlated with one another but also were not highly correlated with self-reported sexual arousal or with penile responses, although the latter two measures were highly positively correlated with one another.
Although there have been a number of different methods developed for measuring VT, there has been no standardized methodology used, thus making comparisons across studies difficult. Some inconsistencies found across studies involve both the methodology used and the latency that is taken as the measure of VT or dependent variable. Abel et al. (1998) and Gray and Plaud (2005) had participants view the slides twice, first to familiarize themselves with the slides and a second time to rate their sexual arousal to each slide on a 7-point Likert-type scale. The time taken to rate each slide was taken as the VT. Gress (2005) presented each image 10 times asking a different question, pertaining to sexual interest, with a similar Likert-type rating for each presentation of the image. Harris et al. (1996) and Quinsey et al. (1996) provided the slides in a “single stratified random order” and asked participants to pay close attention to the slides, as they would be asked questions later. Participants then reviewed the same slides again and on the second viewing, rated how sexually attractive they found the person in the image, using a 10-point Likert-type scale. The inconsistency lies in which VT is taken as the dependent variable, the time taken by participants to browse through the images or the time taken to rate the attractiveness of the image. It is suggested that the differences in methodologies may result in different cognitive processes, particularly given that one task involves participants appraising images, including those of children, in terms of sexual attraction while another tasks merely asks participants to browse through images. Thus, one measure may be more akin to sexual interest than the other.
The Present Study
The aim of the present study was to scrutinize two different methodologies used within the VT paradigm and an IPT, the pictorial-modified Stroop task, and to subsequently investigate their ability to identify the sexual orientation of nonoffending men. Although sexual orientation is not seen as akin to deviant sexual interest and different cognitive processes may apply in the two groups, the use of a nonoffending sample is important in this context as it allows for the methodologies to be tested on a population who would be expected to be less likely to be dishonest in relation to their sexual interest compared with a forensic population. The sample allows for a comparison across adult gender categories and age categories where a clear preference would be expected. This study attempts to test a pictorial-modified Stroop task and two VT methodologies; one in which participants are asked to simply browse through stimuli while their VT to each stimulus is covertly measured, and one in which participants will be asked to rate their sexual attraction to the images as they browse them. It is expected that the method in which the dependent measure of VT is taken when participants browse the images without a subjective rating (Harris et al., 1996), will emerge as the most robust measure of assessing sexual interest, as it is presumed that their cognitive load would be quite low in this task, thus producing less noise and freeing up more resources for sexual interest to emerge.
In line with previous research in the area (Freund, McKnight, Langevin, & Cibiri, 1972), it is hypothesized that longer reaction times (VTs for the VT tasks and color naming for the Stroop task) will emerge for the adult category of interest than all other categories and significant differences will emerge between the adult category of interest and the opposite gender category and the child category of interest. In addition, it is hypothesized that the heterosexual participants will show longer reaction times to adult female images than homosexual participants and the opposite pattern of results will emerge for the adult male images.
Method
Participants
Thirty-five nonoffending males participated in the study, 11 homosexual males and 24 heterosexual males. Participants were classified according to their sexual orientation based on their self-reported primary attraction to one gender. The mean age for the homosexual men was 28.64 years (SD = 9.29) and for the heterosexual men was 26.58 years (SD = 9.93). There was no significant difference found in participant age across the two groups. Ninety-one percentage of the participants were currently in or had completed third-level education. Participants were recruited from around Dublin City through recruitment posters and Internet forums. Participants were paid 10 euro or offered course credits for their participation. The data from one of the heterosexual participants on the pictorial-modified Stroop task was ignored as he reported green–blue color-blindness.
Stimuli/Apparatus
A questionnaire was used in which participants were asked if they were primarily attracted to males, females, or both. Participants’ self-reported sexual orientation was not confirmed through any additional measures. Participants were also asked to state whether they were color-blind, their age, and educational achievement.
The images used for the pictorial-modified Stroop task and both VT tasks were of males and females from five age groups ranging from young child to adult. All the images consisted of a single frontal view of an individual in a bathing suit. Thirty-three of the images were taken from the Not Real People (NRP) image set by Laws and Gress (2004). The NRP image set comprises of computer-modified images in which images are developed by compiling and morphing three or more images plus additional modifications such as hair, eye, and body color, simple pose modifications and clothing. Each image has been digitally clothed in bathing suits and corresponds to a Tanner secondary sexual characteristic category (Tanner, 1973). The Tanner (1973) stages reflect the five pubertal stages of primary and secondary sexual development (e.g., size of breasts, genitalia, and development of pubic hair) rather than chronological age. Tanner Stage 1 reflects prepubescence, Stage 2 reflects early pubescence, Stage 3 reflects intermediate pubescence, Stage 4 reflects late pubescence, and Stage 5 reflects full sexual maturity. Given the limited number of adult stimuli from the NRP set, six further adult male and nine adult female images were used for the three tasks. These were developed using the same process as used for the NRP image set for use in a previous presentation of the Stroop task (Ó Ciardha & Gormley, 2009). For each sexual interest task, 6 images were used for each of the Tanner 1, 2, 3 and 4 stages (3 for each gender category), and 12 images were used for the adult male and adult female categories (Tanner Stage 5). All 48 images were used for each of the three sexual interest tasks. The experiment was administered on an iMac desktop computer running Mac OS X and SuperLab software. All the tasks were presented using SuperLab software.
Procedure
A mixed factorial design was used in this study. All participants completed eight computer tasks in total and a short questionnaire. Four IATs and a traditional Stroop task as a measure of executive function were administered along with the three sexual interest tasks which are discussed in this article. In the interests of brevity, the IAT and the traditional Stroop results are not reported here. Participants initially completed a short demographic questionnaire outlining their age, educational achievement, color-blindness, and sexual preference. They then completed all eight computerized tasks, the order of which was pseudorandomized across participants, such that the Stroop task and the two VT tasks were never presented consecutively, and they were always separated by the four IATs. In addition, Viewing Time Method 1 (VT1) always immediately preceded Viewing Time Method 2 (VT2), and the traditional Stroop task always preceded the Pictorial Stroop Task. The VT order was maintained to prevent participants from becoming aware that VT1 was a measure of sexual interest. It should be noted that as VT2 always comes after VT1, it could be susceptible to habituation and/or order effects. Potential habituation effects are minimized by the fact that, although the same images are used across both procedures, they are only used once in each version. Thus, given that there are 48 trials in each task, it is unlikely that a participant could habituate to an individual picture, although habituation to a stimulus category remains possible. Each participant was tested individually. Each VT measure took approximately 8 min, and the Pictorial Stroop Task took approximately 10 min. Along with the additional cognitive tasks, the study took approximately 45 to 50 min to complete. After informed consent was given, the participant was seated in front of a computer monitor. A female researcher remained in the room during the experiment but did not observe the participants when they were carrying out the tasks. The same researcher tested all participants.
Viewing Time Method 1
Participants were informed that their task was to browse through images, and they would be asked “some simple questions about the images” when they had completed the task. Participants were presented with 4 practice trials and 48 test trials. Each image was presented once in the test section of the task. The order of test trials was randomized across participants. A cue in the form of a fixation cross was presented for 500 ms prior to each trial. Each trial consisted of a morphed image, presented size 14 cm × 18 cm, in the centre of the screen. They were instructed to move from one image to the next by pressing the “n” key on the keyboard. Participants were instructed to browse through the images at their own pace. Following completion of the task, the participants were told that they would not be asked questions regarding the images but instead asked to rate their attractiveness consistent with the VT2 procedure. The dependent variable in this instance was the time between the stimulus onset and the participant’s response.
Viewing Time Method 2
Participants were presented with 48 test trials using the same stimuli as VT1. A cue in the form of a fixation cross was presented for 500 ms prior to each trial. Each trial consisted of one morphed image accompanied by the question “How sexually attractive do you find the individual in the picture on a scale of 1 (extremely sexually unattractive) to 7 (extremely sexually attractive).” This was presented in large font above each image. A range of Likert-type scale answers (1-7) were presented to the right of each image. Participants were instructed to view each slide and reply to the question by using the numbered section on the left-hand side of the keyboard, corresponding to their answer. After the participant responded, the slide was removed and the next slide presented. Each slide was accompanied by an identical question and identical Likert-type responses. The presentation order of the images was randomized across participants. The dependent variable in this instance was the time between the stimulus onset and the participant’s response (i.e., the time taken for the participant to rate the sexual attractiveness of the stimulus).
Pictorial-Modified Stroop Task
This task followed the procedure of Ó Ciardha and Gormley (2009). In each section, images were presented in four colors: red, yellow, green, or blue. The participants’ task was to respond by pressing a button on the keyboard corresponding to the correct color of the image. This task used the morphed images over which a colored filter was placed. Stimuli were grouped according to age and gender and included a control category of large cats. There were five blocks: adult male, adult female, child male, child female, and large cats. Although a neutral category of cats was used in the design, it was not used in the analysis. This category was originally included as it was expected the scores would be analyzed in relation to this neutral category. However, as it was not deemed appropriate to include a neutral category in the second VT measure, this neutral category was not included in the Stroop analysis to maintain consistency in the type of analysis carried out across all the three measures. Block order was randomized across participants, and trial order was randomized within blocks. No feedback was given regarding correct responses throughout the test. The reaction time was calculated, by the computer, as the time from the image onset to the participants’ response (pressing the colored response key). The dependent variable in this instance was the time it took participants to name the color of the filter, which was superimposed on the image.
Results
For the purposes of this study, two different types of analyses were carried out on the data. First, group analyses were carried out to determine the measures’ abilities to differentiate between the groups and stimulus categories based on participants’ sexual orientation. Second, individual analyses were carried out to determine the ability of the measures to classify participants’ sexual interest based on their individual scores.
To facilitate the first analysis type, it was important to overcome the difficulties of nonnormality inherent in reaction time data. The power of F statistics can be considerably reduced when the data are skewed and contain outliers (Wilcox, 1998), which are characteristics of reaction time data (Whelan, 2008). Each individual’s mean response time to a stimulus category was calculated in relation to their overall reaction time and their reaction time variance, that is, the block mean response time was subtracted from the overall trial response time and divided by the overall variance (overall mean and standard deviation excluded the preferred adult gender category). Outliers three times the interquartile range were removed per participant prior to analysis. This transformation is similar to that applied in PPG research, and it allows for individual differences in response sets (Earls, Quinsey, & Castonguay, 1987). Transforming the data in this way, consistently addressed the problem of nonnormality.
In line with the hypothesis that there would be longer reaction times for the adult age categories of interest for each orientation group, the following group analyses were carried out for each measure:
A three-way interaction was hypothesized to emerge between the sexual orientation of the participant, age of stimulus, and gender of stimulus.
Further two-way age-by-gender interactions were hypothesized to emerge and in the opposite direction for the heterosexual and homosexual groups.
Differences between the adult male and adult female categories for both the heterosexual and homosexual group were also expected. Previous research has shown, using phallometric assessment, gender differences for the adolescent and child categories (Freund & Costell, 1970). However, there is a lack of replication of this type of research with nonoffending men, particularly in relation to attention-based methods. Thus, given the dearth of research in the area, these differences were assessed for all three attention-based measures although no specific pattern of results was expected.
It was hypothesized that there would be a difference found in reaction times to the adult and child category of interest, and this age difference would not emerge for the opposite gender category.
Finally, analysis involved the assessment of between-group differences for the adult and adolescent categories. This was carried out to show the measures’ ability to discriminate between groups in terms of their sexual orientation.
Group Analyses
Group analysis involved conducting a 2 × 3 × 2 factorial ANOVA on the mean reaction time (VT for each VT measure and time taken to name the color of the filter for the Stroop task) for each stimulus category across each of the three sexual interest measures. The between-group factor was the sexual orientation of the participant (heterosexual and homosexual). The within-group factors were the gender of the stimulus (male and female) and the age of the stimulus (child, adolescent, and adult). The between-group, age, and gender effects were analyzed for each method irrespective of whether significant interactions emerged as it allowed for consistent analysis across the three measures. Even if no interaction was found, it would be of interest which categories the measures discriminated between. Bonferroni corrections were applied to decrease the likelihood of type I error. The adjusted alpha level was thus set to .008. Tables 1 and 2 present the results of the post hoc analyses for each method. Table 3 presents the means and standard deviations for each stimulus category.
Hypotheses and Effect Sizes of Three-Way Interaction and Between-Group Differences for Each Sexual Interest Method
Note: ns = nonsignificant result; b/g = between-group differences; ηp2 = partial eta square; d = Cohen’s d effect size; t = t-test value; T4 = Tanner 4; VT1 = Viewing Time Method 1; VT2 = Viewing Time Method 2.
Result found is contrary to the hypothesis.
Hypotheses and Effect Sizes of Two-Way Interactions and Within-Group Differences for Each Sexual Interest Method
Note: ns = nonsignificant result; T4 = Tanner 4; t = t-test value. ηp2 = partial eta square; d = Cohen’s d effect size; VT1 = Viewing Time Method 1; VT2 = Viewing Time Method 2.
Result found is contrary to the hypothesis.
Differences found in sexual orientation consistent direction.
Age differences represent the difference between the adult and the child category.
Mean and Standard Deviation of Reaction Times per Stimulus Category
Note: Standard Deviation is presented in parentheses after the mean. T4 female = Tanner 4 female; T4 male = Tanner 4 male; ; VT1 = Viewing Time Method 1; VT2 = Viewing Time Method 2.
Viewing Time Method 1
A significant three-way interaction emerged in participants’ VTs between stimulus age, stimulus gender, and group sexual orientation for VT1. Further analysis identified a significant age by gender interaction for both the heterosexual group and the homosexual group as depicted is Figure 1a and Figure 1b, respectively.

Mean reaction time z score of each stimulus category for Viewing Time Method 1
For the heterosexual participants, the impact of gender was only significant at the adult level of age, and the impact of age was only significant for the female images. For the homosexual participants, the impact of gender was significant at the adult and adolescent level of age with no significant gender for the child images, whereas the impact of age was significant for the male images. All of these effects were found in the expected directions. Between-group differences indicated that the heterosexual men viewed adult female images significantly longer than did homosexual participants. In addition, the homosexual men viewed adult male images significantly longer than did the heterosexual men. Homosexual men also had significantly longer VTs to the adolescent male images than did heterosexual participants.
Viewing Time Method 2
A significant three-way interaction was found between stimulus age, stimulus gender, and group sexual orientation for VT2. A significant age by gender interaction was found for the heterosexual group as depicted in Figure 2a. There was no significant age by gender interaction found for the homosexual group.

Mean reaction time z score of each stimulus category for Viewing Time Method 2
For the heterosexual participants, the impact of gender was significant at the adult and adolescent level of age whereas the impact of age was significant for both male and female images. For the homosexual participants, there was no significant impact of gender at any age category, whereas the impact of age was significant for the male images. The homosexual group viewed adult male images significantly longer than did the heterosexual group. It was also found that heterosexual men viewed the adolescent female images significantly longer than the homosexual participants.
Analysis of the raw mean indicated there was no significant difference between the overall reaction times for VT1 and VT2.
Pictorial-Modified Stroop Task
A significant three-way interaction emerged in participants’ reaction times between stimulus age, stimulus gender, and group sexual orientation for the Pictorial Stroop Task. Further analysis identified a significant age by gender interaction for the heterosexual group as depicted in Figure 3a. There was no significant age by gender interaction for the homosexual group.

Mean reaction time z score of each stimulus category for Pictorial-Modified Stroop Task
As depicted in Figure 3, for the heterosexual participants, the impact of gender was only significant at the adult level of age, with no age effects. For the homosexual group, the impact of gender was significant at the adult and child level but not significant at the adolescent level of age, with a significant age effect found for the male images. The two groups differed significantly in their reaction time to adult female images and adult male images in the expected direction.
Mean Attractiveness Ratings
Analysis of mean attractiveness indicated that, as expected, there were significant differences found between the two group’s self-reported attraction to the adult female images, t(12.77) = 6.34, p < .001, and adult male images, t(32.27) = −10.03, p < .001. There were also significant differences found between the heterosexual and homosexual groups’ self-reported sexual attraction to adolescent female images, t(33) = 2.04, p = .05, and adolescent male images, t(33) = −3.79, p < .001. These differences were found in the expected direction. Interestingly, however, a significant difference was found between the two groups’ rated sexual attraction to their preferred adult gender, t(33) = 2.20, p < .05. Heterosexual participants reported more sexual attraction to adult female images (M = 5.39, SD = 0.69) than homosexual participants reported to adult male images (M = 4.89, SD = 0.47). There were no significant differences found in the attractiveness ratings for the preferred adolescent category.
Analysis of the differences in reaction times between the two sets of adult images indicate no difference in reaction time between the NRP set and the Ó Ciardha and Gormley (2009) set for Pictorial Stroop Task or VT2. However, participants had longer reaction times to the Ó Ciardha and Gormley set (M = 3,599, SD = 4,409) than the NRP adult set (M = 2,976, SD = 3,459) for VT1; t(34) = 3.10, p = .004. In addition, participants rated the Ó Ciardha and Gormley set (M = 4.30, SD = 0.83) as more attractive than the NRP set (M = 1.76, SD = 0.83); t(34) = 12.87, p < .001.
Individual Analysis
Individual analysis involved investigating the correlations between the participants’ response time to each image and their self-reported rating of sexual attraction to the image on a 7-point Likert-type scale and the ability of each measure to correctly classify participants in a manner consistent with their reported orientation. Firstly, a simple method of classification involved calculating the difference between reaction times to the preferred adult category and the opposite adult category. As this method was seen as minimal (i.e., even a small difference of 1 m s between the adult male and adult female reaction times would result in a correct classification), further analysis involved using a specific cutoff for correct classification. This required the participant’s response time for their preferred category to fall beyond 0.5 times the participant’s standard deviation from their overall mean. The AUC analysis was also carried out on the difference scores. All analyses were carried out on the response time data with outliers beyond three times the interquartile range removed. Results across the three empirical measures are presented in Table 4.
Percentages of Correct Classification, ROC Analysis, and Correlations Between Reaction Time and Self-Report for Each Sexual Interest Measure
Note: Diff score = Difference between RT to adult category of interest and other adult gender category must be positive to be deemed a correct classification; 0.5 SD cutoff = RTs to adult category of interest must be at least 0.5 greater than overall SD to be deemed a correct classification; Het = heterosexual participants; Hom = homosexual participants. Sensitivity involves the homosexual orientation as the condition that was correctly identified. Specificity involves the heterosexual orientation as the condition that was correctly identified. AUC refers to the area under the receiver operating characteristic (ROC) curve.
Viewing Time Method 1
VT1 correctly classified 28 of the 35 participants using the difference score classification method and 25 participants using the threshold of 0.5 times the standard deviation distance from the overall mean. The method also showed an excellent AUC and good levels correlations with self-report.
Viewing Time Method 2
Using the difference score 25 of the 35 participants were correctly classified. This dropped to 22 when using a cutoff value. The method resulted in a fair AUC and a high number of correlations with self-report.
Pictorial-Modified Stroop Task
Thirty participants were correctly classified using the difference. However, this dropped to eight when using the threshold of 0.5 times the standard deviation away from the overall mean. Poor correlations were found with self-report. However, the method resulted in an excellent AUC.
Discussion
The results indicate that VT1 produced the results most consistent with expectations based on the idea that sexual interest in a stimulus influences attentional mechanisms. This emerged both in terms of stimulus category differentiation and individual classification.
VT2 produced less consistent results. The results for the heterosexual group were generally in line with the hypotheses, with the exception of the male age effects. An age by gender interaction failed to emerge for the homosexual group and the group was poorly classified, particularly using a cutoff criteria. Similar inconsistencies in results were also found by Smith and Fischer (1999) using a similar methodology. The measure did, however, result in a high number of correlations with self-reported sexual attraction. It is suggested that some of these inconsistent results, particularly for the homosexual participants, may have resulted from methodological limitations, including the attractiveness of the stimuli used, and the order of presentation of both VT tasks, which will be outlined later.
An interesting result found for both VT methodologies was the size and similarity of the effect sizes for the age effects in both groups. This consistency may indicate that age effect is the most stable effect when investigating the ability of these measures in assessing sexual interest, whereas other factors, such as a participant’s comparison of self with other adult males, may affect when investigating adult gender differences. This is a significant observation given the importance of these measures in assessing age effects when applied to the forensic setting.
The pictorial-modified Stroop task also resulted in some ambiguous patterns. Again, an age by gender interaction failed to emerge for the homosexual group and the heterosexual group failed to show an age effect for the female stimuli. However, the measure showed excellent classification ability in terms of AUC analysis and classification based on a difference score. The classification ability dropped considerably when a cutoff criterion was used. The inconsistency of these results may be due to the inclusion of child images in the analyses. The AUC results and between-group differences are based on adult images only and indicate that the method shows good discriminatory ability when looking at sexual orientation alone. However, when other younger age categories are included, in terms of classification using a cutoff criterion or the age by gender interaction, this ability reduced significantly. This would appear to indicate that additional cognitive processes may be present when responding to younger aged stimuli. Thus, the measure may be good at identifying gender preference but poor at identifying age preference. The results relating to the adult gender categories are similar to those of Ó Ciardha and Gormley (2011).
Overall, VT1 emerged as the only measure to consistently show adequate group effects, correlations with self-report and individual classification. It was found to predict group membership well (AUC = 0.92) and falls within a similar range to other attention-based measures, although with lower levels of specificity than previous studies (e.g., Snowden, Wichter, & Gray, 2008). However, a number of methodological issues were identified and results should be interpreted in light of these. First, the heterosexual group rated the adult female images as more sexually attractive than the homosexual group rated the attractiveness of the adult male images. Although this may have affected participant responses in terms of adult orientation, it is less likely to have had an effect on the age effects given the finding that for both VT tasks, very similar effect sizes were found for age effects for both orientation groups. Furthermore, a mixture of two sets of adult stimuli was used. Although there was no difference in reaction times to the two sets for VT2 and the Stroop task, a significant difference was found in the rating of attractiveness and in reaction times for VT1 across the two sets of stimuli. While this may have affected the results found, it was only seen to have an impact on VT1, which resulted in the most consistent results.
The order of presentation of the tasks was another concern. VT1 always preceded VT2. This order was used to ensure that participants were not aware that VT1 was a measure of sexual interest to minimize any social desirability effects. Nonetheless, it remains possible that habituation or orders effect could have negated the potential for VT2 to differentiate sexual interest in a manner similar to VT1. However, the two measures did not differ from each other in terms of overall reaction time, which may suggest that the demands of both tasks were similar. In addition, as each stimulus was presented only once for VT1 and once for VT2, it was expected that this would minimize habituation effects for VT2. Habituation may also have affected responses on the Stroop task, despite the precaution of randomizing the presentation of the VT tasks and the Stroop task and separating their presentation with the four additional IAT tasks. Furthermore, as the study involved a number of other cognitive tasks, this may have lead to fatigue effects on the part of the participants. Nonetheless, given the randomization of presentation of the two VT tasks and the Stroop task, these confounds would have had a similar effect on VT1 which, despite the potential noise caused by these effects, still resulted in clear and consistent findings. Given that some strong effects were still found across measures, this would suggest that systematic variation strong enough to produce effects was still in evidence
The pictorial Stroop task used a category of large cats as a control condition. Given the unusual nature of this category, this may have affected participants responding to the task. The category was included as the procedure followed that of Ó Ciardha and Gormley (2009), in which the category was used as a baseline measure. However, it was not deemed appropriate to include a neutral category for the VT measures, particularly VT2, as asking participants to rate their sexual attraction to large cats would have been seen as unusual by the participants. As it was deemed important to analyze the measures in a consistent manner, the large cats category was not included in the final analysis.
A small sample size was used the current study, and results should be interpreted in light of this. In addition, sexual orientation is different in nature to sexual deviancy and different processes of sexual interest may apply to sexual deviancy than to gender-based sexual orientation. Nonetheless, this sample was chosen as it was expected that participants would be unlikely to lie and more likely to show a clear sexual preference, thus reducing the amount of noise and allowing for a clearer comparison of the methodologies. Volunteer bias may also have influenced results. Research has indicated that volunteers for sexuality research may differ significantly from nonvolunteers. Volunteers for sexuality research have been found to have higher levels of sensation seeking and lower levels of social conformity (Bogaert, 1996), have lower levels of sexual guilt (Strassberg & Lowe, 1995), and have a wider range of sexual experiences (Plaud, Gaither, Hegstad, Rowan, & Devitt, 1999) than nonvolunteers. In addition, research indicates that a minority of men report some sexual interest in children. For example, Ahlers et al. (2011) reported pedophilic sexual arousal patterns of 9.5% and 3.8% in sexual fantasies and real-life sociosexual development, respectively, in a community sample of men. As with all sexuality research, results should be interpreted in light of this.
Furthermore, the participants in the present study indicated a clear sexual attraction to one gender. However, research on sex offending has shown that a large proportion of male child sex offenders do not show an exclusive sexual interest in children and often show an equal or larger sexual interest in adult females (Barsetti, Earls, Lalumière, & Bélanger, 1998; Lang, Black, Frenzel, & Checkley, 1988). In addition, this group of participants would not be expected to deny their sexual interest; however, the transparent nature of the VT measure may be less applicable in a forensic setting where dissimulation may be high. Thus, an IPT, in which the object of the task is less transparent, may be more suitable. Finally, a female researcher carried out the study, and although the researcher did not observe the participants during the tasks, future studies may benefit from matching the researcher to the nonpreferred gender of the participant.
The adolescent categories caused some difficulty when interpreting the results. Both VT tasks showed between- and within-group effects for the adolescent images. The adolescent category incorporated images of boys and girls from Tanner Stage 4 (average 13.1 years; Tanner, 1973). However, in the present study, the Tanner 4 categories included only 3 stimuli for each gender, thus reducing the variability of the category. Furthermore, the inconsistency of results from VT2 may be due to difficulties in assigning the stimuli to their appropriate categories of biological maturity. This is currently a controversial topic given recent debate regarding the inclusion of hebephilia into the upcoming DSM-V. Hebephilia is proposed to denote the erotic preference for pubescent children (roughly 11 or 12-14; Blanchard et al., 2009). Despite this debate however, empirical research on the sexual interest of nonoffending men to pubescent males and females is considerably lacking. Although there are studies showing heterosexual male preference for younger women and homosexual preference for younger men, these studies tend to only include stimuli from age 18 upward (e.g., Silverthorne & Quinsey, 2000). A study on the erotic preference of the nondeviant male indicates that the attraction profile of “normal” heterosexual men could be rank ordered with adult females having the highest ranking, followed by adolescent females, both categories showing a positive erotic appeal (Freund & Costell, 1970). In addition, Mokros et al. (2011) found male participants showed a preference for adolescent and adult females when self-reporting sexual attraction to NRP stimuli. More recently, Lykins et al. (2010) found that gynephillic men show higher genital arousal to adolescent females than to a neutral category. Perhaps these patterns could also be applied to homosexual men. Despite the lack of empirical evidence this pattern of attraction appears to be something accepted clinically. For example, Howes states,
Though there is virtually no published research (and certainly nothing current) identifying the extent to which “normal” non-offending adult males are sexually aroused by minor pubescent females, based on their experience many phallometric testing facilities no longer use young teenage girls as stimuli. (R. Howes, personal communication, August 12, 2010)
However, given the limitations identified herein, these results in relation to adolescent images should be interpreted with extreme caution.
The difference in results found between VT methods indicates the importance of standardization of these measures. Although there are a number of studies published indicating the efficacy of VT as a measure of sexual interest, there are large differences between the methodologies used. The present results indicate that the two VT procedures result in different patterns of results, which may indicate that they are tapping into separate cognitive constructs, albeit both having some relation to sexual interest. This is not surprising given the difference in the cognitive tasks involved in both measures. In VT1, participants are asked to browse through stimuli of varying ages, whereas in VT2, participants are asked to rate their attraction to these images, including those of children. This is likely to result in different cognitive and information processes. However, this suggestion should be interpreted with caution given the limitation of design outlined above. Thus, future research should investigate this further. A standardization of the procedure used is important, as it allows for direct comparison across studies, and also ensures that the most robust methodology is used in clinical settings.
The present study also high lights the effect of different methods of classification and the importance of using a number of different methods to determine the measure’s ability to assess sexual interest. As the results of the Stroop indicated, classification based on AUC analysis or difference scores alone, indicated excellent classification ability. However, the ANOVA analysis illustrated the effect of child images, making its applicability to sexual interest questionable. This would suggest that when assessing the validity of a measure, any method of classification should include an additional discriminatory component. Thus, rather than just looking at the differences between gender categories, the classification could take into account reaction times to all the age and gender categories.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received partial funding from the ‘Trinity College Dublin Postgraduate Research Studentship.’
