Abstract
This article analyzes the efficacy of the randomized response technique (RRT) in achieving honest self-reporting about sexual behavior, compared with traditional survey techniques. A complex survey was conducted of 1,246 university students in Spain, who were asked sensitive quantitative questions about their sexual behavior, either via the RRT (n = 754) or by direct questioning (DQ) (n = 492). The RRT estimates of the number of times that the students were unable to restrain their inappropriate sexual behavior were significantly higher than the DQ estimates, among both male and female students. The results obtained suggest that the RRT method elicits higher values of self-stigmatizing reports of sexual experiences by increasing privacy in the data collection process. The RRT is shown to be a useful method for investigating sexual behavior.
Introduction
Sexual behavior is difficult to study empirically because of its sensitive nature. The prevalence and frequency of sexual behavior are difficult to estimate by standard survey techniques because respondents tend to withhold information in such settings. Social desirability bias (the wish to make a favorable impression) poses a significant threat to the validity of self-reports, particularly when they concern sensitive information related to sexual activities (Tan & Grace, 2008).
Since the 1960s, a variety of questioning methods have been devised to ensure respondents’ anonymity and to reduce the incidence of evasive answers and the over/underreporting of socially undesirable acts. These methods, generally known as indirect questioning techniques (IQT), obey the principle that no direct question is posed to survey participants. Therefore, there is no need for respondents to openly reveal whether they have actually engaged in activities or present attitudes that are socially sensitive.
The randomized response technique (RRT) is one of the more commonly used methods of indirect questioning. Since Warner’s pioneering work on indirect questioning in 1965, the RRT has maintained a prominent position in this field of research. Fundamentally, the RRT employs (at least in its original formulation) a physical randomization device (decks of cards, colored numbered balls, dice, coins, spinners, random number generators, etc.) which determines whether respondents should answer the sensitive question or another, neutral question or even provide a pre-specified response (e.g., “yes”) irrespective of how they would answer concerning the sensitive behavior.
The rationale of the RRT is that the respondents are less inhibited when the confidentiality of their responses is guaranteed. This goal is achieved because all responses are given according to the outcome of the randomization procedure, which is unknown to the researcher, and hence, respondents’ privacy is preserved. Although the individual information provided by the respondents in accordance with the RRT cannot be used to discover their true status regarding the sensitive issues, the data compiled from all the survey participants can be profitably employed to draw inferences on parameters of interest regarding the study population, after making certain transformations in the original variables (see, for example, Arcos, Rueda, & Singh, 2015).
In recent years, IQT have become popular as an effective means of eliciting more honest responses to sensitive questions. The RRT was first developed by Warner (1965) to allow researchers to obtain sensitive information while guaranteeing privacy to respondents. This method encourages greater cooperation from respondents and reduces their motivation to report falsely. The most important claim made for RRT is that it yields more valid point estimates of sensitive behavior. Compared with standard direct questioning (DQ), areas of sensitive behavior are more frequently reported if questions are posed via the RRT (Williams & Suen, 1994). Warner’s study generated a large body of research literature on alternative RRTs for eliciting sensitive information (Arnab, 2002, 2004; Bouza, 2009; Clark & Desharnais, 1998; Diana & Perri, 2010, 2012; Santiago, Bouza, & Al-Omari, 2016; Singh & Sedory, 2011; Ulrich, Schröter, Striegel, & Simon, 2012) in studies of socially undesirable behavior. The potential generalization of the technique to other cultures has been demonstrated by studies related to China (Geng, Gao, Ruan, Yu, & Zhou, 2016), Iran (Vakilian, Mousavi, & Keramat, 2014), the United States (Walsh & Braithwaite, 2008), Europe (Perri, Pelle, & Stranges, 2015), and Spain (Cobo, Rueda, & López-Torrecillas, 2017), all of which observe that surveys based on indirect questioning are commonly used when the questions relate to sensitive issues.
The RRT and its variants have been applied to examine a great variety of subjects, including the use of recreational drugs or of athletic or cognitive performance-enhancing substances (Goodstadt & Gruson, 1975; Kerkvliet, 1994; Shamsipour et al., 2014; Simon, Striegel, Aust, Dietz, & Ulrich, 2006; Striegel, Ulrich, & Simon, 2010); the impact of HIV/AIDS infection in Botswana (Arnab & Singh, 2010); the prevalence of induced abortion in the United States, Mexico, Botswana, Taiwan, and Turkey (Lara, García, Ellertson, Camlin, & Suárez, 2006; Oliveras & Letamo, 2010); and the prevalence of induced abortion and of irregular immigrant status among foreign women in Calabria, Italy (Perri et al., 2015).
With respect to studies of sexual behavior, LaBrie and Earleywine (2000) and Walsh and Braithwaite (2008) used indirect questioning methods to investigate risky sexual activity, and De Jong, Pieters, and Stremersch (2012) compared nonstudent samples from two countries in an analysis of permissive sexual attitudes and risky sexual behavior, using a RRT. These studies highlight the benefits of using the RRT to investigate sensitive issues like sexual behavior, when it might otherwise be very difficult to obtain accurate responses.
Specifically, LaBrie and Earleywine (2000) used indirect questioning to estimate base rates for risky sexual behavior after drinking, and compared their findings with those obtained by conventional methods. The indirect questioning approach revealed significantly higher base rates than a conventional self-report survey with respect to having had sex, having sex without a condom, and having sex without a condom after drinking. Subsequently, Walsh and Braithwaite (2008) examined the same issues of alcohol consumption and sexual behavior, in a study based on a sample of 842 students at a university in the American Midwest, and concluded that IQT produced higher rates of honest self-reporting than traditional survey techniques.
Geng et al. (2016) described the behavioral risk profile of men who have sex with men (MSM) in Beijing, using three RRTs. According to the responses made, and in comparison with statistics obtained by surveys based on anonymous questionnaires or direct interviews, the MSM population in the RRT-based surveys started their first sexual encounter at a younger age, had more male partners, and reported a lower rate of consistent condom use during anal sex with male partners. These results suggest that the RRT might be a useful tool to obtain more honest feedback from respondents on sensitive information such as sexual behavior.
Krebs et al. (2011) hypothesized that the validity of self-reported data on sexual assault might be open to doubt, if victims were reluctant to disclose what had happened to them. In this study, using an anonymous, web-based survey, a sample of undergraduate women were asked, by direct and indirect questioning, about their experiences of physically forced sexual assault. The results obtained via indirect questioning reflected a slightly higher prevalence in this respect than those produced by direct questioning. However, the difference was not statistically significant. These results suggest that either direct questioning yields reasonably valid estimates of the prevalence of sexual assault or that the item count technique does not produce estimates that are any more valid. Miner (2008) used the RRT in a study of 424 men who were imprisoned for child sexual abuse or rape, and reported that the RRT estimates of prior offending (2.20 prior offenses) were significantly higher than officially recorded prior offenses (0.51 prior offenses). The numbers of prior sex offenses were obtained from participants’ prison records at the time of recruitment screening. Miner found preliminary evidence that RRT is a useful method for generating data of a sensitive nature, when official records might be inaccurate.
In the present study, direct and indirect questioning methods are used to explore reports of inappropriate sexual behavior by university students. We assume that the study’s focus on stigmatizing behavior will generate a social desirability bias, and so higher rates will be obtained by RRT than by direct questioning. To our knowledge, such an analysis of social desirability bias in a survey of sexual behavior has not previously been performed in Spain.
Method
To determine whether the RRT elicits better estimates of problematic sexual behavior than methods based on direct questioning, we conducted a survey of university students in this respect, using both approaches. The students were all informed about the aims of the study and provided signed informed consent. Ethical approval was obtained from the Research Ethics Committee of the University of Granada. In the following sections, we report how the sample size was determined, describe the procedures applied, and provide full details of the measures considered. All the analyses performed (whether they yielded significant or non-significant results) are presented.
Participants and Sampling Method
The sample population for this survey was composed of students from two universities in the regions of Andalusia and Murcia (southern Spain). A stratified sample of students enrolled in different faculties were selected such that degree programs and year of degree were represented in proportion to their total numbers of students. For the purposes of this study, a cluster was assumed to be approximately equal to the size of a class, in each of the universities. Clusters were randomly chosen from the university classes, and all members of the class, both female and male, were included in every case.
In our study, we determinate the sample size to estimate the population mean with a coefficient of variation of 0.25. We decided that 500 students would be asked to a respond to a survey based on DQ and another 625, to one based on the RRT. The numbers were greater in the latter case because RRT inherently has less statistical power.
The data collection and the field work were conducted by the FQM365 research group, part of the Andalusian Research Plan. The interviews were carried out during 2015. Students in 16 classes were contacted and randomly assigned to one of the two survey modes: RRT (Subsample 1, n = 754, 60.2% female and 79.3% aged 21 years or younger, given no missing data) and direct response (Subsample 2, n = 492, 57.1% female and 87.4% aged 21 years or younger, given no missing data). The dataset is available on the Open Science Framework platform at https://osf.io/823nw/?view_only=56833a3d255941e783d9d2922fafd5f0.
Procedure and Measures
All questionnaires were administered on paper during the class break time. All students were invited to participate in the study, and provided signed informed consent. The classroom setting facilitated cooperation, no objection to the survey was raised, and no empty questionnaires were returned.
The questionnaire was the same in the two subsamples. It began with some academic questions followed by a set of basic demographic questions and then a sensitive question referring to sexual behavior, taken from the “Sexual Dependency Inventory–Revised” (SDI-R; Carnes & Delmonico, 1996). This screening test is a broad and comprehensive assessment designed to help researchers in the study of sexual addiction. In our study, the sensitive question used was as follows: Over the past 90 days, how many times have you had trouble stopping your sexual behavior when you know it is inappropriate? In Subsample 1 (using the RRT), for the sensitive question, the interviewer explained how the survey was being conducted, and gave an example of its use. The response was randomized using a generalization of the model proposed by Bar-Lev, Bobovitch, and Boukai (2004) for simple random sampling and later extended by Arcos et al. (2015) for use with complex samples.
Instructions were given for the following procedure to be carried out: if the respondent was in Subsample 2 (using DQ), the sensitive question should be answered directly. If the respondent was in Subsample 1 (using the RRT), he or she was asked to install the Baraja Española (“Deck of Cards”) app in their phone. This app (available from Google Play Store, 2015) presents one of 40 cards, divided into four suits, each numbered 1 to 7, or one of three figures, and was used as a randomization mechanism with which to answer the sensitive question. If the card presented was a figure (e.g., a king), the respondent should answer the sensitive question correctly; if the card showed a number, the respondent should answer the sensitive question multiplied by the number shown on the card (e.g., a 5). Figure 1 shows the response procedure for the two subsamples.

Procedure of response for the two samples.
The researcher explained that this technique preserved the students’ anonymity and that the aim was to avoid provoking mistrust. Following this explanation, all students completed the full questionnaire. On the contrary, in Subsample 2, with a questionnaire based on DQ, the survey was not fully completed by all of the respondents: the total nonresponse rate was 13%. In the next section, we present and discuss the nonresponse rates.
Statistical Analysis
In survey sampling, inference is used to estimate the parameters of interest. In the present case, because the card provided by the app is randomly selected, and as we know the probability of a figure and of each number appearing, we can estimate the overall response of the RRT group (but not that of each individual) and compare it to the DQ response. The weights were computed from a stratified clustered random design, modified to adjust for coverage bias. All statistical analyses were performed using the sampling weights. The Horvitz–Thompson estimator (Singh, 2003) was used to estimate the mean values for the direct questions. In randomization response (RR) with the Bar-Lev technique, the unbiased generalized estimator proposed by Arcos et al. (2015) was used to estimate the mean values of the study variable (see Appendix A). In both cases, the values were calibrated (by sex and age) to the population totals. The calibration adjustments applied to the mean estimators enhanced the validity and accuracy of the process. All statistical analyses were performed using R software (version R-3.3.3), with standard packages for estimation in survey sampling (Sampling; Tillé & Matei, 2015) and a specific package for handling RR data obtained from complex surveys (RRTCS; Rueda, Cobo, & Arcos, 2015). Specifically, in this package, the BarLev() function was used. The mean values obtained by DQ and RRT, and those for each of the subgroups (both by DQ and by the RRT), were compared using the method proposed by Wolter and Preisendörfer (2013) (see Appendix B).
Results
Because the samples do not reproduce the gender composition of the population, we have reweighted the sample weights by calibration on gender.
Due to the sensitive nature of the research topic, we expected that a considerable share of respondents would refuse to participate in the survey or would underreport the sensitive behavior. However, the RRT has been found to be easy to understand and trusted to ensure the anonymity of the respondents, so that we are interested in knowing whether the response rates obtained when applying RRT are lower than in DQ.
Table 1 includes the nonresponse rates to the sensitive question for the full sample and for the sample according to gender and age, by direct and by randomized response. The nonresponse rate for the question was significantly lower in the RRT group (p value < .0001). Among the RRT group, the difference between male and female students was also significant (p value < .05). However, in the DQ group, the responses of male and female students presented no statistically significant differences (p value > .05). With respect to age, we also found significant differences in nonresponse rates between the RRT and DQ groups (p value < .0001). In the latter, the nonresponse rate for respondents aged ≤21 years was significantly lower than that of older students (p value < .0001). In the RRT group, there was no such significant difference according to the students’ age (p value > .05).
Nonresponse Rates of Sample.
Note. DQ = direct questioning; RR = randomized response.
The point estimates of the sensitive question and the 95% confidence intervals for each technique (DQ and RR) are summarized in Tables 2 and 3.
Estimations Refer to the Number of Times the Participant had Engaged in Inappropriate Sexual Behavior Over the Past 90 Days.
Note. DQ = direct questioning; RR = randomized response; CI = confidence interval.
The p values DQ Versus RR.
Note. DQ = direct questioning; RR = randomized response.
For the DQ group, the estimated mean number of days involving difficulty controlling inappropriate sexual behavior was 0.23, in contrast to the 1.45 days for the RR group (p value < .0001). Analysis by gender showed that the mean number of days was higher according to RR than by DQ, both for men (p value < .0001) and for women (p value < .0001). Focusing on DQ, the mean number of days was higher for men than for women (p value = .03443). This was also so with RR, but in this case, the difference was not significant (p value = .2473). While RR is arguably less prone to bias than DQ, it is also more susceptible to sampling variability (since the sampling variance must be added the variance due to randomization process, see Formula 2 in Appendix A). When the results are considered by age, the difference between DQ and RR is also statistically significant (see Table 2). Both genders reported a low number of days when questioned directly (0.48 for men and 0.0686 for women), while indirect methods revealed substantially higher mean values in this respect (2.0445 for men and 1.1558 for women).
Discussion
The present study describes a procedure for asking sensitive quantitative questions about sexual behavior, which was applied in a survey conducted among 1,246 students at two Spanish universities, in Granada and Murcia. Respondents were randomly selected to be questioned about their sexual behavior, either directly or by the RR technique. The calibration adjustments applied to these estimators increased their validity and accuracy. The results of the direct survey were then compared with those of the randomized response survey. The mean number of days that subjects reported having difficulty controlling their inappropriate sexual behavior was much higher according to the RRT responses than with the DQ technique. Hence, according to the “more-is-better” assumption (Lensvelt-Mulders, Hox, Heijden, & van der Mass, 2005), the data collection method that provided higher estimates of the sensitive characteristics was considered to be the more valid (although, unfortunately, we do not have real values obtained by a census of inappropriate sexual behavior among university students with which to compare our findings).
The nonresponse rate was significantly lower in the RR than in the DQ group, which corroborates previous research in this field, such as Goodstadt and Gruson (1975), Geng et al. (2016), and Cobo et al. (2017).
Tourangeau and Yan (2007) suggested that misreporting about sensitive topics is common and largely situational, its extent depending on whether the respondent has anything embarrassing to report, and on the design features of the survey. The RRT is designed to decrease social desirability bias and thus obtain more reliable estimates (Arnab & Singh, 2010; De Jong et al., 2012; Geng et al., 2016; Krebs et al., 2011; Lara et al., 2006; Miner, 2008; Oliveras & Letamo, 2010; Perri et al., 2015; Shamsipour et al., 2014; Simon et al., 2006; Striegel et al., 2010). Our results are consistent with those obtained by De Jong et al. (2012), who compared nonstudent samples from two countries on permissive sexual attitudes and risky sexual behavior, using a RRT, and highlighted the advantages of using RRT in sensitive issues like sexual behavior, where it can be very difficult to obtain accurate responses.
In many contexts, negative social attitudes toward sensitive issues such as aggressive sexual behavior can result in false and invalid data being provided in self-reported surveys. In this respect, De Jong et al. (2012) and Walsh and Braithwaite (2008) indicated that men’s self-reported sexually aggressive behavior should be interpreted with care, as it is highly subject to social desirability bias and so tends to be underreported. Other researchers, too, have raised concerns about the distortions that can be introduced into standardized measures by the effects of social desirability bias (Schlachter & Rolf, 2017; Tourangeau & Yan, 2007).
Limitations
The present study has certain limitations that should be acknowledged. The sample population was drawn from university students in one specific country, and the results obtained may not be generalizable to other social or cultural contexts. Therefore, further research should be undertaken, seeking to replicate our results in alternative settings. Another limitation of the present study is the size of the sample: We believe it was sufficient for the DQ estimation but possibly not for precise analysis with the RRT.
Our study aim was to estimate the mean values of the variable in question (responses to a sensitive question) and to determine whether they were affected by social desirability bias in the population, or in any of its subgroups. However, we did not address other, more complex statistical analyses, such as regression, which could also be applied to the RRT (see, for example, Blair, Imai, & Zhou, 2015).
Conclusion
Our findings support the view that the RRT enables researchers to obtain sensitive information while guaranteeing privacy to respondents. It encourages cooperation from respondents and reduces their motivation to falsely report their attitudes. The most important claim made for RRT is that it yields more valid point estimates of sensitive behavior. Various types of sensitive behavior are more frequently reported if respondents are questioned via the RRT, rather than directly. In our study, conducted in Spain (where RR techniques are not commonly used for studies of sexual behavior), quantitative variables were taken into account to make the scope of the study as complete as possible.
Nonresponse is a common problem in surveys, especially when sensitive issues are investigated. It is generally agreed that use of the RRT can increase respondents’ degree of cooperation and thus reduce the rate of nonresponse (Barabesi, Diana, & Perri, 2014). The results of our study show that the RRT achieves a significant improvement in response rates, which is in line with previous research findings.
On the contrary, the RR approach also has certain drawbacks. First, individual response patterns cannot be interpreted directly, due to the observation of randomized responses, nor can individuals or groups of individuals be compared. Moreover, RR procedures require a randomization device to drive the answer. Using physical devices may be more time consuming and costly than DQ. The variance of the estimates is also increased by the randomization mechanism, although the use of auxiliary information at the estimation stage can help reduce this variance without additional costs and without infringing respondents’ privacy. Recent contributions in this field have been proposed by Özgül and Cingi (2017) and by Rueda, Cobo, and Arcos (2018), among others.
As a final observation, the RRT could be used when the sensitive question is about engaging not only in sexually aggressive behavior but also in treatment. Thus, treatment providers could use the RRT to evaluate whether participants are reporting sexual thoughts or fantasies about re-offending, this being an area of inquiry that is highly sensitive and likely to be minimized in DQ.
Footnotes
Appendix A
Let
where S is a scramble variable, with a mean
The transformed variables are defined as
with
where
The Horvitz–Thompson estimator of the mean for the RR survey is given by
and the unbiased estimator of the variance of this estimator is
where πi and πij are the inclusion probabilities (Singh, 2003) of the ith unit and of the ith and jth units
The confidence interval at
where
Appendix B
To test the equality of means, given that the variance of the randomized response technique (RRT) includes the randomization term, we use the methodology proposed by Wolter and Preisendörfer (2013) and we calculate z scores using the equation:
where
Acknowledgements
The authors extend their thanks to the Editor, the Action Editor, and the anonymous reviewers for their extremely helpful comments and feedback.
Authors’ Note
The authors take full responsibility for the integrity of the data presented and for the accuracy of the analyses made. Every effort was made to avoid inflating statistically significant results.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was partially supported by Ministerio de Educación y Ciencia (grant MTM2015-63609-R).
