Abstract
Surveying human behaviors, especially in demographic, social, medical and public health research, often involves sensitive issues. Posing direct inquiries about stigmatizing or threatening topics may lead survey participants to refuse to answer or to give untruthful responses. Nonresponse and misreporting denote measurement errors that are difficult to treat and are likely to yield unreliable analyses of the surveyed topics.
This problem can be mitigated by adopting survey methods that enhance anonymity and respondent cooperation. One possibility is to create a trustful and confidential relationship between the interviewer and the survey participants. Alternatively, it is possible to fully protect privacy by adopting indirect questioning procedures that elicit information without posing sensitive questions directly.
We consider both above-mentioned possibilities showing the results of a real study which explores the effectiveness of the randomized response crossed model proposed by Lee et al. (2013) to produce prevalence estimates for two sensitive traits, cannabis use and its legalization.
Introduction
Nowadays, large-scale surveys of human populations delve increasingly into sensitive topics such as abortion, sexual behavior, rape victimization, drug taking, tax evasion, xenophobia, income, and so on. Let us try to imagine the reaction of people engaged in socially undesirable or illegal behaviors when they are directly posed questions like “Have you ever earned illegal income?”, “Are you homosexual?”, or even the sensation of those who are posed confidential questions that pertain to their private sphere, albeit legal and do not trigger social desirability concerns, like “Which party will you vote for in the next election?”, “What is the level of your household income?” It is easy to imagine the embarrassment and the concerns of the interviewees, even when the inquirers do their best to guarantee confidentiality. Probably, most respondents would feel uncomfortable with such direct inquires, may give evasive or misleading replies, or even refuse to answer. Such reactions generate a well-known source of bias in surveys, called social desirability bias, which refers to the tendency to present oneself in a positive light. Survey participants exhibit this bias when they overreport socially acceptable attitudes which conform to social norms (e.g., giving alms, healthy eating, doing voluntary work) or underreport socially disapproved behaviors which deviate from social rules (e.g., gambling, consumption of alcohol, induced abortion, drug taking).
Sensitive research (Liamputtong, 2007; Tourangeau & Yan, 2007; Dickson-Swift et al., 2008) on stigmatizing, highly personal, embarrassing, threatening or even incriminating issues – especially when carried out by traditional techniques based on direct questioning (DQ) survey modes – is likely to produce non-sampling errors, such as nonresponse and untruthful answers, which can lead to inconsistent analyses and erroneous conclusions, including wrong inference about unknown characteristics of the population under study. Although these errors cannot be totally avoided, they can, to some extent, be reduced by enhancing respondent cooperation. There are various aspects to this: how the survey is administered; the presence of the interviewer and whether he/she poses the questions directly; the questionnaire format, the wording and the placing of the sensitive items; the setting for data collection – does the respondents answer in isolation or are others present? All the above factors play a part in encouraging trust and confidentiality (on this, see, e.g., Tourangeau & Smith, 1996; Groves et al., 2004).
In this paper, we discuss the results of a real study which furnishes empirical evidence of possible ways for enhancing respondent cooperation and reducing misreporting in sensitive research. Two aspects, in particular, are investigated: a non-standard data-collection survey method based on indirect questioning (IQ) and the role of the interviewer. Specifically, the study is conceived as a face-to-face mixed-mode interview which allows us to collect information on two topics of different sensitivity, cannabis use and its legalization, and to estimate the prevalence of cannabis users and of people in favour of cannabis legalization. Both the DQ and IQ survey modes are used to collect data in the presence of the interviewer, a person who is a member of the surveyed community and inclined to establish a confidential and trustful rapport with the survey participants. The outcomes of the empirical research well emphasize the positive effects that can be produced by adopting survey strategies based on the IQ and trustworthy interviewers. As expected, the benefits increase when the respondents perceive the investigated topic as being more sensitive.
The remainder of the paper is organized as follows. In Section 2 we introduce the problem of enhancing respondents cooperation in sensitive research. Section 3 is devoted to the description of a particular IQ method: the randomized response crossed model proposed by Lee et al. (2013). In Section 4 we present the main features of a real study on cannabis use and cannabis legalization conceived to evaluate the usefulness of the IQ approach when stigmatizing topics are surveyed. The results of our study are discussed in Section 5. Section 6 concludes the paper with some final considerations.
Increasing respondent cooperation in sensitive surveys
In surveys which cover sensitive issues, it is a well-established fact that the decision of the participants to honestly cooperate greatly depends on to what extent they perceive their privacy is being protected. Consequently, survey modes which ensure respondent anonymity or, at least, a high degree of confidentiality, may go some way to improving cooperation and, hence, gathering more reliable information on sensitive topics than is the case with DQ. It is worth remarking that, when comparing alternative data-collection methods, it is crucial to establish the criterion for choosing the most “accurate” method. In the absence of reliable external sources that provide true population benchmarks of the sensitive behaviors under investigation, the most common approach used in comparative validation studies is based on the “more-is-better” [“less-is-better”] assumption according to which a method that successfully induces more [less] reports of a sensitive undesirable [desirable] behavior is expected to yield higher [lower] estimates of the sensitive traits and, therefore, should be judged the more valid.
In order to increase respondent cooperation, survey statisticians have developed many different strategies. One possibility for improving reporting on sensitive topics is to limit the influence of the interviewer from the question-and-answer process on the grounds that the presence of the interviewer tends to increase socially desirable effects. This goal is traditionally pursued by means of self-administered questionnaires (SAQs) with paper and pencil, the computer-assisted telephone interviewing (CATI), the computer-assisted self interviewing (CASI), the audio computer-assisted self interviewing (ACASI) or by the computer-assisted Web interviewing (CAWI). On the contrary, many social researchers stress the collaborative, trustful and non-hierarchical relationship that should be build between the interviewer and the survey participants. In this case, the interviewer self-disclosure or the reciprocal sharing of personal ideas, attitudes and/or experiences concerning matters that might be related to the sensitive topic become important tactics to create a forthcoming and encouraging atmosphere. It may also happen that the interviewer knows most of the members of the stigmatizing group or is himself/herself a member of that group (see, e.g., Liamputtong, 2007; Dickson-Swift et al., 2008). In these situations, the respondents would be less inhibited by the presence of the interviewer and might be more willing to release personal information since they have nothing to hide, show indifference to interviewer opinion, and do not fear that their personal information may be released to third parties for purposes other than the survey ones.
A second possibility to enhance confidentiality consists in the use of alternative questioning methods – different from the conventional survey modes – that have been devised since the sixties to ensure respondent anonymity and that cut down on evasive answers and misreporting of stigmatizing acts. These methods are generally known as indirect questioning techniques (IQTs) and they obey the principle that no direct questions are posed to the survey participants and, thus, there is no need for respondents to openly reveal their true status. In this way, privacy is protected since the information gathered remains confidential and, consequently, the true status of the respondents remains undisclosed to both the interviewer and the researcher.
Since the IQ approach plays a crucial part in this paper, it is worth providing a more detailed discussion in the next Section.
The randomized response crossed model
The year 1965 marks the kick-off for the beginning of the IQTs, an ingenious approach conceived to reduce nonresponse and gather franker and more reliable responses to sensitive questions than DQ survey modes. The IQ approach represents a category of strategies for eliciting sensitive information which encompasses various alternatives. For the amount of literature produced since Warner’s (1965) pioneering work, the randomized response (RR) technique (RRT) holds certainly a prominent position among the IQTs. Roughly speaking, the RRT adopts a randomization device (e.g. decks of cards, colored numbered balls, dice, coins, etc.) which determines whether respondents answer the sensitive question or whether they are asked another neutral question, or even provide a pre-specified response (e.g. “Yes”) irrespective of their true status concerning the stigmatizing behavior. The randomization device, therefore, generates a probabilistic relation between the sensitive question and a given answer which is used to make inference about unknown parameters of interest, such as the prevalence of a sensitive attribute in the target population.
A great deal of the work spanning the literature since Warner (1965) on the IQTs is condensed in the monographs of Fox and Tracy (1986), Chaudhuri and Mukerjee (1988), Chaudhuri (2011), Chaudhuri and Christofides (2013), Tian and Tang (2014), Chaudhuri et al. (2016) and Fox (2016).
From the different IQTs, in our study we have employed the RR crossed model (CM) proposed by Lee et al. (2013) to simultaneously estimate, from a single sample, the prevalence of two sensitive attributes, say
Let
and
We note that, in real studies, the estimation procedure may lead to estimates of
In order to evaluate the effectiveness of the CM, we have conducted a real study by investigating two sensitive topics, say cannabis use and cannabis legalization. This Section is, therefore, devoted to describing the salient aspects of our study.
Why using the crossed model
Measuring levels and patterns of illicit drug use, their determinants, related behaviors and attitudes requires the use of self-reported methods of investigation. However, the validity of self-reported data has long been questioned (see, e.g., Harrison & Hughes, 1997) and assessed by using urinalysis, blood and hair analyses. Although less intrusive survey methods, such as CATI, ACASI and CAWI, are currently used to increase confidentiality, results still suffer from errors mostly ascribable to misreporting. For instance, some studies show that individuals under criminal justice supervision are loath to report drug use on confidential and anonymous surveys, others emphasize that a non-negligible percentage of individuals who test positive for drugs by urinalyses deny having used drugs. Underreporting of drug consumption is therefore well-evident. Similarly, there are reasons to believe that a lesser misreporting may also affect polling findings concerning the legalization of soft drugs, mostly where smoking cannabis is morally wrong.
The nature of the discussed topics well suggests the use of alternative survey modes which may limit the shortcomings of traditional approaches. This idea inspired us to conduct a sensitive research about illegal cannabis use (hereafter identified as the attribute
It is reasonable to assume that cannabis use is highly stigmatizing because of the social desirability effect or out of fear of punishment by law. On the other hand, cannabis legalization is reasonably assumed to be a less sensitive issue. Consequently, we expect to get higher estimates for the prevalence of promoters of cannabis legalization than for the prevalence of cannabis use.
Some data on cannabis use and cannabis legalization in Italy
Cannabis is the most widely consumed illicit drug in the world today, especially by teens, young people and pregnant women. The latest available data collected in Italy for the year 2017 indicate that about one-third (33.1%) of people aged 15–64 years has used cannabis at least once. Cannabis use is more frequent among men (39.1%) than women (26.4%). The prevalence of people who consumed cannabis during the 2017 is 10.2% (12.6% for men and 7.8% for women). The highest prevalence (20.7%) of last-year cannabis use is concentrated among young adults. More detailed information may be found in the 2018 Annual Report to the Parliament on the Use of Drugs in Italy produced by Anti-Drug Policies Department (Dipartimento Politiche Antidroga, 2018, p. 62) set up at the Presidency of the Council of Ministers.1
Although cannabis is considered a soft drug and less damaging to health than hard drugs (e.g. ecstasy and cocaine), in many countries it is forbidden to use, possess, grow or sell cannabis. Nevertheless, in some parts of the world the possession of a “small amount” of cannabis has been decriminalized and use is allowed for medical/religious purposes, or is even legal. Support for making cannabis legal is nowadays increasing around the world. Many people strongly advocate cannabis legalization in the belief that a decriminalization policy would reduce the consumption, particularly among younger age groups, eliminate the illegal trade and associated crime, yield a valuable tax-source to reinvest in welfare and reduce policing costs. In Italy, the percentage of people aged 18 or older who support cannabis legalization has been estimated at 73% by Ipsos Public Affair (2015). According to the opinion poll conduct by the agency SWG in 2014, nearly 46% of interviewees support cannabis legalization or decriminalization, with the peak at 69% observed for the voters of the Democratic Party. Similar findings can be found in a study conducted by Skuola.net2 among about 1500 students aged 11–25, where it is reported that the percentage of those who support legalization is nearly 41% while 40% is not favorable; the remainders do not have a precise opinion in this regard. More recent data released by Eurispes (2019) show that the percentage of Italians who claim to support the legalization of soft drugs is 43.9% (51.1% for men and 36.8% for women), while among young people aged 18–24 the percentage increases up to 63% (61.1% for 25–34 years old).
The survey plan
A mixed-mode research was conducted in a municipality of about 5,000 inhabitants located in southern Italy with a twofold goal. Firstly, we aimed at simultaneously estimating the prevalence of individuals who have used cannabis at least once in their life and of those who were in favour of its legalization. Secondly, we were interested in evaluating the impact of the interviewer on the DQ survey mode. To serve our purposes, data have been collected on the same people using both the CM and the DQ survey format through face-to-face interviews. The fieldwork was realized by a single interviewer, a qualified and well-instructed final-year master student in Statistics who was well-known by the residents of the municipality. The interviewer recruited a convenience sample of respondents via personal contacts in the two main squares and streets of the municipalities. The interviews were collected during alternating weekends between September 2018–December 2018. On average, each interview took about 20 minutes to be completed. In order to enhance cooperation, contacted people were strongly assured that the collected data would be maintained strictly confidential and would be anonymously used only for scientific purposes by the research group from the University of Calabria which planned the survey. Hence, in advance of data collection, the interviewer explained to the survey participants all the phases of the survey and obtained their informed consent.
Participants were first submitted to a face-to-face interview using a short paper-and-pencil standardized questionnaire containing some generic sociodemographic information about gender, age, education, employment status, marital status and number of children if any. Then, after a collaborative and confidential atmosphere had been established, participants were provided with the two decks of cards needed to run the CM. Finally, DQ was performed.
As described in Section 3, we prepared two decks of cards: Deck-I contained a proportion
After the CM was carried out and the responses collected, the interviewer passed to the third part of the survey by posing the two sensitive questions to the respondents directly:
D1: “Have you ever used cannabis at least once in your life?” D2: “Are you in favour of cannabis legalization?”
In this phase (hereafter DQ1), it is expected that the presence of the interviewer who has established a trusting rapport with the respondents, will encourage them to cooperate and reduce to some extent the embarrassment to answer truthfully. To verify this working hypothesis and evaluate the impact of interviewer on the estimates, DQ was repeated a second time. In so doing, in the fourth stage of the survey (hereafter DQ2), the interviewer collected new responses after posing the following request:
“What would have been your answer to my two previous questions if I had not been a fellow-citizen of yours and you had not known me in advance and/or you had not a trusting rapport with me? Now, please, imagine you never have known me and/or you do not trust in me, and answer to my questions D1 and D2 again”
Under the two stages DQ1 and DQ2, we expect that the results may be significantly different for question D1.
Testing the differences between the CM and DQ survey modes
In order to test the significance of the difference in the prevalence estimates obtained across CM, DQ1 and DQ2, and hence to assess the effect on the respondents of the three survey modes, we have conceived and implemented an “adjusted version” of the McNemar (1947) test for paired data to evaluate CM vs DQ1 and CM vs DQ2. It is worth observing that here the McNemar test is used in its “standard version” to ascertain whether a significant difference exists in the responses of the survey participants under both DQ1 and DQ2 stages. On the contrary, when the CM is used, since survey participants provide misclassified responses, it is not possible to disentangle responses related to attribute
Our procedure can be summarized in the following steps:
Based on a sample of size Fill the above table with the joint frequencies as follows:
Derive the contingency table for the CM responses (Yes, Yes), (Yes, No), (No, Yes) and (No, No) from Deck 1 and Deck 2 conditioned to the subsample of respondents bearing the attribute(s) Compute the number of respondents bearing
Complete the contingency table for the McNemar test as follows:
The value of the
At the end of the data-collection phase, the analyzable cases included 289 participants, aged 16–60 years who correctly gave all the information requested during the face-to-face interview under the three different survey modes, CM, DQ1 and DQ2. With regard to the characteristics of the respondents, gender is almost equally represented (53.3% men against 46.7% women) and about the 40% of respondents are younger than 30 years. Looking at the employment status, about 50% of interviewees are working, 33% do not have a job and 17% are students. Most of the survey participants claimed to have a medium/high level of education (66.4%), while the remainder have a low level (compulsory education). Just over half of the respondents are married/cohabiting and without children (52.3% and 52.6%, respectively).
Table 1 shows point estimates for
Point estimates through the CM and DQ survey modes. Symbol
denotes that the difference
is significant at the 5% level (
-value
0.05);
denotes significance at the 1% level. Similarly,
and
refer to the significance of the difference
. Finally
and
denote significance at 5% and 1% levels for the difference
Point estimates through the CM and DQ survey modes. Symbol
Proportion of observed responses for the CM
the ascertainment that although cannabis use denotes a stigmatizing and illegal behavior, estimates for estimates for
Similar findings may be recognized for subgroups of the population although, due to the reduced sample size and the extra variability induced by the CM, some results may appear, in a certain sense, unexpected. For instance, in the group of men, the estimated percentage of cannabis users is lower under CM (50.7%) than under DQ1 (58.4%). Similarly, for the prevalence of those in favour of cannabis legalization under CM (59.1%), DQ1 (72.1%) and DQ2 (71.4%). The latter outcome is perhaps not so surprising since favoring cannabis legalization may denote positive and desirable personal traits, such as an open mind and modern social views. Consequently, some subgroups of the population, contrary to underreporting of cannabis use, may overreport this attitude when it is surveyed by DQ and produce misleading higher estimates. In this case, the CM estimates may provide a more reliable picture of the reality and the “less-is-better” assumption could be applied to validate the CM.
Looking at the estimates obtained using the CM, we observe that cannabis use is more frequently reported by men (50.7%) than women (43%), and that cannabis consumers are more concentrated among people with a job (61.1%). In the same category, not surprisingly, we also find the highest estimated percentage of people who declare to support cannabis legalization (80.6%). A high percentage of cannabis users (54.4%) is found in the age group 16–30 years confirming the tendency that emerges from official statistics according to which the highest prevalence of those who have used cannabis in the previous 12 months is concentrated in young people (see Section 4.2). People with children and with a more stable relation (married/cohabiting) seem to be less prone to cannabis use.
In this paper, we discussed the differences in the estimates of the prevalence of two sensitive attributes when different questioning survey modes are employed. The survey design was conceived as a mixed-mode research which allowed us to compare both the DQ and IQ approaches in the presence of the interviewer for the simultaneous estimation of the prevalence of cannabis users and the prevalence of people supporting its legalization.
The results of the study highlight some important issues. First, significant improvements in the prevalence estimates can be achieved when the IQ mode is used rather than questions asked by the interviewer directly. In particular, in almost all considered cases, the randomized response CM led to higher estimates than DQ1 and DQ2 and, according to the “more is better” assumption, we can conclude that IQ worked better than direct mode. Large differences were found especially when comparing CM and DQ2.
Moreover, we intentionally considered two correlated sensitive issues with different levels of sensitivity that, as expected, induced survey participants to behave differently when responding. Differences in the prevalence estimates due to sensitivity can be recognized across the three survey modes employed and, therefore, the idea that respondents were more incline to truthfully answer to questions that they perceived as less stigmatizing is confirmed.
Finally, the results demonstrated that the presence of a trustworthy interviewer plays an important role in enhancing respondent cooperation. Comparison of prevalence estimates through DQ1 and DQ2 suggested that, when the DQ mode is performed, the estimation process can be significantly improved if a trusting rapport is established between interviewer and respondent.
To conclude, we would like to remind the reader to be cautious about making too-strong conclusions from the results here discussed since the study involved a convenience sample and tested different working hypotheses on the same set of data. Future research could certainly give a deeper insight on the investigated matters.
Footnotes
See also
See
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
