This paper suggests a new randomized response model useful for gathering information on quantitative sensitive variable such as drug usage, tax evasion and induced abortions etc. The resultant estimator has been found to more efficient than the estimator of the Saha (2007) under some realistic conditions. We have illustrated results numerically.
In sociological, health, economic and psychological surveys, people often do not respond truthfully when asked personal or sensitive questions, or refuse to answer. So in such situations the procedures that protect anonymity are a solution. Randomized response (RR) technique and scrambled response procedure are the two extensively used ways to protect anonymity. The randomized response technique pioneered by Warner (1965) in an ingenious interviewing procedure for eliciting information on sensitive data while ensuring that respondents privacy is protected. However the scrambled response technique was initiated by Pollock and Bek (1976). Further some work have been carried out in this area by various authors including Himmelfarb and Edgell (1980), Eichhorn and Hayre (1983), Singh et al. (1998), Bar-Lev et al. (2004), Singh and Mathur (2005), Gupta et al. (2006), Saha (2007), Gupta and Shabbir (2008), Gjestvang and Singh (2009), Diana and Perri (2010, 2011), Gupta et al. (2012), Perri and Diana (2013), Hussain and Al-Zahrani (2016) and Kumari and Trisandhya (2017, 2019).
In this paper following the procedure adopted by Gjestvang and Singh (2009) and taking the clue from Saha (2007) we have developed a new randomized response model and the estimator for estimating the mean of a quantitative sensitive variable. Some analytical and numerical comparisons of efficiency are performed to set up the conditions under which improvements upon Saha’s model can be obtained.
Saha’s (2007) scrambled randomized response model and estimator
Let be a finite population of units and be a quantitative sensitive variable under study with mean and , assumed to be unknown. Let and be two positive independent random variables also independent of whose distributions are known as well as , , and .
There are several devices available in the literature for estimating population mean of the variable under investigation, for instance, see Odumade and Singh (2009). In most of these cases a coding mechanism of the response on , i.e. the respondents are asked to algebraically perturb the true value of through one or more random numbers generated from known scrambling distributions. Saha (2007) suggested to gather information on by asking the interviewee to yield the scrambled randomized response
Here the interviewer is completely unaware of the random numbers and used for scrambling true responses . But the interviewer is having complete knowledge of the scrambling distributions. We also note that the respondent does not reveal to any one the scrambling numbers. This model combines multiplicative and additive models to induce larger confidence among the respondents about their privacy protection.
A simple random sample (SRS) of size is drawn with replacement () from the population . Let be the scrambled randomized response received from the selected individual . Then we have
Thus an unbiased estimator for the population mean is given by
where .
The variance of the estimator is given by
where
, and are the square of the coefficients variation ofthe variables and respectively and .
Proposed randomized response model and estimator
Moving along the direction traced by Saha (2007), we consider a scrambled randomized response model in the way suggested by Gjestvang and Singh (2009). Let and be two known positive real numbers. Consider a deck of cards in which is the proportion of cards bearing the statement: and be the proportion of cards bearing the statement: . Let be known. Each respondent is asked to draw one card secretly and report the scrambled response accordingly. Therefore, the response to the sensitive question is
Let be the expected value over all possible samples and be the expected value over the randomized device, then
Thus an unbiased estimator of the population mean is given by
where .
Let be the variance over all possible samples and be the variance over the randomization device; then we have
Now,
Since
and
therefore,
Putting Eq. (10) in Eq. (9) we get the variance of as
where
An estimator of the variance of the suggested estimator of the population mean is given by
Thus the proposed estimator is more efficient than the estimator of Saha (2007) as long as the conditions in Eq. (14) are satisfied.
Numerical illustration
To cast light upon the performance of the proposed model over Saha’s model, we have computed the percent relative efficiency (PRE) of the proposed estimator with respect to the estimator of Saha (2007) by using the following formula:
We consider a sensitive variable with the mean and the coefficient of variation in the range of 0.1 to 2.0; and we assume that scrambling variable are Fisher distributed as in Eichhorn and Hayre (1983) and the Diana and Perri (2010).
In both the cases the is greater than 100%. Thus the proposed estimator is more efficient than the estimator of Saha (2007) for the values of closed in Tables 1 and 2;
The decreases with increasing values of in both the cases;
The gain in efficiency is substantial by using the proposed estimator over the estimator of the Saha (2007) for smaller values of the coefficient of variation and if ‘’ is closer to zero;
The gain in efficiency is larger in Case-II than in Case-I except in few cases.
Thus for fixed value of , to obtain considerable gain in efficiency by using the proposed estimator over the estimator due to Saha (2007), a suitable selection of , based on practicable value of , should be made such that their product should remain near to zero. Gjestvang and Singh (2009) have mentioned that a practical choice of , fixed by our experience from repeated surveys.
Footnotes
Acknowledgments
Authors are thankful to the learned referee for his valuable comments regarding improvement of the earlier draft of the paper.
References
1.
Bar-LevS. K.BobovitchE., & BoukaiB. (2004). A note on randomized response models for quantitative data. Metrika, 60, 255-260.
2.
DianaG., & PerriP. F. (2010). New scrambled response models for estimating the mean of a sensitive quantitative character. Jour Appl Statist, 37(11), 1875-1890.
3.
DianaG., & PerriP. F. (2011). A class of estimators for quantitative sensitive data. Statistical Papers, 52, 633-650.
4.
EichhornB. H., & HayreL. S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Jour Statist Plan Infer, 7, 307-316.
5.
GuptaS. N., & ShabbirJ. (2008). On improvement in estimating the population mean in simple random sampling. Jour Appl Statist, 35(5), 559-566.
6.
GuptaS. N.ShabbirJ.SousaR., & Corte-RealP. (2012). Estimation of the mean of a sensitive variable in the presence of auxiliary information. Commun Statist-Theo Meth, 41(13-14), 2394-2404.
7.
GuptaS. N.ThortonB.ShabbirJ., & SinghalS. (2006). A comparison of multiplicative and additive optional RRT models. Jour Statist Theor Appl, 5(3), 226-239.
8.
GjestvangC. R., & SinghS. (2009). An improved randomized response model: Estimation of mean. Jour Appl Statist, 36(12), 1361-1367.
9.
HimmelfarbS., & EdgellS. E. (1980). Additive constant model: A randomized response technique for eliminating evasiveness to quantitative response questions. Psychol Bull, 87, 525-530.
10.
HussainZ., & Al-ZahraniB. (2016). Mean and sensitivity estimation of a sensitive variable through additive scrambling. Commun Statist-Theo Method, 45(1), 182-193.
11.
PerriP. F., & DianaG. (2013). Scrambled response models based on auxiliary variables: Advances in theoretical and applied statistics, Springer Verlag, Berlin, doi: 10.1007/978-3-642-35588-2-26.
12.
PollockK. H., and BekY. (1976). A comparison of three randomized response models for quantitative data. Jour Amer Statist Assoc, 71(356), 884-886.
13.
PriyankaK.TrisandhyaP., & MittalR. (2017). Dealing sensitive characters on successive occasions through a general class of estimators using scrambled response technique. Metron, doi: 10.1007/s40300-017-0131-1.
14.
PriyankaK., & TrisandhyaP. (2019). Some classes of estimators for sensitive population mean on successive moves. Jour Statist Theo Practice, doi: 10.1007/s42519-018-0008-5.
15.
SahaA. (2007). A simple randomized response technique in complex surveys. Metron, 65(1), 59-66.
16.
SinghH. P., & MathurN. (2005). Estimation of population mean when coefficient of variation is known using scrambled response technique. Jour Statist Plan Infer, 131, 135-144.
17.
SinghS.HornS., & ChoudharyS. (1998). Estimation of stigmatized characteristics of a hidden gang in finite population. Austral and New Zealand Jour Statist, 40, 291-297.
18.
WarnerS. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Jour Amer Statist Assoc, 60, 63-69.