This paper considers the problem of estimating the population total of the sensitive variable under a unified approach. Odumade and Singh (2008) have suggested a forced quantitative randomized response model and then suggested an unbiased estimator of the population total of the sensitive variable . It is observed that the optimum estimator of the population total due to Odumade and Singh (2008) depends on the unknown population parameter under investigation which lacks the practical utility. To overcome this drawback Odumade and Singh (2008) suggested another unbiased estimator of the population total of the sensitive variable based on two independent samples which causes the increase in the cost. Keeping this in view we define an alternative randomized response procedure and hence the estimator of the population total of the sensitive variable based on one sample only which is free from such limitation and more efficient than the Odumade and Singh (2008) estimator. The relative efficiency of the proposed estimator with respect to Bar-Lev et al. (2004) and Odumade and Singh (2008) estimators are examined through numerical illustration.
The feasibility of using the randomized response technique (RRT) (see Warner (1965)) for studying that proportion of a population having a sensitive characteristic (one that most people would be sensitive to admit having) has been enhanced by a series of theoretical works and some field tests conducted by several investigators. For example see Horvitz et al., (1967); Greenberg et al., (1969); Moors, (1997); Mangat and Singh, (1990); Kuk (1990); Mangat, (1994), Nayak (1994); Bhargava (1996); Zou (1997); Bhargava and Singh (2001, 2002); Gjestvang and Singh (2006); Kim and Elam (2005); and Kim and Warde (2005), Odumade and Singh (2009, 2010).
Eichhorn and Hayre(1983), suggested a multiplicative model, to collect information on sensitive quantitative variables like income tax evasion, the amount of drug used etc. According to them, each respondent in the sample is requested to report the scrambled response , where is the real value of the sensitive quantitative variable and is the scrambling variable whose distribution is assumed to be known. In other words, and are assumed to be known and positive. Then an unbiased estimator of the population total under the simple random and with replacement (SRSWR) sampling is given by:
with variance:
where , and .
Bar-Lev, Bobovich and Boukai (BBB) (2004) further proposed a quantitative randomized response (RR) procedure which generalizes that of Eichhorn and Hayre (1983). In BBB model, the distribution of the responses is given by
In other words, each respondent is requested to rotate a spinner unobserved by the interviewer, and if the spinner stops in the shaded area, then the respondent is requested to report the real response on the sensitive variable, say ; and if the spinner stops in the non-shaded area, then the respondent is requested to report the scrambled response, say , where is any scrambling variable and its distribution is assumed to be known. Assume that and are known. Let be the proportion of the shaded area of the spinner and be the non-shaded area of the spinner as shown in Fig. 1.
Bar-Lev, Bobovich, and Boukai (2004, BBB) randomized response device.
Odumade and Singh (2008, GFQRR) randomized response device.
An unbiased estimator of population total is given by:
with variance under SRSWR sampling given by
where
Further Odumade and Singh (2008) suggested a generalized forced quantitative randomized response model whose description is given below.
Odumade and Singh (2008) RR model
Consider a finite population consisting of units. Let be the value of the th population unit of the sensitive quantitative variable. Let be the probability of including the th unit from the population in the sample with probability design . For estimating population total of the sensitive quantitative variable , Odumade and Singh (2008) suggested a forced quantitative randomized response model under unified approach. In their model, the th respondent selected in the sample is requested to rotate a spinner having three statements
Report the real value of the sensitive variable, , with probability ,
Report the scrambled response , with probability ,
Report the fixed response F, with probability ,
where is a scrambling variable and its distribution is assumed to be known. In other words, if is the expected value and is the variance over the randomization device used in a survey, then and are assumed to be positive and known. Conclusively, the distribution of the th response is given by
Thus an unbiased estimator of the population total is given by
where is called design weight.
The variance of is given by
where , denote the probability of including both th and th units in the sample, and .
The variance of the estimator at Eq. (9) is minimum when
Thus the minimum variance of the estimator due to Odumade and Singh (2008) is given by
It is to be noted that the optimum value of at Eq. (10) depends upon the values, and hence the resulting estimator is not the viable estimator. To overcome this difficulty Odumade and Singh (2008) suggested to take two independent random samples and from the population using the sampling designs and respectively. Then they developed an estimator of population total which is free from such difficulty. But drawing the two independent samples from the population will increase the cost of the survey. Keeping the above discussion in view we have suggested two alternative randomized response models based on one sample only. Note that the suggested randomized response models are more efficient than the Odumade and Singh (2008) randomized response model.
In the following sections, we will define two randomized response models based on and alongwith their properties.
Proposed randomized response model-I (RRM-I)
In the proposed RRM-I, the th respondent selected in the sample is requested to rotate a spinner having three statements:
Report the real value of the sensitive variable, , with probability ,
Report the scrambled response , with probability ,
Report the scrambled response , with probability ,
where is a scrambling variable and its distribution is assumed to be known.
Thus the distribution of the th response is given by
Randomized response model-I device.
Randomized response model-II device.
Now, we have the following theorems.
Theorem 1. An unbiased estimator of the population total is given by
where .
Proof Let and be the expected values over the design and the randomization device, say spinner, thus we have
which shows that the estimator is an unbiased estimator of the population total . Thus the theorem is proved.
Theorem 2. The variance of the proposed estimator is given by
where .
Proof Let and denote the variance over the randomization device, say spinner, and over the design, we have
which proves the theorem.
Special cases
Case I If , and , then the suggested randomized response model (RRM-I) reduces to the Eicchorn and Hayre (1983) model.
Case II If , and , then the proposed randomized response model (RRM-I) reduce to Bar-Lev et al. (2004) model.
Relative efficiency
Under simple random and without replacement sampling, we have and . Thus the minimum variance of the estimator due to Odumade and Singh (2008) at Eq. (11) and the variance of the proposed estimator at Eq. (15) respectively reduce to
and
where
To have the tangible idea about the performance of the suggested estimator over the Bar-Lev et al. (2004) estimator and Odumade and Singh’s (2008) estimator , we have computed the percent relative efficiency (PRE) of the proposed estimator with respect to the estimators and by using the formulae:
For fixed values of , the values of and increase as the value of the coefficient of variation of the scrambling variable increases.
For fixed values of , the values of and decrease with increasing values of the coefficient of variation of the sensitive variable .
For fixed values of , the values of and decrease as the value of increases.
For fixed values of , the value increases while the value of decreases with increasing values of .
For fixed values of and and , , the performance of the proposed estimator with respect to Bar-Lev et al.’s (2004) estimator is pitiable.
It is observed from Table 1 that the proposed randomized response model (RRM-I) could be more profitable over Bar-Lev et al. (2004) randomized response model as compared to Odumade and Singh (2008) randomized response model if it is used with scrambling variable having the mean value . We note that the Gupta et al. (2002) have taken the value of the mean of the scrambling variable . It is further observed that the gain in efficiency by using the proposed randomized response model over Bar-Lev et al. (2004) randomized response model is higher as compared to Odumade and Singh (2008) randomized response model for large value of i.e. The gain in efficiency by using the proposed randomized response model over Odumade and Singh (2008) randomized response model is larger as compared to Bar-Lev et al. (2004) randomized response model for moderate large value of . The substantial gain in efficiency is seen by using the proposed randomized response model over Bar-Lev et al (2004) and Odumade and Singh (2008) randomized response models for the higher value of the coefficient of variation of the scrambling variable.
Proposed randomized response model – II (RRM-II)
In the suggested RRM-II, the th respondent selected in the sample is requested to rotate a spinner bearing three statements:
Report the real value of the sensitive variable, , with probability ,
Report the scrambled response , with probability ,
Report the fixed response with probability ,
where is a scrambling variable defined earlier.
Thus the distribution of the th response is given by
Consequently, we have the following theorem.
Theorem 3. An unbiased estimator of the population total is given by
where and .
Proof Let and be the expected values over the design and the randomization device, say spinner, thus we have
which proves the theorem.
Theorem 4. The variance of the suggested estimator is given by
Proof Let and denote the variance over the randomization device, say spinner, and over the design, we have
Under simple random sampling and without replacement sampling, we have and . Thus the variance of the proposed estimator at Eq. (21) reduces to:
where
From Eqs (5),(16) and (25), we have computed the percent relative efficiency of the proposed randomized response model (RRM-II) under SRSWOR sampling with respect to the BBB model under SRSWR and Odumade and Singh (2008) under SRSWOR sampling scheme by using the formulae:
for 1000, 100, 0.7, 0.8, 0.9, , , , and .
Further, from Eqs (17) and (25), we have computed the percent relative efficiency of the proposed estimator with respect to the proposed estimator by using the formulae:
The PRE of the proposed estimator with respect to Bar-Lev et al.’s (2004) estimator , Odumade, and Singh’s (2008) estimator and proposed estimator for 1000, 100 (i.e. 0.1)
For fixed values of , the values of and increase with increasing values of , the coefficient of variation of the scrambling variable .
For fixed values of , the values of and decrease as the value of , the coefficient of variation of the sensitive variable increases.
For fixed values of , the value of and decrease with increasing value of .
For fixed values of , the value of increases with increasing value of . At , 0.7 for ; the value of decreases as increases while for it increases with increasing value of . It is observed that at , 0.8, ; the value of decreases in a very slow manner with increasing value of and for it starts increasing with increasing value of . We also note that for fixed , and for all values of , the value of increases considerably while the value of increases slowly with increasing value of .
For ( 0.7, 0.8; ), the value of increases with increasing value of while the value of decreases. For ( 0.9, ) the value of increases as increases at , ( 0.9, ), the value of decreases while for , it increases with increasing value of .
The value of indicates that the proposed estimators are almost equally efficient. That is the performance of the proposed model RRM-II is at par with the proposed model RRM-I. So the survey practitioners are free to choose any one of the two proposed randomized response models (i.e. either RRM-I or RRM-II) in practice.
Conclusion
This article describes an improvement over Bar-Lev et al. (2004) and Odumade and Singh (2008) randomized response models. We have suggested two alternatives to forced quantitative randomized response model due to Odumade and Singh (2008). The proposed randomized response models also remove the drawback of the Odumade and Singh (2008) randomized response model. It has been shown that the proposed estimators of the population total of the sensitive variable are more efficient than the Bar-Lev et al. (2004) and Odumade and Singh (2008) estimators. This fact has been examined through numerical illustration. Finally, our recommendation is in the favor of proposed randomized response models and the estimators of the population total of the sensitive variable for their use in practice.
Footnotes
Acknowledgments
Authors are thankful to the learned referee for his constructive suggestions regarding improvement of paper.
References
1.
Bar-LevS. K.BobovitchE., & BoukaiB. (2004). A note on randomized response models for quantitative data. Metrika, 60, 255-260.
2.
BhargavaM. (1996). An investigation into the efficiencies of certain randomized response strategies. Unpublished Ph.D. thesis submitted to Punjab Agricultural University, Ludhiana, India.
3.
BhargavaM., & SinghR. (2001). Efficiency comparison of certain randomized response schemes with U-model. Journal of the Indian Society of Agricultural Statistics, 54(1), 19-28.
4.
BhargavaM., & SinghR. (2002). On the efficiency comparison of certain randomized response strategies. Metrika, 55(3), 191-197.
5.
EichhornB. H., & HayreL. S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference, 7, 307-316.
6.
GjestvangC. R., & SinghS. (2006). A new randomized response model. Journal of the Royal Statistical Society B, 68, 523-530.
7.
GreenbergB.Abul-ElaA.SimmonsW. R., & HorvitzD. G. (1969). The unreleased question randomized response: Theoretical framework. Journal of American Statistical Association, 64, 529-539.
8.
GuptaS. N.GuptaB. C., & SinghS. (2002). Estimation of sensitivity level of personal interview survey questions. Journal of Statistical Planning and Inference, 100, 239-247.
9.
HorvitzD. G.ShahB. V., & SimmonsW. R. (1967). The unrelated question randomized response model. Proceeding of Social Statistics Section. American Statistical Association, 65-72.
10.
KukA. Y. C. (1990). Asking sensitive questions indirectly. Biometrika, 77, 436-438.
11.
KimJ.-M., & ElamM. E. (2005). A two-stage stratified Warner’s randomized response model using optimal allocation. Metrika, 61, 1-7.
12.
KimJ.-M., & WardeW. D. (2005). A mixed randomized response model. Journal of Statistical Planning and Inference, 133, 211-221.
13.
MangatN. S., & SinghR. (1990). An alternative randomized response procedure. Biometrika, 77, 439-442.
14.
MangatN. S. (1994). An improved randomized response strategy. Journal of the Royal Statistical Society B, 56, 93-95.
15.
MoorsJ. J. A. (1997). A critical evaluation of Mangat’s two-step procedure in randomized response. Discussion paper at Center for Economic Research, Tilburg University, The Netherlands.
16.
NayakT. K. (1994). On randomized response surveys for estimating a proportion. Communication Statistics – Theory and Methods, 23(1), 3303-3321.
17.
OdumadeO., & SinghS. (2008). Generalized forced quantitative randomized response model: A unified approach. Journal of Indian Society of Agricultural Statistics, 62(3), 244-252.
18.
OdumadeO., & SinghS. (2009). Improved Bar-Lev, Bobovitch, and Boukai randomized response models. Communication in Statistics-Simulation and Computation, 38(3), 473-502.
19.
OdumadeO., & SinghS. (2010). An alternative to the bar-lev, Bobovitch, and Boukai randomized response model. Sociological Methods and Research, 39(2), 206-221.
20.
WarnerS. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 60, 63-69.
21.
ZouG. (1997). Two-stage randomized response procedures as single stage procedures. Australian Journal of Statistics, 39, 235-236.