Improved randomized response technique for estimating population proportion of a sensitive characteristic

Abstract

Getting correct answers to sensitive questions from the respondents and estimating the population parameters on variables that are sensitive in nature is prevailing problem in survey sampling. In the present research paper, the problem of estimation of the population proportion of sensitive characteristics has been studied. For this, an improved randomized response device has been developed by taking the two cases of the unrelated question, case-I: ‘when the proportion of unrelated characteristic is known’ and other case-II: ‘when the proportion of unrelated characteristic is not known’. Two estimators of the population proportion of a sensitive characteristic have been proposed, one for a known value of unrelated characteristic $\pi_{y}$ and the other for an unknown value, which were found to be unbiased. The expression for variances and unbiased estimates for the variances of the proposed estimators have been obtained. The optimum value of sample sizes has been worked out for which the minimum variance for the proposed estimators has also been obtained. An empirical study has been conducted and concluded graphically that proposed estimators are better than the estimators of Mangat (1992) and Tiwari and Mehta (2016).

Keywords

Randomized response technique unbiased estimator variance relative efficiency

1. Introduction

While having surveys on the human population, we seldom need information on highly personal matters, which people usually like to hide from others. Also, an inquirer often feels embarrassed to ask direct questions about private and confidential subjects, especially, if the subjects related to social stigma, such as smoking, criminal behavior, gambling, drug-taking, tax evasion, cheating in an examination, or the extent of any illegal income, history of induced abortion, and many similar aspects. Direct question on sensitive characteristics generally leads to false responses because of social stigma involved, and consider it to be non-cooperation of the respondents. It becomes difficult to control their effects to make an evasive use of the acquired data to reach a fair conclusion. Warner (1965) is the first who developed the randomized response (RR), which is the simplest one used for collecting the response on sensitive question. There are more techniques such as Bogus Pipeline Technique (BPT) and Unmatched Count Technique (UCT) are used to reduce the non-response and biased response in surveys. However, we are going to use the randomized response technique (RRT) for our proposed estimator.

Warner (1965) suggested the randomization device in which he randomly selected some proportion ‘ $p$ ’ of respondents who are being asked the sensitive question and the remaining proportion ‘( $1-p$ )’ of respondents who are being asked the complement of the sensitive question. Each respondent chooses one of the following questions (1.) “Do you belong to group $S$ ?” and (2.) “Do you not belong to group $S$ ?” where ‘ $S$ ’ is the sensitive group. The respondent replies “yes” or “no” according to the question selected by the respondent. The interviewer does not know if the respondent answered the sensitive question or its complement. Thus, the privacy of respondents is protected using the randomization procedure. Then, an estimator of $\pi$ can be made after collecting the data of “yes” or “no” responses.

After Warner’s model, a model is introduced by Greenberg et al. (1969) in which the second question of Warner’s procedure has been replaced by a question that is unrelated to the former question (sensitive character). Therefore, this model is known as the Unrelated or U-model. They suggest that one of the optimal choices of $p_{1}$ , $p_{2}$ (where $p_{1}$ and $p_{2}$ are the probabilities of sensitive questions in the first and second sample respectively) should be close to one and other one close to zero. The values for the proportion of unrelated characteristic ( $\pi_{y}$ ) should be chosen close to zero or one according to $\pi<0.5$ or $\pi>0.5$ , respectively.

Besides that, it is claimed that if one of the values i.e. $p_{2}$ is chosen to be zero, then U-model becomes optimal so far as the choice of $p_{1}$ and $p_{2}$ is concerned (Moors, 1971).

The procedure of RR technique which has two stages is known as the two-stage RR model. The first two-stage model was introduced by Mangat and Singh (1990) by adding an element of truthful responses into Warner’s RR model. In the two-stage models where Warner’s model taken as the second stage, have also been discussed in Kim et al. (1992), Singh and Singh (1992), and Mangat (1994). Singh (1993) and Mangat (1993) independently considered simpler models, which are equivalent and follow the same procedure as given by Warner’s (1965) device. This model-based estimator is found to be more efficient than Warner’s estimator if $p>1/3$ .

In contrast, an unrelated question RR procedure in a two-stage model is considered by Mangat (1992) and Mangat et al. (1992). Throughout the exposition, it has been assumed that the proportion of the population belonging to non-sensitive attributes is known.

In addition to it, Warner’s (1965) model is modified by Mangat et al. (1993) in which $p$ proportion of cards representing statement (1) “I belong to sensitive group $S$ ” and the remaining ( ${1-p}$ ) proportion of cards representing statement (2) “I do not belong to sensitive group $S$ ” in Warner (1965) model was split into two categories. Here proportion $p_{1}$ of cards bear that statements and proportion $p_{2}$ (such that $p_{1}+p_{2}=1-p$ ) of the cards has the statement (3) “Draw one more card”. They have shown that the proposed strategy based on the estimator was found to be unbiased for population proportion and was always more efficient than Warner’s (1965) usual estimator. A modification of Warner’s model is given by Mangat and Singh (1995) in which they use the proposed RR device consists of a deck having three types of cards. Mangat and Singh (1995) extended the results of the procedure given by Mangat and Singh (1991) to the case where the sampled respondent might not report truthfully.

Apart from it, a new parameter ‘ $\omega$ ’ (known as sensitivity level) is added in the original RR model under the assumption that if some people do not feel the survey question is sensitive in nature, they can give direct response as they get the option to answer truthfully, however if they feel the question is sensitive, then they can use RR device (Mangat & Singh, 1994; Singh & Joarder, 1997; Gupta, 2001).

A modified Warner (1965) model with the efficient use of two decks of cards is given by Odumade and Singh (2009). They prove that their estimator is more efficient than the estimator of Warner (1965), Mangat and Singh (1990), and Mangat and Singh (1994) under some conditions.

The modification of Gupta’s (2001) model is given by Sihm and Gupta (2014). In this model, the first randomization device has two options – (1) “Do you possess the sensitive characteristic $S$ ?” and (2) “Go to the second randomization device”. The second randomization device follows the method proposed by Gupta (2001).

There was a drawback of the models suggested by Gupta (2001) and Sihm and Gupta (2014) that $\omega$ and $\pi$ were of unknown quantities, which were estimated using two samples from the population. This made their procedure practically tedious. This problem was solved by applying the RR technique only for those respondents who considered the particular question to be sensitive in nature (Tiwari & Mehta, 2016).

In this procedure, if the respondent feels the survey question is non-sensitive, the true response can get from them and the rest of the respondents follows the procedure given by Mangat and Singh (1990). The estimator of Tiwari and Mehta (2016) is more efficient than that provided by Warner (1965) and Mangat and Singh (1990).

Although numerous modifications have been made by several researchers, yet there is a scope to further modify the existing models on RR technique. As Mangat (1992) improved Warner’s model by taking two-stage unrelated question that gave us an idea that if the question is unrelated in nature, Tiwari and Mehta’s (2016) model can be improved. Taking the idea from the Mangat (1992) and Tiwari and Mehta (2016) model, we developed an alternative model by taking a model in which we consider the two cases; the unrelated question is known and the unrelated question is unknown. So, we consider the situation that the sensitivity level can be known by good guess, which makes the procedure practically easier. The proposed estimator for both cases of the unrelated question is compared with the estimators suggested by Mangat (1992) and Tiwari and Mehta (2016).

In Section 2, a brief description of some previous RRT models has been presented. The proposed model has been described in Section 3. In Section 4, we have compared the variance of the proposed estimator (both cases) with the variance of the estimators presented in Section 2. The findings of the paper have been discussed in Section 5.

2. A brief description of some RRT models

2.1 Warner (1965)

According to him, in population, each respondent may belong to sensitive group ‘ $S$ ’ or may not belong to the sensitive group ‘ $S$ ’. So, an option was given to the respondent by Warner in his device, for choosing one of the following selected: (1) “Do you belong to group ‘ $S$ ’?” and (2) “Do you not belong to ‘ $S$ ’?” with probability ‘ $p^{\ast}$ ’ and ‘( $1-p^{\ast}$ )’, respectively.

Let $\pi$ be the proportion of the population belonging to the sensitive group ‘ $S$ ’ then the probability of the “yes” answer is given by

$\displaystyle\theta_{w}=p^{\ast}\pi+(1-p^{\ast})(1-\pi)$ (1)

for $p^{*}\neq 1/2$ and the variance of estimator $\hat{\pi}_{w}$ is given by

$\displaystyle V(\hat{\pi}_{w})=\frac{\pi(1-\pi)}{n}+\frac{p^{\ast}(1-p^{\ast})% }{n(2p^{\ast}-1)^{2}}$ (2)

where $\hat{\theta}_{w}$ is the unbiased estimator of $\theta_{w}$ .

2.2 Mangat (1992)

In this model, to have a more truthful answer, one more randomization device was added into the model given by Greenberg (1969). For estimating the value of $\pi$ , a simple random with replacement sample of ‘ $n$ ’ respondents is drawn from the population. Then, two randomization devices are given to each sampled respondent. The random device $R_{1}$ consists of two statements, namely (i) “Do you belong to sensitive group ‘ $S$ ’?” and (ii) “Go to random device $R_{2}$ ” represented with probabilities ‘ $T^{\ast}$ ’ and ‘ $({1-T^{\ast}})$ ’ respectively. The second randomization device is same as the Greenberg’s (1969) randomization device. The probability of “yes” response for each respondent by using the given procedure, is given as

$\displaystyle\theta_{m}=T^{\ast}\pi+(1-T^{\ast})[\pi p^{\ast}+(1-p^{\ast})\pi_% {y}]$ (3)

Mangat (1992) gives the model when $\pi_{y}$ is known i.e. Here, we consider the case for the unknown value of $\pi_{y}$ . Then, two independent random samples of $n^{\prime}_{1}$ and $n^{\prime}_{2}$ respondents are selected with simple random sampling with replacement (SRSWR). Thus, probabilities of “yes” responses for first and second samples are given as

$\displaystyle\theta_{m1}=T_{1}^{\ast}\pi+(1-T_{1}^{\ast})[\pi p_{1}^{\ast}+(1-% p_{1}^{\ast})\pi_{y}]$ (4) $\displaystyle\theta_{m2}=T_{2}^{\ast}\pi+(1-T_{2}^{\ast})[\pi p_{2}^{\ast}+(1-% p_{2}^{\ast})\pi_{y}]$ (5)

Then, by solving both equations for $\pi$ and using the maximum likelihood estimates of $\theta_{m1}$ and $\theta_{m2}$ i.e. $\hat{\theta}_{m1}$ and $\hat{\theta}_{m2}$ , we get the estimator of $\pi$ as

$\displaystyle\hat{\pi}_{m}=\frac{(1-B_{2})\hat{\theta}_{m1}-(1-B_{1})\hat{% \theta}_{m2}}{(B_{1}-B_{2})}$ (6)

where $B_{1}=1-(1-T_{1}^{\ast})(1-p_{1}^{\ast})B_{2}=1-(1-T_{2}^{\ast})(1-p_{2}^{\ast})$ and.

Since $\hat{\pi}_{m}$ is unbiased, the variance of estimator $\hat{\pi}_{m}$ is given by

$\displaystyle V(\hat{\pi}_{m})=\frac{1}{({B_{1}-B_{2}})^{2}}\left[{\frac{(1-B_% {2})^{2}\theta_{m1}(1-\theta_{m1})}{n^{\prime}_{1}}+\frac{(1-B_{1})^{2}\theta_% {m2}(1-\theta_{m2})}{n^{\prime}_{2}}}\right]$ (7)

and for the optimum value of $n^{\prime}_{1}$ and $n^{\prime}_{2}$ , the minimum variance of estimator $\hat{\pi}_{m}$ is given as

$\displaystyle\min[{V(\hat{\pi}_{m})}]=\frac{\left[{(1-B_{2})\sqrt{\theta_{m1}(% 1-\theta_{m1})}+(1-B_{1})\sqrt{\theta_{m2}(1-\theta_{m2})}}\right]^{2}}{n({B_{% 1}-B_{2}})^{2}}\,\text{when}\,n=n^{\prime}_{1}+n^{\prime}_{2}.$ (8)

2.3 Tiwari and Mehta (2016)

This device is based on the assumption of sensitivity as given in Mangat and Singh (1995), Singh and Joarder (1997) and Gupta (2001). Thus, if the respondents feel the survey question is non-sensitive, the true response is collected from them and the rest of the respondents follow the procedure given by Mangat and Singh (1990) in which two randomization devices were used. The first randomization device has two options: (1) “Do you belong to the sensitive group ‘ $S^{\prime}$ ?” and (2) “Go to the second randomization device” with probabilities ‘ $T^{\ast}$ ’ and ‘( $1-T^{\ast}$ )’, respectively. The second randomization device is same as the Warner’s randomization device. The privacy of the respondents (who feel the question is sensitive) can be maintained because the procedure in the second step remains unobserved by the researcher. If ‘ $\theta_{t}$ ’ be the probability that a respondent gives the ‘yes’ answer and $\omega^{\ast}$ is the level of sensitivity, then we have the unbiased estimator of $\pi$ , $\hat{\pi}_{t}$

$\displaystyle\hat{\pi}_{t}=\frac{\hat{\theta}_{t}-\omega^{\ast}(1-T^{\ast})(1-% p^{\ast})}{1-2\omega^{\ast}(1-T^{\ast})(1-p^{\ast})}$ (9)

Here $\hat{\theta}_{t}$ is the unbiased estimator of $\theta_{t}$ . The variance of the estimator $\hat{\pi}_{t}$ is given by,

$\displaystyle\text{Var}(\hat{\pi}_{t})=\frac{1}{\left[{1-2\omega^{\ast}(1-p^{% \ast})(1-T^{\ast})}\right]^{2}}\left[{\frac{\theta_{t}(1-\theta_{t})}{n}}\right]$ (10) $\displaystyle\text{Var}(\hat{\pi}_{t})=\frac{\pi(1-\pi)}{n}+\frac{A(1-A)}{n(1-% 2A)^{2}}$

where $A=\omega^{\ast}(1-T^{\ast})(1-p^{\ast})$ .

3. Materials and methods

3.1 The proposed improved randomization model

It may be possible that any survey question which is sensitive for some people may not be sensitive for other people. For example a question such as “Are you a patient of tuberculosis?” may not be sensitive for some people and they can give an honest answer to the interviewer.

Mangat and Singh (1994) and Gupta (2001) in their model used the idea of sensitivity level by considering it an unknown parameter. The fraction of the population which feels the question is sensitive in nature is known as sensitivity level and is denoted by ‘ $\omega^{\ast}$ ’. The model given by Tiwari and Mehta (2016) was based on the sensitive question and its complement. In this paper, we improved this model by using the U-model (unrelated question model) in which we consider the two cases; the unrelated question is known and the unrelated question is unknown. So, it would be practically easier for the interviewer to collect sensitive information using the proposed method. We consider the sensitivity level is known, so it would be easier to estimate the population proportion ( $\pi$ ).

A simple random sample of the size $`n^{\prime}$ respondents is drawn from the population of size $`N^{\prime}$ and each respondent is provided with a randomization device. In the first stage, if the respondent thinks that the asked question is non-sensitive $({1-\omega^{\ast}})$ , then he or she is instructed to report “yes” with probability $\pi$ if he or she belongs to the sensitive group, otherwise, he or she is instructed to report “no” with probability $({1-\pi})$ , without using a randomization device. On the other hand, if a respondent thinks that the asked question is sensitive $\omega^{\ast}$ , then he/she is instructed to use a two-stage randomized response (RR) model same as given by Mangat (1992) where the randomization device $D_{1}$ consisting of two type statements: (1) “ $I\in S$ ” with relative frequency $T^{\ast}$ and (2) “Go to second RR device” with relative frequency $({1-T^{\ast}})$ .

The second RR device $D_{2}$ is the same as suggested by the Greenberg et al. (1969) model consisting of two types of statements, (1) “ $I\in S$ ” and (2) “ $I\in Y$ ” with relative frequencies $p^{\ast}$ and $({1-p^{\ast}})$ , respectively. The privacy of the respondent is maintained as the mode of answering the asked question is not disclosed by the respondent.

Let $\pi$ be the population proportion possessing the sensitive characteristic ‘ $S$ ’. The probability of “yes” answer in the randomization device is given by

$\displaystyle\theta_{mk}=(1-\omega^{\ast})\pi+\omega^{\ast}\{{T^{\ast}\pi+(1-T% ^{\ast})[{\pi p^{\ast}+(1-p^{\ast})\pi_{y}}]}\}$ (11)

Suppose that the $n_{1}$ respondents report “yes” while the remaining $({n-n_{1}})$ respondents report “no”.

Case-I: In the case when we assume that the proportion of the unrelated question, $\pi_{y}$ is known.

By the method of moments, the maximum likelihood estimator of $\pi$ is given by

$\displaystyle\hat{\pi}_{mk}=\frac{\frac{n_{1}}{n}-\omega^{\ast}(1-T^{\ast})(1-% p^{\ast})\pi_{y}}{1-\omega^{\ast}(1-T^{\ast})(1-p^{\ast})}$ (12)

where $\hat{\pi}_{mk}$ is the estimator of the proposed model which is unbiased.

The proposed estimator is unbiased for the population of the sensitive characteristic and the variance of the proposed estimator $\hat{\pi}_{mk}$ is given by

$\displaystyle V(\hat{\pi}_{mk})=\frac{\pi(1-\pi)}{n}+\frac{\pi\omega^{\ast}(1-% T^{\ast})(1-p^{\ast})(1-2\pi_{y})}{n[{1-\omega^{\ast}(1-T^{\ast})(1-p^{\ast})}% ]}{}+\frac{\omega^{\ast}(1-T^{\ast})(1-p^{\ast})\pi_{y}[{1-\omega^{\ast}(1-T^{% \ast})(1-p^{\ast})\pi_{y}}]}{n[{1-\omega^{\ast}(1-T^{\ast})(1-p^{\ast})}]^{2}}$ (13)

Case-II: When the value of $\pi_{y}$ , the proportion of neutral character ‘ $Y$ ’ in the population is unknown, we have two unknown parameters $\pi$ and $\pi_{y}$ . So, by using the same RR device we take two independent random samples of sizes $n_{1}$ and $n_{2}$ using SRSWR from the population of size ‘ $N$ ’. Let $n^{\prime}_{1}$ and $n^{\prime}_{2}$ be the number of “yes” answers observed in the first and second samples, respectively.

In the first sample, $\omega_{1}^{\ast}$ is the sensitivity level with which respondents use the RR device $D_{1}$ , $T_{1}^{\ast}$ and $({1-T_{1}^{\ast}})$ are the probabilities of selecting the statements regarding the possessing of the sensitive characteristic ‘ $S$ ’ and use second device $D_{2}$ . Similarly, $p_{1}^{\ast}$ and $({1-p_{1}^{\ast}})$ denote the probabilities of selecting the statements that possessing of the sensitive group ‘ $S$ ’ and non-sensitive characteristic ‘ $Y$ ’. The probability of the “yes” answer for the first and second sample is given by

$\displaystyle\theta_{mk1}=(1-\omega_{1}^{\ast})\pi+\omega_{1}^{\ast}\{{T_{1}^{% \ast}\pi+(1-T_{1}^{\ast})[\pi p_{1}^{\ast}+(1-p_{1}^{\ast})\pi_{y}]}\}$ (14) $\displaystyle\theta_{mk2}=(1-\omega_{2}^{\ast})\pi+\omega_{2}^{\ast}\{{T_{2}^{% \ast}\pi+(1-T_{2}^{\ast})[\pi p_{2}^{\ast}+(1-p_{2}^{\ast})\pi_{y}]}\}$ (15)

Then, the sample estimator $\pi_{mk}$ of $\pi$ is given by

$\displaystyle\hat{\pi}_{mk}=\frac{\omega_{2}^{\ast}(1-T_{2}^{\ast})(1-p_{2}^{% \ast})\hat{\theta}_{mk1}-\omega_{1}^{\ast}(1-T_{1}^{\ast})(1-p_{1}^{\ast})\hat% {\theta}_{mk2}}{[{\omega_{2}^{\ast}(1-T_{2}^{\ast})(1-p_{2}^{\ast})}]-[{\omega% _{1}^{\ast}(1-T_{1}^{\ast})(1-p_{1}^{\ast})}]}$ (16)

Since $n^{\prime}_{1}$ being the binomial random variable with parameter $({n_{1},\theta_{mk1}})$ and $n^{\prime}_{2}$ being the binomial random variable with parameter $({n_{2},\theta_{mk2}})$ . The population proportion $\pi$ is unbiasedly estimated by estimator $\hat{\pi}_{mk}$ . The variance of the proposed estimator $\hat{\pi}_{mk}$ is the given by

$\displaystyle V(\hat{\pi}_{mk})=\frac{1}{\{{[{\omega_{2}^{\ast}(1-T_{2}^{\ast}% )(1-p_{2}^{\ast})}]-[{\omega_{1}^{\ast}(1-T_{1}^{\ast})(1-p_{1}^{\ast})}]}\}^{% 2}}$ (17) $\displaystyle\left[{\frac{[{\omega_{2}^{\ast}(1-T_{2}^{\ast})(1-p_{2}^{\ast})}% ]^{2}\theta_{mk1}(1-\theta_{mk1})}{n_{1}}+\frac{[{\omega_{1}^{\ast}(1-T_{1}^{% \ast})(1-p_{1}^{\ast})}]^{2}\theta_{mk2}(1-\theta_{mk2})}{n_{2}}}\right]$

The minimum variance of the proposed estimator $\hat{\pi}_{mk}$ , under optimal allocation of $n^{\prime}_{1}$ and $n^{\prime}_{2}$ , is given by

$\displaystyle\min[{V(\hat{\pi}_{mk})}]=\frac{[{A_{2}\sqrt{\theta_{mk1}(1-% \theta_{mk1})}+A_{1}\sqrt{\theta_{mk2}(1-\theta_{mk2})}}]r}{n({A_{2}-A_{1}})^{% 2}},\text{when}\,n=n^{\prime}_{1}+n^{\prime}_{2}$ (18)

where $A_{1}=\omega_{1}^{\ast}(1-T_{1}^{\ast})(1-p_{1}^{\ast})$ and $A_{2}=\omega_{2}^{\ast}(1-T_{2}^{\ast})(1-p_{2}^{\ast})$ .

4. Results

The randomization device developed in Section-3 needs an efficiency comparison to show that the proposed device is better than Tiwari and Mehta (2016) and Mangat (1992) devices. For this purpose, the relative efficiency of the proposed randomization device has been carried out for the comparison between both the cases; (i) when an unrelated characteristic ( $\pi_{y}$ ) is known and (ii) when the unrelated characteristic ( $\pi_{y}$ ) is not known which was discussed theoretically and empirically.

Case I. The percent relative efficiency of the proposed estimator ( $\hat{\pi}_{mk}$ ) with respect to Tiwari and Mehta (2016) estimator ( $\hat{\pi}_{t}$ ) is defined as

$\displaystyle\textit{PRE}=\frac{V(\hat{\pi}_{t})}{V(\hat{\pi}_{mk})}\times 100$ (19)

The proposed estimator $\hat{\pi}_{mk}$ will be superior to estimator $\hat{\pi}_{t}$ , if the relative efficiency defined above is greater than 1. On substituting the values of $V(\hat{\pi}_{t})$ from Eq. (2.3) and $V(\hat{\pi}_{mk})$ from Eq. (13), we get by solving it

$\displaystyle\left\{{{\begin{array}[]{*{20}c}\pi<\frac{1}{(1-2\pi_{y})}\left[{% -\frac{\pi_{y}\left[{1-\omega^{\ast}(1-T^{\ast})(1-p^{\ast})\pi_{y}}\right]}{% \left[{1-\omega^{\ast}(1-T^{\ast})(1-p^{\ast})}\right]}+\frac{(1-\omega^{\ast}% (1-T^{\ast})(1-p^{\ast}))^{2}}{\left[{1-2\omega^{\ast}(1-T^{\ast})(1-p^{\ast})% }\right]^{2}}}\right]\text{when}\,\pi_{y}\leqslant 0.5\\ \\ \pi>\frac{1}{(1-2\pi_{y})}\left[{-\frac{\pi_{y}\left[{1-\omega{}^{\ast}(1-T^{% \ast})(1-p^{\ast})\pi_{y}}\right]}{\left[{1-\omega^{\ast}(1-T^{\ast})(1-p^{% \ast})}\right]}+\frac{(1-\omega^{\ast}(1-T^{\ast})(1-p^{\ast}))^{2}}{\left[{1-% 2\omega^{\ast}(1-T^{\ast})(1-p^{\ast})}\right]^{2}}}\right]\text{when}\,\pi_{y% }>0.5\\ \end{array}}}\right.$ (20)

From Eq. (20), we can see that the condition depends on the value of $\pi$ , $\pi_{y}$ , $\omega^{\ast}$ , $T^{\ast}$ , and $p^{\ast}$ . Thus, the relative efficiency of the proposed estimator ( $\hat{\pi}_{km}$ ) with respect to Tiwari and Mehta (2016) estimator ( $\hat{\pi}_{t}$ ) has been worked out for $\pi\in[{0.1,0.9}]$ by taking practical example with parameters $\omega^{\ast}=0.7,0.8,0.9$ , $T^{\ast}=0.4$ and $p^{\ast}=0.7$ where the optimal value of $\omega^{\ast}$ , $T^{\ast}$ and $p^{\ast}$ have been chosen. We have checked the condition Eq. (20) numerically and found that the condition holds for both cases when $\pi_{y}\leqslant 0.5$ and $\pi_{y}>0.5$ . As a result, whatever the value of $\pi_{y}$ is taken, it is observed that inequality always holds for the proposed estimator. Hence, the proposed estimator is always efficient than Tiwari and Mehta (2016) estimator.

To have an idea of PRE achieved by using the proposed procedure, we have taken the data. In the condition Eq. (20), $\omega^{\ast}$ , $p^{\ast}$ , and $T^{\ast}$ are known quantities before conducting a survey. Thus, based on the analytical comparisons the proposed estimator can be made more useful than the estimator of Tiwari and Mehta (2016).

To find out the value of relative efficiency of the proposed estimator over the estimator of Tiwari and Mehta (2016), we computed the relative efficiency values by using Eq. (19).

When the sensitivity level $({\omega^{\ast}})$ is 70%, consider that for the first deck of cards, 40% of cards may belong to sensitive characteristic and rest 60% of cards bear the statement that go to the second RR device (which is the second deck of cards). In the second deck 70% of cards bear the statement “Are you a member of the group ‘ $S$ ’?” and the remaining 30% of cards bear the statement “Are you a member of group ’ $Y$ ’?” (i.e. unrelated question). The range of true proportion of the sensitive characteristic is $\pi\in[{0.1,0.9}]$ and that of unrelated characteristic is $\pi_{y}\in[{0.1,0.9}]$ .

Table 1 provides the detailed percent relative efficiency results. Statistical software SAS was used in finding the results in Table 1. In this study, we considered $\pi$ in the range of 0.1 to 0.9 by a step lag of 0.1 and that of non-sensitive attribute $\pi_{y}$ in the range of 0.1 to 0.9 by a step lag of 0.1, so a total of 9 $\times$ 9 $=$ 81 relative efficiencies.

Table 1

Percent relative efficiency of the proposed model over the model for Tiwari (2016) estimator for $p^{\ast}=$ 0.7, $T^{\ast}=$ 0.4 and $\omega^{\ast}=$ 0.7, 0.8, 0.9

$\omega^{*}$	$\pi$	$\pi_{y}$
		0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
0.7	0.1	243.44	219.27	200.04	184.41	171.47	160.59	151.35	143.41	136.53
	0.2	178.99	170.36	162.82	156.21	150.38	145.22	140.62	136.53	132.87
	0.3	155.94	151.74	147.98	144.62	141.61	138.93	136.53	134.40	132.51
	0.4	144.44	142.40	140.60	139.03	137.68	136.53	135.57	134.80	134.21
	0.5	137.93	137.31	136.88	136.61	136.53	136.61	136.88	137.31	137.93
	0.6	134.21	134.80	135.57	136.53	137.68	139.03	140.60	142.40	144.44
	0.7	132.51	134.40	136.53	138.93	141.61	144.62	147.98	151.74	155.94
	0.8	132.87	136.53	140.62	145.22	150.38	156.21	162.82	170.36	178.99
	0.9	136.53	143.41	151.35	160.59	171.47	184.41	200.04	219.27	243.44
0.8	0.1	271.24	240.95	217.55	198.96	183.87	171.41	160.97	152.11	144.54
	0.2	195.43	184.63	175.38	167.41	160.50	154.47	149.18	144.54	140.45
	0.3	167.99	162.74	158.11	154.04	150.46	147.30	144.54	142.13	140.05
	0.4	154.26	151.67	149.43	147.51	145.88	144.54	143.46	142.63	142.05
	0.5	146.48	145.63	145.02	144.66	144.54	144.66	145.02	145.63	146.48
	0.6	142.05	142.63	143.46	144.54	145.88	147.51	149.43	151.67	154.26
	0.7	140.05	142.13	144.54	147.30	150.46	154.04	158.11	162.74	167.99
	0.8	140.45	144.54	149.18	154.47	160.50	167.41	175.38	184.63	195.43
	0.9	144.54	152.11	160.97	171.41	183.87	198.96	217.55	240.95	271.24
0.9	0.1	302.02	264.68	236.63	214.85	197.49	183.37	171.70	161.93	153.67
	0.2	213.96	200.61	189.41	179.93	171.83	164.87	158.87	153.67	149.17
	0.3	181.69	175.19	169.57	164.69	160.47	156.81	153.67	150.99	148.72
	0.4	165.48	162.24	159.48	157.15	155.22	153.67	152.47	151.61	151.08
	0.5	156.30	155.14	154.32	153.83	153.67	153.83	154.32	155.14	156.30
	0.6	151.08	151.61	152.47	153.67	155.22	157.15	159.48	162.24	165.48
	0.7	148.72	150.99	153.67	156.81	160.47	164.69	169.57	175.19	181.69
	0.8	149.17	153.67	158.87	164.87	171.83	179.93	189.41	200.61	213.96
	0.9	153.67	161.93	171.70	183.37	197.49	214.85	236.63	264.68	302.02

Table 1 gives the PRE of the proposed model over the model for Tiwari and Mehta (2016) for $p^{\ast}=0.7$ , $T^{\ast}=0.4$ , and $\omega^{\ast}=0.7,0.8,0.9$ . In this table, for different values of $\pi$ and $\pi_{y}$ , PRE values are given where $\pi$ ranges from 0.1 to 0.9 with step lag of 0.1 and $\pi_{y}$ ranges from 0.1 to 0.9 with step lag of 0.1. From Table 1, it is seen that for a fixed value of $\pi$ , to exemplify, $\pi=0.1$ , as the $\pi_{y}$ increases, the value of PRE decreases gradually. Similarly, for $\pi<0.5$ , the dramatic decline in PRE with respect to $\pi_{y}$ is observed. For $\pi>0.5$ , the reverse trend is followed by PRE i.e. there is a significant increase in PRE with an increase in the value of $\pi_{y}$ . A similar trend is followed by the values of RE for $\omega^{\ast}=0.8$ and $\omega^{\ast}=0.9$ .

Table 2

Descriptive statistic of percent relative efficiency values for different values of $\pi$ (For $\omega^{\ast}=$ 0.7, 0.8, 0.9)

$\omega^{*}$	$\pi$	Mean	Standard deviation	Minimum	Median	Maximum
0.7	0.1	178.90	36.20	136.50	171.50	243.40
	0.2	152.67	15.73	132.87	150.38	178.99
	0.3	142.69	8.02	132.51	141.61	155.94
	0.4	138.36	3.54	134.21	137.68	144.44
	0.5	137.11	0.54	136.53	136.88	137.93
	0.6	138.36	3.54	134.21	137.68	144.44
	0.7	142.69	8.02	132.51	141.61	155.94
	0.8	152.67	15.73	132.87	150.38	178.99
	0.9	178.90	36.20	136.50	171.50	243.40
0.8	0.1	193.50	42.80	144.50	183.90	271.20
	0.2	163.56	18.74	140.45	160.50	195.43
	0.3	151.93	9.57	140.05	150.46	167.99
	0.4	146.83	4.24	142.05	145.88	154.26
	0.5	145.35	0.75	144.54	145.02	146.48
	0.6	146.83	4.24	142.05	145.88	154.26
	0.7	151.93	9.57	140.05	150.46	167.99
	0.8	163.56	18.74	140.45	160.50	195.43
	0.9	193.50	42.80	144.50	183.90	271.20
0.9	0.1	209.60	50.00	153.70	197.50	302.00
	0.2	175.81	22.07	149.17	171.83	213.96
	0.3	162.42	11.30	148.72	160.47	181.69
	0.4	156.49	5.02	151.08	155.22	165.48
	0.5	154.76	1.02	153.67	154.32	156.30
	0.6	156.49	5.02	151.08	155.22	165.48
	0.7	162.42	11.30	148.72	160.47	181.69
	0.8	175.81	22.07	149.17	171.83	213.96
	0.9	209.60	50.00	153.70	197.50	302.00

Table 2 summarizes the PRE values for 70% sensitivity level and $\pi$ in the range of 0.1 to 0.9 with a step lag of 0.1 for different values of $\pi_{y}$ with a minimum value of 0.1 and the maximum value of 0.9. Taking a closer view, it is seen that the PRE changes from its minimum value of 136.5% to the maximum of 243.4% having a median value of 171.5% and an average of 178.9% with a standard deviation of 36.2% for $\pi=0.1$ i.e. 10% of the population belongs to the sensitive group, ‘ $S$ ’. Likewise, for $\pi=0.2$ (i.e. 20% of the population belongs to the sensitive group ‘ $S$ ’), the mean PRE value is 152.67%, a standard deviation of 15.73% with a minimum value of 132.87% and a maximum of 178.99%. Similarly, Table 2 can be interpreted for sensitivity levels 80% and 90%, respectively.

Figure 1.

Percent relative efficiency (PRE) versus $\pi$ versus $\pi_{y}$ for $\omega^{*}=$ 0.7. Note: Relative Efficiency has been calibrated in percentage, it is denoted by PRE in graphs that represents the percent relative efficiency.

Figure 1 provides three-dimensional visualization of the behavior of the PRE values as a function of $\pi$ and $\pi_{y}$ for $\omega^{\ast}=0.7$ (70% sensitivity level). Such a three-dimensional visualization makes it very clear that for a given value of $\pi_{y}<0.5$ , as the value of $\pi$ increases, the PRE values go on decreasing. For a given value of $\pi_{y}>0.5$ , as the value of $\pi$ increases, the PRE values also go on increasing. But for a given value of $\pi_{y}=0.5$ , as the value of $\pi$ increase the PRE behave like a parabolic curve, thus we can concluded that for $\pi_{y}<0.5$ , the value of PRE decreases than for $\pi_{y}>0.5$ and then PRE goes on increasing to the maximum value (243.44%). Similarly, Figs 2 and 3 can be interpreted as they behave similarly for different values of $\omega$ .

Figure 2.

Percent relative efficiency (PRE) versus $\pi$ versus $\pi_{y}$ for $\omega^{*}=$ 0.8.

Figure 3.

Percent relative efficiency (PRE) versus $\pi$ versus $\pi_{y}$ for $\omega^{*}=$ 0.9.

The value of $p^{\ast}$ should be chosen as close to 1 as it is possible without threatening the degree of co-operation by the respondent. The choice of $\pi_{y}$ has been recommended close to 0 or 1 according to the $\pi<0.5\text{ or }>0.5$ . For $\pi=0.5$ , the maximum variance occurs at the tails of $\pi_{y}$ .

As the $\omega^{\ast}$ increases from 0.7 to 0.9, the average relative efficiency of the proposed estimator increases from 178.9% to 209.6%. It concludes proposed estimator is more efficient when the sensitive level $({\omega^{\ast}})$ is high.

Case II: The randomization device developed in case-II of Section-3 needs an efficiency comparison to show that the proposed device is better than Mangat (1992) device. A comparison with Mangat (1992) for unknown unrelated character devices will be discussed empirically in this section. The estimator $\hat{\pi}_{mk}$ based on the proposed randomization device for the unknown value of an unrelated character ( $\pi_{y}$ ) will be more efficient than the estimator given by Mangat (1992), $\hat{\pi}_{m}$ .

$\displaystyle\frac{\min[{V(\hat{\pi}_{m})}]}{\min[{V(\hat{\pi}_{mk})}]}>1$ (21)

The values of minimum variance for estimators $\hat{\pi}_{m}$ and $\hat{\pi}_{mk}$ are taken from Eqs (8) and (18). By the above calculations, we cannot conclude which model is better. So, to compare models, we have to check the efficiencies empirically. For this, we are computing the percent relative efficiency values as

$\displaystyle\textit{PRE}=\frac{V(\hat{\pi}_{m})}{V(\hat{\pi}_{mk})}\times 100\%$ (22)

For computing the relative efficiency numerically, we take the value of the true proportion of the sensitive character $\pi\in[{0.1,0.9}]$ and the range of unrelated characteristic $\pi_{y}\in[{0.1,0.9}]$ . For Eq. (22), we have $p_{1}^{\ast}$ , $T_{1}^{\ast}$ , $\omega_{1}^{\ast}$ and $p_{2}^{\ast}$ , $T_{2}^{\ast}$ , $\omega_{2}^{\ast}$ as device parameters. By good guess idea, we have taken sensitivity levels as 60% and 70% for the first sample and second sample, respectively. In the first sample, we consider 40% proportion of cards that bear the statement of sensitive characteristic, and the rest 60% of cards bear the statement “Go to the second RR device”.

In the second deck of cards, 65% bear sensitive character statement and rest of cards bear statement which is unrelated to the sensitive characteristic. Similarly, for the second sample with a sensitivity level of 70%, the 70% of the cards of the first deck and 50% cards of the second deck are considered to belong to the sensitive character.

Table 3 has shown the detailed relative efficiency results. Statistical software SAS used in finding the results in Table 1. In this study, we considered $\pi$ ranging from 0.1 to 0.9 with respect to $\pi_{y}$ , which range from 0.1 to 0.9 with step lag 0.1 and fixed $9\times 9=81$ relative efficiencies for the different choices of $\pi$ and $\pi_{y}$ .

Table 3

Percent relative efficiency of the proposed model over the model for Mangat (1992) for $\pi$ and $\pi_{y}$ , when $p_{1}^{\ast}=$ 0.65, $T_{1}^{\ast}=$ 0.4, $\omega_{1}^{\ast}=$ 0.6, $p_{2}^{\ast}=$ 0.6, $T_{2}^{\ast}=$ 0.5 and $\omega_{2}^{\ast}=$ 0.7

$\pi$	$\pi_{y}$
	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
0.1	465.65	491.32	510.26	524.32	534.72	542.28	547.58	551.03	552.93
0.2	451.61	465.65	477.01	486.10	493.27	498.77	502.80	505.52	507.05
0.3	449.82	458.52	465.65	471.38	475.83	479.10	481.27	482.42	482.60
0.4	452.46	457.96	462.34	465.65	467.96	469.30	469.70	469.19	467.77
0.5	458.31	461.54	463.83	465.20	465.65	465.20	463.83	461.54	458.31
0.6	467.77	469.19	469.70	469.30	467.96	465.65	462.34	457.96	452.46
0.7	482.60	482.42	481.27	479.10	475.83	471.38	465.65	458.52	449.82
0.8	507.05	505.52	502.80	498.77	493.27	486.10	477.01	465.65	451.61
0.9	552.93	551.03	547.58	542.28	534.72	524.32	510.26	491.32	465.65

Table 4

Descriptive statistics of percent relative efficiency values for different values of $\pi_{y}$ when $p_{1}^{\ast}=$ 0.65, $T_{1}^{\ast}=$ 0.4, $\omega_{1}^{\ast}=$ 0.6, $p_{2}^{\ast}=$ 0.6, $T_{2}^{\ast}=$ 0.5 and $\omega_{2}^{\ast}=$ 0.7

$\pi_{y}$	Mean	Standard deviation	Minimum	Median	Maximum
0.1	476.50	34.00	449.80	465.70	552.90
0.2	482.60	30.40	458.00	469.20	551.00
0.3	486.71	28.49	462.34	477.01	547.58
0.4	489.12	27.61	465.20	479.10	542.28
0.5	489.91	27.36	465.65	475.83	534.72
0.6	489.12	27.61	465.20	479.10	542.28
0.7	486.71	28.49	462.34	477.01	547.58
0.8	482.60	30.40	458.00	469.20	551.00
0.9	476.50	34.00	449.80	465.70	552.90

Table 4 results the PRE values for a fixed value of $p_{1}^{\ast}=0.65$ , $T_{1}^{\ast}=0.4$ , $\omega_{1}^{\ast}=0.6$ , $p_{2}^{\ast}=0.6$ , $T_{2}^{\ast}=0.5$ and $\omega_{1}^{\ast}=0.7$ and for $\pi_{y}$ in the range of 0.1 to 0.9 with a step lag of 0.1 for different 9 values $\pi$ with a minimum value of 0.1 and a maximum value of 0.9. It can be seen that for a fixed value of $\pi_{y}=0.1$ , the minimum value of PRE is 449.8% and maximum value of PRE is 552.9% having the mean PRE value of 476.5% with a standard deviation of 34%. Likewise, for $\pi_{y}=0.2$ , the mean PRE value is 482.6% with a standard deviation of 30.4% with a minimum value of 458% and a maximum of 551%, and in the same way, the results from Table 1 can be interpreted.

Figure 4.

Percent relative efficiency (PRE) versus $\pi$ versus $\pi_{y}$ for $\omega_{1}^{*}=$ 0.6 and $\omega_{2}^{*}=$ 0.7.

Figure 4 provides three-dimensional visualization of the behavior of the relative efficiency values as a function of $\pi$ and $\pi_{y}$ for $p_{1}^{\ast}=0.65$ , $T_{1}^{\ast}=0.4$ , $\omega_{1}^{\ast}=0.6$ , $p_{2}^{\ast}=0.6$ , $T_{2}^{\ast}=0.5$ and $\omega_{1}^{\ast}=0.7$ . Such a three-dimensional visualization makes it very clear that for any given value of $\pi<0.5$ , as the value of $\pi_{y}$ incline, the PRE values go on inclining. For a given value of $\pi>0.5$ as the value of $\pi_{y}$ inclines, PRE goes on declining. From another point of view, for any value of $\pi_{y}$ , first, the PRE values go on decreasing, after reaching the minimum value it goes on increasing with an increase in the value of $\pi$ . But for a given value of $\pi_{y}=0.5$ , as the value of $\pi$ increases, the PRE behaves like a downward parabolic curve.

From Fig. 4, as for any value of $\pi_{y}$ , the plot does not meet the axes i.e. PRE will be more than 100%. So for every value of $\pi_{y}$ , the proposed estimator for unknown $\pi_{y}$ is more efficient than the estimator given by Mangat (1992).

5. Discussions and conclusions

In this section, the proposed estimator has been compared with the existing estimator of Tiwari and Mehta (2016) and Mangat (1992). They are compared based on relative efficiency. Numerically, the PRE is calculated for a particular example having a specific values of parameters. The three-dimensional scatter plots are given for better illustration of PRE with respect to $\pi$ and $\pi_{y}$ . Both the cases of unrelated characteristics have given the same results. Under the empirical study, it is concluded that whatever the value of $\pi_{y}$ , PRE always increasing. In the case of sensitivity level, for respondents who feel asked question is highly sensitive, the proposed estimator works with greater efficiency in the estimator of people who possess the sensitive characteristic ‘ $S$ ’.

From both cases, the given result showed that the proposed estimator is always more efficient than the usual models given by Tiwari and Mehta (2016) and Mangat (1992).

References

Greenberg

B.G.

Abul-Ela

A.L.

Simmons

W.R.

, & Horvitz

D.G.

(1969). The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association, 64, 520–539.

Gupta

S.N.

(2001). Qualifying the sensitivity level of binary response personal interview survey questions. Journal of Combinatorics, Information and System Sciences, 26(1–4), 101–109.

Kim

J.H.

Ryu

J.B.

, & Lee

G.S.

(1992). A new two-stage randomized response model. The Korean Journal of Applied Statistics, 5, 157–167.

Mangat

N.S.

(1991). An optional randomized response sampling technique using non-stigmatized attributes. Statistica, 51(4), 595–602.

Mangat

N.S.

(1992). Two-stage randomized response sampling procedure using unrelated questions. Journal of the Indian Society of Agricultural Statistics, 44(1), 82–87.

Mangat

N.S.

(1994). An improved randomized response strategy. Journal of Royal Statistical Society, Series B56(1), 93–95.

Mangat

N.S.

, & Singh

(1990). An alternative randomized response procedure. Biometrika, 77, 439–442.

Mangat

N.S.

, & Singh

(1991). Alternative approach to randomized response survey. Statistica, LI(3), 327–332.

Mangat

N.S.

, & Singh

(1994). An optional randomized response sampling techniques. Journal of Indian Statistical Association, 32, 71–75.

10.

Mangat

N.S.

, & Singh

(1995). A note on the inverse binomial randomized response procedure. Journal of the Indian Society of Agricultural Statistics, 47, 21–25.

11.

Mangat

N.S.

Singh

, & Singh

(1992). An improved unrelated question randomized response strategy. Calcutta Statistical Association Bulletin, 42, 277–281.

12.

Mangat

N.S.

Singh

, & Singh

(1993). On the use of a modified randomization device in randomized response inquiries. Metron, 51, 211–216.

13.

Mangat

N.S.

Singh

, & Singh

(1995). On the use of a modified randomization device in Warner’s model. Journal of Indian Society of Statistics & Operations Research, 16, 65–69.

14.

Moors

J.J.A.

(1971). Optimization of the unrelated question randomized response model. Journal of the American Statistical Association, 66, 627–629.

15.

Odumade

, & Singh

(2009). Efficient use of two decks of cards in randomized response sampling. Communications in Statistics – Theory and Methods, 38(4), 439–446.

16.

Tiwari

, & Mehta

(2016). An improved two-stage optional RRT model. Journal of the Indian Society of Agricultural Statistics, 70(3), 197–203.

17.

Singh

(1976). A note on randomized response technique. Journal of the American Statistical Association. Social Statistical Section, 772.

18.

Singh

, & Joarder

A.H.

(1997). Optional randomized response technique for sensitive quantitative variable. Metron, LV, 151–157.

19.

Sihm

J.S.

, & Gupta

(2014). A two-stage binary optional randomized response model. Communications in Statistics-Simulation and Computation, 44(9), 2278–2296.

20.

Singh

, & Singh

(1992). An alternative estimator for randomized response technique. Journal of the Indian Society of Agricultural Statistics, 44, 149–154.

21.

Warner

S.L.

(1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63–69.