Kuk’s Model Adjusted for Protection and Efficiency

Abstract

In this article, we adjust the Kuk randomized response model for collecting information on a sensitive characteristic for increased protection and efficiency by making use of forced “yes” and forced “no” responses. We first describe Kuk’s model and then the proposed adjustment to Kuk’s model. Next, by means of a simulation study, we compare the efficiency of the adjusted Kuk model relative to the pioneer Kuk model while maintaining at least equal protection of respondents.

Keywords

randomized response technique estimation of proportion forced responses

Introduction

The collection of data on sensitive characteristics from human populations is not an easy task. For example, sensitive questions such as (a) Are you an Alawite? (b) Are you gay? (c) Have you ever molested a child? (d) Have you underreported your income on your tax return? (e) Do you smoke marijuana? (f) Have you ever cheated on an exam? (g) Are you a Baath Party Member? (h) Have you ever been involved in a crime? and so on, are not likely to be responded to honestly by the respondents if asked using direct question survey methods.

The randomized response technique (RRT) was first introduced by Warner (1965) to deal with the problem of estimating the proportion of individuals possessing a sensitive characteristic in a finite population. This technique enables respondents to provide truthful information anonymously on sensitive or highly personal questions without endangering their privacy. His design involves the use of two questions or statements, each of which divides the population into two mutually exclusive and complementary groups (say) A and not-A. In order to estimate π, the proportion of the population in the sensitive group A, each person appearing in a simple random sampling with replacement (SRSWR) sample is given a suitable randomization device. Using the device, the respondent is presented with one of the two statements of the form:

(i) I belong to group A (ii) I do not belong to group A.

The statements (i) and (ii) have known relative frequencies P and (1 − P), respectively, in the randomization device. The respondent answers “yes” or “no” according to the statement randomly selected and to his or her actual status with respect to membership in A, without revealing to the interviewer which statement has been chosen. In this way, the respondent preserves his or her privacy. Note that the true probability of a “yes” answer, $θ_{W}^{*}$ , is given by:

θ_{W}^{*} = P π + (1 - P) (1 - π) .

Suppose n* persons in the sample answered “yes” and n – n* answered “no.” An unbiased estimator ${\hat{θ}}_{W}^{*}$ of the probability of “yes” answer, $θ_{W}^{*}$ , is given by n*/n. Then, using P and the observed proportion of “yes” answers, the maximum likelihood estimator (MLE) for π is obtained as:

{\hat{π}}_{w} = \frac{{\hat{θ}}_{W}^{*} - (1 - P)}{2 P - 1}, P \neq 0 .5 .

The estimator is unbiased if the respondents are persuaded to respond truthfully. Since $n {\hat{θ}}_{W}^{*}$ , the number of “yes” responses in the sample, follows a binomial distribution $B (n, θ_{W}^{*}),$ with parameters n and $θ_{W}^{*}$ , the variance of the estimator ${\hat{π}}_{W}$ is given by:

V ({\hat{π}}_{W}) = \frac{π (1 - π)}{n} + \frac{P (1 - P)}{n (2 P - 1)^{2}} .

Clearly, the first part in equation (3) is the usual binomial variance associated with a direct question and truthful replies by all respondents and the second part is the additional variance due to the randomization device.

Warner (1965) noted that his estimator could assume values outside the closed interval [0, 1]. This drawback was first removed by Singh (1976) by proposing an estimator known as a shrinkage estimator, and later Raghavarao (1978) also made an attempt in this direction and developed a nonlinear biased estimator. Lee, Sedory, and Singh (2013) have pointed out that the Warner (1965) estimator remains an MLE for large sample sizes. They have provided minimum sample sizes that are required by the Warner (1965) estimator to attain values within the unit interval [0, 1]. They also reported that for small sample sizes the modification of the Warner (1965) estimator suggested by Singh (1976) remains valid.

Tracy and Osahan (1999) studied a new partial randomized response strategy in comparison to the Mangat and Singh (1990) model. Mangat and Singh (1994) provide a modified version of the Warner (1965) randomized design where the respondent is free to give an answer in terms of “yes” and “no” either by using a randomized response device or without using it. Singh et al. (1994) have suggested a two-stage RRT to estimate the proportion of population possessing a sensitive attribute. Odumade and Singh (2009) and Singh and Sedory (2011) have investigated a new estimator for estimating the proportion of a sensitive attribute by making use of two decks of cards. There are many models that have been developed and compared with the Warner (1965) model by several researchers, for example, refer to Mangat and Singh (1990), Mangat (1994), Gjestvang and Singh (2006), Barabesi, Diana, and Perri (2013, 2014), Diana, Giordan, and Perri (2013), and Abdelfatah and Mazloum (2014) among others. An extensive review of literature on randomized response sampling can be found in monographs by Chaudhuri (2011) and Chaudhuri and Christofides (2013). Nayak (1994) showed that most of the newly proposed randomized response models as of 1994 are as efficient as the Warner (1965) estimator at equal protection of respondents. We note that many models such as Warner (1965), Mangat and Singh (1990), and Mangat (1994) are special cases of another ingenious model due to Kuk (1990). This motivated the authors to think about trying to improve Kuk’s model from both the protection and efficiency points of views.

Kuk (1990) proposed a randomized response model that makes use of two randomization devices. The first randomization device, R ₁ (say), has two possible outcomes, say a deck of cards each card bearing one of two possible questions: (i) Are you a member of group A? and (ii) Are you a member of group A^c ? with known probabilities θ₁ and (1 − θ₁), respectively. The second randomization device, R ₂ (say), has the same two possible outcomes, say a deck of cards, each card bearing one of two possible questions: (i) Are you a member of group A^c ? and (ii) Are you a member of group A? With known probabilities θ₂ and (1 – θ₂), respectively. Assume that an SRSWR sample of n respondents is selected from the population of interest. Each respondent selected in the sample is provided with both randomization devices, R ₁ and R ₂, along with instructions on how to make use of these devices. Each respondent is also given the instruction that, if he or she belongs to the sensitive group A, then he or she should make use of the first randomization device R ₁, while if he or she belongs to the nonsensitive group A^c , then he or she should make use of the second randomization device R ₂, without disclosing to which group, A or A^c , he or she belongs. That is, the choice between the two randomization devices R ₁ and R ₂ is being made by the interviewee unobserved by the interviewer. Hence, the privacy of the respondent is maintained.

The true probability of a “yes” answer $θ_{K}^{*}$ is given by:

θ_{K}^{*} = {π θ}_{1} + (1 - π) θ_{2},

and the maximum likelihood and unbiased estimator of π is given by:

{\hat{π}}_{K} = \frac{{\hat{θ}}_{K}^{*} - θ_{2}}{θ_{1} - θ_{2}}, θ_{1} \neq θ_{2},

where $n {\hat{θ}}_{K}^{*}$ is the number of “yes” responses in the sample. The variance of the estimator ${\hat{π}}_{K}$ is given by:

V ({\hat{π}}_{K}) = \frac{θ_{K}^{*} (1 - θ_{K}^{*})}{n (θ_{1} - θ_{2})^{2}},

Note that if θ₁ = P and θ₂ = (1 – P), then Kuk’s randomized response model reduces to the Warner (1965) model. If θ₁ = 1 and θ₂ = (1 – P), then Kuk’s model reduces to the Mangat (1994) model, and so on.

Adjusted Kuk’s Randomized Response Model

We consider selecting an SRSWR sample of n respondents from the given population of interest. Each respondent in the sample of n respondents is provided with two randomization devices, D ₁ and D ₂. The randomization device D ₁ consists of a deck of cards, each card bearing one of two types of statements: (i) Use randomization device F ₁ and (ii) Use randomization device $F_{1}^{c}$ , with probabilities θ₁ and (1 – θ₁), respectively. The randomization device D ₂ consists of a deck of cards, each card bearing one of two statements: (i) Use randomization device F ₂ and (ii) Use randomization device $F_{2}^{c}$ , with probabilities θ₂ and (1 – θ₂), respectively. Each respondent is instructed to use the first device D ₁ if he or she belongs to the sensitive group A, and to use the second device D ₂ if he or she belongs to the nonsensitive group A^c . The device F ₁ mentioned by the first outcome of device D ₁ consists of two possible mutually exclusive statements: (i) Say “yes” and (ii) say “no” with probabilities P ₁ and (1 – P ₁), respectively. The device $F_{1}^{c}$ mentioned by the second outcome of device D ₁ also consists of two possible mutually exclusive statements: (i) Say “yes” and (ii) say “no,” but with probabilities T ₁ and (1 – T ₁), respectively. Similarly, the device F ₂ mentioned by the first outcome of device D ₂ consists of two possible mutually exclusive statements: (i) Say “yes” and (ii) say “no” with probabilities P ₂ and (1 – P ₂), respectively. The device $F_{2}^{c}$ mentioned by the second outcome of device D ₂ also consists of two possible mutually exclusive statements: (i) Say “yes” and (ii) say “no,” but with probabilities T ₂ and (1 – T ₂), respectively. A pictorial representation of such a proposed forced randomized response model is given in Figure 1.

Figure 1.

Flow chart of the adjusted Kuk randomized response model.

In the adjusted Kuk model, the probability of a “yes” answer is given by:

\begin{aligned} θ_{c}^{*} = P (Y e s) & = π [θ_{1} P_{1} + (1 - θ_{1}) T_{1}] + (1 - π) [θ_{2} P_{2} + (1 - θ_{2}) T_{2}] \\ = π [θ_{1} (P_{1} - T_{1}) - θ_{2} (P_{2} - T_{2}) + (T_{1} - T_{2})] + θ_{2} P_{2} \\ + (1 - θ_{2}) T_{2} . \end{aligned}

Let X be the number of observed “yes” answers in the SRSWR sample of n respondents. Obviously $X ~ B (n, θ_{c}^{*})$ , that is, X follows binomial distribution with parameters n and $θ_{c}^{*}$ . Hence, the probability of observing x “yes” answers out of n responses is given by:

P (X = x) = (\begin{matrix} n \\ x \end{matrix}) (θ_{c}^{*})^{x} (1 - θ_{c}^{*})^{n - x} .

The log likelihood function is given by:

log P (X = x) = log (\begin{matrix} n \\ x \end{matrix}) + x log (θ_{c}^{*}) + (n - x) log (1 - θ_{c}^{*}) .

On setting

\frac{\partial log P (X = x)}{\partial π} = 0,

we get

{\hat{θ}}_{c}^{*} = \frac{x}{n},

as the MLE of the probability of a “yes” answer.

By the method of moments, we have the following theorem.

Theorem 1: An unbiased estimator of π is given as follows:

{\hat{π}}_{C} = \frac{{\hat{θ}}_{c}^{*} - θ_{2} P_{2} - (1 - θ_{2}) T_{2}}{θ_{1} (P_{1} - T_{1}) - θ_{2} (P_{2} - T_{2}) + (T_{1} - T_{2})},

where ${\hat{θ}}_{c}^{*} = \frac{x}{n}$ is the observed proportion of “yes” answers in the sample.

Proof: Obvious since $E ({\hat{θ}}_{c}^{*}) = θ_{c}^{*}$ .

Theorem 2: The variance of the proposed estimator ${\hat{π}}_{c}$ is given as follows:

V ({\hat{π}}_{C}) = \frac{θ_{c}^{*} (1 - θ_{c}^{*})}{n [θ_{1} (P_{1} - T_{1}) - θ_{2} (P_{2} - T_{2}) + (T_{1} - T_{2})]^{2}} .

Proof: Note that $X ~ B (n, θ_{c}^{*})$ and all the trials are independent and identical, thus, by the definition of variance, we have

\begin{aligned} V ({\hat{π}}_{c}) = V [\frac{{\hat{θ}}_{c}^{*} - θ_{2} P_{2} - (1 - θ_{2}) T_{2}}{θ_{1} (P_{1} - T_{1}) - θ_{2} (P_{2} - T_{2}) + (T_{1} - T_{2})}] \\ = \frac{V ({\hat{θ}}_{c}^{*})}{{θ_{1} (P_{1} - T_{1}) - θ_{2} (P_{2} - T_{2}) + (T_{1} - T_{2})}^{2}} \\ = \frac{θ_{c}^{*} (1 - θ_{c}^{*})}{n {θ_{1} (P_{1} - T_{1}) - θ_{2} (P_{2} - T_{2}) + (T_{1} - T_{2})}^{2}} . \end{aligned}

Hence the theorem.

Theorem 3: An unbiased estimator of the variance of the proposed estimator ${\hat{π}}_{c}$ is given as follows:

\hat{V} ({\hat{π}}_{c}) = \frac{{\hat{θ}}_{c}^{*} (1 - {\hat{θ}}_{c}^{*})}{(n - 1) [θ_{1} (P_{1} - T_{1}) - θ_{2} (P_{2} - T_{2}) + (T_{1} - T_{2})]^{2}} .

Proof: Obvious because $E [{\hat{θ}}_{c}^{*} (1 - {\hat{θ}}_{c}^{*})] = (n - 1) V ({\hat{θ}}_{c}^{*})$ .

Measures of Respondents’ Privacy Protection

Following Lanke (1975), we use his measure of protection of respondents, for Kuk’s and the proposed models defined as follows. The conditional probability in Kuk’s model that a respondent reporting “yes” also belongs to the sensitive group A is given by:

P_{K} (A | Y e s) = \frac{P (A \cap Y e s)}{P (Y e s)} = \frac{θ_{1} π}{θ_{K}^{*}} .

The conditional probability in Kuk’s model that a respondent reporting “no” also belongs to the sensitive group A is given by:

P_{K} (A | N o) = \frac{P (A \cap N o)}{P (N o)} = \frac{(1 - θ_{1}) π}{(1 - θ_{K}^{*})} .

The least protection (or greatest incrimination) of a respondent reporting either “yes” or “no” while experiencing Kuk’s model is given by:

P R O T K = M a x [P_{K} (A | Y e s), P_{K} (A | N o)] .

The conditional probability that a respondent reporting “yes” with the adjusted Kuk randomized response model also belongs to the sensitive group A is given as follows:

P_{c} (A | Y e s) = \frac{P (A \cap Y e s)}{P (Y e s)} = \frac{[P_{1} θ_{1} + (1 - θ_{1}) T_{1}] π}{θ_{c}^{*}} .

The conditional probability that a respondent reporting “no” with the adjusted Kuk randomized response model also belongs to the sensitive group A is given by:

P_{c} (A | N o) = \frac{P (A \cap N o)}{P (N o)} = \frac{[θ_{1} (1 - P_{1}) + (1 - θ_{1}) (1 - T_{1})] π}{1 - θ_{c}^{*}} .

The least protection of a respondent reporting either “yes” or “no” while experiencing the adjusted Kuk randomized response model is given by:

P R O T C = M a x [P_{c} (A | Y e s), P_{c} (A | N o)] .

Comparison of the Models

We define the percentage relative protection (RP) of the adjusted Kuk randomized response model with respect to Kuk’s randomized response model as:

R P = \frac{P R O T K}{P R O T C} \times 100 % .

We also define the percentage relative efficiency (RE) of the adjusted Kuk randomized response model with respect to the pioneer Kuk model as:

R E = \frac{V ({\hat{π}}_{K})}{V ({\hat{π}}_{c})} \times 100 % .

For given values of θ₁, and θ₂ in Kuk’s model, we did a grid search for different choices of values of P ₁, P ₂, T ₁, and T ₂ such that

R E, > 100 % a n d R P, \geq 100 %

The Statistical Analysis System (SAS) codes used to produce results shown in Figure 2 are given in Supplementary Appendix A. Search results, given in Figure 2, demonstrate that for the given choice of parameters in Kuk’s model (say θ₁ = 0.7 and θ₂ = 0.2), and for each choice of π where 0.1 ≤ π ≤ 0.9 taken with a step of 0.1 there exists a choice of parameters P ₁, P ₂, T ₁ and T ₂ where both criteria are met. One observation that can be made from Figure 2 is that if π is close to 0, then values of RP may be notably higher than 100% in many situations, but when π is close to 0.9, then the value of RP does not differ very much from 100%. This means that if a characteristic is very sensitive and does not occur often in the population of interest, then it is feasible to construct a randomized response model that is more protective than its competitors and more efficient, but that if the prevalence of the sensitive characteristic across the population is great, then it is difficult to differentiate between the two randomized response models on the basis of privacy protection. To investigate these situations in more detail, we note that there were 1,935 cases where both the criterions are met. We provide in Tables 1 and 2 the descriptive statistics of the percentage RP and percentage RE for a range of values of the parameter of interest, 0.1 ≤ π ≤ 0.9 with a step of 0.1.

Figure 2.

Relative efficiency (RE) versus relative protection (RP) for 0.1 ≤ π ≤ 0.9 with different choice of other parameters 0.1 ≤ P ₁, P ₂, T ₁, T ₂ ≤ 0.9, θ₁ = 0.7, and θ₂ = 0.2.

Table 1.

Descriptive Statistics of the Percentage Relative Protection (RP).

π	Frequency	M	SD	Min	Med	Max
0.1	103	106.36	4.80	100.00	106.00	117.60
0.2	123	105.99	4.74	100.00	104.85	117.19
0.3	154	105.57	4.35	100.00	104.94	116.00
0.4	181	104.97	3.90	100.00	104.13	116.67
0.5	204	104.49	3.66	100.00	103.70	114.07
0.6	244	103.73	2.86	100.00	102.91	111.38
0.7	274	102.84	2.21	100.00	102.26	108.61
0.8	308	101.98	1.56	100.00	101.63	106.30
0.9	344	101.05	0.80	100.00	100.68	103.38

Note: M = mean; SD = standard deviation; Min = minimum; Med = medium; Max = maximum.

Table 2.

Descriptive Statistic of the Percentage Relative Efficiency (RE).

π	Frequency	M	SD	Min	Med	Max
0.1	103	113.86	10.86	101.53	112.55	140.26
0.2	123	114.85	11.66	100.08	111.13	144.90
0.3	154	116.21	12.96	100.07	112.64	150.48
0.4	181	118.01	14.53	100.00	114.62	157.45
0.5	204	120.95	16.42	101.42	116.28	166.46
0.6	244	122.66	19.39	100.16	117.55	178.66
0.7	274	127.85	23.27	100.14	122.40	196.15
0.8	308	134.31	29.34	101.04	125.42	223.40
0.9	344	143.20	40.07	100.37	129.40	271.86

Note: M = mean; SD = standard deviation; Min = minimum; Med = medium; Max = maximum.

We discuss these tables as follows: For π = 0.1, there were 103 combinations of P ₁, P ₂, T ₁, and T ₂ and for the given choice of θ₁ = 0.7 and θ₂ = 0.2 with RE > 100 and RP ≥ 100. Among these, the percentage RP of the proposed adjusted forced randomized response model ranged between 100.00% and 117.60% while the percentage RE values ranged from 101.53% to 140.26%. For π = 0.2, there were 123 combinations of P ₁, P ₂, T ₁, and T ₂ where RE > 100 and RP ≥ 100. Among these, the percentage RP of the proposed model ranged between 100.00% and 117.19% while the percentage RE values ranged between 100.08% and 144.90%. For π = 0.1, the median RP among the 103 values was 106.00% with a median RE of 112.55%. The mean RP was 106.36% with a standard deviation (SD) of 4.80%, and the mean RE was 113.86% with an SD of 10.86%. In the same way, the rest of the results in Tables 1 and 2 can be interpreted. Note that the mean and median values of RP are decreasing functions of π as shown in Table 1. For this reason, the percentage RE results are not symmetric around the value of π = 0.5. In other words, the mean percentage RE value for π = 0.1 is not the same as the value of mean percentage RE for π = 0.9.

In Figure 3, we provide four scatter plots showing the values of P ₁, P ₂, T ₁, and T ₂ for the fixed values of θ₁ = 0.7 and θ₂ = 0.2, and for all values of 0.1 ≤ π ≤ 0.9 with a step of 0.1. A close look at Figure 3 indicates that the value of P ₁ remains either close to zero or close to 1, there seems to be no restriction on the choice of values of P ₂, and T ₁, but the value of T ₂, the value of T ₁ should also not be close to 0.5 if π = 0.5.

Figure 3.

Different choices of other parameters 0.1 ≤ P ₁, P ₂, T ₁, T ₂ ≤ 0.9 for θ₁ = 0.7, θ₂ = 0.2, and 0.1 ≤ π ≤ 0.9.

Thus, based on Figure 3, an investigator could make a choice of parameters P ₁, P ₂, T ₁, and T ₂, such that at least equal protection and more efficiency is expected from the adjusted Kuk randomized response model than from the original Kuk model. Similar results are observed for other practicable choice of parameters θ₁ and θ₂, but we have reported results only for the choice of θ₁ = 0.7 and θ₂ = 0.2. Other results can easily be produced by executing the provided SAS codes.

In Comparison of the Models section, we compared the final variance expressions of the two estimators for better than equal protection. In the Simulation Study Close to a Real Survey section, we perform a simulation study that can be considered very close to a real survey data set collected using the adjusted Kuk model and the pioneer Kuk model.

Simulation Study Close to a Real Survey

In this section, we used synthetic responses from respondents by using two different models. We explain this simulation procedure as follows: For θ₁ = 0.7 and θ₂ = 0.2, we compute the probability of “Yes” answer $θ_{K}^{*}$ in Kuk’s pioneer model for a given value of π between 0.1 to 0.9 with a step of 0.1. Then we use the subroutine CALL RNBIN to generate a number of “yes” answers, say X_k out of n trials with $θ_{K}^{*}$ as the probability of a “yes” answer, then we estimate the required parameter π. We set the number of iteration (NITR) equal to 10,000, and hence we obtain 10,000 estimates of each value of π. For a given value of n, we count the number of times the pioneer Kuk estimate ${\hat{π}}_{K}$ takes value outside the limit 0 and 1. Now we do the same process with the adjusted Kuk model by replacing the probability of “yes” answer with $θ_{c}^{*}$ where we use P ₁ = 0.90, P ₂ = 0.20, T ₁ = 0.90, and T ₂ = 0.35. Then we use the adjusted Kuk estimator ${\hat{π}}_{c}$ to estimate each value of π. Again, we count the number of times the estimator ${\hat{π}}_{c}$ takes values outside the limit of 0 and 1. The computed PROPK and PROPC are the observed proportions of inadmissible estimates obtained using the pioneer Kuk model and the adjusted Kuk models, respectively. Then we also computed the percentage empirical RE(Emp) of the proposed adjusted Kuk model with respect to the pioneer Kuk model as follows:

R E (E m p) = \frac{\sum_{j = 1}^{10, 000} {({\hat{π}}_{K (j)} - π)}^{2}}{\sum_{j = 1}^{1, 0000} {({\hat{π}}_{C (j)} - π)}^{2}} \times 100 % .

In addition, we computed the percentage RP of the adjusted Kuk model with respect to the pioneer Kuk model for the adjusting parameters given previously as P ₁ = 0.90, P ₂ = 0.20, T ₁ = 0.90, and T ₂ = 0.35. The results are very encouraging in favor of the adjusted Kuk model. The FORTRAN codes used in the simulation study are provided in Supplementary Appendix A.

A summary of the raw results obtained in Supplementary Table A1 is given in Tables 3 and 4 for different levels of value of π.

Table 3.

Summary Statistics of the Percentage RE(Emp) Values.

π	Frequency	M	SD	Min	Med	Max
0.1	10	106.77	1.63	104.70	106.53	110.29
0.2	10	113.91	1.56	111.02	114.24	115.90
0.3	10	121.56	2.36	118.79	120.89	125.96
0.4	10	129.27	3.09	123.95	128.59	134.62
0.5	10	140.29	1.96	136.95	140.94	142.67
0.6	10	150.35	4.59	140.84	150.63	156.58
0.7	10	167.27	2.53	163.14	166.58	170.75
0.8	10	191.19	3.17	184.67	192.44	194.48
0.9	10	229.41	5.48	222.99	229.05	236.44

Note: M = mean; SD = standard deviation; Min = minimum; Med = medium; Max = maximum.

Table 4.

The Percentage RP Values for Each Level of π.

π	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
RP	117.60	113.04	109.78	107.33	105.43	103.91	102.67	101.63	100.75

Note: RP = relative protection.

When we compared the true variances then the percentage RE value is free from the value of sample size, but Table 3 shows that there could be slight variation in the percentage RE(Emp) values. For example, if π = 0.1, then the RE(Emp) value ranges from 104.70% to 110.29% as the sample size varies from 100 to 1,000 with a step of 100. The mean RE(Emp) value is 106.77% with an SD of 1.63%, and median value is 106.53%. For π = 0.2, the minimum value of the RE(Emp) was 111.02%, maximum was 115.90% with a median efficiency of 114.24%, and the mean value of 113.91% with an SD of 1.56%. In the same way, the rest of the table can be interpreted. Table 4 reports the percentage RP value for each value of π. The RP is 117.60% if π is 0.1; 113.04% if π is 0.2; and it reduces to 100.75% if π is 0.9.

For another view of the results obtained from the simulation study, we give a few pictorial representations in the scatterplots subsequently.

Figure 4 shows that if the sample size is small, say less that 500, then there is a possibility that the pioneer Kuk estimate can take an inadmissible values if π is either close to zero or close to 1.

Figure 4.

Proportion (PROPK) of inadmissible estimates with Kuk’s pioneer model.

Figure 5 also shows a similar trend for the adjusted Kuk model, in that, if the value of π is close to 0 or 1, then the estimate from the adjusted Kuk model can also take on inadmissible values in the case of small sample sizes, say less than 500. However, the proposed model goes outside the admissible range much less often than Kuk’s model according to Supplementary Table A1 (see Appendix A). Thus, a minimum sample of size of 500 respondents is recommended if an investigator expect the value of π to be close to either 0 or 1. Figure 6 has been devoted to investigating the relationship between the RE(Emp) and RP values for different values of π. The vertical bars for each value of π in the range 0.1 to 0.9 indicate that there is no change in the value of RP but that there is slight change in the percentage RE(Emp) values as the sample size varies from 100 to 1,000, which is consistent with results in Tables 3 and 4.

Figure 5.

Proportion (PROPC) of inadmissible estimates with Kuk’s adjusted model.

Figure 6.

RE(Emp) versus RP for each level of the sensitive attribute.

Figure 7 confirms that due to simulated synthetic 10,000 data values, the value of RE(Emp) changes for each level of value of π as the sample size changes from 100 to 1,000, and this variation is reflected in the SD values of the mean RE(Emp). Also there is no obvious pattern in the variation of RE(Emp) with n, for any of the values of π.

Figure 7.

RE(Emp) versus sample size (n) for different values of π.

Figure 8 is an attempt to illustrate the simultaneous effect of the value of level of the sensitive characteristic π and the sample size n on the value of RE(Emp).

Figure 8.

RE(Emp) versus sample size (n) versus values of π.

We fit a liner model to predict the percentage RE(Emp) value based on a good guess of the value of π as given as follows:

R E (E m p) = 111 + 140 π w i t h R^{2} = 0.812.

Conclusion

In this article, we conclude that forced “yes” and forced “no” answers can be used to adjust Kuk randomized response model for increasing both the respondents’ protection and the efficiency of the Kuk model without increasing any cost of the survey.

Footnotes

Acknowledgment

The authors would like to thank Editor Prof. Christopher Winship, Editorial Assistant Genevieve Butler, and three learned referees for their very valuable and critical comments on the original version of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material

The online appendices are available at .

References

Abdelfatah

Mazloum

. 2014. “Improved Randomized Response Models Using Three Decks of Cards.” Model Assisted Statistics and Applications 9:63–72.

Barabesi

Diana

Perri

P. F.

. 2013. “Design-based Distribution Function Estimation for Stigmatized Populations.” Metrika 76:919–35.

Barabesi

Diana

Perri

P. F.

. 2014. “Horvitz-Thompson Estimation with Randomized Response and Non-response.” Model Assisted Statistics and Applications 9:3–10.

Chaudhuri

2011. Randomized response and indirect questioning techniques in surveys. Boca Raton, FL: CRC Press.

Chaudhuri

Christofides

. 2013. Indirect Questioning in Sample Surveys. Berlin, Germany: Springer-Verlag.

Diana

Giordan

Perri

P. F.

. 2013. “Randomized Response Surveys: A Note on Some Privacy Protection Measures.” Model Assisted Statistics and Applications 8:19–28.

Gjestvang

C. R.

Singh

. 2006. “A New Randomized Response Model.” Journal of the Royal Statistical Society B 68:523–30.

Kuk

A. Y. C.

1990. “Asking Sensitive Questions Indirectly.” Biomerika 77:436–38.

Lanke

1975. “On the Choice of the Unrelated Question in Simmons Version of Randomized Response.” Journal of the American Statistical Association 70:80–83.

10.

Lee

Cheon-Sig

Sedory

S. A.

Singh

. 2013. “Simulated Minimum Sample Sizes for Various Randomized Response Models.” Communications in Statistics-simulation and Computation 42:771–89.

11.

Mangat

N. S

. 1994. “An Improved Randomized Response Strategy.” Journal of the Royal Statistical Society B 56:93–95.

12.

Mangat

N. S.

Singh

. 1990. “An Alternative Randomized Response Procedure.” Biometrika 77:439–42.

13.

Mangat

N. S.

Singh

. 1994. “Optional Randomized Response Model.” Journal of the Indian Statistical Association 32:71–75.

14.

Nayak

T. K

. 1994. “On Randomized Response Surveys for Estimating a Proportion.” Communications in Statistics–Theory and Methods 23:3303–21.

15.

Odumade

Singh

. 2009. “Efficient Use of Two Decks of Cards in Randomized Response Sampling.” Communications in Statistics–Theory and Methods 38:439–46.

16.

Raghavarao

1978. “On an Estimation Problem in Warner’s Randomized RESPONSE technique.” Biometrics 34:87–90.

17.

Singh

1976. “A Note on Randomized Response Technique.” Proceedings of the Social Statistics Section, American Statistical Association 772.

18.

Singh

Sedory

S. A.

. 2011. “Cramer-Rao Lower Bound of Variance in Randomized Response Sampling.” Sociological Methods and Research 40:536–46.

19.

Singh

Mangat

N. S.

Tracy

D. S.

. 1994. “An Alternative Device For randomized Responses.” Statistica 54:233–43.

20.

Tracy

D. S.

Osahan

S. S.

. 1999. “A Partial Randomized Response Strategy.” Test 4:315–22.

21.

Warner

S. L

. 1965. “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias.” Journal of the American Statistical Association 60:63–69.