Addressing Two-Phase Non-Ignorable Missing Data in Mean Estimation for Heterogeneous Population

Abstract

The reliability and validity of the results can be jeopardized by techniques like deletion and imputation of the simple mean when there is a high percentage of missing data, which presents a significant challenge to decision and estimation theory. This challenge becomes more challengeable if population suffers with heterogeneity and the parameters of auxiliary variable are unknown because they are used to improvise the estimation by suggesting the improved estimators. In this paper, for a population with observed heterogeneity, we propose new exponential estimators to estimate the mean of the study variable using auxiliary variables when survey variables suffer from non-ignorable missing data at two sampling phases. To verify the efficacy of the suggested estimators, a number of pertinent and promising estimators have been modified in this context; the text offers the theoretical constraints of the comparative analysis along with the mathematical expressions of these estimators’ bias and mean square error $(M S q E)$ . The suggested estimators’ performance has been mechanized through a numerical analysis using statistical packages of R-software on both real and simulated data-sets (symmetric and asymmetric), validating the theoretical results and their significance.

Keywords

bias mean mean square error simulation two phase stratified sampling non-response

1 Introduction

1.1 Literature Review and Research Gap

In estimation theory, lack of response or missing data have potential to distort the results when missingness of the data is related with the variable of interest. In diverse disciplines of the data analysis, survey sampling plays crucial role and so the missingness of the data. The survey results based on the data that exhibits missingness may be sensitive of non-respondent, ignoring them may cause biased results. For instance, Allehoff et al. (1983) mentioned in his research on mental disorders, non-participation rate among 8-year-old children was 38.5% and was correlated with lower IQ values and scholastic values. Apart from cross-sectional and longitudinal data-collection methods, nowadays internet-based data collection method has got eye-catching popularity due to widespread availability of internet and the advantages it offers in terms of cost effectiveness, time effectiveness, real-time collection, and accessibility. Instead of having several advantages these methods may suffer from missing data when survey unit fails to respond the survey invitation, due to lack of willingness to participate, or due to the attrition of the survey unit whenever participants lose their interest during period of time.

In survey sampling, problem of missing data due to non-response gained recognition after Hansen and Hurwitz (1946) introduced an unbiased estimator for population mean based on sub-sampling the non-respondents. In order to estimate population mean $\bar{Y}$ of the study variable Y with unknown population mean $\bar{X}$ of the auxiliary variable X in the presence of non-response, Rao (1983), Khare and Srivastava (1995, 1997), Tabasum and Khan (2004) have proposed both conventional and alternative ratio, product, and regression estimators. After that, several authors such as, Särndal and Swensson (1987), Tripathi and Khare (1997), Okafor and Lee (2000), Singh (2003), Singh et al. (2006), Singh and Kumar (2008), Singh et al. (2009), Singh and Kumar (2010), Khare and Kumar (2011), Shabbir and Khan (2013), Dykes et al. (2015), Chaudhary and Kumar (2016), Azeem and Hanif (2017), Yaqub et al. (2017), Singh et al. (2018), Bhushan and Pandey (2019), Kumar and Zeeshan (2019), Guha and Chandra (2021), Sanaullah et al. (2022), Sinha and Khanna (2023), Unal and Kadilar (2022, 2023) have done promising research in the field of estimation of unknown parameters utilising the auxiliary mean under different cases based on whether the auxiliary mean is known or unknown, and presence of non-response. Shabbir et al. (2019) took this problem into consideration in procession and proposed a generalized class of estimators under two-phase stratified sampling for non-response, along with several adopted estimators. According to the methodology employed in these encouraging studies, in situations where the population mean $\bar{X}$ of the auxiliary variate X is unknown, a large first phase sample of size $n^{'}$ , which is obtained from a population of N units by simple random sampling without replacement $(S R S_{(w o r)})$ , is used to estimate the population mean $\bar{X}$ . The variate Y under study is measured on a smaller second phase sample of size n (i.e., $n < n^{'}$ ) that is obtained from $n^{'}$ with the assistance of $S R S_{(w o r)}$ . If the second phase sample is non-responsive, take a subsample of the non-respondents’ units and contact them again.

Nevertheless, since the second phase sample is selected for the study variable, there is a possibility of non-response for this variable because the mean of the auxiliary variable, $\bar{X}$ , is already estimated by ${\bar{x}}^{'}$ , which is based on $n^{'}$ units. However, most of the authors in the literature have considered that, in the first phase, the auxiliary variable is free from non-response, but in the second phase, it suffers, when the auxiliary mean is unknown. Chaudhary and Kumar (2016) took into account the non-response during both phases of sampling and suggested conventional ratio, product, and regression estimators to estimate the mean of study variable $(Y)$ when population mean of an auxiliary variable $(X)$ is unknown.

Despite the abundance of literature on non-response, little has been done to account for the presence of non-respondents during both phases of the survey sampling when the auxiliary mean is unknown. Therefore, there is wide scope of developing estimators for population parameters. So, in order to address the non-response that occurs in both sampling phases when the auxiliary mean is unknown, the main objective of this article is to estimate the population mean of the study variable using the exponential type estimator that is recommended for heterogeneous population.

1.2 Methodology and Notations

Our proposition is driven by the desire to optimize the utilization of currently available auxiliary information. Instead of using the estimated unknown population mean of the auxiliary variable based on all information available at the first phase sample and assuming missing information due to non-response on the second phase of sample for the auxiliary variable, we have taken into consideration the missing data issue in both the phase of sampling, which makes it more practical and applicable.

In addition, it is critical to consider the population's heterogeneity since it sheds light on the complexity, diversity, and variability among the various subgroups that make up the population. In order to address the population that shows observed heterogeneity among the population units in the current study, we have employed a stratified two-phase sampling scheme $(S T P S)$ to address non-response.

Let us consider study variable $(Y)$ and auxiliary variables $(X, Z)$ , both suffer from non-response. Let's assume that the population $U_{N}$ is composed of $L, (h = 1, 2, 3, \dots, L)$ exhaustive and mutually heterogeneous strata, with each stratum having h units made up of $N_{h}$ units so that $\sum_{h = 1}^{L} N_{h} = N$ . The two mutually exclusive groups, respondent $(R)$ and non-respondent $(N R)$ with $N_{h (1)}$ and $N_{h (2)}$ units respectively, are assumed to make up the $h^{t h}$ stratum. Here, the unknown auxiliary mean has been obtained through the use of a two-phase sampling technique. The first phase and second phase simple random samples $(S R S_{(w o r)})$ of size $n_{h}^{'}$ and $n_{h}$ respectively, taken from the $h^{t h}$ stratum are denoted by $S_{n_{h}}^{'}$ and $S_{n_{h}}$ . In addition, the survey variables $(Y, X,$ and $Z)$ for the $i^{t h}$ population unit of the $h^{t h}$ stratum are indicated by $(Y_{i}, X_{i}$ and $Z_{i})$ respectively.

In accordance with Hansen and Hurwitz (1946), to compute the auxiliary mean, we choose a larger sample $S_{n_{h}}^{'}$ from the $h^{t h}$ stratum so that $\sum_{h = 1}^{L} n_{h}^{'} = n^{'}$ . Further, it is assumed that $n_{h (1)}^{'}$ units out of $n_{h}^{'}$ units are respondents and $n_{h (2)}^{'}$ units are non-respondents. For a face-to-face interview or follow-up call, we now extract a sub-sample of size $r_{h}^{'} (= \frac{n_{h (2)}^{'}}{k^{'}})$ form $n_{h (2)}^{'}$ units. Here, $n_{h} < n_{h}^{'}$ , $k > 1$ , and $k^{'} > 1$ . The two-phase sampling procedure for a stratified population in the event of non-response is shown in Figure 1.

Figure 1.

Flow-chart for Two-phase Sampling Technique Under $h^{t h}$ Stratum.

As a result, in the first phase, the auxiliary mean estimates of $(\bar{X}$ and $\bar{Z})$ are presented as

{\bar{x}}_{s t}^{*'} = \sum_{h = 1}^{L} P_{h} {\bar{x}}_{h}^{*'} and {\bar{z}}_{s t}^{*'} = \sum_{h = 1}^{L} P_{h} {\bar{z}}_{h}^{*'} .

Here, $P_{h} = \frac{N_{h}}{N}$ , ${\bar{x}}_{h}^{*'} (= \frac{n_{h (1)}^{'} {\bar{x}}_{h (1)}^{*'} + n_{h (2)}^{'} {\bar{x}}_{r}^{*'}}{n_{h}^{'}})$ and ${\bar{z}}_{h}^{*'} (= \frac{n_{h (1)}^{'} {\bar{z}}_{h (1)}^{*'} + n_{h (2)}^{'} {\bar{z}}_{r}^{*'}}{n_{h}^{'}})$ are Hansen-Hurwitz unbiased estimators of mean while $({\bar{x}}_{h (1)}^{*'}, {\bar{z}}_{h (1)}^{*'})$ and $({\bar{x}}_{r}^{*'}, {\bar{z}}_{r}^{*'})$ are sample means based on $n_{h (1)}^{'}$ units and $r_{h}^{'}$ units respectively.

Similarly, following Hansen-Hurwitz (1946) technique at the second phase, mean estimates for study variable and auxiliary variables based on n units are given as:

{\bar{y}}_{s t}^{*} = \sum_{h = 1}^{L} P_{h} {\bar{y}}_{h}^{*}, {\bar{x}}_{s t}^{*} = \sum_{h = 1}^{L} P_{h} {\bar{x}}_{h}^{*}, {\bar{z}}_{s t}^{*} = \sum_{h = 1}^{L} P_{h} {\bar{z}}_{h}^{*} .

Here, ${\bar{y}}_{h}^{*} = \frac{n_{h (1)} {\bar{y}}_{h (1)}^{*} + n_{h (2)} {\bar{y}}_{r}^{*}}{n_{h}}, {\bar{x}}_{h}^{*} = \frac{n_{h (1)} {\bar{x}}_{h (1)}^{*} + n_{h (2)} {\bar{x}}_{r}^{*}}{n_{h}}, {\bar{z}}_{h}^{*} = \frac{n_{h (1)} {\bar{z}}_{h (1)}^{*} + n_{h (2)} {\bar{z}}_{r}^{*}}{n_{h}};$ where, $({\bar{y}}_{h (1)}^{*}, {\bar{x}}_{h (1)}^{*}, {\bar{z}}_{h (1)}^{*})$ and $({\bar{y}}_{r}^{*}, {\bar{x}}_{r}^{*}, {\bar{z}}_{r}^{*})$ are sample means based on $n_{h (1)}$ units and $r_{h}$ units respectively.

To determine the variance and covariance of the variables under consideration, let us consider the large sample approximations as –

e_{0} = \frac{{\bar{y}}_{s t}^{*}}{\bar{Y}} - 1, e_{1} = \frac{{\bar{x}}_{s t}^{*}}{\bar{X}} - 1, e_{2} = \frac{{\bar{z}}_{s t}^{*}}{\bar{Z}} - 1, e_{1}^{'} = \frac{{\bar{x}}_{s t}^{*'}}{\bar{X}} - 1, e_{2}^{'} = \frac{{\bar{z}}_{s t}^{*'}}{\bar{Z}} - 1.

Such that, $E (e_{i}) = 0; \forall i = 0, 1, 2, E (e_{i}^{'}) = 0, \forall i = 1, 2.$

\begin{aligned} ω_{0} & = E (e_{0}^{2}) = \sum_{h = 1}^{L} (λ_{h} τ_{Y_{h}} + θ_{h} τ_{Y_{h} (2)}), ω_{1} = E (e_{1}^{2}) = \sum_{h = 1}^{L} (λ_{h} τ_{X_{h}} + θ_{h} τ_{X_{h} (2)}), \\ ω_{2} & = E ({e_{2}}^{2}) = \sum_{h = 1}^{L} (λ_{h} τ_{Z_{h}} + θ_{h} τ_{Z_{h} (2)}), ω_{1}^{'} = E (e_{1}^{' 2}) = \sum_{h = 1}^{L} (λ_{h}^{'} τ_{X_{h}} + θ_{h}^{'} τ_{X_{h} (2)}), \\ ω_{2}^{'} & = E (e {_{2}^{'}}^{2}) = \sum_{h = 1}^{L} (λ_{h}^{'} τ_{Z_{h}} + θ_{h}^{'} τ_{Z_{h} (2)}), ω_{11}^{'} = E (e_{1} e_{1}^{'}) = \sum_{h = 1}^{L} (λ_{h}^{'} τ_{X_{h}} + θ_{h}^{'} τ_{X_{h} (2)}), \\ ω_{22}^{'} & = E (e_{2} e_{2}^{'}) = \sum_{h = 1}^{L} (λ_{h}^{'} τ_{Z_{h}} + θ_{h}^{'} τ_{Z_{h} (2)}), ω_{01} = E (e_{0} e_{1}) = \sum_{h = 1}^{L} (λ_{h} τ_{Y X_{h}} + θ_{h} τ_{Y X_{h} (2)}), \\ ω_{01}^{'} & = E (e_{0} e_{1}^{'}) = \sum_{h = 1}^{L} (λ_{h}^{'} τ_{Y X_{h}} + θ_{h}^{'} τ_{Y X_{h} (2)}), ω_{02} = E (e_{0} e_{2}) = \sum_{h = 1}^{L} (λ_{h} τ_{Y Z_{h}} + θ_{h} τ_{Y Z_{h} (2)}), \\ ω_{02}^{'} & = E (e_{0} e_{2}^{'}) = \sum_{h = 1}^{L} (λ_{h}^{'} τ_{Y Z_{h}} + θ_{h}^{'} τ_{Y Z_{h} (2)}), ω_{12} = E (e_{1} e_{2}) = \sum_{h = 1}^{L} (λ_{h} τ_{X Z_{h}} + θ_{h} τ_{X Z_{h} (2)}), \\ ω_{12}^{'} & = E (e_{1}^{'} e_{2}) = E (e_{1} e_{2}^{'}) = E (e_{1}^{'} e_{2}^{'}) = \sum_{h = 1}^{L} (λ_{h}^{'} τ_{X Z_{h}} + θ_{h}^{'} τ_{X Z_{h} (2)}) \end{aligned}

Here, $λ_{h} = \frac{1}{n_{h}} - \frac{1}{N_{h}}, λ_{h}^{'} = \frac{1}{n_{h}^{'}} - \frac{1}{N_{h}}, r_{h} = \frac{n_{h (2)}}{k}, r_{h}^{'} = \frac{n_{h (2)}^{'}}{k^{'}}, θ_{h} = \frac{k - 1}{n_{h}}, θ_{h}^{'} = \frac{k^{'} - 1}{n_{h}^{'}},$

\begin{aligned} W_{h (1)} & = \frac{N_{h (1)}}{N_{h}}, W_{h (2)} = \frac{N_{h (2)}}{N_{h}}, Π_{h} = θ_{h} - θ_{h}^{'}, △_{h} = λ_{h} - λ_{h}^{'}, \\ S_{V_{h}}^{2} & = \frac{1}{N_{h} - 1} \sum_{i = 1}^{N_{h}} (V_{i} - {\bar{V}}_{h})^{2}, S_{V_{h} (2)}^{2} = \frac{1}{N_{h (2)} - 1} \sum_{i = 1}^{N_{h (2)}} (V_{i} - {\bar{V}}_{h (2)})^{2}; for (V = Y, X, Z) \\ ρ_{V_{h} V_{h}^{'} (2)} & = \frac{S_{V_{h} V_{h}^{'} (2)}}{S_{V_{h} (2)} S_{V_{h}^{'} (2)}}, ρ_{V_{h} V_{h}^{'}} = \frac{S_{V_{h} V_{h}^{'}}}{S_{V_{h}} S_{V_{h}^{'}}}; for (V \neq V^{'} = Y, X, Z) . \\ τ_{V_{h}} & = P_{h}^{2} S_{V_{h}}^{2} / {\bar{V}}^{2}, τ_{V_{h} (2)} = P_{h}^{2} W_{h (2)} S_{V_{h} (2)}^{2} / {\bar{V}}^{2}, for (V = Y, X, Z) \\ τ_{V {V^{'}}_{h}} & = P_{h}^{2} ρ_{V_{h} V_{h}^{'}} S_{V_{h}} S_{V_{h}^{'}} / \bar{V} {\bar{V}}^{'}, for (V \neq V^{'} = Y, X, Z) \\ τ_{V {V^{'}}_{h} (2)} & = P_{h}^{2} W_{h (2)} ρ_{V_{h} {V^{'}}_{h} (2)} S_{V_{h} (2)} S_{{V^{'}}_{h} (2)} / \bar{V} {\bar{V}}^{'}, for (V \neq V^{'} = Y, X, Z) \\ A_{y} & = {\bar{Y}}^{2} ω_{0}, A_{x} = {\bar{X}}^{2} (ω_{1} - ω_{1}^{'}), A_{z} = {\bar{Z}}^{2} ω_{2}, D_{z} = {\bar{Z}}^{2} (ω_{2} - ω_{2}^{'}), B_{y z} = \bar{Y} \bar{Z} ω_{02} \\ B_{x z} & = - \bar{X} \bar{Z} (ω_{12} - ω_{12}^{'}), B_{y x} = - \bar{Y} \bar{X} (ω_{01} - ω_{01}^{'}), C_{y z} = \bar{Y} \bar{Z} (ω_{02} - ω_{02}^{'}) . \end{aligned}

2 Some Existing and Modified Estimators

An approach to estimate the unknown mean of the study variable under a non-response population with inherent heterogeneity using improved estimators is presented in this section. Thus, in order to validate the performance of the suggested estimator, we have examined a few well-known existing estimators under $S T P S$ and further modified some of them under $S T P S$ , as follows:

1. Ratio and product estimators suggested by Chaudhary et al. (2020)	$\begin{array}{ccccc} t_{1} & = {\bar{y}}_{s t}^{} \frac{{\bar{x}}_{s t}^{'}}{{\bar{x}}_{s t}^{}}, \\ t_{2} & = {\bar{y}}_{s t}^{} \frac{{\bar{x}}_{s t}^{}}{{\bar{x}}_{s t}^{'}} . \end{array}$
2. Modified regression estimator	$t_{3} = {\bar{y}}_{s t}^{} + b^{} ({\bar{x}}_{s t}^{'} - {\bar{x}}_{s t}^{})$ , where $b^{} = \frac{s_{x y}^{}}{s_{x}^{2 *}}$ .
3. Chaudhary et al. (2020) proposed family of combined-type estimators	$t_{4} = {\bar{y}}_{s t}^{} {(\frac{a {\bar{x}}_{s t}^{'} + b}{α_{c} (a {\bar{x}}_{s t}^{} + b) + (1 - α_{c}) (a {\bar{x}}_{s t}^{'} + b)})}^{g_{c}}$ ; $α_{C}$ is optimizing constant.
4. Modified conventional generalized ratio estimator	$t_{5} = {\bar{y}}_{s t}^{} {(\frac{{\bar{x}}_{s t}^{'}}{{\bar{x}}_{s t}^{*}})}^{g}$ ; g is an optimizing constant.
5. Modified Searls (1964) type estimator	$t_{6} = α_{S} {\bar{y}}_{s t}^{*}$ ; $α_{S}$ is an optimizing constant.
6. Modified Searls (1964) type ratio estimator	$t_{7} = α_{S R} {\bar{y}}_{s t}^{} \frac{{\bar{x}}_{s t}^{'}}{{\bar{x}}_{s t}^{*}}$ ; $α_{S R}$ is an optimizing constant.
7. Modified Bahl and Tuteja (1991) exponential type ratio estimator	$t_{8} = {\bar{y}}_{s t}^{} e x p (\frac{{\bar{x}}_{s t}^{'} - {\bar{x}}_{s t}^{}}{{\bar{x}}_{s t}^{'} + {\bar{x}}_{s t}^{*}}) .$
8. Modified Kumar and Bhougal (2011) exponential estimator	$t_{9} = {\bar{y}}_{s t}^{} e x p [α_{K B} (\frac{{\bar{x}}_{s t}^{'} - {\bar{x}}_{s t}^{}}{{\bar{x}}_{s t}^{'} + {\bar{x}}_{s t}^{*}})]$ ; $α_{K B}$ is an optimizing constant.
9. Modified Rao (1990) difference type estimator	$t_{10} = {\bar{y}}_{s t}^{} + α_{D} ({\bar{x}}_{s t}^{'} - {\bar{x}}_{s t}^{*}) .$
10. Modified ratio-ratio type estimator	$t_{11}^{(1)} = {\bar{y}}_{s t}^{} e x p (\frac{{\bar{x}}_{s t}^{'} - {\bar{x}}_{s t}^{}}{{\bar{x}}_{s t}^{'} + {\bar{x}}_{s t}^{}}) (\frac{\bar{Z}}{{\bar{z}}_{s t}^{}})$ , if $\bar{Z}$ is known.
	$t_{11}^{(2)} = {\bar{y}}_{s t}^{} e x p (\frac{{\bar{x}}_{s t}^{'} - {\bar{x}}_{s t}^{}}{{\bar{x}}_{s t}^{'} + {\bar{x}}_{s t}^{}}) (\frac{{\bar{z}}_{s t}^{'}}{{\bar{z}}_{s t}^{*}})$ , if $\bar{Z}$ is unknown.

The mean square errors $(M S q E)$ / minimum mean square error $(M . M S q E)$ of the respective existing and modified estimators are summarized as follow-

\begin{aligned} M S q E (t_{1}) & = \sum_{h = 1}^{L} P_{h}^{2} {λ_{h}^{'} S_{Y_{h}}^{2} + △_{h} (S_{Y_{h}}^{2} + R_{1}^{2} S_{X_{h}}^{2} - 2 R_{1} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}}) + θ_{h}^{'} W_{h (2)} S_{Y_{h} (2)}^{2} \\ + Π_{h} W_{h (2)} (S_{Y_{h} (2)}^{2} + R_{1}^{2} S_{X_{h} (2)}^{2} - 2 R_{1} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)})}, \end{aligned}

(13)

\begin{aligned} M S q E (t_{2}) & = \sum_{h = 1}^{L} P_{h}^{2} {λ_{h}^{'} S_{Y_{h}}^{2} + △_{h} (S_{Y_{h}}^{2} + R_{1}^{2} S_{X_{h}}^{2} + 2 R_{1} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}}) + θ_{h}^{'} W_{h (2)} S_{Y_{h} (2)}^{2} \\ + Π_{h} W_{h (2)} (S_{Y_{h} (2)}^{2} + R_{1}^{2} S_{X_{h} (2)}^{2} + 2 R_{1} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)})}, \end{aligned}

(14)

\begin{aligned} M S q E (t_{3}) & = \sum_{h = 1}^{L} P_{h}^{2} [(λ_{h}^{'} + △_{h} (1 - ρ_{Y_{h} X_{h}}^{2})) S_{Y_{h}}^{2} + θ_{h} W_{h (2)} S_{Y_{h} (2)}^{2} \\ + W_{h (2)} (β^{2} S_{X_{h} (2)}^{2} - 2 β) Π_{h}], \end{aligned}

(15)

\begin{aligned} M S q E (t_{4}) & = \sum_{h = 1}^{L} P_{h}^{2} {λ_{h}^{'} S_{Y_{h}}^{2} + △_{h} (S_{Y_{h}}^{2} + g_{c}^{2} ϑ^{2} α_{c}^{2} R_{1}^{2} S_{X_{h}}^{2} - 2 g_{c} ϑ R_{1} α_{c} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}}) \\ + θ_{h} W_{h (2)} (S_{Y_{h} (2)}^{2} + g_{c}^{2} ϑ^{2} α_{c}^{2} R_{1}^{2} S_{X_{h} (2)}^{2} - 2 g_{c} ϑ R_{1} α_{c} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)}) \\ + θ_{h}^{'} W_{h (2)} (- g_{c}^{2} ϑ^{2} α_{c}^{2} R_{1}^{2} S_{X_{h} (2)}^{2} + 2 g_{c} ϑ R_{1} α_{c} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)})}, \end{aligned}

(16)

\begin{aligned} M . M S q E (t_{5}) & = [\sum_{h = 1}^{L} P_{h}^{2} (λ_{h} S_{Y_{h}}^{2} + θ_{h} W_{h (2)} S_{Y_{h} (2)}^{2}) \\ - \frac{{(\sum_{h = 1}^{L} P_{h}^{2} {△_{h} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}} + Π_{h} W_{h (2)} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)}})}^{2}}{(\sum_{h = 1}^{L} P_{h}^{2} {△_{h} S_{X_{h}}^{2} + Π_{h} W_{h (2)} S_{X_{h} (2)}^{2}})}], \end{aligned}

(17)

\begin{aligned} M . M S q E (t_{6}) = \frac{{\bar{Y}}^{2} ω_{0}}{1 + ω_{0}}, \end{aligned}

(18)

\begin{aligned} M . M S q E (t_{7}) = {\bar{Y}}^{2} [1 - \frac{{(1 + ω_{01}^{'} - ω_{01} - ω_{11}^{'} + ω_{1})}^{2}}{1 + ω_{0} + 3 (ω_{1} - ω_{11}^{'}) + 4 (ω_{01}^{'} - ω_{01})}], \end{aligned}

(19)

\begin{aligned} M S q E (t_{8}) & = \sum_{h = 1}^{L} P_{h}^{2} {λ_{h}^{'} S_{Y_{h}}^{2} + △_{h} (S_{Y_{h}}^{2} + R_{1}^{2} S_{X_{h}}^{2} / 4 - R_{1} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}}) + θ_{h}^{'} W_{h (2)} S_{Y_{h} (2)}^{2} \\ + Π_{h} W_{h (2)} (S_{Y_{h} (2)}^{2} + R_{1}^{2} S_{X_{h} (2)}^{2} / 4 - R_{1} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)})}, \end{aligned}

(20)

\begin{aligned} M . M S q E (t_{9}) & = [\sum_{h = 1}^{L} P_{h}^{2} (λ_{h} S_{Y_{h}}^{2} + θ_{h} W_{h (2)} S_{Y_{h} (2)}^{2}) \\ - \frac{{(\sum_{h = 1}^{L} P_{h}^{2} △_{h} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}} + Π_{h} W_{h (2)} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)})}^{2}}{(\sum_{h = 1}^{L} P_{h}^{2} △_{h} S_{X_{h}}^{2} + Π_{h} W_{h (2)} S_{X_{h} (2)}^{2})}], \end{aligned}

(21)

\begin{aligned} M . M S q E (t_{10}) & = [\sum_{h = 1}^{L} P_{h}^{2} (λ_{h} S_{Y_{h}}^{2} + θ_{h} W_{h (2)} S_{Y_{h} (2)}^{2}) \\ - \frac{{(\sum_{h = 1}^{L} P_{h}^{2} △_{h} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}} + Π_{h} W_{h (2)} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)})}^{2}}{(\sum_{h = 1}^{L} P_{h}^{2} △_{h} S_{X_{h}}^{2} + Π_{h} W_{h (2)} S_{X_{h} (2)}^{2})}], \end{aligned}

(22)

\begin{aligned} M S q E (t_{11}^{(1)}) & = \sum_{h = 1}^{L} P_{h}^{2} {λ_{h} (S_{Y_{h}}^{2} + R_{2}^{2} S_{Z_{h}}^{2} - 2 R_{2} ρ_{Y_{h} Z_{h}} S_{Y_{h}} S_{Z_{h}}) + θ_{h} W_{h (2)} (S_{Y_{h} (2)}^{2} + R_{2}^{2} S_{Z_{h} (2)}^{2} \\ - 2 R_{2} ρ_{Y_{h} Z_{h} (2)} S_{Y_{h} (2)} S_{Z_{h} (2)}) + △_{h} (R_{1}^{2} S_{X_{h}}^{2} / 4 - R_{1} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}} \\ + R_{1} R_{2} ρ_{X_{h} Z_{h}} S_{X_{h}} S_{Z_{h}}) + Π_{h} W_{h (2)} (R_{1}^{2} S_{X_{h} (2)}^{2} / 4 - R_{1} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)} \\ + R_{1} R_{2} ρ_{X_{h} Z_{h} (2)} S_{X_{h} (2)} S_{Z_{h} (2)}}, \end{aligned}

(23)

\begin{aligned} M S q E (t_{11}^{(2)}) & = \sum_{h = 1}^{L} P_{h}^{2} {λ_{h}^{'} S_{Y_{h}}^{2} + △_{h} (S_{Y_{h}}^{2} + R_{1}^{2} S_{X_{h}}^{2} / 4 + R_{2}^{2} S_{Z_{h}}^{2} - R_{1} ρ_{Y_{h} X_{h}} S_{Y_{h}} S_{X_{h}} \\ - 2 R_{2} ρ_{Y_{h} Z_{h}} S_{Y_{h}} S_{Z_{h}} + R_{1} R_{2} ρ_{X_{h} Z_{h}} S_{X_{h}} S_{Z_{h}}) + θ_{h}^{'} W_{h (2)} S_{Y_{h} (2)}^{2} + Π_{h} W_{h (2)} (S_{Y_{h} (2)}^{2} \\ + R_{1}^{2} S_{X_{h} (2)}^{2} / 4 + R_{2}^{2} S_{Z_{h} (2)}^{2} - R_{1} ρ_{Y_{h} X_{h} (2)} S_{Y_{h} (2)} S_{X_{h} (2)} - 2 R_{2} ρ_{Y_{h} Z_{h} (2)} S_{Y_{h} (2)} S_{Z_{h} (2)} \\ + R_{1} R_{2} ρ_{X_{h} Z_{h} (2)} S_{X_{h} (2)} S_{Z_{h} (2)}}, \end{aligned}

(24)

and, the variance of the unbiased estimate of population mean under

S T P S

is given by

\begin{aligned} v a r ({\bar{y}}_{s t}^{*}) = \sum_{h = 1}^{L} (P_{h}^{2} λ_{h} S_{Y_{h}}^{2} + P_{h}^{2} θ_{h} W_{h (2)} S_{Y_{h} (2)}^{2}), \end{aligned}

(25)

where,

R_{1} = \frac{\bar{Y}}{\bar{X}}, R_{2} = \frac{\bar{Y}}{\bar{Z}}, β = \frac{S_{Y X}}{S_{X}^{2}}, ϑ = \frac{a \bar{X}}{a \bar{X} + b} .

3 Proposed Estimators

In recent years, many valuable estimators have been proposed to estimate the population mean using auxiliary variable when data under study suffer with missing response. The vital part of this framework is availability of auxiliary information with unknown mean and non-response at both the phases of sampling scheme. Inspired by Kumar and Zeeshan (2019) and Chaudhary et al. (2020), we suggest exponential estimators for population mean estimation for two different scenarios under STPS. We also state the theorems to further elucidate the traits of the proposed estimators based on approximation to the first degree $[Appr (n^{- 1})]$ .

3.1 Situation-I

In this situation, we have considered that the population mean $\bar{X}$ is unknown whereas $\bar{Z}$ is known, and propose an exponential type estimator as

\begin{aligned} T_{E E}^{(1)} = {\bar{y}}_{s t} * e x p [γ_{1} f (u^{'}) + γ_{2} f (v)], \end{aligned}

(26)

where

(u^{'}) = (\frac{u^{'} - 1}{u^{'} + 1}), u^{'} = \frac{{\bar{x}}_{s t}^{*'}}{{\bar{x}}_{s t}^{*}}, f (v) = (\frac{1 - v}{1 + v}), v = \frac{\bar{Z}}{{\bar{z}}_{s t}^{*}}

, and

γ_{1}

γ_{2}

serving as the optimizing constants to be applied in order to find the

M . M S q E

of the suggested estimator

T_{E E}^{(1)}

Theorem 3.1.1:

Bias and $M S q E$ of the proposed estimator $T_{E E}^{(1)}$ are given by

\begin{aligned} B i a s (T_{E E}^{(1)}) & = \bar{Y} [γ_{1}^{2} \frac{(ω_{1} - ω_{11}^{'})}{8} + γ_{2}^{2} \frac{ω_{2}}{8} + γ_{1} {\frac{2 (ω_{01}^{'} - ω_{01}) - (ω_{1}^{'} - ω_{1})}{4}} + γ_{2} \frac{(2 ω_{02} - ω_{2})}{4} \\ + γ_{1} γ_{2} \frac{(ω_{12}^{'} - ω_{12})}{4}], \end{aligned}

(27)

\begin{aligned} M S q E (T_{E E}^{(1)}) & = {\bar{Y}}^{2} [ω_{0} + γ_{1}^{2} \frac{(ω_{1} - ω_{11}^{'})}{4} + γ_{2}^{2} \frac{ω_{2}}{4} + γ_{1} γ_{2} \frac{(ω_{12}^{'} - ω_{12})}{2} + γ_{1} (ω_{01}^{'} - ω_{01}) + γ_{2} ω_{02}] \end{aligned}

(28)

Proof:

Under the approximations given in sub-section 1.2 the proposed estimator $T_{E E}^{(1)}$ takes the following form as-

\begin{aligned} T_{E E}^{(1)} - \bar{Y} & = \bar{Y} [e_{0} + γ_{1} {\frac{(e_{1}^{'} - e_{1}) + (e_{0} e_{1}^{'} - e_{0} e_{1})}{2} - \frac{(e {1_{}^{'}}^{2} - {e_{1}}^{2})}{4}} + γ_{2} (\frac{e_{2} + e_{0} e_{2}}{2} - \frac{{e_{2}}^{2}}{4}) \\ + γ_{1}^{2} \frac{{(e_{1}^{'} - e_{1})}^{2}}{8} + γ_{2}^{2} \frac{{e_{2}}^{2}}{8} + γ_{1} γ_{2} \frac{(e_{1}^{'} e_{2} - e_{1} e_{2})}{4}], \end{aligned}

(29)

On taking expectation on both side of the equation (29), we get

\begin{aligned} B i a s (T_{E E}^{(1)}) & = \bar{Y} [γ_{1}^{2} \frac{(ω_{1} - ω_{11}^{'})}{8} + γ_{2}^{2} \frac{ω_{2}}{8} + γ_{1} {\frac{2 (ω_{01}^{'} - ω_{01}) - (ω_{1}^{'} - ω_{1})}{4}} \\ + γ_{2} \frac{(2 ω_{02} - ω_{2})}{4} + γ_{1} γ_{2} \frac{(ω_{12}^{'} - ω_{12})}{4}], \end{aligned}

(30)

The expression for $M S q E$ of $T_{E E}^{(1)}$ is obtained by taking expectation of square of the equation (29) as

\begin{aligned} M S q E (T_{E E}^{(1)}) = {\bar{Y}}^{2} [ω_{0} + γ_{1}^{2} \frac{(ω_{1} - ω_{11}^{'})}{4} + γ_{2}^{2} \frac{ω_{2}}{4} + γ_{1} γ_{2} \frac{(ω_{12}^{'} - ω_{12})}{2} + γ_{1} (ω_{01}^{'} - ω_{01}) + γ_{2} ω_{02}] \end{aligned}

(31)

Theorem 3.1.2:

Minimum mean square error $(M . M S q E)$ of the proposed estimator $T_{E E}^{(1)}$ at the optimum value of $γ_{1} (= γ_{1 (o p t)})$ and $γ_{2} (= γ_{2 (o p t)})$ is

\begin{aligned} M . M S q E (T_{E E}^{(1)}) = [A_{y} - \frac{(B_{y x}^{2} A_{z} + B_{y z}^{2} A_{x} - 2 B_{y z} B_{x z} B_{y x})}{A_{x} A_{z} - B_{x z}^{2}}], \end{aligned}

(32)

where,

γ_{1 (o p t)} = \frac{2 [ω_{02} (ω_{12}^{'} - ω_{12}) - ω_{2} (ω_{01}^{'} - ω_{01})]}{(ω_{1} - ω_{11}^{'}) ω_{2} - {(ω_{12}^{'} - ω_{12})}^{2}}

and

γ_{2 (o p t)} = \frac{2 [(ω_{01}^{'} - ω_{01}) (ω_{12}^{'} - ω_{12}) - ω_{02} (ω_{1} - ω_{11}^{'})]}{(ω_{1} - ω_{11}^{'}) ω_{2} - {(ω_{12}^{'} - ω_{12})}^{2}} .

Proof:

On minimizing the equation (31) with respect to $γ_{1}$ and $γ_{2}$ , we get the optimum value of the optimizing constants and then substituting them in itself, we get the expression of $M . M S q E$ .

Remark 1:

Here it may be verified that the $M . M S q E (T_{E E}^{(1)})$ will be lesser than $v a r ({\bar{y}}_{s t} *)$ if, the factor $\frac{(B_{y x}^{2} A_{z} + B_{y z}^{2} A_{x} - 2 B_{y z} B_{x z} B_{y x})}{A_{x} A_{z} - B_{x z}^{2}} = \frac{D_{1}}{N_{1}}$ (say) remain positive. Our theoretical investigation of the factor $\frac{D_{1}}{N_{1}}$ leads to the following two results:

Result I: $N_{1}$ is always positive.

Result II: $D_{1}$ will be positive if $ρ_{Y X} > ρ_{Y Z} ρ_{X Z}$ and $ρ_{Y Z} > ρ_{Y X} ρ_{X Z}$

3.2 Situation-II

In this situation, considering that the population mean of both the auxiliary variables i.e., $\bar{X}, \bar{Z}$ are unknown, we have proposed the subsequent exponential estimator of the population mean as:

\begin{aligned} T_{E E}^{(2)} = {\bar{y}}_{s t} * e x p [μ_{1} f (u^{'}) + μ_{2} f (v^{'})], \end{aligned}

(33)

where

(u^{'}) = (\frac{u^{'} - 1}{u^{'} + 1}), u^{'} = \frac{{\bar{x}}_{s t}^{*'}}{{\bar{x}}_{s t}^{*}}, f (v^{'}) = (\frac{1 - v^{'}}{1 + v^{'}}), v^{'} = \frac{{\bar{z}}_{s t}^{*'}}{{\bar{z}}_{s t}^{*}},

and $μ_{1}$ , $μ_{2}$ are the optimizing constants to be applied in order to find the $M . M S q E$ of the proposed estimator $T_{E E}^{(2)}$ .

Theorem 3.2.1:

Bias and $M S q E$ of the proposed estimator $T_{E E}^{(2)}$ are given by

\begin{aligned} B i a s (T_{E E}^{(2)}) & = \bar{Y} [μ_{1}^{2} \frac{(ω_{1} - ω_{11}^{'})}{8} + μ_{2}^{2} \frac{(ω_{2} - ω_{22}^{'})}{8} + μ_{1} {\frac{(ω_{01}^{'} - ω_{01})}{2} - \frac{(ω_{1}^{'} - ω_{1})}{4}} \\ + μ_{2} {\frac{(ω_{02} - ω_{02}^{'})}{2} - \frac{(ω_{2} - ω_{2}^{'})}{4}} + μ_{1} μ_{2} \frac{(ω_{12}^{'} - ω_{12})}{4}] \end{aligned}

(34)

\begin{aligned} M S q E (T_{E E}^{(2)}) & = {\bar{Y}}^{2} [ω_{0} + μ_{1}^{2} (ω_{1} - ω_{11}^{'}) / 4 + μ_{2}^{2} (ω_{2} - ω_{22}^{'}) / 4 + μ_{1} μ_{2} (ω_{12}^{'} - ω_{12}) / 2 \\ + μ_{1} (ω_{01}^{'} - ω_{01}) + μ_{2} (ω_{02} - ω_{02}^{'})] \end{aligned}

(35)

Proof:

Similarly, as the previous theorem makes clear, the proposed estimator $T_{E E}^{(2)}$ takes the following form:

\begin{aligned} T_{E E}^{(2)} - \bar{Y} & = \bar{Y} [e_{0} + μ_{1} {\frac{(e_{1}^{'} - e_{1} + e_{0} e_{1}^{'} - e_{0} e_{1})}{2} - \frac{(e {1_{}^{'}}^{2} - {e_{1}}^{2})}{4}} + μ_{2} {\frac{(e_{2} - e_{2}^{'} + e_{0} e_{2} - e_{0} e_{2}^{'})}{2} - \frac{({e_{2}}^{2} - e {2_{}^{'}}^{2})}{4}} \\ + μ_{1}^{2} \frac{{(e_{1}^{'} - e_{1})}^{2}}{8} + μ_{2}^{2} \frac{(e_{2} - e 2_{}^{'})^{2}}{8} + μ_{1} μ_{2} \frac{(e_{1}^{'} e_{2} - e_{1} e_{2} - e_{1}^{'} e_{2}^{'} + e_{1} e_{2}^{'})}{4}] . \end{aligned}

(36)

Taking expectation on both side of the equation (36), we get

\begin{aligned} B i a s (T_{E E}^{(2)}) & = \bar{Y} [μ_{1}^{2} \frac{(ω_{1} - ω_{11}^{'})}{8} + μ_{2}^{2} \frac{(ω_{2} - ω_{22}^{'})}{8} + μ_{1} {\frac{(ω_{01}^{'} - ω_{01})}{2} - \frac{(ω_{1}^{'} - ω_{1})}{4}} \\ + μ_{2} {\frac{(ω_{02} - ω_{02}^{'})}{2} - \frac{(ω_{2} - ω_{2}^{'})}{4}} + μ_{1} μ_{2} \frac{(ω_{12}^{'} - ω_{12})}{4}] . \end{aligned}

(37)

Furthermore, squaring and taking the expectation of equation (36) gives the $M S q E$ of $T_{E E}^{(2)}$ , which is

\begin{aligned} M S q E (T_{E E}^{(2)}) & = {\bar{Y}}^{2} [ω_{0} + μ_{1}^{2} (ω_{1} - ω_{11}^{'}) / 4 + μ_{2}^{2} (ω_{2} - ω_{22}^{'}) / 4 + μ_{1} μ_{2} (ω_{12}^{'} - ω_{12}) / 2 \\ + μ_{1} (ω_{01}^{'} - ω_{01}) + μ_{2} (ω_{02} - ω_{02}^{'})] \end{aligned}

(38)

Theorem 3.2.2:

The minimum mean square error $(M . M S q E)$ of the proposed estimator $T_{E E}^{(2)}$ at the optimum value of $μ_{1} (= μ_{1 (o p t)})$ and $μ_{2} (= μ_{2 (o p t)})$ is

\begin{aligned} M . M S q E (T_{E E}^{(2)}) = [A_{y} - \frac{(B_{y x}^{2} D_{z} + C_{y z}^{2} A_{x} - 2 C_{y z} B_{x z} B_{y x})}{A_{x} D_{z} - B_{x z}^{2}}], \end{aligned}

(39)

where,

\begin{aligned} μ_{1 (o p t)} & = \frac{2 [(ω_{02} - ω_{02}^{'}) (ω_{12}^{'} - ω_{12}) - (ω_{2} - ω_{22}^{'}) (ω_{01}^{'} - ω_{01})]}{(ω_{1} - ω_{11}^{'}) (ω_{2} - ω_{22}^{'}) - {(ω_{12}^{'} - ω_{12})}^{2}}, \\ μ_{2 (o p t)} & = \frac{2 [(ω_{01}^{'} - ω_{01}) (ω_{12}^{'} - ω_{12}) - (ω_{02} - ω_{02}^{'}) (ω_{1} - ω_{11}^{'})]}{(ω_{1} - ω_{11}^{'}) (ω_{2} - ω_{22}^{'}) - {(ω_{12}^{'} - ω_{12})}^{2}} . \end{aligned}

Proof:

The optimum value of the optimizing constants is obtained by minimizing equation (38) in relation to $μ_{1}$ and $μ_{2}$ . By substituting these values into equation (38) and further simplifying, we obtain the minimum mean square error of $T_{E E}^{(2)}$ .

Remark 2:

The equation (39) will be subject to the same conditions as in remark 1, i.e., $N_{2} > 0$ always and $D_{2} > 0$ if result II is satisfied. This implies $\frac{(B_{y x}^{2} D_{z} + C_{y z}^{2} A_{x} - 2 C_{y z} B_{x z} B_{y x})}{A_{x} D_{z} - B_{x z}^{2}} = \frac{D_{2}}{N_{2}} > 0.$

A note on practicability of the proposed estimators: Reddy (1978) and Srivastava and Jhajj (1983) suggested from a practical standpoint that the antecedent information from past data on the required parameters or their estimate can be used to obtain the values of unknown parameters. It is also possible to use sample observations, which won't have an impact on the performance of the estimator up to the first order of approximation.

4. Theoretical Limitations for Efficiency of Proposed Estimators

To provide a clear understanding of the practical limitations for the applicability of the proposed estimators, theoretical constraints have been established by comparing the efficiency of the suggested estimators in terms of mean square errors against all previously known competing estimators.

For Situation-I

M . M S q E (T_{E E}^{(1)}) < v a r ({\bar{y}}_{s t} *)

; if

(\frac{A_{z} B_{y x}^{2} + A_{x} B_{y z}^{2}}{2 B_{y x} B_{x z} B_{y z}} - 1) > 0

a l w a y s t r u e u n d e r r e m a r k 1

ii.

M . M S q E (T_{E E}^{(1)}) < M S q E (t_{1})

; if

2 B_{y x} B_{x z} + A_{x} (R_{1} B_{x z} - B_{y z}) < A_{z} \frac{{(R_{1} A_{x} + B_{y x})}^{2}}{(R_{1} B_{x z} + B_{y z})}

iii.

M . M S q E (T_{E E}^{(1)}) < M S q E (t_{2})

; if

2 B_{y x} B_{x z} - A_{x} (R_{1} B_{x z} + B_{y z}) < A_{z} \frac{{(R_{1} A_{x} - B_{y x})}^{2}}{(B_{y z} - R_{1} B_{x z})}

iv.

M . M S q E (T_{E E}^{(1)}) < M S q E (t_{3})

; if

β^{2} A_{x} + 2 β B_{y x} + \frac{(B_{y x}^{2} A_{z} + B_{y z}^{2} A_{x} - 2 B_{y z} B_{x z} B_{y x})}{A_{x} A_{z} - B_{x z}^{2}} > 0

;

a l w a y s t r u e u n d e r r e m a r k 1

M . M S q E (T_{E E}^{(1)}) < M S q E (t_{8})

; if

\frac{R_{1}^{2} A_{x}}{4} + R_{1} B_{y x} + \frac{(B_{y x}^{2} A_{z} + B_{y z}^{2} A_{x} - 2 B_{y z} B_{x z} B_{y x})}{A_{x} A_{z} - B_{x z}^{2}} > 0

;

a l w a y s t r u e u n d e r r e m a r k 1

vi.

M . M S q E (T_{E E}^{(1)}) < M S q E (t_{11}^{(1)})

; if

\frac{R_{1}^{2} A_{x}}{4} + R_{2}^{2} A_{z} - 2 R_{2} B_{y z} + B_{y x} - B_{x z} + \frac{(B_{y x}^{2} A_{z} + B_{y z}^{2} A_{x} - 2 B_{y z} B_{x z} B_{y x})}{A_{x} A_{z} - B_{x z}^{2}} > 0

vii.

M . M S q E (T_{E E}^{(1)}) < M . M S q E (T)

; if

(B_{y z} A_{x} - B_{y x} B_{x z})^{2} > 0

;

a l w a y s t r u e

. here,

T = t_{4}

t_{5}

t_{9}

t_{10}

For Situation-II

M . M S q E (T_{E E}^{(2)}) < v a r ({\bar{y}}_{s t} *)

; if

(\frac{D_{z} B_{y x}^{2} + A_{x} C_{y z}^{2}}{2 B_{y x} B_{x z} C_{y z}} - 1) > 0

a l w a y s t r u e u n d e r r e m a r k 2

ii.

M . M S q E (T_{E E}^{(2)}) < M S q E (t_{1})

; if

2 B_{y x} B_{x z} + A_{x} (R B_{x z} - C_{y z}) < D_{z} \frac{{(R A_{x} + B_{y x})}^{2}}{(R B_{x z} + C_{y z})}

iii.

M . M S q E (T_{E E}^{(2)}) < M S q E (t_{2})

; if

2 B_{y x} B_{x z} - A_{x} (R B_{x z} + C_{y z}) < D_{z} \frac{{(R A_{x} - B_{y x})}^{2}}{(C_{y z} - R B_{x z})}

iv.

M . M S q E (T_{E E}^{(2)}) < M S q E (t_{3})

; if

β^{2} A_{x} + 2 β B_{y x} + \frac{(B_{y x}^{2} D_{z} + C_{y z}^{2} A_{x} - 2 C_{y z} B_{x z} B_{y x})}{A_{x} D_{z} - B_{x z}^{2}} > 0

;

a l w a y s t r u e u n d e r r e m a r k 2

M . M S q E (T_{E E}^{(2)}) < M S q E (t_{8})

; if

\frac{R_{1}^{2} A_{x}}{4} + R_{1} B_{y x} + \frac{(B_{y x}^{2} D_{z} + C_{y z}^{2} A_{x} - 2 C_{y z} B_{x z} B_{y x})}{A_{x} D_{z} - B_{x z}^{2}} > 0

a l w a y s t r u e u n d e r r e m a r k 2

vi.

M . M S q E (T_{E E}^{(2)}) < M S q E (t_{11}^{(2)})

; if

\frac{R_{1}^{2} A_{x}}{4} + R_{2}^{2} D_{z} + R_{1} B_{y x} - 2 R_{2} C_{y z} - R_{1} R_{2} B_{x z} + \frac{(B_{y x}^{2} D_{z} + B_{y z}^{2} A_{x} - 2 C_{y z} B_{x z} B_{y x})}{A_{x} D_{z} - B_{x z}^{2}} > 0

vii.

M . M S q E (T_{E E}^{(2)}) < M . M S q E (T)

; if

(C_{y z} A_{x} - B_{y x} B_{x z})^{2} > 0

;

a l w a y s t r u e

.here,

T = t_{4}

t_{5}

t_{9}

t_{10}

Note : 1. Although the theoretical comparison of the mean $(t_{6})$ and ratio $(t_{7})$ estimators of the Searls (1964) type appears challenging, it can be confirmed through numerical analysis (provided in the next section) that the suggested estimators outperform both estimators.

2. After closely examining the $M . M S q E$ , we found that the estimators $t_{4}$ , $t_{5}$ , $t_{9}$ , and $t_{10}$ are equally efficient as the modified regression estimator $t_{3}$ .

5 Numerical Analysis of Performance of Proposed Estimators

This section aims to conduct two types of analyses for efficiency comparison: a simulation analysis using simulated data (symmetric and asymmetric) accounting for population type, and an empirical analysis using two different real-world data sets. During the present numerical analysis, we have taken the same sub-sampling factor k in both the sampling phases which may also vary depending on the choice and demand of the surveyor.

5.1 Efficiency Analysis on Simulated Data

We have used following statistical tools available in R software- mvrnorm (), unonr (), sample (), sampling (), moments ().

For both situations I and II, we have generated hypothetical symmetric and asymmetric data sets with parameters mentioned below with respect to variable $c (Y, X, Z)$ [Readers can obtain the R-codes upon request].

Symmetric Data Set	Asymmetric Data Set
Mean vector $= c (178, 37, 38)$ ,	Mean vector $= c (178, 37, 38)$ ,
Variance-covariance matrix	Variance-covariance matrix
$\begin{array}{l} = [\begin{array}{ccc} 863.7 & 59.35 & 60.47 \\ 59.35 & 5.91 & 3.95 \\ 60.47 & 3.95 & 5.82 \end{array}], \\ ρ = [\begin{array}{ccc} 1.00 & 0.83 & 0.85 \\ 0.83 & 1.00 & 0.66 \\ 0.85 & 0.66 & 1.00 \end{array}] . \end{array}$	$\begin{array}{l} = [\begin{array}{ccc} 863.7 & 59.35 & 60.47 \\ 59.35 & 5.91 & 3.95 \\ 60.47 & 3.95 & 5.82 \end{array}], \\ ρ = [\begin{array}{ccc} 1.00 & 0.83 & 0.85 \\ 0.83 & 1.00 & 0.66 \\ 0.85 & 0.66 & 1.00 \end{array}] . \end{array}$
	Skewness $= c (1.19, 1.54, 1.51)$ ,
	Kurtosis $= c (6.5, 5.84, 7.1)$

The percentage relative efficiency $(P R E)$ and percent relative bias $(P R B)$ at different levels of k of the estimators are shown in Tables 1 and 2. The $P R E$ of the estimators are calculated with respect to ${\bar{y}}_{s t} *$ using, $P R E (T) = v a r ({\bar{y}}_{s t} *) / M S q E (T) \times 100$ . The $P R B$ of the estimators are calculated using, $P R B (T) = {\frac{1}{R e p} \sum_{i = 1}^{R e p} | \frac{T_{i} - \bar{Y}}{\bar{Y}} |} \times 100$ where T = EE, ME and PE.

Table 1.
$P R B$ and $P R E$ of the Estimators Under Situation-I.

Symmetric Data Set Asymmetric Data Set

$k$ Estimator 2 3 4 2 3 4

$P R B$ $P R E$ $P R B$ $P R E$ $P R B$ $P R E$ $P R B$ $P R E$ $P R B$ $P R E$ $P R B$ $P R E$

${\bar{y}}_{s t}^{*}$ 0.62 100 0.70 100 0.77 100 0.60 100 0.67 100 0.72 100

$t_{1}$ 0.48 171.64 0.53 171.81 0.59 169.53 0.47 175.95 0.52 175.98 0.57 175.35

$t_{2}$ 0.82 57.81 0.93 56.33 1.02 57.56 0.79 56.43 0.89 55.11 0.93 55.14

$t_{3}$ 0.43 206.74 0.51 187.52 0.55 195.59 0.43 209.69 0.49 194.10 0.55 192.61

$t_{4}$ $g_{c} = 1$ $a = 1$ $b = 0$ 0.43 206.30 0.51 187.14 0.56 189.77 0.43 209.60 0.49 192.10 0.55 189.12

$g_{c} = 1$ $a = 1$ $b = 1$ 0.43 204.07 0.51 183.96 0.56 186.64 0.43 207.52 0.49 189.02 0.55 185.90

$g_{c}$ = $- 1$ $a = 1$ $b = 0$ 0.43 206.31 0.51 187.08 0.56 189.69 0.43 209.57 0.49 192.113 0.55 189.03

$g_{c}$ = $- 1$ $a = 1$ $b = 1$ 0.43 204.08 0.51 183.90 0.56 186.56 0.43 207.50 0.49 189.03 0.55 185.80

$t_{5}$ 0.43 206.31 0.51 187.14 0.56 189.77 0.43 209.60 0.49 192.12 0.55 189.12

$t_{6}$ 0.62 100.01 0.70 100.01 0.77 100.01 0.60 100.00 0.67 100.01 0.72 99.10

$t_{7}$ 0.48 171.65 0.53 171.82 0.59 169.53 0.47 175.95 0.52 175.99 0.57 175.34

$t_{8}$ 0.54 132.93 0.60 133.98 0.67 132.62 0.53 134.84 0.58 135.81 0.63 135.65

$t_{9}$ 0.43 206.31 0.51 187.14 0.56 189.77 0.43 209.60 0.49 192.12 0.55 189.12

$t_{10}$ 0.43 206.33 0.51 187.15 0.56 189.82 0.43 209.60 0.49 192.08 0.55 189.06

$t_{11}^{(1)}$ 0.36 291.10 0.41 293.79 0.45 290.51 0.36 321.05 0.40 322.24 0.42 328.22

$T_{E E}^{(1)}$ 0.28 507.60 0.31 491.66 0.35 473.88 0.27 525.20 0.31 505.91 0.33 511.45

	Symmetric Data Set	Asymmetric Data Set
${\bar{y}}_{s t}^{*}$	0.62	100	0.70	100	0.77	100	0.60	100	0.67	100	0.72	100
$t_{1}$	0.48	171.64	0.53	171.81	0.59	169.53	0.47	175.95	0.52	175.98	0.57	175.35
$t_{2}$	0.82	57.81	0.93	56.33	1.02	57.56	0.79	56.43	0.89	55.11	0.93	55.14
$t_{3}$	0.43	206.74	0.51	187.52	0.55	195.59	0.43	209.69	0.49	194.10	0.55	192.61
$t_{4}$	$g_{c} = 1$ $a = 1$ $b = 0$	0.43	206.30	0.51	187.14	0.56	189.77	0.43	209.60	0.49	192.10	0.55	189.12
$g_{c} = 1$ $a = 1$ $b = 1$	0.43	204.07	0.51	183.96	0.56	186.64	0.43	207.52	0.49	189.02	0.55	185.90
$g_{c}$ = $- 1$ $a = 1$ $b = 0$	0.43	206.31	0.51	187.08	0.56	189.69	0.43	209.57	0.49	192.113	0.55	189.03
$g_{c}$ = $- 1$ $a = 1$ $b = 1$	0.43	204.08	0.51	183.90	0.56	186.56	0.43	207.50	0.49	189.03	0.55	185.80
$t_{5}$	0.43	206.31	0.51	187.14	0.56	189.77	0.43	209.60	0.49	192.12	0.55	189.12
$t_{6}$	0.62	100.01	0.70	100.01	0.77	100.01	0.60	100.00	0.67	100.01	0.72	99.10
$t_{7}$	0.48	171.65	0.53	171.82	0.59	169.53	0.47	175.95	0.52	175.99	0.57	175.34
$t_{8}$	0.54	132.93	0.60	133.98	0.67	132.62	0.53	134.84	0.58	135.81	0.63	135.65
$t_{9}$	0.43	206.31	0.51	187.14	0.56	189.77	0.43	209.60	0.49	192.12	0.55	189.12
$t_{10}$	0.43	206.33	0.51	187.15	0.56	189.82	0.43	209.60	0.49	192.08	0.55	189.06
$t_{11}^{(1)}$	0.36	291.10	0.41	293.79	0.45	290.51	0.36	321.05	0.40	322.24	0.42	328.22
$T_{E E}^{(1)}$	0.28	507.60	0.31	491.66	0.35	473.88	0.27	525.20	0.31	505.91	0.33	511.45

Table 2.

$P R B$ and $P R E$ of the Estimators Under Situation-II.

		Symmetric Data Set						Asymmetric Data Set
$k$ Estimator		2		3		4		2		3		4
$k$ Estimator		$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$
${\bar{y}}_{s t}^{*}$		0.62	100	0.70	100	0.78	100	0.61	100	0.69	100	0.75	100
$t_{1}$		0.48	175.80	0.53	167.71	0.59	165.03	0.46	175.48	0.52	165.55	0.57	166.04
$t_{2}$		0.82	56.67	0.93	58.10	1.03	57.57	0.82	56.87	0.92	55.83	1.01	58.27
$t_{3}$		0.43	210.75	0.50	191.88	0.57	174.25	0.42	213.16	0.49	166.75	0.54	186.59
$t_{4}$	$g_{c}$ =1 $a = 1$ $b = 0$	0.43	210.91	0.51	191.06	0.57	177.30	0.42	210.32	0.49	172.56	0.55	183.34
	$g_{c}$ =1 $a = 1$ $b = 1$	0.43	208.68	0.51	188.42	0.57	174.63	0.42	207.76	0.49	170.33	0.55	180.31
	$g_{c}$ = $-$ 1 $a = 1$ $b = 0$	0.43	210.92	0.51	191.06	0.57	177.29	0.42	210.33	0.49	172.57	0.55	183.39
	$g_{c}$ = $-$ 1 $a = 1$ $b = 1$	0.43	208.69	0.51	188.42	0.57	174.63	0.42	207.77	0.49	170.34	0.55	180.35
$t_{5}$		0.43	210.92	0.51	191.07	0.57	177.31	0.42	210.33	0.49	172.56	0.55	183.37
$t_{6}$		0.62	100.00	0.70	100.01	0.78	100.00	0.61	100.00	0.69	100.00	0.75	100.02
$t_{7}$		0.48	175.80	0.53	167.72	0.59	165.02	0.46	175.47	0.52	165.55	0.57	166.06
$t_{8}$		0.54	134.64	0.60	131.85	0.67	131.56	0.53	134.43	0.59	132.87	0.65	131.34
$t_{9}$		0.43	210.92	0.51	191.07	0.57	177.31	0.42	210.33	0.49	172.56	0.55	183.37
$t_{10}$		0.43	210.94	0.51	191.09	0.57	177.32	0.42	210.23	0.49	172.58	0.55	183.29
$t_{11}^{(2)}$		0.41	231.40	0.46	226.91	0.51	150.42	0.39	234.54	0.44	232.80	0.49	211.11
$T_{E E}^{(2)}$		0.37	271.29	0.44	238.05	0.51	182.40	0.37	280.90	0.49	250.47	0.50	222.30

5.2 Efficiency Analysis on Real Data

For the empirical investigation of numerical analysis, we used the Hypertension Arterial Mexico Data Set, which is accessible at https://www.kaggle.com/datasets/frederickfelix/hipertensin-arterial-mxico. The data set includes raw information (such as Cholesterol level, gender, different glucose results etc.) taken from the national health and nutrition survey (ENSANUT) https://ensanut.insp.mx/encuestas/ensanutcontinua2022/descargas.php.

In the present investigation, two distinct sets of variables are taken into consideration:

	Combination-1	Combination-2
$Y$	$v a l o r_c o l e s t e r o l_t o t a l$	$v a l o r_h e m o g l o b i n a_g l u c o s i l a d a$
$X$	$v a l o r_c o l e s t e r o l_h d l$	$r e s u l t a d o_g l u c o s a$
$Z$	$v a l o r_t r i g l i c e r i d o s$	$r e s u l t a d o_g l u c o s a_p r o m e d i o$

Based on the particular circumstances surrounding their non-response, we classified 20% of the units as non-respondent groups using gender as the primary stratification criterion, the parameters for Combinations 1 and 2 are shown in Table 3.

Table 3.
Parameters of Real Data Sets.

Data Sets $\to$ Combination-1 Combination-2

Stratum (h) Parameters $↓\to$ $h = 1$ $h = 2$ $h = 1$ $h = 2$

${\bar{Y}}_{h}$ 144.99 143.60 5.39 5.49

${\bar{X}}_{h}$ 35.56 36.33 95.65 97.68

${\bar{Z}}_{h}$ 143.51 133.34 108.32 111.58

${\bar{Y}}_{h (2)}$ 143.71 142.99 5.29 5.43

${\bar{X}}_{h (2)}$ 35.08 36.48 92.98 95.03

${\bar{Z}}_{h (2)}$ 141.42 130.21 105.55 109.36

$S_{Y_{h}}$ 29.48 27.40 0.84 1.01

$S_{X_{h}}$ 9.26 7.23 28.11 53.75

$S_{Z_{h}}$ 91.41 67.45 24.05 36.95

$S_{Y_{h} (2)}$ 19.73 21.34 0.39 0.92

$S_{X_{h} (2)}$ 4.67 6.58 12.48 27.13

$S_{Z_{h} (2)}$ 70.90 61.36 11.07 26.01

$ρ_{Y_{h} X_{h}}$ 0.416 0.551 0.858 0.5776

$ρ_{Y_{h} Z_{h}}$ 0.555 0.518 0.991 0.851

$ρ_{X_{h} Z_{h}}$ 0.058 0.009 0.858 0.487

$ρ_{Y_{h} X_{h} (2)}$ 0.454 0.563 0.653 0.848

$ρ_{Y_{h} Z_{h} (2)}$ 0.462 0.349 0.991 0.993

$ρ_{X_{h} Z_{h} (2)}$ -0.127 -0.059 0.653 0.813

Data Sets $\to$	Combination-1	Combination-2
${\bar{Y}}_{h}$	144.99	143.60	5.39	5.49
${\bar{X}}_{h}$	35.56	36.33	95.65	97.68
${\bar{Z}}_{h}$	143.51	133.34	108.32	111.58
${\bar{Y}}_{h (2)}$	143.71	142.99	5.29	5.43
${\bar{X}}_{h (2)}$	35.08	36.48	92.98	95.03
${\bar{Z}}_{h (2)}$	141.42	130.21	105.55	109.36
$S_{Y_{h}}$	29.48	27.40	0.84	1.01
$S_{X_{h}}$	9.26	7.23	28.11	53.75
$S_{Z_{h}}$	91.41	67.45	24.05	36.95
$S_{Y_{h} (2)}$	19.73	21.34	0.39	0.92
$S_{X_{h} (2)}$	4.67	6.58	12.48	27.13
$S_{Z_{h} (2)}$	70.90	61.36	11.07	26.01
$ρ_{Y_{h} X_{h}}$	0.416	0.551	0.858	0.5776
$ρ_{Y_{h} Z_{h}}$	0.555	0.518	0.991	0.851
$ρ_{X_{h} Z_{h}}$	0.058	0.009	0.858	0.487
$ρ_{Y_{h} X_{h} (2)}$	0.454	0.563	0.653	0.848
$ρ_{Y_{h} Z_{h} (2)}$	0.462	0.349	0.991	0.993
$ρ_{X_{h} Z_{h} (2)}$	-0.127	-0.059	0.653	0.813

To further clarify the methodology of the sampling procedure, a schematic description is provided in Figure 2.

Figure 2.

Sampling design in empirical data set.

The $P R E$ and $P R B$ of the estimators at various levels of k under specified situations I and II for two different combinations of variables are displayed in Tables 4 and 5.

Table 4.

$P R B$ and $P R E$ of the Estimators Under Situation-I.

		Combination-1						Combination-2
$k$ Estimator		2		3		4		2		3		4
$k$ Estimator		$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$
${\bar{y}}_{s t}^{*}$		0.32	100.0	0.34	100.0	0.36	100.0	0.30	100.00	0.32	100.00	0.34	100.00
$t_{1}$		0.35	88.47	0.39	85.69	0.43	89.53	0.28	83.05	0.31	66.76	0.33	58.33
$t_{2}$		0.57	34.27	0.63	34.36	0.68	35.60	0.71	17.99	0.78	17.79	0.83	17.70
$t_{3}$		0.28	124.21	0.30	122.38	0.33	122.79	0.19	190.66	0.20	155.46	0.22	136.44
$t_{4}$	$g_{c}$ = $1$ $a$ = $1$ $b$ = $0$	0.28	124.23	0.30	122.38	0.32	122.82	0.18	199.14	0.19	171.03	0.21	156.50
	$g_{c}$ = $1$ $a$ = $1$ $b$ = $1$	0.28	124.23	0.30	122.38	0.32	122.82	0.18	199.14	0.19	171.03	0.21	156.50
	$g_{c}$ = $- 1$ $a = 1$ $b = 0$	0.28	124.23	0.30	122.38	0.32	122.82	0.18	199.14	0.19	171.03	0.21	156.50
	$g_{c}$ = $- 1$ $a = 1$ $b = 1$	0.28	124.23	0.30	122.38	0.32	122.82	0.18	199.14	0.19	171.03	0.21	156.50
$t_{5}$		0.28	124.23	0.30	122.38	0.32	122.82	0.18	199.14	0.19	171.03	0.21	156.50
$t_{6}$		0.32	100.00	0.34	100.00	0.36	100.00	0.30	100.00	0.32	100.00	0.34	100.00
$t_{7}$		0.35	34.69	0.39	32.96	0.43	34.55	0.28	17.62	0.31	15.42	0.33	14.16
$t_{8}$		0.28	123.59	0.31	121.39	0.33	122.28	0.18	197.42	0.19	164.23	0.20	145.66
$t_{9}$		0.28	124.23	0.30	122.38	0.32	122.82	0.18	199.14	0.19	171.03	0.21	156.50
$t_{10}$		0.28	124.23	0.30	122.38	0.32	122.82	0.18	199.14	0.19	171.03	0.21	156.50
$t_{11}^{(1)}$		0.77	19.49	0.85	20.74	0.98	23.00	0.69	22.63	0.72	25.01	0.73	9.69
$T_{E E}^{(1)}$		0.23	181.65	0.26	176.30	0.28	173.22	0.16	263.25	0.17	240.22	0.18	191.36

Table 5.

$P R B$ and $P R E$ of the Estimators Under Situation-II.

		Combination-1						Combination-2
$k$ Estimator		2		3		4		2		3		4
$k$ Estimator		$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$	$P R B$	$P R E$
${\bar{y}}_{s t}^{*}$		0.32	100.00	0.34	100.00	0.36	100.00	0.30	100.00	0.30	100.00	0.30	100.00
$t_{1}$		0.35	88.08	0.38	89.27	0.41	92.11	0.28	88.49	0.30	71.85	0.32	62.76
$t_{2}$		0.57	34.23	0.62	34.91	0.67	36.35	0.71	18.01	0.73	17.85	0.74	17.76
$t_{3}$		0.28	124.05	0.30	123.69	0.32	123.18	0.19	202.56	0.20	166.62	0.21	146.31
$t_{4}$	$g_{c}$ =1 $a$ =1 $b$ =0	0.28	124.05	0.30	123.71	0.32	123.27	0.18	209.20	0.19	179.75	0.20	163.91
	$g_{c}$ =1 $a$ =1 $b$ =1	0.28	124.05	0.30	123.71	0.32	123.27	0.18	209.20	0.19	179.75	0.20	163.91
	$g_{c}$ = $-$ 1 $a$ =1 $b$ =0	0.28	124.05	0.30	123.71	0.32	123.27	0.18	209.20	0.19	179.75	0.20	163.91
	$g_{c}$ = $- 1$ $a = 1$ $b = 1$	0.28	124.05	0.30	123.71	0.32	123.27	0.18	209.20	0.19	179.75	0.20	163.91
$t_{5}$		0.28	124.05	0.30	123.71	0.32	123.27	0.18	209.20	0.19	179.75	0.20	163.91
$t_{6}$		0.32	100.00	0.34	100.00	0.36	100.00	0.30	100.00	0.30	100.00	0.30	100.00
$t_{7}$		0.35	34.53	0.38	34.54	0.41	36.13	0.28	18.24	0.30	16.14	0.32	14.85
$t_{8}$		0.28	123.37	0.30	123.15	0.32	122.97	0.18	208.41	0.19	174.91	0.20	155.33
$t_{9}$		0.28	124.05	0.30	123.71	0.32	123.27	0.18	209.20	0.19	179.75	0.20	163.91
$t_{10}$		0.28	124.05	0.30	123.71	0.32	123.27	0.18	209.20	0.19	179.75	0.20	163.91
$t_{11}^{(2)}$		0.76	18.07	0.83	19.04	0.91	20.38	0.67	14.28	0.71	24.01	0.73	25.92
$T_{E E}^{(2)}$		0.24	162.59	0.27	158.68	0.28	156.34	0.17	226.10	0.18	203.53	0.20	186.88

Furthermore, it can be observed from a computational analysis of the proposed estimators’ performance on a hypothetical multivariate data set for various combinations of $ρ_{Y X}$ and $ρ_{Y Z}$ at specific values of k that the proposed estimators’ efficiency outperforms all relevant estimators at higher values of $ρ_{Y X}$ and $ρ_{Y Z}$ ; even at the very low correlation of $ρ_{Y Z}$ for each level of $ρ_{Y X}$ , the proposed estimators perform almost equally well as the modified regression estimator. This indicates that the auxiliary information provided by z in the suggested estimators can lead to consistent improvements in performance over competing estimators considered in the text.

6 Discussion and Conclusion

This research contributes to the ongoing efforts to improve the efficiency of estimation procedures when it comes to use the unknown population mean of an auxiliary variable in mean estimation with observed heterogeneity and missing information. The objectives are convinced in mean estimation with observed heterogeneity by putting forth novel exponential estimators based on the dual use of auxiliary information for two different real situations that handle non-ignorable missing data at two concurrent sampling phases. Furthermore, under $S T P S$ , eight prominent and well-known estimators have been modified, and their properties have been examined. The comparative performance of the suggested approach and estimators is listed by simulation and empirical studies using well-known data sets of a multidisciplinary nature in order to verify the theoretical results.

Based on simulation and empirical studies, Tables 1, 2, 4 and 5 show that our proposed estimators are more efficient than various existing and modified estimators (given in section 2) at all levels of sub-sampling factor $(k)$ . This strongly reinforces the concept that the missing data at both phases should be taken into consideration as a persuasive tool when using the unknown population mean of an auxiliary variable.

Footnotes

Acknowledgments

The authors express their gratitude to the Editor-in-Chief and Co-Editor-in-Chief for reviewing the manuscript, and to the distinguished referees for their insightful comments that significantly enhanced the current version of the research paper.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

R. R. Sinha

Anjali Gupta

References

Allehoff

W. H.

Esser

Schmidt

M. H.

Hennicke

(1983). Die Bedeutung der Informations-und Kooperationsverweigerung für die Interpretationsreichweite einer mehrstufigen kinderpsychiatrisch-epidemiologischen Untersuchung. Social Psychiatry, 18(1), 29–36. https://doi.org/10.1007/BF00583385

Azeem

Hanif

(2017). Joint influence of measurement error and non-response on estimation of population mean. Communications in Statistics-Theory and Methods, 46(4), 1679–1693. https://doi.org/10.1080/03610926.2015.1026992

Bahl

Tuteja

(1991). Ratio and product type exponential estimators. Journal Of Information And Optimization Sciences, 12(1), 159–164. https://doi.org/10.1080/02522667.1991.10699058

Bhushan

Pandey

A. P.

(2019). An efficient estimation procedure for the population mean under non-response. Statistica, 79(4), 363–378.

Chaudhary

M. K.

Kumar

(2016). Estimation of mean of finite population using double sampling scheme under non-response. Journal of Statistics Applications and Probability, 5(2), 287–297.

Chaudhary

M. K.

Kumar

Vishwakarma

G. K.

Kadilar

(2020). Family of combined-type estimators for population mean using stratified two-phase sampling scheme under non-response. Journal of Statistics and Management Systems, 23(5), 915–928. https://doi.org/10.1080/09720510.2019.1700937

Dykes

Singh

Sedory

S. A.

Louis

(2015). Calibrated estimators of population mean for a mail survey design. Communications in Statistics–Theory and Methods, 44(16), 3403–3427. https://doi.org/10.1080/03610926.2013.841932

Guha

Chandra

(2021). Improved estimation of finite population mean in two-phase sampling with subsampling of the nonrespondents. Mathematical Population Studies, 28(1), 24–44. https://doi.org/10.1080/08898480.2019.1694325

Hansen

M. H.

Hurwitz

W. N.

(1946). The problem of non-response in sample surveys. Journal of the American Statistical Association, 41(236), 517–529. https://doi.org/10.1080/01621459.1946.10501894

10.

Khare

B. B.

Kumar

(2011). Estimation of population mean using known coefficient of variation of the study character in the presence of non-response. Communications in Statistics-Theory and Methods, 40(11), 2044–2058. https://doi.org/10.1080/03610921003725820

11.

Khare

B. B.

Srivastava

(1995). Study of conventional and alternative two-phase sampling ratio, product and regression estimators in presence of nonresponse. Proceedings of the National Academy of Sciences, India Section A, 65, 195–204.

12.

Khare

B. B.

Srivastava

(1997). Transformed ratio type estimators for the population mean in the presence of nonresponse. Communications in Statistics–Theory and Methods, 26(7), 1779–1791. https://doi.org/10.1080/03610929708832012

13.

Kumar

Bhougal

(2011). Estimation of the population mean in presence of non-response. Communications for Statistical Applications and Methods, 18(4), 537–548. https://doi.org/10.5351/CKSS.2011.18.4.537

14.

Kumar

Zeeshan

S. M.

(2019). Improved two phase sampling exponential ratio and product type estimators for population mean of study character in the presence of non-response. Communications in Statistics-Theory and Methods, 48(9), 2305–2319. https://doi.org/10.1080/03610926.2018.1465082

15.

Okafor

F. C.

Lee

(2000). Double sampling for ratio and regression estimation with sub-sampling the non-respondents. Survey Methodology, 26(2), 183–188.

16.

Rao

J. N. K.

(1983). Ratio estimators. In Kotz

Johnson

N. L.

(Eds.), Encyclopedia of Statistical Sciences. (Vol. 4, pp. 639–646). J. Wiley.

17.

Rao

P. S. R. S.

(1990 ) Regression estimators with subsampling of nonrespondents, In-Data Quality Control, Theory and Pragmatics, (Eds.) Gunar E. Liepins and VRR Uppuluri, Marcel Dekker, New York, 191–208.

18.

Reddy

V. N.

(1978). A study on the use of prior knowledge on certain population parameters in estimation. Sankhya C, 40, 29–37.

19.

Sanaullah

Saleem

Gupta

Hanif

(2022). Mean estimation with generalized scrambling using two-phase sampling. Communications in Statistics-Simulation and Computation, 51(10), 5643–5657. https://doi.org/10.1080/03610918.2020.1778032

20.

Särndal

C. E.

Swensson

. (1987). A general view of estimation for two phases of selection with applications to two-phase sampling and nonresponse. International Statistical Review/Revue Internationale de Statistique, 55(3), 279–294.

21.

Searls

D. T.

(1964). The utilization of a known coefficient of variation in the estimation procedure. Journal of the American Statistical Association, 59(308), 1225–1226. https://doi.org/10.1080/01621459.1964.10480765

22.

Shabbir

Gupta

Ahmed

(2019). A generalized class of estimators under two-phase stratified sampling for non-response. Communications in Statistics–Theory and Methods, 48(15), 3761–3777. https://doi.org/10.1080/03610926.2018.1481969

23.

Shabbir

Khan

N. S.

(2013). Some modified exponential-ratio type estimators in the presence of non-response under two-phase sampling scheme. Electronic Journal of Applied Statistical Analysis, 6(1), 1–17.

24.

Singh

(2003). Advanced Sampling Theory with Applications: How Michael ‘Selected’ Amy (Vol. 2). Springer Science & Business Media.

25.

Singh

H. P.

Kumar

(2008). A regression approach to the estimation of the finite population mean in the presence of non-response. Aust. N.Z.J. Statist, 50(4), 395–408. https://doi.org/10.1111/j.1467-842X.2008.00525.x

26.

Singh

H. P.

Kumar

(2010). Estimation of mean in presence of non-response using two phase sampling scheme. Statistical Papers, 51, 559–582. https://doi.org/10.1007/s00362-008-0140-5

27.

Singh

H. P.

Singh

Kim

J. M.

(2006). General families of chain ratio type estimators of the population mean with known coefficient of variation of the second auxiliary variable in two phase sampling. Journal of the Korean Statistical Society, 35(4), 377–395.

28.

Singh

H. P.

Tailor

Allen

Kozak

(2009). Estimation of ratio of two finite-population means in the presence of non-response. Communications in Statistics—Theory and Methods, 38(19), 3608–3621. https://doi.org/10.1080/03610920802610100

29.

Singh

G. N.

Suman

Kadilar

(2018). On the use of imputation methods for missing data in estimation of population mean under two-phase sampling design. Hacettepe Journal of Mathematics and Statistics, 47(6), 1715–1729. https://doi.org/10.15672/HJMS.2018.560

30.

Sinha

R. R.

Khanna

(2023). Two-Phase ratio estimation using ordinal and ratio auxiliary variables in non-response. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 93(4), 695–702. https://doi.org/10.1007/s40010-023-00824-0

31.

Srivastava

S. K.

Jhajj

H. S.

(1983). A class of estimators of the population mean using multi-auxiliary information. Calcutta Statistical Association Bulletin, 32(1–2), 47–56. https://doi.org/10.1177/0008068319830104

32.

Tabasum

Khan

I. A.

(2004). Double sampling for ratio estimation with non-response. Journal of the Indian Society of Agricultural Statistics, 58(3), 300–306.

33.

Tripathi

T. P.

Khare

B. B.

(1997). Estimation of mean vector in presence of non-response. Communications In Statistics-Theory And Methods, 26(9), 2255–2269. https://doi.org/10.1080/03610929708832045

34.

Unal

Kadilar

(2022). A new population mean estimator under non-response cases. Journal of Taibah University for Science, 16(1), 111–119. https://doi.org/10.1080/16583655.2022.2034343

35.

Ünal

Kadilar

(2023). Improved population mean estimator with exponential function under non-response. Applied Mathematics-A Journal of Chinese Universities, 38(4), 562–580. https://doi.org/10.1007/s11766-023-4572-4

36.

Yaqub

Shabbir

Gupta

S. N.

(2017). Estimation of population mean based on dual use of auxiliary information in non-response. Communications in Statistics-Theory and Methods, 46(24), 12130–12151. https://doi.org/10.1080/03610926.2017.1291969

Data Sets $\to$	Combination-1		Combination-2
Stratum (h) Parameters $↓\to$	$h = 1$	$h = 2$	$h = 1$	$h = 2$
${\bar{Y}}_{h}$	144.99	143.60	5.39	5.49
${\bar{X}}_{h}$	35.56	36.33	95.65	97.68
${\bar{Z}}_{h}$	143.51	133.34	108.32	111.58
${\bar{Y}}_{h (2)}$	143.71	142.99	5.29	5.43
${\bar{X}}_{h (2)}$	35.08	36.48	92.98	95.03
${\bar{Z}}_{h (2)}$	141.42	130.21	105.55	109.36
$S_{Y_{h}}$	29.48	27.40	0.84	1.01
$S_{X_{h}}$	9.26	7.23	28.11	53.75
$S_{Z_{h}}$	91.41	67.45	24.05	36.95
$S_{Y_{h} (2)}$	19.73	21.34	0.39	0.92
$S_{X_{h} (2)}$	4.67	6.58	12.48	27.13
$S_{Z_{h} (2)}$	70.90	61.36	11.07	26.01
$ρ_{Y_{h} X_{h}}$	0.416	0.551	0.858	0.5776
$ρ_{Y_{h} Z_{h}}$	0.555	0.518	0.991	0.851
$ρ_{X_{h} Z_{h}}$	0.058	0.009	0.858	0.487
$ρ_{Y_{h} X_{h} (2)}$	0.454	0.563	0.653	0.848
$ρ_{Y_{h} Z_{h} (2)}$	0.462	0.349	0.991	0.993
$ρ_{X_{h} Z_{h} (2)}$	-0.127	-0.059	0.653	0.813