In survey sampling, it is observed that researchers and users of statistics sometimes do not take into consideration the tool that will be most appropriate for the measure of location. As a result, they often go for the mean or total, which has wider coverage in the finite population sampling literature, unlike the median, which is more complicated to deal with given that it has to do with ordered data. Keeping in mind the established facts from the literature on the usefulness of the median estimator in estimating economic indicators for high precision and efficiency, this study has made useful improvement in estimating the population median not only for gains in efficiency but also in achieving less biased estimates. The study suggests an estimator of population median in single and double sampling techniques. In addition, minimum mean square error has also been obtained for a given cost function under double sampling. Results obtained from both theoretical and empirical investigations reveal that the proposed estimators perform better when the considered variables are from a highly skewed distribution, such as income, expenditure, scores, etc. Moreso, it is observed that the proposed estimators compete favorably with less bias and outstanding gains in efficiency than the existing estimators of its class. In addition, this study avails us of an appropriate way of constructing the cost function for better evaluations compared to an existing estimator considered in this work.
Most often in survey sampling, statisticians come across variables that have highly skewed distributions, such as income, expenditure, scores, etc. In such situations, considering the tool that will be most appropriate for the measurement of location becomes essential. The median, unlike the mean or total, which has been widely discussed in finite population sampling, seems to be more complicated to deal with since it has to do with ordered data, thus deserves special attention. Kuk and Mak (1989) were the first to introduce the estimation of the population median of the study variate Y using auxiliary information in survey sampling. Francisco and Fuller (1991) also considered the problem of estimating the median as part of the estimation of a finite population distribution function. Several authors have made useful contributions to improving the precision of survey estimates of population parameters using auxiliary variables. Notable among them are Bahl and Tuteja (1991), who proposed both exponential ratio and product estimators for estimating population medians; Singh, Singh, and Puertas (2003); and Kadilar and Cingi (2004), who made attempts to modify the exponential ratio estimators with the introduction of several parameters to improve efficiency.
In what follows, given that weight adjustments in survey sampling are gaining a lot of attention towards improving the precision of estimates and given the robustness of the exponential estimators proposed by Bahl and Tuteja (1991), researchers in this area have adopted several procedures for modifying these estimators to enhance the performance of the median estimators. In most cases, these procedures lead to the same result for the mean square error (MSE) of the median estimator. Singh and Solanki (2013), Aladag and Cingi (2015), and Enang et al. (2016) are but few cases in point. As a deviation, Iseh (2020) calibrated a separate-ratio exponential estimator, and obtained a better result compared to other existing estimators in single-phase sampling.
To further enhance the performance of the median estimators, authors like Singh et al. (2001), Singh, Joarder, and Tracy (2003), and Singh, Singh, and Upadhyay (2007) have adopted the double sampling procedure to improve the efficiency of the estimators. However, Jhajj, Kaur, and Jhajj (2016), Biag, Masood, and Terray (2019), and Iseh (2021) have utilized the option of modifying the exponential estimators through double sampling, which has yielded fruitful results and showed prominence over existing estimators in single and two-phase sampling. Keeping in mind the usefulness of the median estimator in estimating economic indicators and the need for high precision and efficiency, this study seeks to make useful improvements in estimating the population median not only for gains in efficiency but also in achieving asymptotically unbiased estimates.
Methodology
Notations
Consider a finite population with size . Let , , and be the study, auxiliary, and support variables respectively. Let represents the samples of the interest variable and and represents the samples of the auxiliary and support variables respectively known for every unit in the population for the element drawn under SRSWOR. Let , , and represent the density functions of the random variables with , , and as the samples estimates of the population median , and respectively. Also, suppose be the integer satisfying and be the proportion of values in the sample that are less than or equal to the median value which denotes the unknown population parameter. If denote the of then, with correlation coefficient , where , , where , and , where . Kuk and Mak (1989) defined a matrix of proportion as shown in Table 1.
Matrix of proportion
,
,
Total
Total
1
Some standard derivations for special class of separate estimators
Adopting the concept by Srivastava (1971) and Srivastava and Jhajj (1995)
Let , and , where , , and , then
where it is assumed that as the distribution of the trivariate variable approaches a continuous distribution with marginal densities , , and , for and respectively. This assumption holds in particular under a super population model framework, treating the values of in the population as a realization of independent observations from a continuous distribution. We also assume that and , are non-negative.
Related existing estimators in literature
This section considers some existing estimators with one and two auxiliary variables in single stage sampling.
i: The classical median estimator due to Gross (1980) is given by
ii: The classical ratio median estimator by Kuk and Mak (1989) is given by
iii: The exponential ratio median estimator following Bahl and Tuteja (1991) is given by
iv: The exponential product-type median estimator following Bahl and Tuteja (1991) is given by
v: The alternative exponential median estimator due to Enang et.al. (2016) is given by
vi: Shabbir and Gupta (2017) suggested generalized difference-type estimator for population median as
where and are defined to be unknown population parameters and , and are scalar quantities which can take different values like and , and
and
The expressions for the bias and the mean square error up to the first order of approximation are as follows:
and
vii: Baig, Masood and Tarray (2019) suggested an improved class of difference-type estimators for population median using two auxiliary variables
where
viii: Iseh (2021) proposed a separate ratio exponential estimator of the form
And the minimum bias given for
is
The proposed estimators
Let assume values in abounded closed convex subset of the two dimensional real space containing the point . Let be a function of and such that
, then the following conditions are satisfied
The function is continuous and bounded in .
The first and second partial derivatives of exist and are continuous and bounded in .
Following Srivastava (1971), this particular class of estimator of the population median, , is defined as
To obtain the bias and mean square error of the technique for expansion of a general class of estimators by Srivastava (1971) is adopted. Hence, the function is expanded about the point in a second order Taylor’s series as shown in Sections 4.1 and 5.2.
Equation (9) can explicitly be written for single and two phase estimators as shown in Eqs (10 and (21).
The proposed estimator under simple random sampling
where and are unknown constants obtained while minimizing the MSE of as
then
Application
To validate the theoretical claims, empirical investigations are carried out using data statistics in Table 2. To obtain the percent relative efficiencies () of the estimators, the MSE values of the existing and proposed estimators are computed, thus;
where is the MSE of classical median estimator and denotes the MSE of proposed estimators.
Descriptive statistics
The data statistics for population I, II, III, and IV are given in Table 2.
Data Statistics from four populations under simple random and two-phase sampling
Statistics
Population I
Population II
Population III
Population IV
69
97
67
97
17
33
15
24
24
46
23
46
2068
1242
4.8
21.4
2011
1233
7.0
22.8
2307
1207
151
22.6
0.1505
0.2096
0.6624
0.48
0.1431
0.15
0.7592
0.45
0.3166
0.123
0.8624
0.44
0.00014
0.00021
0.0763
2.303
0.00014
0.0002
0.0526
2.510
0.00013
0.0002
0.0024
2.398
Results for numerical comparison of AB, MSE and PRE under simple random sampling
Population I
Population II
Population III
Population IV
Est
AB
MSE
PRE
AB
MSE
PRE
AB
MSE
PRE
AB
MSE
PRE
0.0
5.7
100
0.0
1.1
100
0.0
0.66
100
0.0
1.5
100
246.3
9.9
57
81.7
1.9
60
0.05
0.44
149
0.0
1.4
107
87.3
6.3
90
28.3
1.2
95
0.01
0.39
170
0.0
1.1
136
15.0
8.0
71
2.7
1.7
67
0.03
1.26
53
0.0
2.4
63
4.1
5.5
102
150.9
1.1
105
0.02
0.37
179
0.0
1.1
136
2.0
4.9
115
2.9
1.0
113
0.84
0.29
229
15.5
3.59
0.0
4.6
5.0
113
5.0
1.1
106
0.26
0.17
391
0.0
1.0
150
208.0
5.2
109
89.6
1.3
88
0.07
0.20
327
0.0
1.1
136
208.3
5.0
113
66.0
1.1
106
0.09
0.17
391
0.0
1.0
150
Population I: Let , and respectively be the number of fish caught by the marine recreational fisherman in years 1995, 1994 and 1993 in USA given by Singh (2003a).
Population II: Let be the district-wise tomato production (tones) in 2003, as a district-wise tomato production (tones) in 2002 and as a district-wise tomato production (tones) in 2001 given by MFA (2004).
Population III: Let be the U.S. exports to Singapore in billions of Singapore dollars, , the money supply figures in billions of Singapore dollars and is the local supply in U.S. dollars given by Aczel and Sounderpandian (2004).
Population IV: The study variable y is considered as total fertility rate, the supplementary variable is defined as crude birth rate and is considered as crude death rate. A transformation has been applied on the original data of the variables and the minimum value is for all the variables. The transformation of variables is defined as ; , and Source: Silverman (1986) and Singh (2003b).
Results
The results to validate the theoretical claims for the single phase sampling computed for the bias, mean square error, and percent relative efficiency are given in Table 3 for population I, II, and III, IV respectively.
Two phase sampling
Existing estimators under two phase sampling
This section considers some existing estimators with one and two auxiliary variables in double sampling.
(i) Singh, Joarder, and Tracy (2003) suggested a ratio estimator for median in two phase Sampling
(ii) Singh, Singh, and Upadhyay (2007) studied a ratio-type estimator of median using two auxiliary variables
(iii) Jhajj, Kaur, and Jhajj (2016) defined ratio-exponential-type estimator as
(iv) Baig, Masood and Tarray (2019) suggested an improved class of difference-type estimators for population median under two phase sampling with two auxiliary variables
where and are constants given as and
(v) Iseh (2021) proposed a separate ratio exponential estimator of the form
For optimum value of the MSE of
Proposed estimator under two phase Sampling
where , and
Note: Obviously, it is worthy of note that the proposed estimators in single and two phase sampling are special members of the generalized class of estimators proposed by Srivastava (1971), where , for two supplementary information, and , and , with the estimator .
Optimum sample sizes for fixed cost and variances
Existing estimators
i) Following Gross (1980), the cost function for the usual median estimator is given as
where is the fixed cost of the survey, and as the cost per unit in obtaining information from the study variable, such that the minimum variance is obtained as
ii) Following Singh et. al. (2001), the cost function for an estimator with single auxiliary variable in double sampling is given as , where is the cost per unit in obtaining information from the auxiliary variable in the first phase.
where , and
iii) Baig, Masood and Tarray (2019), proposed a cost function for two auxiliary variables under double sampling as
with minimum mean square error as
where
Under the suggested estimator
Here, what comes into mind is whether the reduction in variability is worth the extra expenditure required to observe the auxiliary variables.
Consider a cost function with as the fixed as the cost per unit in obtaining information from the study variable in the second phase, while and , be the cost per unit in obtaining information from the auxiliary/helping variables in the first phase respectively. Then following Singh, Joarder and Tracy (2001) and Allen et al. (2002) a cost function for two auxiliary variables under double sampling is given as;
In the foregoing, the optimum first and second phase sample sizes for the fixed cost as well as the fixed variance cases are obtained respectively. By considering the Lagrange function;
Differentiating Eq. (25) partially with respect to and and solving gives respectively
substituting Eqs (26), and (27) in Eq. (19), we obtained the minimum MSE as
Application
The data statistics for population I, II, III, IV as given in Table 2 will be used in the empirical investigation under two-phase sampling scheme as seen in Table 4.
Results for Numerical comparison of AB, MSE, and PRE under two-phase Sampling
Population I
Population II
Population III
Population IV
Est
AB
MSE
PRE
AB
MSE
PRE
AB
MSE
PRE
AB
MSE
PRE
0.0
5.7
100
0.0
1.1
100
0.0
2.22
100
0.0
1.5
100
26.0
7.3
78
8.1
1.5
78
0.03
1.89
117
0.0
1.4
107
14.7
5.1
112
2.1
1.1
103
0.09
0.57
390
0.0
1.1
136
245.5
5.1
112
285.3
1.1
103
0.0
0.57
390
0.0
1.1
136
25.7
5.0
112
1.4
1.1
104
0.35
0.13
1677
0.0
1.0
150
95.0
5.4
106
44.6
1.2
93
0.04
1.03
216
0.0
1.2
125
37.8
3.4
168
15.5
6.6
170
0.09
1.25
178
0.0
7.0
214
Population V: Consider the information provided in Population IV. In an institute, the Director fixed a cost as for conducting a survey to estimate the median of total fertility rate in the world. Source: Silverman (1986) and Singh (2003b).
PRE of some estimators in two-phase sampling over for various sample sizes
10
40
107.7
161.5
127.3
323.1
50
107.7
168.0
127.3
381.8
60
107.7
168.0
127.3
466.7
15
40
112.5
158.8
128.6
270.0
50
108.0
168.8
128.6
337.5
60
108.0
168.8
128.6
385.7
20
40
105.6
158.3
126.7
211.2
50
111.8
158.3
126.7
271.4
60
111.8
172.7
126.7
316.7
25
40
107.7
140.0
127.3
175.0
50
107.7
155.6
127.3
233.3
60
107.7
155.6
127.3
280.0
30
40
110.0
127.5
122.2
157.1
50
110.0
157.1
122.2
220.0
60
110.0
157.1
137.5
275.0
PRE of some estimators in two-phase sampling over under fixed cost
Cost ($)
250
15
0.0
12284.8
43469.2
47091.6
2.5
12284.8
43469.2
47091.7
5.0
12284.8
43469.2
43469.2
20
0.0
11772.9
43469.2
43469.2
2.5
11772.9
40364.3
40364.3
5.0
11772.9
40364.3
37673.3
300
15
0.0
8561.8
33635.7
33635.7
2.5
8561.8
31393.3
31393.3
5.0
8561.8
31393.3
31393.3
20
0.0
8261.4
31393.3
31393.3
2.5
8261.4
31393.3
29431.3
5.0
8261.4
29431.3
27700.0
350
15
0.0
6404.8
25218.8
25218.8
2.5
6404.8
25218.8
23735.3
5.0
6404.8
23735.3
22416.7
20
0.0
6113.6
23735.3
22416.7
2.5
6113.6
23735.3
21236.8
5.0
6113.6
22416.7
21236.8
Results under two phase sampling
The results to validate the theoretical claims for the two phase sampling computed for the absolute bias (AB), mean square error (MSE), and percent relative efficiency (PRE) of the existing and proposed estimators are given in Table 4 for population I, II, III,IV respectively. Also, Table 5 shows the performance of the proposed estimator over the existing estimators under percent relative efficiency with various sample sizes. In addition, Table 6 is the result for PRE under a fixed cost of the survey for the proposed estimator , and some existing estimators , and some existing estimators , and .
Discussion
From the results in Table 3 , it is observed that the proposed estimator competes favorably with the existing estimators for the four populations considered in this study. As seen in the theoretical derivation, under single phase sampling, both the proposed estimators and the existing estimator have the same MSE and have outperformed other existing estimators considered in this study, except , which performed better in populations I and II. As a result, the proposed estimator, having been shown to be less biased than and , with smaller MSE than in populations III and IV is considered a better estimator in estimation of population median.
Under two-phase sampling, as shown in Table 4, the proposed estimator has a favorable bias compared to other existing estimators. In terms of MSE and PRE, the proposed estimator outperformed other existing estimators in populations I, II, and IV. Hence, has a remarkable gain in efficiency compared to (which has the same PRE performance under single phase) and other existing estimators considered in this study. This superiority in the gains in efficiency of the proposed estimator becomes a direction in the formulation of models in median estimation, and the choice of the auxiliary variables.
Again, under two phase sampling, population was examined with different sizes of the first phase sample units varying against different sizes of the second phase sample units. As shown in Table 5, an increase in the first-phase sample size and a fixed second-phase sample size result in outstanding. performance of the proposed estimator in terms of gains in efficiency relative to the classical median and other existing estimators and under two phase sampling.
Most times, in survey sampling, it becomes imperative to find an estimator with a minimum MSE under a fixed cost of the survey (See Allen et al. (2002)). This is illustrated using data from population to estimate the median of total fertility rate in the world. As shown in Table 6 , the proposed estimator, performs better in terms of gains in percent relative efficiency than the ratio estimator and the difference estimator for a fixed cost of the survey . However, (apparently the incorrect cost function version of Baig, Masood and Tarray (2019)), seems to have a slight edge over the proposed estimator as (the cost of the survey in enumerating the study variable), increases. This is so because the authors used in enumerating the study variable for a large first-phase sample size thereby creating a trade-off between getting efficiency for a higher cost. Whereas, the proposed estimator has shown some fruitfulness and cost effectiveness in enumerating the study variable with improved efficiency, which agrees with the concept of a double sampling scheme.
Conclusion
This study was conceived to elucidate the direction of formulating models for enhancing efficiency in the estimation of the population median. It has been observed that the proposed estimators in single and two-phase sampling are special members of the generalized class of estimators proposed by Srisvastava (1971) with the median of two auxiliary variables. With several works done on improving the exponential ratio estimator, which seems to be the most robust among other classes of estimators for estimating the population median, one could visibly see that the proposed estimator has favorable qualities in single-phase sampling and stands out in two-phase sampling compared to others. Having examined the proposed estimator and other existing estimators in double sampling with varying sample sizes and a fixed cost of the survey, it is obvious that the former will be preferred for estimating the population median in two-phase sampling for greater gains in efficiency with a minimum cost. Consequently, it suffices to conclude that the proposed estimator will be suitable and highly recommended when the variable under study has a skewed distribution.
Footnotes
Acknowledgments
With profound gratitude, I acknowledge the anonymous reviewer for his expert review of the manuscript to see that it meets the required standard of the Journal.
References
1.
AczelA.D., & SounderpandianJ. Complete business statistics. 5th ed. New York: McGraw Hill, 2004.
2.
AladagS., & CingiH. (2015). Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Communication in Statistics-Theory and Methods, 45(5), 1013-1032. doi: 10.1080/03610926.2012.753090.
3.
AllenJ.SinghH.P.SinghS.SmarandacheF. (2002). A generalized class of estimators of population median using two auxiliary variables in double sampling: In Randomness and Optimal Estimation in Data Sampling, (2 ed. pp. 26-43). American Research Press, USA, .
4.
BahlS.TutejaR.K. (1991). Ratio and product type exponential estimator. Journal of Information and Optimization Sciences, 12(1), 159-164.
5.
BaigA.MasoodS.TarrayT.A. (2019). Improved class of difference-type estimators for population median in survey sampling. Communication in Statistics-Theory and Methods. doi: 10.1080/03610926.2019.1622017.
6.
EnangE.I.EtukS.I.EkpenyongE.J.AkpanV.M. (2016). An alternative Exponential estimator of population median. International Journal of Statistics and Economics, 17(3): 85-97.
7.
FranciscoC.A. & FullerW.A. (1991). Quantile estimation with a complex survey design. Ann. Statist, 19, 454-469.
8.
GrossT.S. (1980). Median estimation in sample surveys. in American Statistical Association Proceedings of Survey Research methodology Section, 181-184.
9.
IsehM.J. (2020). Enhancing efficiency of ratio estimator of population median by calibration techniques. International Journal of Engineering Sciences & Research Technology, 9(8), 14-23.
10.
IsehM.J. (2021). Towards the efficiency of the ratio estimator for population median in sampling survey. International Journal of Innovation Science, Engineering & Technology, 8(6), 518-533.
11.
JhajjH.S.KaurH. & JhajjP. (2016). Efficient family of estimators of median using two-phase sampling design. Communications in Statistics-Theory and Methods, 45(15), 4325-31. doi: 10.1080/03610926.2014.911912.
12.
KadilarC. & CingiH. (2004). Ratio estimators in simple random sampling. Applied Mathematical Computations, 151, 893-902.
13.
KukA.Y.C., & MakT.K. (1989). Median estimation in the presence of auxiliary variable. Journal of Royal Statistical Society. Series B, 51, 261-269. doi: 10.1111/j.2517-6161.1989.tb01763.x.
14.
MFA. 2004. Crops area production, Government of Pakistan, Ministry of Food, Agriculture and Livestocks. Islamabad, Pakistan: Economic Wing.
15.
ShabbirJ. & GuptaS. (2017). A generalized class of difference-type estimator for population median in survey sampling. Hacettepe Journal of Mathematics and Statistics, 46, 1015-28. doi: 10.15672/HJMS.201610614759.
16.
SilvermanB.W. (1986). Density estimation for statistics and data analysis. Monographs on statistics and applied probability. London: Chapman and Hall.
17.
SinghS. (2003a). Advanced Sampling Theory and Applications: How Michael ‘Selected’ Amy. Volume I and II. Kluwer academics Publishers, the Netherlands.
18.
SinghS. (2003b). Advanced Sampling Theory With Applications: How Michael Selected Amy (Vol. 2), Springer Science & Business Media.
19.
SinghS.JoarderA.H. & TracyD.S. (2001). Median estimation using double sampling. Australian and New Zealand Journal of Statistics, 43, 33-46. doi: 10.1111/1467-842X.00153.
20.
SinghS.SinghH.P. & UpadhyayaL.N. (2007). Chain ratio and regression-type estimators for median estimation in survey sampling. Statistical Papers, 48, 23-46. doi: 10.1007/s00362-006-0314y.
21.
SinghH.P.SinghS. & PuertasS.M. (2003). Ratio-type estimators for the median of finite populations. Allegemeines Statistisches Archiv, 87, 369-382.
22.
SinghH.P. & SolankiR.S. (2013). Some classes of estimators for the population median using auxiliary information. Communication in Statistics, 42, 4222-4238.
23.
SrivastavaS.K. & JhajjH.S. (1995). Classes of estimators of finite population mean and variance using auxiliary information. Jour. Ind. Soc. Ag. Statistics, 47(2), 119-128.
24.
SrivastavaS.K. (1971). A generalized estimator for the mean of a finite population using multi-auxiliary information. Jour. Amer. Statist. Assoc., 66(334), 404-407.