Ranked set sampling with varied order statistics for skew distributions

Abstract

Ranked Set Sampling (RSS) is a method of sampling that can be advantageous when quantification of all sampling units is costly but when small sets of units can be ranked according to the character under investigation by means of the methods not requiring actual measurements. The units corresponding to each rank are used in RSS and it performs better than simple random sampling (SRS) while estimating the population mean and other population parameters. In this paper, a new RSS procedure (RSSVO) for estimating the population mean of skew distributions is suggested. RSSVO measures only one or two order statistics depending upon the set size. The proposed estimator under RSSVO is then compared with the estimators based on SRS and RSS with equal allocation and Neyman’s optimal allocations. It is shown that the relative precisions of the estimators based on RSSVO are higher than those of the estimators based on SRS and RSS (both equal and Neyman’s optimal allocation) when the distributions under consideration are highly positive skew. Further, it is shown that, the performance of the proposed estimator increases as the skewness increases by using the example of lognormal distribution.

Keywords

Ordered observations Neyman’s allocation relative precision skewness unbiased estimator

1. Introduction

Ranked Set Sampling (RSS) for estimating a population mean was introduced by McIntyre (1952) as a cost-efficient alternative to simple random sampling (SRS) if observations can be ranked according to the characteristic under investigation by means of visual inspection or other methods not requiring actual measurements. Dell and Clutter (1972) and Takahasi and Wakimoto (1968) provided mathematical foundation for RSS. Dell and Clutter also showed that the estimator for population mean based on RSS is at least as efficient as the estimator based on SRS with the same number of observations even when there are ranking errors. RSS is a nonparametric procedure. However, RSS has also been used in the parametric setting (see Bhoj, 1997a, 1997b; Bhoj & Ahsanullah, 1996; Lam et al., 1994, 1996; Stokes, 1995). Most of the distributions considered by these investigators belong to the family of random variables with cumulative distribution function of the form $F\left({\frac{x-\mu}{\sigma}}\right)$ , where $\mu$ and $\sigma$ are the location and scale parameters, respectively.

The selection of ranked set sample of size $k$ involves drawing $k$ random samples with $k$ units in each sample from a population for which an estimate of the mean is required. The units in each sample are ranked by using judgment or other inexpensive methods. The unit with lowest rank is measured from the first sample, the unit with second lowest rank is measured from the second sample, and this procedure is continued until the unit with the highest rank is measured from the last sample. The $k^{2}$ ordered observations in $k$ samples can be displayed as:

$\displaystyle y_{(11)},y_{(12)},\ldots,y_{(1k)}$ $\displaystyle y_{(21)},y_{(22)},\ldots,y_{(2k)}$ $\displaystyle\ldots$ $\displaystyle y_{(k1)},y_{(k2)},\ldots,y_{(kk)}$

We measure only $k({y_{(ii)},i=1,2\ldots,k})$ diagonal observations, and they constitute the RSS. McIntyre (1952) proposed the average of these diagonal observations as an unbiased estimator of population mean regardless of errors in ranking. It is noted that these $k$ observations are independently distributed. In RSS, $k$ is usually small to reduce the ranking errors and therefore, to increase the sample size, the above procedure is repeated $m\geqslant 2$ times to get the sample of size $n=mk$ . Since each order statistic is measured equal number of times, therefore, this procedure is called the balanced RSS or RSS with equal allocation. For convenience, it is assumed that $m=1$ . Suppose that the mean and variance of the $i^{\text{th}}$ rank order statistic for set size $k$ are denoted by $\mu_{(i:k)}$ and $\sigma_{(i:k)}^{2}$ , respectively. The unbiased estimator of population mean, $\mu$ under RSS with equal allocation is given by

$\displaystyle\bar{y}_{\textit{eql}}=\frac{1}{k}\sum\limits_{i=1}^{k}{y_{(ii)}}% ,\text{ with}$ $\displaystyle\textit{Var}({\bar{y}_{\textit{eql}}})=\frac{1}{k^{2}}\sum\limits% _{i=1}^{k}{\sigma_{(i:k)}^{2}}.$ (1)

It is known that RSS with equal allocation is more precise method than the SRS (Kaur et al., 1994). However, the gain in the performance of the RSS can be further improved when an appropriate unequal allocation for the order statistics is made. For the skewed distribution, Neyman’s allocation $m_{i}=\frac{n\sigma_{(i:k)}}{\sum\limits_{i=1}^{k}{\sigma_{(i:k)}}}$ provides the optimal allocation (Bhoj & Chandra, 2019). In this model, sample size corresponding to each order statistics is proportional to the standard deviation of that order statistics. The unbiased estimator of $\mu$ under Neyman’s optimal allocation model is given by

$\displaystyle\bar{y}_{\textit{Ney}}=\frac{1}{k}\sum\limits_{i=1}^{k}{\frac{T_{% i}}{m_{i}}},\text{ with}$ $\displaystyle\textit{Var}({\bar{y}_{\textit{Ney}}})=\frac{\bar{\sigma}^{2}}{n},$ (2)

where $T_{i}$ denotes the sum of all measured $y$ -values of $m_{i}$ observations of $i^{\text{th}}$ order statistic and $\bar{\sigma}=\frac{1}{k}\sum\limits_{i=1}^{k}{\sigma_{(i:k)}}$ is the average within-rank standard deviation.

In the next Section, RSS with varied order statistics (RSSVO) is proposed for positively skew distributions. Then the proposed estimator based on RSSVO is compared with the estimators based on SRS and RSS scheme.

2. RSS with varied order statistics (RSSVO)

There are some RSS procedures for skew distributions suggested by investigators (see Bhoj & Chandra, 2019; Chandra et al., 2018; Bhoj & Kushary, 2016; Tiwari & Chandra, 2011; Kaur et al., 1994). In this paper, RSSVO for highly positive skew distributions has been proposed to estimate the population mean $\mu$ . The broad steps in RSSVO are same as in RSS. In comparison, RSS measures only one order statistics from each sample, however, RSSVO measures either one or two order statistics from each sample. In total, RSS measures all order statistics $({y_{(ii)},i=1,2\ldots,k})$ , however, RSSVO measures only one or two order statistics $({y_{(ir)},i=1,2\ldots,k})$ depending upon the set size $k$ . In both the schemes, $k$ observations are accurately measured out of $k^{2}$ ranked observations. In RSSVO, for set size $k$ , the $r^{\text{th}}$ order statistics of $i^{\text{th}}$ sample $(y_{(ir)})$ is measured, where for $i=1,2,\ldots,k,r$ is given by

$\displaystyle r=\left\{{\begin{array}[]{l}k-1,\text{for }k=2,3,4\\ k-1\text{ from first two samples and }k-2\text{ from last three samples},\text{for }k=5\\ k-2,\text{for }k=6,7\\ \end{array}}\right.$ (3)

It may be noted that for set size $k$ , the value of $r$ changes from $({k-1})$ to $({k-2})$ when $k$ increases from 4 to 6. For $k=$ 5, the appropriate order statistics to be chosen is still in the process of moving from $({k-1})$ to $({k-2})$ . Therefore, an appropriate combination of $({k-1})^{\text{th}}$ and $({k-2})^{\text{th}}$ order statistics is proposed.

Now, the estimator for population mean $\mu$ based on RSSVO scheme is proposed as:

$\displaystyle\bar{y}_{VO}=\left\{{\begin{array}[]{l}\frac{1}{k}\sum\limits_{i=% 1}^{k}{y_{i(k-1)}},\text{for }k=2,3,4\\ \frac{1}{k}\left[{\sum\limits_{i=1}^{2}{y_{i(k-1)}+\sum\limits_{i=3}^{k}{y_{i(% k-2)}}}}\right],\text{for }k=5\\ \frac{1}{k}\sum\limits_{i=1}^{k}{y_{i(k-2)}},\text{for }k=6,7\\ \end{array}}\right.$ (4)

Variance of $\bar{y}_{VO}$ is given by

$\displaystyle\textit{Var}({\bar{y}_{VO}})=\left\{{\begin{array}[]{l}\frac{% \sigma_{(k-1:k)}^{2}}{k},\text{for }k=2,3,4\\ \frac{2\sigma_{(k-1:k)}^{2}+3\sigma_{(k-2:k)}^{2}}{k^{2}},\text{for }k=5\\ \frac{\sigma_{(k-2:k)}^{2}}{k},\text{for }k=6,7\\ \end{array}}\right.$ (5)

MSE of the estimator based on RSSVO is given by $\textit{MSE}=\textit{Var}({\bar{y}_{VO}})+B^{2}$ , where bias $B$ in the proposed estimator is $B=\mu-E({\bar{y}_{VO}})$ and for different values of $k$ , it is given by

$\displaystyle B=\left\{{\begin{array}[]{l}\mu-\mu_{(k-1:k)},\text{for }k=2,3,4% \\ \mu-\frac{2\mu_{(k-1:k)}+3\mu_{(k-2:k)}}{5},\text{for }k=5\\ \mu-\mu_{(k-2:k)},\text{for }k=6,7\\ \end{array}}\right.$ (6)

Now, the estimators based on RSS with equal allocation, RSSVO and RSS with Neyman’s optimum allocation are compared with the estimator based on SRS in terms of relative precisions (RPs) of the same sample size $n=mk$ as follows:

$\displaystyle RP_{1}=\frac{\textit{Var}({\bar{y}_{\textit{SRS}}})}{\textit{Var% }({\bar{y}_{\textit{eql}}})}=\frac{\sigma^{2}}{\overline{\sigma^{2}}},$ (7) $\displaystyle RP_{2}=\frac{\textit{Var}({\bar{y}_{\textit{SRS}}})}{\textit{MSE% }({\bar{y}_{VO}})},$ (8) $\displaystyle RP_{3}=\frac{\textit{Var}({\bar{y}_{\textit{SRS}}})}{\textit{Var% }({\bar{y}_{\textit{Ney}}})}=\frac{\sigma^{2}}{\bar{\sigma}^{2}},$ (9)

where $\overline{\sigma^{2}}=\frac{1}{k}\sum\limits_{i=1}^{k}{\sigma_{(i:k)}^{2}}$ is the average within-rank variance, $\bar{y}_{\textit{SRS}}$ is the unbiased estimators of population mean based on SRS. It is known that $\textit{Var}({\bar{y}_{\textit{Ney}}})<\textit{Var}({\bar{y}_{\textit{eql}}})<% \textit{Var}({\bar{y}_{\textit{SRS}}})$ (Bhoj & Chandra, 2019).

3. Comparison of estimators

In this Section, the performance of the estimators of population mean based on RSS with equal allocation, RSS with Neyman’s optimal allocation and RSSVO are compared with the estimator based on SRS by using RPs given in Eqs (7) to (9). For this purpose, the following four highly positive skew distributions are used.

3.1 Lognormal distribution

The probability density function (pdf) of the lognormal distribution $LN(a,b)$ with the location parameter $a$ and scale parameter $b$ is given by

$\displaystyle f(x)=\frac{1}{xb\sqrt{2\pi}}\exp\left[{\frac{-1}{2}\left({\frac{% \log x-a}{b}}\right)^{2}}\right],\text{for }x>0,a>0,b>0,$ (10)

with population mean $=\exp\left({a+\frac{b}{2}^{2}}\right)$ and population variance $=\exp({2a+2b^{2}})-\exp({2a+b^{2}})$ .

In this paper, we use the standard lognormal distribution $LN(0,1)$ for the computations.

3.2 Pareto distribution

The pdf of the Pareto population having shape and scale parameters $\nu$ and $a$ is given by $f(x)=\frac{\nu a^{\nu}}{x^{\nu+1}},x\geqslant a,a>0,\nu>0$ ; with population mean $\mu=\frac{a\nu}{\nu-1}$ and population variance $\sigma^{2}=\frac{a^{2}\nu}{({\nu-1})^{2}({\nu-2})}$ .

We have used ( $a=$ 1, $\nu=$ 3) and ( $a=$ 1, $\nu=$ 5), and are denoted by P (3) and P (5) respectively.

3.3 Gamma distribution

The pdf of standard gamma distribution ( $G(\alpha)$ ) with parameter $\alpha>0$ is

$\displaystyle f(x)=\frac{x^{\alpha-1}e^{-x}}{\Gamma(\alpha)},x\geqslant 0,% \alpha>0,$

with population mean $=$ population variance $=\alpha$ .

For the calculation of the RPs, the distribution G (0.5) is used.

The means and variances of order statistics for LN (0, 1), P (3) and P (5) are available in Harter and Balakrishnan (1996). However, for G (0.5), these are available in Breiter and Krishnaiah (1968). For these four distributions, the RPs are computed for set sizes $k=2,3,\ldots,7$ and are given in Table 1.

Table 1
The values of RPs, $RP_{1}$ , $RP_{2}$ and $RP_{3}$ for four distributions with $k=$ 2 (1) 7

Set size ( $k$ )		2	3	4	5	6	7
LN (0, 1)	$RP_{1}$	1.1872	1.3393	1.4711	1.5891	1.6971	1.7974
	$RP_{2}$	2.3236	3.4103	3.8797	4.5442	4.8240	6.9610
	$RP_{3}$	1.5765	2.1182	2.6219	3.1347	3.6193	4.0898
P (3)	$RP_{1}$	1.1364	1.2422	1.3305	1.4072	1.4755	1.5373
	$RP_{2}$	3.1250	4.3203	5.0546	5.5621	5.6014	8.6644
	$RP_{3}$	1.6000	2.1505	2.6441	3.1671	3.6790	4.1762
P (5)	$RP_{1}$	1.2277	1.4179	1.5861	1.7390	1.8797	2.0126
	$RP_{2}$	1.9295	2.9161	3.2311	4.0242	4.4660	5.6586
	$RP_{3}$	1.5439	2.0601	2.5847	3.0724	3.5473	4.0340
G (0.5)	$RP_{1}$	1.2448	1.4832	1.6963	1.9053	2.0708	2.2770
	$RP_{2}$	1.7300	2.3568	2.5505	3.1447	3.2130	4.2444
	$RP_{3}$	1.5060	2.1012	2.6368	3.1911	3.6671	4.2059

It is seen that $\bar{y}_{VO}$ is uniformly superior to the estimators based on SRS and RSS. The gains in RPs are substantial for all four distributions. It is also seen that the performance of $\bar{y}_{VO}$ is uniformly better than $\bar{y}_{\textit{Ney}}$ except for few values for G (0.5). This might be due to the fact that G (0.5) has the coefficient of skewness 2.8284 which is not that high. Further, the RPs $RP_{1}$ , $RP_{2}$ and $RP_{3}$ increase with $k$ , the sample size.

4. Pefromance of RSSVO with skewness

The performance of the three methods of RSS with equal allocation, RSSVO and RSS with Neyman’s optimum allocation are now considered with the increasing values of skewness of a family of distributions. For this purpose, we choose one skew distribution, $LN({0,b})$ . The pdf of $X\sim LN({a,b})$ is given by Eq. (10).

Then skewness ( $S k$ ) and shape parameter ( $p$ ) are given by

$\displaystyle Sk=\sqrt{\beta_{1}}=\sqrt{\exp({b^{2}})-1}({\exp({b^{2}})+2}),$ $\displaystyle p=\textit{Exp}({b^{2}}).$

The performance of these three methods relative to SRS with $k=$ 4 is presented in Table 2 for lognormal family of distributions for a range of values of population standard deviation. The variances of the order statistics of the family of distributions were computed by using the variances of order statistics for different values of shape parameter which are readily available in Balakrishnan and Chen (1999). From Table 2, we observe that as skewness increases the performance of (i) RSS method with equal allocation decreases, and (ii) RSS method with Neyman’s allocation and RSSVO increases. The rate of increase with $k$ for RSSVO is more than that of Neyman’s allocation.

Table 2
Values of $RP_{1}$ , $RP_{2}$ and $RP_{3}$ for Lognormal $LN({0,b})$ distributions for $k=$ 4

Shape parameter ( $p$ )	$b$	Sk $({\sqrt{\beta_{1}}})$	$RP_{1}$	$RP_{2}$	$RP_{3}$
1.8	0.7667	3.40	1.7022	2.9275	2.5200
1.9	0.8012	3.70	1.6651	3.0486	2.5353
2.0	0.8326	4.00	1.6322	3.1641	2.5499
2.1	0.8614	4.30	1.6026	3.2746	2.5639
2.2	0.8880	4.60	1.5760	3.3810	2.5773
2.3	0.9126	4.90	1.5518	3.4836	2.5902
2.4	0.9357	5.21	1.5298	3.5828	2.6026
2.5	0.9572	5.51	1.5097	3.6789	2.6145
2.6	0.9775	5.82	1.4912	3.7724	2.6260
2.7	0.9966	6.13	1.4741	3.8633	2.6370
2.8	1.0147	6.44	1.4583	3.9520	2.6478
2.9	1.0318	6.75	1.4436	4.0387	2.6581
3.0	1.0481	7.07	1.4299	4.1234	2.6682

5. Conclusions and discussion

In this paper, a new ranked set sampling with varied order statistics (RSSVO) for highly positive skew distributions is proposed. In this scheme, only one $({k-1})^{\text{th}}$ order statistic for each sample of set sizes $k=$ 2, 3, 4, and $({k-2})^{\text{th}}$ order statistic for each sample of set sizes $k=$ 6, 7, are selected for actual measurement. If the set size $k=$ 5, two units of $({k-1})^{\text{th}}$ order statistic from first two samples and three units of $({k-2})^{\text{th}}$ order statistic from last three samples has been chosen for actual measurements. In RSS, the number of observations in each sample is usually small to minimize the ranking errors. In that case, for $k\leqslant 4$ , the proposed scheme becomes very simple.

The proposed method RSSVO is used for estimating the population mean of four highly positive skew distributions. The proposed estimator is biased. However, it has the advantage in terms of gain in relative precisions and its simplicity in real applications especially for $k<5$ . Three schemes, RSS with equal allocation, RSSVO and RSS with Neyman’s allocation are compared in terms of relative precisions. From Table 1, it is seen that the estimator of the population mean based on RSSVO is uniformly better than SRS and RSS with equal allocation and is also better than RSS with Neyman’s method except for few set sizes of G (0.5) with substantial gains in the relative precisions. Further, all the relative precisions have the desirable property that they increase with the sample size $k$ . Further, from Table 2, it is seen that as skewness increases the performance of RSSVO is better when compared to Neyman’s optimal allocation method.

Therefore, the proposed method based on varied order statistic is recommended for estimating the population mean for highly positive skew distributions and small set sizes.

References

Balakrishnan

, & Chen

W.S.

(1999). Handbook of Tables of order statistics from lognormal distribution with applications, Springer, U.S.A.

Bhoj

D.S.

(1997a). New parametric ranked set sampling. Journal of Applied Statistical Sciences, 6, 275-289.

Bhoj

D.S.

(1997b). Estimation of parameters of extreme value distributions with ranked set sampling. Communication of Statistics: Theory and Methods, 26(3), 653-667.

Bhoj

D.S.

, & Ahsanullah

(1996). Estimation of the parameters of the generalized geometric distribution using ranked set sampling. Biometrics, 52, 685-694.

Bhoj

D.S.

, & Chandra

(2019). Simple unequal allocation procedure for ranked set sampling with skew distributions. Journal of Modern Applied Statistical Methods, 18(2), eP2811. doi: 10.22237/jmasm/1604189700.

Bhoj

D.S.

, & Kushary

(2016). Ranked set sampling with unequal samples for skew distributions. Journal of Statistical Computations and Simulations, 86(4), 676-681.

Breiter

M.C.

, & Krishnaiah

P.R.

(1968). Tables for the moments of gamma order statistics, Sankhya, Series B 30, 59-72.

Chandra

Bhoj

D.S.

, & Pandey

(2018). Simple unbalanced ranked set sampling for mean estimation of response variable ofdevelopmental programs. Journal of Modern Applied Statistical Methods, 17(1-1). doi: 10.22237/jmasm/1543856083.

Dell

T.R.

, & Clutter

J.L.

(1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545-555.

10.

Harter

H.L.

, & Balakrishnan

(1996). CRC handbook of tables for the use of order statistics in estimation, CRC Press, Boca Raton, New York.

11.

Kaur

Patil

G.P.

, & Taillie

(1997). Unequal allocation models for ranked set sampling with skew distributions. Biometrics, 53, 123-130.

12.

Lam

Sinha

B.K.

, & Wu

(1996). Estimation of location and scale parameters of a logistic distribution using a ranked set sample. In: Nagaraja H.N., Sen P.K., Morrison D.F. (eds) Statistical Theory and Applications. Springer, New York, NY.

13.

Lam

Sinha

B.K.

, & Wu

(1994). Estimation of parameters of the two-parameter exponential distribution using ranked set sample. Annals of the Institute Statistical Mathematics, 46, 723-736.

14.

McIntyre

G.A.

(1952). A method for unbiased selective sampling using ranked sets. Australian Journal of Agricultural Research, 3, 385-390.

15.

Stokes

S.L.

(1995). Parametric ranked set sampling. Annals of the Institute of Statistical Mathematics, 47, 465-482.

16.

Takahasi

, & Wakimoto

(1968). On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics, 20, 1-31.

17.

Tiwari

, & Chandra

(2011). A systematic procedure for unequal allocation for skewed Distributions in Ranked Set Sampling. Journal of the Society of Agricultural Statistics, 65(3), 331-338.

Ranked set sampling with varied order statistics for skew distributions

Abstract

Keywords

1. Introduction

3.1 Lognormal distribution

3.3 Gamma distribution

Table 1 The values of RPs, R ⁢ P 1 , R ⁢ P 2 and R ⁢ P 3 for four distributions with k = 2 (1) 7

Table 2 Values of R ⁢ P 1 , R ⁢ P 2 and R ⁢ P 3 for Lognormal L ⁢ N ⁢ ( 0 , b ) distributions for k = 4

References

Table 1
The values of RPs, $RP_{1}$ , $RP_{2}$ and $RP_{3}$ for four distributions with $k=$ 2 (1) 7

Table 2
Values of $RP_{1}$ , $RP_{2}$ and $RP_{3}$ for Lognormal $LN({0,b})$ distributions for $k=$ 4