The estimation of finite population mean at current occasion in two occasion successive sampling in presence of non-response is investigated using tuned jackknife estimators. Based on the availability of auxiliary information at population level (Info U) and sample level (Info s) and using tuned jackknife technique, estimators have been proposed. Estimator of variance of proposed estimators have also been discussed. Different cases of occurance of non-response have been explored. The estimators are mutually compared. The properties of these estimators are studied via simulation study using natural population.
The phenomenon of sampling the same population repeatedly and observing the same study variable each time can be acomplished by a well known survey sampling methodology termed as successive sampling. This technique is capable of providing estimates, which are reliable, precise and cost effective. In addition to this it can estimate the level, change in the level over time as well as the direction of this change.
For example: In these challenging time of global pendamic Covid-19, government agencies may be interested in estimating the status of inflation or GDP at different point of time and pattern of change in inflation or GDP over the period of time. An enviornment scientist may be interested in estimating the Air Quality Index (AQI) at different point of time and change in level of AQI over the period of time etc.
The idea of successive sampling was initiated by Jessen (1942), then the idea was explored extensively and further discussed by Patterson (1950), Narain (1953), Tikkiwal (1953, 1964), Eckler (1955), Sen et al. (1973), Scott and Smith (1974), Okafor (1992), Artes et al. (1998), Singh and Priyanka (2006, 2010), Priyanka and Mittal (2015, 2016, 2017a) and many others.
All the above quoted literatures are based on traditional techniques. However, Rueda et al. (2009) and Priyanka et al. (2019) used calibration estimator for population mean in successive sampling.
The major problem faced by survey statistician is non-response leading to incompleteness in data. The scenario of incompleteness becomes worse when one is interested in collecting data for more than one occasion because, even though one have a complete sample frame but may fail to obtain response in one way or the other. For example, in a survey related to different mines one may be interested in the mean yield from the mine. Now, it might be possible that the data on mean yield cannot be recorded since it may have been mined completely, it may have been shut down due to government issues or natural calamity ruined the entire mine.
Hence, one will have to proceed with incomplete data and device methodologies to adjust this incompleteness so that estimates may not be affected much. There are two kinds of non-response, unit non-response and item non-response. Therefore, before proceeding with the technique the kind of non-response must be identified and suitable methodology for treatment of non-response may be deviced. Immence effort have been put together by Rubin (1976) to deal with non-response. Generally subsampling of non-respondents or imputation techniques are used to propose estimators in presence of non-response. However, the recent technique namely calibration, jackknifing or tuned jackknifing etc. have not been investigated to deal with non-response in successive sampling.
Recently, Singh et al. (2016) investigated a new concept of tuning design weights using jackknife estimators.
The aim of the present work is to explore and apply the concept of newly tuned jackknife estimator in successive sampling frame work in presence of unit non-response using additional auxiliary information. The additional auxiliary information is observed to be an important factor for adjusting the design weights to deal with non-response.
In successive sampling it is common practice to use, the knowledge of study variable from previous occasion as auxiliary variable at current occasion in addition to this the availability of an additional auxiliary variable has been considered at two different levels namely at population level (Info U) and at sample level (Info s) (see Lundstorm and Särndal (1999)).
Tuned jackknife weights have been used to propose estimators for matched and unmatched portion of the sample in presence of unit non-response, considering both Info U and Info s respectively for additional auxiliary information to estimate population mean at current occasion in two occasion successive sampling. Detail properties of proposed estimators have been discussed. The newly tuned estimators under Info U and Info s are mutually compared. The various possibilities of occurance of non-response on two occasions have been explored. They are also compared with similar estimator for complete response case. Simulation studies have been carried out using natural population to judge the behaviour of considered estimators in terms of absolute relative bias and percent relative efficiencies.
Notations and sampling design
Let be the -element finite population, which has been sampled over two successive occasions. Let denote the study variable at first (second) occasion. It is assumed that there is non-response at both occasions. Aim is to estimate the population mean of the study variable at current occasion in presence of non-response at both occasions.
For two occasion successive sampling set-up let denote a simple random sample (without replacement) of size from the population at first occasion. Let of size be the response set obtained from . Further let be a simple random sub sample of size drawn from (the set of responding units at first occasion) to be used at current (second) occasion. A fresh simple random sample (without replacement) denoted by of size is drawn a current (second) occasion from the non sampled units of the population. Let of size be the response set obtained from such that . The value at first (second) occasion are observed as per below mentioned Table 1.
Response set and observed values on two occasions
Occasion
Sample
Response set
Observed value for study variable
In order to deal with non-response on two occasions, an auxiliary variable has been assumed to be available at both occasions and is stable over time. The auxiliary variable may be considered in two ways, depending on the availability of information. It can be at the population level (Info U) or at the sample level (Info s). Both possibilities has been explored.
Successive tuned jackknife estimator
The traditional concept of calibration have been modified by replacing the calibrated weights by the newly tuned jackknife weights using the concept of jackknife sample mean. Both possibilities of Info U as well as Info s of the auxiliary variable have been used and the newly tuned jackknife estimators have been proposed based on sample of size and respectively under non-response.
Tuned jackknife estimator based on fresh sample
Following Singh et al. (2016), a newly tuned jackknife estimator based on fresh sample of size considering non-response at both occasions and availing Info U for auxiliary information is given as
where
is the jackknife sample mean of the study variable which is obtained by removing unit from the sample and
is the tunned jackknife weight of the calibrated weight .
The tuned jackknife weight are such that the following two tuning constraints are satisfied
and
where
is the jackknife sample mean of auxiliary variable obtained by removing the unit from the sample .
Remark 1 In case of non-availability of Info U for auxiliary variable , Info s can be utilized. In this case the constraint in Eq. (5) should be modified to
and hence, the estimator may be modified to .
Tuning with Chi-square type distance function for fresh sample
Following Deville and Särndal (1992) and Singh et al. (2016), in order to obtain the tuned weight the following modified chi-square type distance function is minimized.
where are arbitrary chosen weights.
The above chi-square type function is minimized subject to tuning constraints in Eqs (4) and (5) by the method of Lagrange multiplier. Let the Lagrangian function be denoted by and defined as
where and are the Lagrange multiplier constants.
Putting , we get
Now, using Eq. (10) in constraints Eqs (4) and (5), the following set of normal equations have been obtained to find the optimum value of and respectively:
Now, substituting the optimum value of and in Eq. (10) and finally using it in Eq. (1), we get the newly tuned jackknife estimator under non-response for fresh sample as
Simplifying the above expression, is obtained as:
where
Remark 2 If the constraint Eq. (7) would have been used in place of constraint Eq. (5), then similarly for Info s case, the tuned jackknife estimator for fresh sample can be obtained as
Tuned jackknife estimator based on matched sample
In sampling on successive occasions, there is a practice to utilize the information obtained from previous occasion as auxiliary information in addition to the availability of additional auxiliary information either in the form of Info U or Info s.
Hence, using Info U and based on sample of size , the tuned estimator is proposed as
where,
is the jackknife sample mean based on matched portion of the sample.
Now, to obtain the tuned jackknife weight we minimize the chi-square type function
subject to newly tuned jackknife constraints:
and
and
where are arbitrary chosen weights and is the tuned jackknife estimator of study variable at first occasion, which for Info U is obtained as
where
where are arbitrary chosen weights.
Now, minimizing the chi-square type function in Eq. (18) subject to three constraints Eqs (19)–(21), the tuned weight is obtained as
The optimum values of can be obtained from the following set of normal equations
Substituting the optimum values of from Eqs (25) in to Eqs (24), and finally substituting the value of in Eq. (16), we get the tuned jackknife estimator based on sample of size at current occasion as
where
Remark 3 Similarly if Info s has been used instead of Info U for additional auxiliary information on both occasions, then the chi-square type function in Eq. (18) need to be minimized with respect to the tuning constraint given in Eq. (19) and modified constraints for Info s which are given as
and
where
Hence, proceeding similarly as that of Info U, the estimator based on matched portion of sample for Info s becomes
Composite successive tuned jackknife estimators
Taking the convex linear combination of the proposed tuned jackknife estimators and for Info U and and for Info s under non-response on both occasions, final tuned jackknife estimators are proposed as
and
where and are given in Eqs (13), (26), (15) and (32) respectively and are scalar quantities to be chosen suitably.
Estimation of variance of successive tuned estimator
The variance of the estimator is given as
Since, and are based on two non-overlapping samples of sizes and respectively. So, . Therefore, Eq. (35) becomes
It is to be noted that is a function of . Hence, optimising it with respect to the optimum value of is obtained as
Finally substituting the optimum value of from Eq. (37) in to Eq. (36), the optimum variance of the tuned jackknife estimator is obtained as
Now, following Singh et al. (2016), estimator of variance of estimators and are given as
where each newly tuned doubly jackknife estimator of the population mean is given by
Similarly, the estimator of variance of is defined as
where
Remark 4 As sample size increases the sample mean converges to population mean. Therefore, the proposed jackknife estimator of variance of converges to the usual jackknife estimator of variance of study variable as
Remark 5 Following similar procedures the estimator of variance of the estimator for Info s can be obtained.
Special Cases
Case-I: When there is non-response only at first (previous) occasion
In this situation the proposed estimator of the population mean changes to
where the estimator can be obtained by replacing by in Eq. (13) and other equations that depends on it and is defined in Eq. (26).
Case-II: When there is non-response only at second (current) occasion
In the presence of non-response only at second (current) occasion, the estimator of the population mean changes to
where the estimator can be obtained by replacing by in Eq. (26) and other equations that depends on it and is defined in Eq. (13).
Case-III: When there is no non-response at any occasion
In this situation the proposed estimator changes to
The estimator and can be obtained by replacing by in Eq. (13) and by in Eq. (26) and subsequent equations depending on them respectively.
Simulation study
A simulation study has been performed to reveal the behaviour of the proposed newly tuned jackknife estimators using natural population. The aim is to point out that out of the many estimators proposed under non-response which is the most efficient. The natural population is discussed in the following Table 2.
Description of natural population
Population
Description
Variables
AQI
Air Quality Index (AQI)
: AQI on June 4, 2020
()
at different spots in Delhi, India
: AQI on May 4, 2020
[Source: www.aqi.in]
: AQI on May 1, 2020
The behaviour of proposed newly tuned estimators and have been studied for and different choices for rate of non-response at both occasions.
For simulation independent samples have been generated under two occasions. All samples are obtained under simple random sampling without replacement. An environment have been created through simulation process, where non-response could happen. Different choices viz. 10%, 20% & 30% non-response have been assumed on first as well as second (current) occasion and all estimators are studied for all considered choices of non-response rates.
The entire simulation has been replicated for different values of which are considered under different sets as:
The performance of the estimators have been evaluated in terms of absolute relative bias and percent relative efficiency as:
where,
Absolute relative bias for the estimators and respectively for different choices of non-response rate.
Percent relative efficiency of estimator with respect to for different choices of non-response rates.
Similarly, is defined. Here the index ‘’ represents the simulation run. The random generations, caluculations for all estimators have been obtained using MATLAB software.
The simulation results have been represented in various graphs (Figs 1 and 2) for varying where, .
Comparison of tuned jackknife estimators at optimum conditions
The estimators and depends on constants and respectively. Hence, substituting the estimated values of and from Eqs (39) and (41) respectively into Eqs (37) and (38) respectively, the estimated optimum values of and is obtained, which are given as
Similarly, for , the optimum values can be obtained as
Now, using the AQI population given in Section 6, the above values have been computed for different sets and different choices of non-response rate. The results are tablulated in Table 3.
Comparison of and at optimum conditions for different sets and different choices of non-response (NR)
Set
NR
I
10%
0.7339
0.5652
20%
0.9117
0.6084
30%
0.6179
0.3982
II
10%
0.2062
0.1247
20%
0.5270
0.4329
30%
0.9837
0.9648
III
10%
0.6090
0.2384
20%
0.6855
0.1852
30%
0.7855
0.6698
Comparison of different cases of non-response with respect to complete response
In this section the simulated percent relative efficiency of the estimator (no non-response) with respect to the estimators, (non-response at both occasions), (non-response only at previous occasion) and (non-response only at current occasion) have been compared considering natural population considered in Section 6. The simulated percent relative efficiency is denoted and defined as
where,
The results obtained are presented in following Table 4 for different choices of and varying non-response rates.
Percent relative efficiency of complete response with respect to different cases of non-response (NR) for sets I, II and III
NR
10%
20%
30%
0.1
I
108.22
103.76
104.75
125.59
109.84
113.30
126.62
107.22
119.23
II
112.24
104.49
108.05
114.63
101.54
116.35
125.39
105.14
119.26
III
112.39
100.47
115.65
125.46
104.93
119.17
134.56
100.12
137.08
0.2
I
153.59
105.35
150.56
201.71
105.03
203.67
243.69
107.34
236.41
II
142.67
101.54
141.12
180.34
100.60
110.35
227.71
102.72
224.28
III
188.07
100.11
191.32
243.57
101.06
240.65
349.69
104.39
342.86
0.3
I
233.69
102.72
232.14
374.01
105.13
361.99
466.55
107.48
460.83
II
212.95
100.71
213.04
316.50
101.94
319.56
481.49
108.42
472.08
III
397.12
100.62
404.42
507.85
102.97
500.63
823.32
109.75
817.51
0.4
I
370.02
102.68
368.27
615.18
103.07
612.91
813.93
107.12
809.59
II
311.00
101.43
305.72
628.56
106.86
617.09
832.07
102.99
826.16
III
746.12
103.75
748.07
1071.90
101.79
1054.30
1720.80
103.65
1730.60
0.5
I
450.89
100.32
447.07
861.39
104.72
854.21
1262.10
106.30
1267.50
II
432.51
101.74
435.35
807.33
101.39
813.51
1372.00
104.28
1373.90
III
1257.20
108.65
1248.50
1689.60
100.19
1691.80
3252.70
107.41
3245.10
0.6
I
565.95
101.79
566.24
1084.20
101.39
1079.20
1503.50
102.91
1501.00
II
604.36
100.51
599.06
1191.50
105.73
1179.10
1372.00
104.28
1373.90
III
1858.40
102.42
1848.70
2852.70
103.71
2835.00
4758.90
107.00
4759.00
0.7
I
659.52
100.63
660.71
1038.90
101.59
1039.10
1719.20
103.26
1723.40
II
779.78
103.54
778.74
1445.60
102.33
1442.80
2089.80
102.31
2084.80
III
2255.20
101.24
2262.50
3331.60
100.17
3352.50
5307.50
105.77
5312.70
0.8
I
609.49
100.11
607.30
1289.80
100.91
1289.20
1778.40
100.27
1780.90
II
835.82
100.64
834.94
1495.10
101.98
1494.60
2361.00
101.08
2362.00
III
2369.60
101.56
2376.90
3928.30
102.23
3923.60
5657.60
100.75
5665.50
0.9
I
706.60
100.03
706.42
1347.50
100.26
1346.70
1851.00
100.62
1850.70
II
823.99
100.90
824.86
1639.90
100.44
1641.40
2512.10
100.31
2510.60
III
2057.60
100.47
2550.70
3837.30
101.94
3838.50
6115.80
100.60
6115.20
Discussion of results
Figures 1 and 2 shows the Absolute relative bias (ARB) and the Percent relative efficiency (PRE) respectively, obtained in the simulation study for several values of and (termed as sets) and different choices of non-response (NR) rates in AQI population.
Some noteworthy findings from Figs 1 and 2 are as follows:
It is observed that in general, newly tuned jackknife successive estimators in presence of non-response have a small bias and all are within a reasonable range. As far as PRE is concerned the tuned jackknife estimator is better than .
The result confirm the good behaviour in terms of PRE of the newly tuned jackknife successive estimator (Info U) as compared to (Info s) for all sample sizes considered at two occasions and for all non-response rates considered.
It is further observed that for fixed set and fixed non-response rate, as increases the bias of the estimators decreases. However, PRE first decreases and afterwards increases again then finally decreases as increases.
Lower value of indicates more weight is on the estimator for matched portion of the sample and as it is obtained from the units who responded on first occasion. This may be the reason of decrease in bias for increasing .
For fixed and for fixed and as the non-response rate increases PRE(1) also increase.
From Table 3, it is observed that the optimum value of and exists for the estimators and respectively for all sets considered. Further it is vindicated that for all sets and all considered non-response rates. This shows that is better than at their respective optimum condition.
From simulation results in Table 4, it is observed that and this shows that the complete response case is always better than non-response cases.
As non-response increases, the value of and 3 also increases for fixed .
In general, it is observed that this shows that when non-response is either at both occasions or only at current occasion, then those cases are not better than, the case when non-response is only at first (previous) occasion. This may be due to retaintion of matched sample from the responding units only at first occasion.
For fixed non-response and fixed set as increases the value of and increases but is almost stable.
Concluding remarks
The new concept of tuned jackknife technique is feasible to be applied in successive sampling and is proved to be an useful tool to deal with non-response occuring on successive occasions. The performance of estimator under Info U is better than that under Info s which is in accordance with the theory that as the information on auxiliary variable increases the performance of estimator enhances. Tuned estimators are computer friendly too, which is an added benifit in today’s scenario of digital revolution. Therefore, the methodology proposed may be recommended as useful alternative with a variety of desirable properties to be used by practitioners in this field.
Footnotes
Acknowledgments
Author is thankful to the reviewer for deeply reading the paper and providing constructive suggestions that lead to improvement over the earlier version of the paper. The author is also thankful to SERB, New Delhi, India for providing financial assistance to carry out the present work and sincerely acknowledge the free access to data at www.aqi.in and https://app.cpcbccr.com/AQI_India.
References
1.
ArtesE.RuedaM., & ArcosA. (1998). Successive sampling using a product estimate. Applied Sciences and the Enviornment, Computational Mechanics Publications, 23, 85-90.
2.
DevilleJ. C., & SärndalC. E. (1992). Calibration estimators in survey sampling. J Amer Statist Assoc, 87, 376-382.
3.
EcklerA. R. (1955). Rotation sampling. Ann Math Statist, 26(4), 664-685.
4.
JessenR. J. (1942). Statistical investigation of a sample survey for obtaining farm facts. Iowa Agri Exp Stat Res Bull, 304, 1-104.
5.
LundstromS., & SärndalC. E. (1999). Calibration as standard method for treatment of non-response. Journal of official Statistics, 15, 305-327.
6.
NarainR. D. (1953). On the recurrence formula in sampling on successive occasions. J Indi Soci Agri Statis, 5, 96-99.
7.
OkaforF. C. (1992). The theory and application of sampling over two occasions for the estimation of current population ratio. Statistica, 1, 137-147.
8.
PattersonH. D. (1950). Sampling on successive occasions with partial replacement of units. J Royal Stat Soci, 12, 241-255.
9.
PriyankaK., & MittalR. (2015). A class of estimation for population median in two occasion rotation sampling. HJMS, 44(1), 189-202.
10.
PriyankaK., & MittalR. (2016). Searching effective rotation patterns for population median using exponential type estimators in two occasion rotation sampling. Communications in Statistics-Theory and Methods, 45(18), 5443-5460.
11.
PriyankaK., & MittalR. (2017). New approach using exponential type estimator with cost modelling for population mean on successive waves. Statistics in Transition-new Series, 18(4), 569-587.
12.
PriyankaK.KumarA., & TrisandhyaP. (2019). Calibration estimators for quantitative sensitive mean estimation under successive sampling. Comm in Stat-Theory and Meth, doi: 10.1080/03610926.2019.1649430.
13.
RaoJ. N. K., & GrahamJ. E. (1964). Rotation design for sampling on repeated occasions. Journal of American Statistical Association, 59, 492-509.
14.
RubinD. B. (1976). Inference and missing data. Biometrika, 63, 581-592.
15.
RuedaM.MartínezS.ArcosA., & MuñozJ. F. (2009). Mean estimation under successive sampling with calibration estimators. Comm Stat-Theo and Meth, 38, 808-827.
16.
SenA. R.SellerS., & SmithD. N. (1973). The use of ratio estimate in successive sampling. Biometrica, 31, 673-683.
17.
ScottA. J., & SmithT. M. F. (1974). Analysis of repeated surveys using time series methods. Journal of the American Statistical Association, 69, 674-678.
18.
SinghG. N., & PriyankaK. (2006). On the use of chain-type ratio to difference estimator in successive sampling. International Journal of Applied Mathematics and Statistics, 5(SO6), 41-49.
19.
SinghG. N., & PriyankaK. (2010). Estimation of population mean at current occasion in presence of several varying auxiliary variates in two occasion successive sampling. Statistics in Transition-new Series, 11(1), 105-126.
20.
SinghS.SedoryS. A.RuedaM. del M.ArcosA., & ArnabR. (2016). A new concept for tuning design weights in survey sampling. Elsevier Publication.
21.
TikkiwalB. D. (1953). Optimum allocation in successive sampling. J Indian Soc Agri Stat, 5, 100-102.
22.
TikkiwalB. D. (1964). A note on two stage sampling on successive occasions. Sankhya, 26(A), 97-100.