Log-Ratio Estimation in Survey Sampling: A Comparative Study Using Agricultural and Cancer Data with Simulations

Abstract

The estimation of finite population characteristics, particularly the mean, is critical in survey sampling, as it has a direct impact on the validity of sample data results. As the number and complexity of available data increases, so does the demand for more robust and precise estimators. Traditional estimators frequently fail to take full advantage of auxiliary information, which can be critical for enhancing estimating accuracy. This paper presents a novel log-ratio estimator that incorporates auxiliary information in a non-linear manner using logarithmic transformations of the study and auxiliary variables. The log-ratio estimator's performance is assessed against 9 different classical and modern estimators using two essential metrics: mean squared error (MSE) and percentage relative efficiency. Our empirical analysis includes a variety of applications, such as estimating the area under wheat based on cultivated area, estimating peppermint oil production based on field area, and three real-world datasets: breast cancer deaths, cancer deaths by gender (male and female), and brain tumor survival rates. To supplement these applications, a simulation study with a population size of 100,000 and a sample size of 1,000 was run up to 100,000 times to assess the estimators’ resilience and stability. The empirical and simulation results consistently reveal that the log-ratio estimator produces lower MSE and higher PRE values across all datasets, exhibiting greater accuracy and efficiency over traditional estimators. These findings suggest that the log-ratio estimator gives more trustworthy and efficient estimates, particularly when dealing with complicated data. As such, this estimator is a promising tool for future survey sampling, helping to enhance estimating methodologies capable of dealing with the challenges provided by modern, large-scale datasets.

Keywords

log-ratio estimator mean squared error percentage relative efficiency simple random sampling

1 Introduction

The foundation for drawing inferences about a whole population from data from a smaller sample is provided by sampling theory, which is a basic component of statistical inference. Sampling is a vital technique in research across many domains because, in practice, gathering data from an entire population is either impractical or prohibitively expensive. The study of sampling theory focuses on the design of estimators, sample selection, and the statistical inferences that can be made from them. Using data gathered from a population subset, the objective is to draw trustworthy and accurate conclusions regarding population parameters. Effective estimators are essential to this procedure. Since these estimators assess the quality and dependability of inferences made from sample data, which directly impacts decision-making processes, their development is essential. The ratio estimator is one of the most effective estimators in survey sampling, especially when the correlation between the study and auxiliary variable is positive and high.

A ratio estimator was initially developed by Cochran (1940) for survey sampling to estimate the population mean of a research variable using information from a related variable. Sisodia and Dwivedi (1981) suggested a modified ratio estimator for cases where the population coefficient of variation is known. An estimate developed by Prasad (1989) provides a class of ratio-type estimators for population means in finite population sample surveys, using Simple Random Sampling (without replacement), when data on an auxiliary variable positively linked to the study variable are available. A modified ratio-type estimator that takes into account both kurtosis and the coefficient of variation is created in Upadhyaya and Singh (1999). The traditional ratio estimator is the most effective of all the estimators covered in this study, according to a study by Kadilar and Cingi (2003) that employed ratio estimators to examine apple productivity and the number of apple trees in Turkish villages. In Singh (2003), several ideas and methods for creating an effective estimator were discussed. Triveni and Danish (2023) created a ratio-type estimator to estimate the population average of a research variable using information from a correlational variable. To address the challenge of estimating the population mean of the study variate, Singh and Tailor (2005) employ a ratio estimator, utilising data on the population mean and coefficient of variation of an auxiliary character. A ratio estimator based on a known correlation value was presented by Singh and Tailor (2003). Kadilar and Cingi (2004). proposed an efficient ratio estimator in simple random sampling. Singh et al. (2004) provide an improved estimator of the population mean using a power transformation. For the population mean, Yan and Tian (2010), Singh and Rani (2006), and Sing and Kumar (2011) offered a couple of ratio-type estimators depending on the auxiliary variable's known skewness. Subramani and Kumarapandiyan (2012a, 2012b) proposed two modified ratio estimators for calculating the population mean of the research variable by linearly combining known population median values and the coefficient of variation of the auxiliary variable. A group of estimators developed by Koyuncu and Kadilar (2009) takes advantage of auxiliary data in SRS design. In Khan et al. (2015), the limited population means under maximum and minimum values, dependent on knowledge of auxiliary variables, are evaluated using estimators of the ratio type estimator. To estimate the population average of the research variables, Triveni and Danish (2024a, 2024b). developed a better separate ratio estimator by using the highly related variable. Audu et al. (2021) presented three difference-cum-ratio estimators that use known population mean, variance, and auxiliary variable to estimate the limited population coefficient of variation of a study variable. For estimating population ratios using simple random sampling without replacement (SRSWOR), three ratio estimators were proposed in Ounrittichai et al. (2024). Kadilar and Cingi (2007) developed a modified ratio estimate based on the correlation coefficient and taking into account the estimators from Upadhyaya and Singh (1999). Using certain robust methods to get the most out of the auxiliary variable, Zaman et al. (2022) suggested a ratio estimator that estimates the population mean in simple random sampling. According to Yadav et al. (2024), utilising known information on an auxiliary variable can improve the estimation of population means. By including bivariate auxiliary data, Triveni and Danish (2024a, 2024b) proposed a novel combination ratio type estimator for estimating the population mean of the research variable. Lakshmi et al. (2025) proposed a log and power transformation for ratio estimation and applied it to the solar radiation dataset. Singh et al. (2025) provide three exponential estimators for handling fuzzy data. Javed et al. (2025) proposed a different type of estimator using the exponential function approach to forecast the population mean.

While there are several estimators for survey sampling, many of them still rely on simplifying assumptions that do not completely account for the complexity of modern data. Traditional linear models frequently fail to account for non-linear interactions between study and auxiliary variables, which limits their applicability in real-world settings. Furthermore, most existing research focuses on a single estimator or fails to compare multiple approaches across various data sources. This study seeks to fill these shortcomings by introducing a log-ratio estimator in the framework of simple random sampling (SRS) that includes auxiliary information more freely and handles complex data structures more successfully. By comparing this suggested estimator against 9 alternatives, we gain a more comprehensive understanding of its performance across various datasets and scenarios, confirming its efficiency and accuracy in estimating population parameters. The findings shed light on the limitations of standard estimators and the advantages of more robust, nonlinear techniques.

1.1 The Terms and Notations

Let N signify the population size and n the sample size. In this study, we consider the study variable Y and the auxiliary variable X. The population mean for the variables X and Y is determined as $\bar{X} = \frac{1}{N} \sum_{i =}^{N} X_{i}$ and $\bar{Y} = \frac{1}{N} \sum_{i =}^{N} Y_{i}$ respectively. The population variance for the variables Y and X is given by $S_{y}^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} (Y_{i} - \bar{Y})^{2}$ and $S_{x}^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} (X_{i} - \bar{X})^{2}$ . The covariance between X and Y is defined as $S_{x y} = \frac{1}{N - 1} \sum_{i = 1}^{N} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})$ and also the coefficient of variation for the variables X and Y given by $C_{x} = \frac{S_{x}}{\bar{X}}$ and $C_{y} = \frac{S_{y}}{\bar{Y}}$ . The population's skewness and kurtosis coefficients are represented by $β_{1 (x)}$ and $β_{2 (x)}$ respectively. Here $= \frac{1 - f}{n}$ , where f is the population correction factor.

2 Literature Review

The combined ratio estimator by Cochran (1940) is defined as

\begin{aligned} {\bar{y}}_{1} = \frac{\bar{y}}{\bar{x}} \bar{X} \end{aligned}

(1)

The MSE of (1) is

\begin{aligned} M S E ({\bar{y}}_{1}) = (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + C_{x}^{2} - 2 ρ_{x y} C_{x} C_{y}] \end{aligned}

(2)

also, the bias of (1) is

\begin{aligned} B ({\bar{y}}_{1}) = (\frac{1 - f}{n}) \bar{Y} [C_{x}^{2} - ρ_{x y} C_{x} C_{y}] \end{aligned}

(3)

When the population coefficient of variation $c_{x}$ is known, a modified ratio estimator, Sisodia and Dwivedi (1981) suggested for $\bar{y} a s$

\begin{aligned} {\bar{y}}_{2} = \bar{y} \frac{\bar{X} + C_{x}}{\bar{x} + C_{x}} \end{aligned}

(4)

The MSE of (4) is

\begin{aligned} M S E ({\bar{y}}_{2}) = (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + {φ_{1}}^{2} C_{x}^{2} - 2 φ_{1} ρ_{x y} C_{x} C_{y}] \end{aligned}

(5)

Also, the bias of (4) is

\begin{aligned} B ({\bar{y}}_{2}) = (\frac{1 - f}{n}) \bar{Y} [{φ_{1}}^{2} C_{x}^{2} - φ_{1} ρ_{x y} C_{x} C_{y}] \end{aligned}

(6)

here

φ_{1} = \frac{\bar{X}}{\bar{X} + C_{x}}

;

A modified ratio estimator was proposed by Singh and Tailor (2003) as

\begin{aligned} {\bar{y}}_{3} = \bar{y} \frac{\bar{X} + ρ}{\bar{x} + ρ} \end{aligned}

(7)

The MSE of (7) is

\begin{aligned} M S E ({\bar{y}}_{3}) = (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + {φ_{2}}^{2} C_{x}^{2} - 2 φ_{2} ρ_{x y} C_{x} C_{y}] \end{aligned}

(8)

Also, the bias of (7) is

\begin{aligned} B ({\bar{y}}_{3}) = (\frac{1 - f}{n}) \bar{Y} [{φ_{2}}^{2} C_{x}^{2} - φ_{2} ρ_{x y} C_{x} C_{y}] \end{aligned}

(9)

where

φ_{2} = \frac{\bar{X}}{\bar{X} + ρ}

A ratio-type estimator was suggested by Yan and Tian (2010) as

\begin{aligned} {\bar{y}}_{4} = \bar{y} \frac{\bar{X} + β_{1 (x)}}{\bar{x} + β_{1 (x)}} \end{aligned}

(10)

The MSE of (10) is

\begin{aligned} M S E ({\bar{y}}_{4}) = (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + {φ_{3}}^{2} C_{x}^{2} - 2 φ_{3} ρ_{x y} C_{x} C_{y}] \end{aligned}

(11)

Also, the bias of (10) is

\begin{aligned} B ({\bar{y}}_{4}) = (\frac{1 - f}{n}) \bar{Y} [{φ_{3}}^{2} C_{x}^{2} - φ_{3} ρ_{x y} C_{x} C_{y}] \end{aligned}

(12)

where

φ_{3} = \frac{\bar{X}}{\bar{X} + β_{1 (x)}}

The coefficient of variation and skewness coefficient on the auxiliary variable were used by Yan and Tian (2010) to suggest a ratio-type estimator as

\begin{aligned} {\bar{y}}_{5} = \bar{y} \frac{\bar{X} C_{x} + β_{1 (x)}}{\bar{x} C_{x} + β_{1 (x)}} \end{aligned}

(13)

The MSE of (13) is

\begin{aligned} M S E ({\bar{y}}_{5}) = (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + {φ_{4}}^{2} C_{x}^{2} - 2 φ_{4} ρ_{x y} C_{x} C_{y}] \end{aligned}

(14)

Also, the bias of (13) is

\begin{aligned} B ({\bar{y}}_{5}) = (\frac{1 - f}{n}) \bar{Y} [{φ_{4}}^{2} C_{x}^{2} - φ_{4} ρ_{x y} C_{x} C_{y}] \end{aligned}

(15)

Where $φ_{4} = \frac{\bar{X} C_{x}}{\bar{X} C_{x} + β_{1 (x)}}$ ;

The modified ratio estimator proposed by Subramani and Kumarapandiyan (2012a) is given by

\begin{aligned} {\bar{y}}_{6} = \bar{y} \frac{\bar{X} + M_{d}}{\bar{x} + M_{d}} \end{aligned}

(16)

The MSE of (16) is

\begin{aligned} M S E ({\bar{y}}_{6}) = (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + {φ_{5}}^{2} C_{x}^{2} - 2 φ_{5} ρ_{x y} C_{x} C_{y}] \end{aligned}

(17)

Also, the bias of (16) is

\begin{aligned} B ({\bar{y}}_{6}) = (\frac{1 - f}{n}) \bar{Y} [{φ_{5}}^{2} C_{x}^{2} - φ_{5} ρ_{x y} C_{x} C_{y}] \end{aligned}

(18)

Where $φ_{5} = \frac{\bar{X}}{\bar{X} + M_{d}}$ ;

The modified ratio estimator proposed by Subramani and Kumarapandiyan (2012a) is given by

\begin{aligned} {\bar{y}}_{7} = \bar{y} \frac{\bar{X} C_{x} + M_{d}}{\bar{x} C_{x} + M_{d}} \end{aligned}

(19)

The MSE of (19) is

\begin{aligned} M S E ({\bar{y}}_{7}) = (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + {φ_{6}}^{2} C_{x}^{2} - 2 φ_{6} ρ_{x y} C_{x} C_{y}] \end{aligned}

(20)

Also, the bias of (19) is

\begin{aligned} B ({\bar{y}}_{7}) = (\frac{1 - f}{n}) \bar{Y} [{φ_{6}}^{2} C_{x}^{2} - φ_{6} ρ_{x y} C_{x} C_{y}] \end{aligned}

(21)

Where $φ_{6} = \frac{\bar{X} C_{x}}{\bar{X} C_{x} + M_{d}}$ ;

The modified ratio estimator proposed by Subramani and Kumarapandiyan (2012b) is given by

\begin{aligned} {\bar{y}}_{8} = \frac{\bar{y} + b (\bar{X} - \bar{x})}{\bar{x} + M_{d}} (\bar{X} + M_{d}) \end{aligned}

(22)

The MSE of (22) is

\begin{aligned} M S E ({\bar{y}}_{8}) = (\frac{1 - f}{n}) [γ_{1}^{2} S_{x}^{2} + S_{y}^{2} (1 - ρ^{2}] \end{aligned}

(23)

Also, the bias of (22) is

\begin{aligned} B ({\bar{y}}_{8}) = (\frac{1 - f}{n}) \frac{S_{x}^{2}}{\bar{Y}} γ_{1}^{2} \end{aligned}

(24)

Where $γ_{1} = \frac{\bar{Y}}{\bar{X} + M_{d}}$

The modified ratio estimator proposed by Subramani and Kumarapandiyan (2012b) is given by

\begin{aligned} {\bar{y}}_{9} = \frac{\bar{y} + b (\bar{X} - \bar{x})}{\bar{x} C_{x} + M_{d}} (\bar{X} C_{x} + M_{d}) \end{aligned}

(25)

The MSE of (25) is

\begin{aligned} M S E ({\bar{y}}_{9}) = (\frac{1 - f}{n}) [γ_{2}^{2} S_{x}^{2} + S_{y}^{2} (1 - ρ^{2}] \end{aligned}

(26)

Also, the bias of (25) is

\begin{aligned} B ({\bar{y}}_{9}) = (\frac{1 - f}{n}) \frac{S_{x}^{2}}{\bar{Y}} γ_{2}^{2} \end{aligned}

(27)

Where $γ_{2} = \frac{\bar{Y} C_{x}}{\bar{X} C_{x} + M_{d}}$

$C_{x}$ is the coefficient of variation of the auxiliary variate; $β_{1 (x)}$ is the skewness coefficient of the auxiliary variate.

3 Proposed Estimator

As the logarithmic function manages variability over data spread with greater efficiency, we developed a ratio estimator with the linear combination of the log function, and it is as follows

\begin{aligned} {\bar{y}}_{p} = \bar{y} (α l o g (\frac{\bar{X}}{\bar{x}}) + (\frac{\bar{X}}{\bar{x}})) \end{aligned}

(28)

Evaluate the above expression by using the following error terms, and we get

\begin{aligned} \bar{y} & = \bar{Y} (1 + e_{0}), \bar{x} = \bar{X} (1 + e_{1}) \end{aligned}

\begin{aligned} {\bar{y}}_{p} & = \bar{Y} (1 + e_{0}) (α l o g (\frac{\bar{X}}{\bar{X} (1 + e_{1})}) + (\frac{\bar{X}}{\bar{X} (1 + e_{1})})) \end{aligned}

Solving the above equation, we get,

\begin{aligned} {\bar{y}}_{p} & = \bar{Y} (1 + e_{0}) (- α e_{1} + α \frac{e_{1}^{2}}{2} + 1 - e_{1} + {e_{1}}^{2}) \end{aligned}

\begin{aligned} (29) & {\bar{y}}_{p} & = \bar{Y} (- α e_{1} + α \frac{e_{1}^{2}}{2} + 1 - e_{1} + {e_{1}}^{2} - α e_{1} e_{0} + e_{0} - e_{1} e_{0}) \end{aligned}

(29)

BIAS for the proposed estimator is given by

\begin{aligned} B I A S ({\bar{y}}_{p}) & = E ({\bar{y}}_{p}) - \bar{Y} \end{aligned}

\begin{aligned} B I A S ({\bar{y}}_{p}) & = \bar{Y} (α \frac{E (e_{1}^{2})}{2} + E ({e_{1}}^{2}) - α E (e_{1} e_{0}) - E (e_{1} e_{0})) \end{aligned}

\begin{aligned} (30) & B I A S ({\bar{y}}_{p}) & = \bar{Y} (α \frac{￡_{01}^{2}}{2} + ￡_{01}^{2} - α ￡_{11} + ￡_{11}) \end{aligned}

(30)

where,

E (e_{0}) = E (e_{1}) = 0; E (e_{0}^{2}) = λ C_{y}^{2} = ￡_{0}^{2}

;

\begin{aligned} E (e_{1}^{2}) = λ C_{x}^{2} = ￡_{1}^{2}; E (e_{0} e_{1}) = λ ρ C_{x} C_{y} = ￡_{01} \end{aligned}

MSE for the proposed estimator is given by

\begin{aligned} M S E ({\bar{y}}_{p}) & = E ({\bar{y}}_{p} - \bar{Y})^{2} \end{aligned}

\begin{aligned} M S E ({\bar{y}}_{p}) & = {\bar{Y}}^{2} (α^{2} E (e_{1}^{2}) + E (e_{1}^{2}) + E (e_{0}^{2}) + 2 α E (e_{1}^{2}) - 2 α E (e_{0} e_{1}) - 2 E (e_{0} e_{1})) \end{aligned}

\begin{aligned} (31) & M S E ({\bar{y}}_{p}) & = {\bar{Y}}^{2} (α^{2} ￡_{1}^{2} + ￡_{1}^{2} + ￡_{0}^{2} + 2 α ￡_{1}^{2} - 2 α ￡_{01} - 2 ￡_{01}) \end{aligned}

(31)

Now, differentiating equation (31) for $^{'} α^{'}$ and equating it to zero, that is

$\frac{\partial M S E ({\bar{y}}_{p})}{\partial α} = 0$ which implies

\begin{aligned} α = \frac{￡_{01} - ￡_{1}^{2}}{￡_{1}^{2}} \end{aligned}

4 Efficiency Comparisons

In survey sampling, MSE is an important metric for determining estimator efficiency. We employ MSE equations for all estimators in the literature review, including the proposed one, to compare the efficiency of competitive estimators to that of the proposed model. This section describes the mathematical conditions under which the suggested estimator is more efficient than the other estimators utilised for comparison.

On comparing (2) with (31), we get

\begin{aligned} M S E ({\bar{y}}_{p}) - M S E ({\bar{y}}_{1}) < 0 \\ {\bar{Y}}^{2} (α^{2} ￡_{01}^{2} + ￡_{01}^{2} + ￡_{10}^{2} + 2 α ￡_{01}^{2} - 2 α ￡_{11} - 2 ￡_{11}) < (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + C_{x}^{2} - 2 ρ_{x y} C_{x} C_{y}] \end{aligned}

(32)

Comparing the MSE of the proposed estimator with other estimators, we get the following equations (33) and (34)

\begin{aligned} M S E ({\bar{y}}_{p}) - M S E ({\bar{y}}_{i}) < 0, i = 2, 3, 4, 5, 6, 7, 8 and 9. \end{aligned}

When i = 2, 3, 4, 5, 6 and 7, then we have the following efficiency equation

\begin{aligned} {\bar{Y}}^{2} (α^{2} ￡_{01}^{2} + ￡_{01}^{2} + ￡_{10}^{2} + 2 α ￡_{01}^{2} - 2 α ￡_{11} - 2 ￡_{11}) < (\frac{1 - f}{n}) {\bar{Y}}^{2} [C_{y}^{2} + {φ_{j}}^{2} C_{x}^{2} - 2 φ_{j} ρ_{x y} C_{x} C_{y}], j = 1, 2, 3, 4, 5, and 6 \end{aligned}

(33)

When i = 8 and 9, then we have the following efficiency equation

\begin{aligned} {\bar{Y}}^{2} (α^{2} ￡_{01}^{2} + ￡_{01}^{2} + ￡_{10}^{2} + 2 α ￡_{01}^{2} - 2 α ￡_{11} - 2 ￡_{11}) < (\frac{1 - f}{n}) [γ_{k}^{2} S_{x}^{2} + S_{y}^{2} (1 - ρ^{2}], k = 1 and 2 \end{aligned}

(34)

When the equations in (32), (33), and (34) are satisfied, then we can say that the proposed estimator is more efficient than the other estimators that are used for comparative study.

5 Empirical Study

5.1 Data Set-I

The following data statistics Table 1 is based on the estimation of wheat area (study variable) in 1974, based on wheat cultivated area (related variable) in 1973, taken from Singh and Chaudhary (1986).

Table 1.
Population Parameters for Data Set-I.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

34 5 199.44 208.88 150.22 150.51 0.75 0.72

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

142.5 150 0.98 0.53 0.87 5.91 0.15 0.17

N	n	$\bar{Y}$	$\bar{X}$	$S_{y}$	$S_{x}$	$C_{y}$	$C_{x}$
34	5	199.44	208.88	150.22	150.51	0.75	0.72
$M_{y}$	$M_{x}$	$ρ$	$C_{y x}$	$β_{1 (x)}$	$β_{2 (x)}$	f	$λ$
142.5	150	0.98	0.53	0.87	5.91	0.15	0.17

5.2 Data Set-II

The data set taken from Yadav et al. (2019), which is based on the output of peppermint oil based on the field area in Bigha, is the basis for the data statistics in Table 2.

Table 2.
Population Parameters for Data Set-II.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

150 40 33.46 4.21 25.50 3.08 0.76 0.73

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

25 3 0.91 0.51 2.80 16.44 0.27 0.02

N	n	$\bar{Y}$	$\bar{X}$	$S_{y}$	$S_{x}$	$C_{y}$	$C_{x}$
150	40	33.46	4.21	25.50	3.08	0.76	0.73
$M_{y}$	$M_{x}$	$ρ$	$C_{y x}$	$β_{1 (x)}$	$β_{2 (x)}$	f	$λ$
25	3	0.91	0.51	2.80	16.44	0.27	0.02

Table 2 lists all parameters required to calculate the MSE values of respective estimators. The MSE values can be calculated based on their corresponding MSE equations, and the PRE values, which are provided in Table 3, are computed using the formula below.

\begin{aligned} PRE ({\bar{y}}_{1}, {\bar{y}}_{m}) = \frac{MSE ({\bar{y}}_{1})}{MSE ({\bar{y}}_{m})} \times 100; m = 1, 2, 3, \dots, 9, p . \end{aligned}

Table 3.

Values of PRE Based on Datasets I and II.

	Data Set-I	Data Set-II
Estimators	PRE	PRE
${\bar{y}}_{1}$	100	100
${\bar{y}}_{2}$	100	96.59
${\bar{y}}_{3}$	99.43	93.39
${\bar{y}}_{4}$	100	61.17
${\bar{y}}_{5}$	99.3	51.24
${\bar{y}}_{6}$	18.29	58.89
${\bar{y}}_{7}$	13.77	49.23
${\bar{y}}_{8}$	11.44	35.54
${\bar{y}}_{9}$	14.86	42.32
${\bar{y}}_{p}$	101.39	101.47

6 A Real Data Application

We used different data sets taken from Kaggle to conduct a real data application. The following is a discussion of the three distinct data sets that describe various forms of cancer.

6.1 Data Set-III

Using breast cancer fatalities as an auxiliary variable and breast cancer cases as a study variable, we examined the data presented by Gupta (2025) and the required data parameters are provided in Table 4.

Table 4.
Parameters for Data Set- III.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

500 100 1.94E + 04 1.04E + 05 1.14E + 04 6.11E + 04 0.59 0.59

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

1.89E + 04 1.01E + 05 1 0.34 0.07 −1.09 0.2 0.01

N	n	$\bar{Y}$	$\bar{X}$	$S_{y}$	$S_{x}$	$C_{y}$	$C_{x}$
500	100	1.94E + 04	1.04E + 05	1.14E + 04	6.11E + 04	0.59	0.59
$M_{y}$	$M_{x}$	$ρ$	$C_{y x}$	$β_{1 (x)}$	$β_{2 (x)}$	f	$λ$
1.89E + 04	1.01E + 05	1	0.34	0.07	−1.09	0.2	0.01

6.2 Data Set - IV

The data set is taken from the American Cancer Society (2024). The predicted number of cancer deaths in the US in 2024 is the auxiliary variable, whereas the main variable is the anticipated number of new invasive cancer cases in the US in 2024 (both sexes). The analysed data statistics are provided in Table 5

Table 5.
Parameters for Data Set-IV.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

57 30 2.99E + 04 1.00E + 05 8.54E + 04 2.74E + 05 2.86 2.72

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

5.20E + 03 2.24E + 04 0.97 7.53 6.26 43.24 0.53 0.02

N	n	$\bar{Y}$	$\bar{X}$	$S_{y}$	$S_{x}$	$C_{y}$	$C_{x}$
57	30	2.99E + 04	1.00E + 05	8.54E + 04	2.74E + 05	2.86	2.72
$M_{y}$	$M_{x}$	$ρ$	$C_{y x}$	$β_{1 (x)}$	$β_{2 (x)}$	f	$λ$
5.20E + 03	2.24E + 04	0.97	7.53	6.26	43.24	0.53	0.02

6.3 Data Set - V

Using data provided by Miadul (2025), we calculated the required statistical data, as shown in Table 6, where the survival rate from brain tumours serves as the research variable and the tumour development rate is considered a related variable. The calculated PRE values are provided in Table 7.

Table 6.
Parameters for Dataset-V.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

1000 150 71.46 1.55 17.43 0.82 0.24 0.53

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

71.90 1.57 0.03 0.004 −0.03 −1.10 0.15 0.01

N	n	$\bar{Y}$	$\bar{X}$	$S_{y}$	$S_{x}$	$C_{y}$	$C_{x}$
1000	150	71.46	1.55	17.43	0.82	0.24	0.53
$M_{y}$	$M_{x}$	$ρ$	$C_{y x}$	$β_{1 (x)}$	$β_{2 (x)}$	f	$λ$
71.90	1.57	0.03	0.004	−0.03	−1.10	0.15	0.01

Table 7.

PRE Values of Estimators for Data Sets- III, IV and V.

	Data Set-III	Data Set-IV	Data Set-V
Estimators	PRE	PRE	PRE
${\bar{y}}_{1}$	100	100	100
${\bar{y}}_{2}$	80.07	100	158.31
${\bar{y}}_{3}$	68.7	100	103.4
${\bar{y}}_{4}$	97.4	100	96.45
${\bar{y}}_{5}$	95.54	100	93.30
${\bar{y}}_{6}$	5.02E-07	64.51	265.82
${\bar{y}}_{7}$	3.14E-07	89.82	369.26
${\bar{y}}_{8}$	4.76E-07	9.48	257.56
${\bar{y}}_{9}$	8.64E-07	7.58	358.31
${\bar{y}}_{p}$	191.87	100.31	556.72

7 Simulation Study

Along with the suggested estimator, we run a simulation study for every estimator utilised in the study. Using a population size of 100,000 and a sample size of 1000, we simulate the sample data 100,000 times. The average PRE values of these 100,000 simulations are given in Table 8.

Table 8.
PRE Values of Estimators Under Simulation Based on Different Values of the Correlation Coefficient.

Different Values of the Correlation Coefficient ( $ρ$ )

Estimators 0.5 0.6 0.7 0.8 0.9

${\bar{y}}_{1}$ 100 100 100 100 100

${\bar{y}}_{2}$ 100.32 100.51 100.46 100.31 100

${\bar{y}}_{3}$ 100.16 100.51 100.46 100.31 100.51

${\bar{y}}_{4}$ 100 100.13 100 100 100

${\bar{y}}_{5}$ 100 100.13 100 100 100

${\bar{y}}_{6}$ 271 280.89 289.40 287.72 229.60

${\bar{y}}_{7}$ 321.03 317.67 301.38 258.27 156.35

${\bar{y}}_{8}$ 158.88 143.08 125.22 102.82 70.36

${\bar{y}}_{9}$ 210.07 192.67 172.73 146.43 107.07

${\bar{y}}_{p}$ 334.76 322 307.75 290.27 268.39

	Different Values of the Correlation Coefficient ( $ρ$ )
${\bar{y}}_{1}$	100	100	100	100	100
${\bar{y}}_{2}$	100.32	100.51	100.46	100.31	100
${\bar{y}}_{3}$	100.16	100.51	100.46	100.31	100.51
${\bar{y}}_{4}$	100	100.13	100	100	100
${\bar{y}}_{5}$	100	100.13	100	100	100
${\bar{y}}_{6}$	271	280.89	289.40	287.72	229.60
${\bar{y}}_{7}$	321.03	317.67	301.38	258.27	156.35
${\bar{y}}_{8}$	158.88	143.08	125.22	102.82	70.36
${\bar{y}}_{9}$	210.07	192.67	172.73	146.43	107.07
${\bar{y}}_{p}$	334.76	322	307.75	290.27	268.39

8 Interpretation of the Findings

The suggested log-ratio estimator outperforms 9 other estimators on numerous datasets in terms of both Mean Squared Error (MSE) and Percentage Relative Efficiency (PRE). In Table 3 of empirical studies, the suggested estimator performed admirably in estimating wheat area based on cultivated area and peppermint oil production based on field area. It had the lowest MSE values and the greatest PRE values, significantly outperforming classic ratio-type estimators. This shows that the log-ratio estimator is both accurate and efficient, leveraging auxiliary information more effectively than linear techniques. In real-world applications that included complicated datasets such as breast cancer deaths based on cancer cases, projected new cancer cases based on estimated fatalities, and survival rates based on brain tumour development, the suggested estimator performed well. The estimator attained an extraordinarily high PRE of more than 50% in the tumour survival rate dataset provided in Table 7, demonstrating its robustness in highly variable and non-linear medical data. Many standard estimators exhibited considerable decreases in efficiency, with some producing extremely low PREs, implying their limited application in such complicated circumstances. The log-ratio estimator's logarithmic transformation appears to adapt well to skewed and kurtotic data structures, making it a useful tool for a variety of real-world survey sample challenges. Furthermore, in the simulation exercise, where data were created and evaluated across 100,000 iterations with a large population size (100,000) and sample size (1,000), the log-ratio estimator not only maintained its efficiency but outperformed all other estimators with a PRE presented in Table 8, which examined under different correlation coefficients. This gives strong evidence that it is stable under large-scale sampling and repeated trials.

Overall, the results show that the log-ratio estimator is an advanced, non-linear alternative to existing techniques that can provide extremely accurate and efficient estimates in both controlled and real-world contexts. Its constant dominance in MSE and PRE measures across empirical, applied, and simulated data situations indicates that it is a dependable and generalizable method for estimating limited population means, especially in the setting of modern, complex datasets where traditional estimators may fall short.

9 Conclusion

In this paper, we present a log-ratio estimator and obtain its MSE and bias equation up to first-order approximation. A detailed study of empirical, real-world, and simulated datasets indicates that the proposed log-ratio estimator outperforms standard and modified ratio estimators. Its ability to incorporate auxiliary information via a logarithmic transformation allows it to effectively handle complex relationships and distributional anomalies found in modern datasets. The estimator continuously showed lower MSE and higher PRE values, demonstrating its stability, precision, and efficiency. In both controlled empirical settings and real-world cancer and agricultural datasets, it beat rival estimators. The simulation analysis supports its advantages under repeated and large-sample settings.

These findings not only demonstrate the efficiency of the log-ratio estimator but also illustrate the growing importance of nonlinear, data-adaptive techniques in survey sampling. As data complexity grows, so does the demand for estimators that can handle a variety of features without sacrificing performance. Because of its design and proven efficiency, the log-ratio estimator stands out as a viable contender for wider use in modern survey methods. Future research might look into its adaptability in stratified or cluster sampling frameworks, as well as its performance across various domains, to cement its position as an important tool in statistical estimation.

Footnotes

ORCID iD

Faizan Danish

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

American Cancer Society. (2024). Cancer facts & figures 2024. American Cancer Society.

Audu

Yunusa

M. A.

Ishaq

O. O.

Lawal

M. K.

Rashida

Muhammad

A. H.

Muili

J. O.

(2021). Difference-cum-ratio estimators for estimating finite population coefficient of variation in simple random sampling. Asian Journal of Probability and Statistics, 13(3), 13–29. https://doi.org/10.9734/ajpas/2021/v13i330308

Cochran

W. G.

(1940). Some properties of estimators based on sampling schemes with varying probabilities. Journal of Agricultural Science, 30(2), 262–275. https://doi.org/10.1017/S0021859600048012

Gupta

(2025). Global insights into breast cancer. Kaggle .

Javed

Irfan

Shongwe

S. C.

Hussain

M. A.

Meetei

M. Z.

(2025). Difference-cum-exponential-type estimators for estimation of finite population mean in survey sampling. PLoS One, 20(1), e0313712.

Kadilar

Cingi

(2003). Ratio estimators in stratified random sampling. Biometrical Journal, 45(2), 218–225. https://doi.org/10.1002/bimj.200390007

Kadilar

Cingi

(2004). Ratio estimators in simple random sampling. Applied Mathematics and Computation, 151(3), 893–902. https://doi.org/10.1016/S0096-3003(03)00803-8

Kadilar

Cingi

(2007). An improvement in estimating the population mean by using the correlation coefficient. Quality Control and Applied Statistics, 52(4), 443–446.

Khan

Ullah

Al-Hossain

A. Y.

Bashir

(2015). Improved ratio-type estimators using maximum and minimum values under a simple random sampling scheme. Hacettepe Journal of Mathematics and Statistics, 44(4), 923–931.

10.

Koyuncu

Kadilar

(2009). Efficient estimators for the population mean. Hacettepe Journal of Mathematics and Statistics, 38(2), 217–225.

11.

Lakshmi

N. V.

Danish

Alrasheedi

(2025). Enhanced estimation of finite population mean via power and log-transformed ratio estimators using an auxiliary variable in solar radiation data. Journal of Radiation Research and Applied Sciences, 18(2), 101379. https://doi.org/10.1016/j.jrras.2025.101379

12.

Miadul. (2025). Brain tumor dataset. Kaggle.

13.

Ounrittichai

Utha

Choopradit

Chaipitak

(2024). Performance comparison of three ratio estimators of the population ratio in simple random sampling without replacement. International Journal of Analysis and Applications, 22, 121–121. https://doi.org/10.28924/2291-8639-22-2024-121

14.

Prasad

(1989). Some improved ratio type estimators of population mean and ratio in finite population sample surveys. Communications in Statistics – Theory and Methods, 18(1), 379–392. https://doi.org/10.1080/03610928908829905

15.

Sing

Kumar

(2011). A note on transformations on auxiliary variable in survey sampling. Model Assisted Statistics and Applications, 6(1), 17–19. https://doi.org/10.3233/MAS-2011-0154

16.

Singh

Sharma

Aloraini

(2025). Estimation of population mean using neutrosophic exponential estimators with application to real data. International Journal of Neutrosophic Science, 25(3). https://doi.org/10.54216/IJNS.180301

17.

Singh

Chaudhary

F. S.

(1986). Theory and analysis of sample survey designs. John Wiley & Sons.

18.

Singh

G. N.

Rani

(2006). Some linear transformations on auxiliary variable for estimating the ratio of two population means in sample surveys. Model Assisted Statistics and Applications, 1(1), 3–7. https://doi.org/10.3233/MAS-2006-1102

19.

Singh

H. P.

Tailor

(2003). Use of known correlation coefficient in estimating the finite population mean. Statistics in Transition, 6(4), 555–560.

20.

Singh

H. P.

Tailor

(2005). Estimation of finite population mean with known coefficient of variation of an auxiliary variable. Statistica, 65(3), 301–313.

21.

Singh

H. P.

Tailor

Kakran

M. S.

(2004). An improved estimator of population mean using power transformation. Journal of the Indian Society of Agricultural Statistics, 58(2), 223–230.

22.

Singh

(2003). Advanced sampling theory with applications. Springer.

23.

Sisodia

B. V. S.

Dwivedi

V. K.

(1981). A modified ratio estimator using coefficient of variation of auxiliary variable. Journal of the Indian Society of Agricultural Statistics, 33(1), 13–18.

24.

Subramani

Kumarapandiyan

(2012a). Estimation of population mean using coefficient of variation and median of an auxiliary variable. International Journal of Probability and Statistics, 1(4), 111–118. https://doi.org/10.5923/j.ijps.20120104.04

25.

Subramani

Kumarapandiyan

(2012b). Modified ratio estimator for population mean using median of the auxiliary variable. Proceedings of the National Conference on Recent Developments in the Applications of Reliability Theory and Survival Analysis, 2–3 February.

26.

Triveni

G. R. V.

Danish

(2023). Heuristical approach for optimizing population mean using ratio estimator in stratified random sampling. Journal of Reliability and Statistical Studies, 16(1), 137–152.

27.

Triveni

G. R. V.

Danish

(2024a). Development of novel separate ratio estimator in stratified random sampling with applications on real data. International Journal of Applied Nonlinear Science, 4(2), 122–131. https://doi.org/10.1504/IJANS.2024.137170

28.

Triveni

G. R. V.

Danish

(2024b). Exploring the dependability of combined ratio estimators in stratified ranked set sampling: Insights from COVID-19 data. Alexandria Engineering Journal, 92, 267–272. https://doi.org/10.1016/j.aej.2024.02.051

29.

Upadhyaya

L. N.

Singh

H. P.

(1999). Use of transformed auxiliary variable in estimating the finite population mean. Biometrical Journal, 41(5), 627–636. https://doi.org/10.1002/(SICI)1521-4036(199909)41:5<627::AID-BIMJ627>3.0.CO;2-W

30.

Yadav

S. K.

Arya

Koc

Zaman

(2024). An efficient family of ratio type estimators for simple random sampling. Journal of Science and Arts, 24(1), 69–94. https://doi.org/10.46939/J.Sci.Arts-24.1-a07

31.

Yadav

S. K.

Dixit

M. K.

Dungana

H. N.

Mishra

S. S.

(2019). Improved estimators for estimating average yield using auxiliary variable. International Journal of Mathematical, Engineering and Management Sciences, 4(5), 1228.

32.

Yan

Tian

(2010). Ratio method to the mean estimation using coefficient of skewness of auxiliary variable. In International conference on information computing and applications, part II (pp. 103–110). https://doi.org/10.1007/978-3-642-16339-5_14

33.

Zaman

Bulut

Yadav

S. K.

(2022). Robust ratio-type estimators for finite population mean in simple random sampling: A simulation study. Concurrency and Computation: Practice and Experience, 34(25), e7273. https://doi.org/10.1002/cpe.7273

Log-Ratio Estimation in Survey Sampling: A Comparative Study Using Agricultural and Cancer Data with Simulations

Abstract

Keywords

1 Introduction

1.1 The Terms and Notations

2 Literature Review

5.1 Data Set-I

Table 1. Population Parameters for Data Set-I. N n Y ¯ X ¯ S y S x C y C x 34 5 199.44 208.88 150.22 150.51 0.75 0.72 M y M x ρ C y x β 1 ( x ) β 2 ( x ) f λ 142.5 150 0.98 0.53 0.87 5.91 0.15 0.17

Table 2. Population Parameters for Data Set-II. N n Y ¯ X ¯ S y S x C y C x 150 40 33.46 4.21 25.50 3.08 0.76 0.73 M y M x ρ C y x β 1 ( x ) β 2 ( x ) f λ 25 3 0.91 0.51 2.80 16.44 0.27 0.02

6.1 Data Set-III

Table 4. Parameters for Data Set- III. N n Y ¯ X ¯ S y S x C y C x 500 100 1.94E + 04 1.04E + 05 1.14E + 04 6.11E + 04 0.59 0.59 M y M x ρ C y x β 1 ( x ) β 2 ( x ) f λ 1.89E + 04 1.01E + 05 1 0.34 0.07 −1.09 0.2 0.01

Table 5. Parameters for Data Set-IV. N n Y ¯ X ¯ S y S x C y C x 57 30 2.99E + 04 1.00E + 05 8.54E + 04 2.74E + 05 2.86 2.72 M y M x ρ C y x β 1 ( x ) β 2 ( x ) f λ 5.20E + 03 2.24E + 04 0.97 7.53 6.26 43.24 0.53 0.02

Table 6. Parameters for Dataset-V. N n Y ¯ X ¯ S y S x C y C x 1000 150 71.46 1.55 17.43 0.82 0.24 0.53 M y M x ρ C y x β 1 ( x ) β 2 ( x ) f λ 71.90 1.57 0.03 0.004 −0.03 −1.10 0.15 0.01

9 Conclusion

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

References

Table 1.
Population Parameters for Data Set-I.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

34 5 199.44 208.88 150.22 150.51 0.75 0.72

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

142.5 150 0.98 0.53 0.87 5.91 0.15 0.17

Table 2.
Population Parameters for Data Set-II.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

150 40 33.46 4.21 25.50 3.08 0.76 0.73

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

25 3 0.91 0.51 2.80 16.44 0.27 0.02

Table 4.
Parameters for Data Set- III.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

500 100 1.94E + 04 1.04E + 05 1.14E + 04 6.11E + 04 0.59 0.59

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

1.89E + 04 1.01E + 05 1 0.34 0.07 −1.09 0.2 0.01

Table 5.
Parameters for Data Set-IV.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

57 30 2.99E + 04 1.00E + 05 8.54E + 04 2.74E + 05 2.86 2.72

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

5.20E + 03 2.24E + 04 0.97 7.53 6.26 43.24 0.53 0.02

Table 6.
Parameters for Dataset-V.

N n $\bar{Y}$ $\bar{X}$ $S_{y}$ $S_{x}$ $C_{y}$ $C_{x}$

1000 150 71.46 1.55 17.43 0.82 0.24 0.53

$M_{y}$ $M_{x}$ $ρ$ $C_{y x}$ $β_{1 (x)}$ $β_{2 (x)}$ f $λ$

71.90 1.57 0.03 0.004 −0.03 −1.10 0.15 0.01