The Evaluation of Statistical Methods for Estimating the Lower Limit of Detection

Abstract

In estimating a quantitative assay's lower limit of detection (LOD), standard deviation (SD) is the most common measure used to quantify the dispersion of the data, yet this LOD calculation method assumes that the low concentration samples follow a Gaussian distribution, which is not always true in reality. Here, a few LOD estimating methods that are based on different dispersion measures were investigated; each method's performance was evaluated across various distribution scenarios. Nine methods for LOD estimation that use different measures of data dispersion—SD, mean absolute deviation (MD), median absolute deviation, Gini's mean difference (GMD), percentiles (PCT), Algorithm A, S_n , Q_n , and inter-quartile range—were evaluated using both simulations and real-life datasets. LOD estimates calculated using different variability measures were compared to the true LOD value under different scenarios. A method was judged to be good if the method had a relatively stable formula, low bias, confidence interval that had shorter width, and achieved the desired level of frequency in covering the true value of LOD (coverage probability [CP]). First, the nine methods were screened for formula consistency across different distribution scenarios. Methods showing the greatest formula variation were removed from further analysis; the remaining methods were then examined and compared. The GMD-based method had a relatively stable formula and demonstrated the best overall performance with low bias, confidence interval of shorter width, and good CP across all situations. The PCT-based method only performed well if sample size was large. The MD-based method in general had larger bias than the GMD-based estimator. LOD estimates based on SD that assumes Gaussian distribution in all scenarios will often generate poor results. Instead, the GMD-based estimator, a method with a simple formula so is easy to use in practice, demonstrated robust performance across varying situations.

Introduction

Analytical sensitivity is an important assay performance characteristic that determines the suitability of the assay's intended use. Usually, sensitivity refers to the assay's ability to generate reliable signal when the amount of measurand is very low; this is critical because detection of small quantities is necessary to define disease states, screen for presence of disease, or reveal the presence of substances such as toxins, contaminants, and drugs. It is also important for assays that measure circulating levels of tumor markers, hormones, infectious disease agents, therapeutic drugs, and other biomarkers where low results separate subjects into different disease or exposure categories.¹

Generally, different measures of detection capability are used to specify the increasing quantitative certainty within the low-end region of the measuring interval. These include the upper boundary on blank sample measurements (the limit of blank or LOB), yes/no detection of the measurand's presence (the limit of detection or LOD), and the minimal measurand amount that can be quantitated reliably with respect to defined accuracy goals (the limit of quantitation or LOQ).^2

–9 One or more of these estimates may be necessary to adequately characterize performance in the low-end region of the measuring interval. Based on the Clinical and Laboratory Standards Institute (CLSI) guideline EP17A, the definitions of LOB, LOD, and LOQ used here are as follows: LOB is the highest measurement result that is likely to be observed with a stated probability (1−type I error α) for blank samples; LOD is the lowest concentration of analyte that can be consistently detected in greater than or equal to a proportion (1−type II error β) of samples tested under routine clinical laboratory conditions; LOQ is the lowest amount of measurand in a material that can be quantitatively determined with stated accuracy under stated experimental conditions.⁴ LOB and LOD are objective statistical constructs that are calculated solely on the inherent precision of the measurement procedure. In contrast, the LOQ reflects performance of the measurement procedure versus a pre-established accuracy goal. The focus of this research is LOD. See Figure 1 for graphical illustration of LOB and LOD.

Fig. 1.

An illustration of limit of blank (LOB) and limit of detection (LOD). For the blank sample (left curve), 95% of its measurement results (α=0.05) fall at or below the LOB. For a sample whose measurand content equals the LOD (right curve), 95% of its measurement results (β=0.05) exceed the LOB. The truncated blank sample distribution reflects that some instrument systems suppress measurement results below zero.

LOD is generally determined in one of two ways: (i) statistically estimating based on the observed low sample data; (ii) empirically testing serial dilutions of samples with a known concentration of the target substance in the analytical range of the expected detection limit and find the concentration that meets the pre-specified criteria. Using statistical approaches, the parametric formula for the true, not estimated, value of LOD is calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} LOD = LOB + c_{\beta} \times D_{LOD}, \tag{1} \end{align*} \end{document}

where LOB is the true LOB, c _β is the multiplicative factor that is associated with the target acceptable error risk of false negatives (β denotes the tolerated false-negative rate, usually β=0.05), and D_LOD is the true dispersion for data generated from samples with concentration of LOD.

Using experimental data from blank samples and low samples, there are different methods to estimate D_LOD as well as the c _β associated with it. For LOD estimation, current methods rely heavily on the assumption that the data of the low samples follow a Gaussian distribution, which is not true in some cases.^10,11 Limpert et al. argued, “Many measurements show a more or less skewed distribution. Skewed distributions are particularly common when mean values are low, variances large, and values cannot be negative, as is the case, for example, with species abundance, lengths of latent periods of infectious diseases, and distribution of mineral resources in the Earth's crust. Such skewed distributions often closely fit the log-normal distribution.”¹² In biology, it is quite common that the data from immunoassays are positive/right skewed, and usually they can be approximated by log-normal distributions.^13–14 Moreover, even though the data are not skewed, they may still not follow a Gaussian distribution. If the data do not follow a Gaussian distribution, the typical method that uses standard deviation (SD), along with c _β determined based on a Gaussian model, may provide poor estimations of the true LOD.¹⁰ Therefore, the interest of this research was to evaluate statistical methods for estimating LOD, using both computer simulations and real-life data, when the distributions of the low samples did not necessarily follow a Gaussian distribution.

Materials and Methods

Assumptions for the Statistical Methods

LOD calculation is a two-step process; it involves using blank samples to calculate the LOB and then using low samples to calculate D_LOD . This investigation focused on the second step, and the LOB was assumed to be fixed (LOB is known or has been verified). Further, this research was interested in the situations where the variation of the low samples was approximately constant over a limited range of low concentrations, which is a reasonable assumption for many assays such as serial dilution and ligand-binding assays.^15,16 If the variation of the low samples is not constant, more complicated methods may be adopted to estimate LOD,⁴ but that is beyond the scope of this research.

Under these assumptions, the LOD estimate can be expressed in a general formula as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} LOD = LOB + c_{\beta} \times D_{low}, \tag{2} \end{align*} \end{document}

since the variation for the low samples is assumed to be constant: D_LOD =D_low . Therefore, when the LOB is assumed to be fixed, LOD estimation is a matter of determining the dispersion measure D_low and the multiplicative factor c _β. A few statistical methods were investigated through theoretical reasoning, simulation, and real-life data applications. A method was deemed to be superior if (i) it had a simple mathematical formula for D_low , so it would be easy to implement in practice; (ii) its multiplicative factor c _β was easy to determine, preferably a constant; (iii) it demonstrated low bias in estimating the true LOD; (iv) its confidence interval was narrow (low uncertainty) and had high probability to cover the true value; and (v) it showed robust performance across different scenarios.

Statistical Methods for LOD Calculation

When α=0.05, the LOB is estimated by the 95th percentile of the distribution of the blank samples. If the data of the blank samples follow a Gaussian distribution, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \widehat{LOB} = \hat{\mu}_B + 1.645 \times \hat{\sigma}_B, \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\hat{\mu}_B$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\hat{\sigma}_B$$ \end{document} stand for the estimates of the mean and the SD of the blank sample data, respectively. In the following discussion, the dataset of the low samples is to be denoted by X, the data values are denoted by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$x_1, x_2, \ldots, x_N$$ \end{document} , where N represents the sample size (the number of low samples times the number of replicate measures for each sample).

The statistical methods considered in this research for LOD estimates are summarized in Table 1. More details of the methods are provided in the Supplementary Data (available online at www.liebertonline.com/adt).

Table 1.

Measures of Dispersion ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{D}_{low}$$ \end{document} )

	Formula	Comment
SD	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\it SD} = \sqrt {\frac {1} {{\it n} - 1} \sum ({\it X}_i - \bar {\it X})^2}$$ \end{document}	Standard deviation (SD), along with the c _β based on a Gaussian model, is currently the most commonly used method for limit of detection (LOD) estimation.
MD	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\it MD} = \frac {1} {{\it N}} \mathop\sum\nolimits_{\it i = 1}^{\it N} \mid {\it X}_i - {\it med_j} ({\it X}_{\it j}) \mid$$ \end{document}	Mean absolute deviation (MD) measures the mean of absolute deviation of every observation from the median of the dataset.
MAD	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\it MAD} = {\it med_i} \left(\mid {\it X}_{\it i} - {\it med_j} ({\it X}_{\it j}) \mid \right)$$ \end{document}	Median absolute deviation (MAD) is the median of the absolute deviation of every observation from the median of the dataset.
GMD	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\it GMD} = \frac {1} {{n \choose 2}} \sum \big\| {\it X}_{\it i} - {\it X}_{\it j} \big\| ; {\it i < j}$$ \end{document}	Gini's mean difference (GMD) is the expectation of the absolute difference between two random observations.¹⁷
PCT	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\it D}_{(0.05, 0.5)} = \widehat{\it PCT}_{0.50} - \widehat{\it PCT}_{0.05}$$ \end{document}	The percentiles (PCT) method measures D _(0.05,0.5), the difference between the median (50th percentile) and the 5th percentile of the data. The multiplicative factor c _β=1 for this method.
Algorithm A	The estimate of the dispersion, noted as s ^*, derived from an iterative process.	This estimator is suggested by ISO-5725, called the Algorithm A. The details of this method could be found in ISO-5725 part 5;¹⁸ it is also provided in the Supplementary Data.
S _n and Q _n	S_n =c_Sn × 1.1962med_i {med_j \|X_i −X_j \|} where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align} \it {c_{Sn} = \begin{cases} \frac {n} {n - 0.9} \quad if \ n \ is \ odd \\ 1 \qquad \;\; if \ n \ is \ even \end{cases}} \end{align} \end{document}	S_n and Q_n methods are proposed by Croux and Rousseeuw as alternative nonparametric estimators of the dispersion.^19,20
S _n and Q _n	\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\it {Q_n = c_{Qn} \times 2.2219 \{\mid X_{\it i} - X_j \mid ; i < j \}_{(k)}}$$ \end{document} where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$k = {{[n/2] + 1 \choose 2}},$$ \end{document} , \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align} \it {c_{Qn} = \begin{cases} \frac {n} {n + 1.4} \ if \ n \ is \ odd \\ \frac {n} {n + 3.8} \ if \ n \ is \ even \end{cases}} \end{align} \end{document} and
IQR	IQR=Q ₃−Q ₁	The inter-quartile range (IQR) is defined as the distance between the lower quartile (Q ₁, the 25th percentile) and the upper quartile (Q ₃, the 75th percentile) of the data.

Provided are the formulae for the measurement of the data dispersion \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\it {\widehat{D}_{low}}$$ \end{document} , which are used by the nine methods for estimating LOD: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\it {\widehat{LOD} = \widehat{LOB} + c_{\beta} \times \widehat{D}_{low}}$$ \end{document} .

Computer Simulations to Evaluate LOD Calculation Methods

To evaluate and compare the performance of the LOD estimation methods, parametric simulation studies using R 2.10.1 were conducted using a variety of distribution types for the low samples.^21,22 Specifically, scenarios of Gaussian distributions, symmetric-but-not-Gaussian distributions (T-distributions with different degrees of freedom, e.g., T10 stands for T-distribution with 10 degrees of freedom), and asymmetrical distributions (log-normal distributions with different levels of skewness) were considered.

All of the methods have the form \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \widehat{LOD} = \widehat{LOB} + c_{\beta} \times \widehat{D}_{low} \tag{3} \end{align*} \end{document}

When \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{LOD}$$ \end{document} was assumed to be fixed, the estimate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\hat{D}_{low}$$ \end{document} was thus dependent upon the multiplicative factor c _β and the estimate of the dispersion \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{LOB} = 1$$ \end{document} for the low sample data. In order for a statistical method to be widely used, the multiplicative factor c _β has to be easily determined. This is possible if a closed formula exists (as in the SD-based method for Gaussian distributions), or if the factor is constant or relatively unchanged with the change of data.

Therefore, the first undertaking of this study was to investigate the behavior of the multiplicative factor c _β in the various methods. Based on simulation results across Gaussian, T20, T10, T5, and T3 distributions, those methods with variable c _β values were subsequently removed from further investigation. The simulation was conducted as follows: the type I and type II errors are assumed to be α=β=0.05. For a specific distribution, the LOB is assumed to be fixed, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{LOB} = 1$$ \end{document} ; the true LOD value can be theoretically determined as LOD=1+D _0.5,0.05, where D _0.5,0.05 is the distance between the 5th percentile and the 50th percentile of the distribution. For example, when the low samples are assumed to follow a standard Gaussian distribution, the true LOD is LOD=1+1.645×1=2.645. The true LOD values for all distributions/scenarios are provided in Table 4. In the simulations, it was assumed that the data consisted of one low sample, which had 100 replicate measures. For each simulation run under a given distribution, data (N=100) were simulated from the distribution such that the center of the distribution equaled to the true LOD. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{D}_k^{(i)} (k = 1, 2, \ldots, 9)$$ \end{document} denote the dispersion of the low sample based on the kth method from the ith simulation run. The multiplicative factor c _β for the kth method determined from the ith simulation run is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} c_{\beta_k}^{(i)} = \frac {LOD - \widehat {LOB}} {\widehat{D}_k^{(i)}}, \tag {4} \end{align*} \end{document}

where LOD is the value of the true LOD and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{LOB} = 1$$\end{document} . Ten thousand runs were conducted for each distribution, the median of the \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_{\beta_k}^{(i)} (i = 1, 2, \ldots, 10, 000)$$ \end{document} is reported as the c _β for the kth method under the current distribution. The methods showing high variation in c _β values across different distributions were eliminated for further investigation.

The remaining methods are thus those whose multiplicative factor c _β is relatively constant; a best c _β value is determined as a fixed value for each method based on further investigation. With the formula fixed (i.e., a fixed value was determined for c _β), each LOD estimation method was then evaluated for performance based on bias, the width of its confidence interval, and how often this confidence interval covers the true LOD (coverage probability [CP]) through simulations on broader sampling scenarios. Specifically, Gaussian, T-distribution with different degrees of freedom, and log-normal distributions with different levels of skewness were considered. In addition, for each distribution, the performance was also investigated when the number of low samples varied. Ten thousand runs were carried out for each distribution scenario, and the LOD estimate was calculated for each method as described below. For a given method and a given distribution, let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{D}_{low}$$ \end{document} denotes the median of the 10,000 estimates, and let D_low denote the true value. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_{\beta}^{ll}$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_{\beta}^{ul}$$ \end{document} be the lower and upper limit of the multiplicative factors, such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_{\beta}^{ll} \times \widehat{D}_{low}$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_{\beta}^{ul} \times \widehat{D}_{low}$$ \end{document} correspond to the 2.5% percentile and the 97.5% percentile of the 10,000 D_low estimates, respectively. A confidence interval for each simulated sample (though for Gaussian distribution, theoretical formula exists for the SD-based method) had the form of [ \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_{\beta}^{ll} \times \widehat{D}_{low}^i, c_{\beta}^{ul} \times \widehat{D}_{low}^i$$ \end{document} ], where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{D}_{low}^i$$ \end{document} is the D _low estimate from the ith simulated sample. The width of this interval reflected the uncertainty for the method under consideration—for a given confidence level, the narrower the better. Each method was also evaluated for the CP of its confidence interval, which was calculated as the percentage of times the confidence interval covered the true D_low value. Since the confidence intervals were constructed for 95% confidence level, a CP close to 95% demonstrated high accuracy. The bias of the method was calculated as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\widehat{D}_{low} - D_{low}) / D_{low}$$ \end{document} .

Real-Life Data to Evaluate LOD Calculation Methods

The first data chosen for the real-life applications were supplied by CLSI EP17A and were measures of total mercury in blood (μg/L).⁴ There were 117 measurements on a variety of blank samples, and the LOB was calculated by CLSI and claimed to be 0.239 μg/L based on the nonparametric percentile (PCT) method. The low sample data were comprised of 15 subjects with low level of mercury in the blood. For each subject, the laboratory had obtained a series of 20 samples over a period of at least 4 days. A subset of 4 subjects from the original 15 had the raw data provided in EP17A, and this was used in this analysis. The raw data of the four subjects are provided in Supplementary Table S1 and Supplementary Figure S1.

The second real-life data were from a fluorescence resonance energy transfer (FRET) assay to detect protein marker X activity in cellular supernatant of cells derived from a fresh patient tumor. Marker X is believed to be minimally expressed by normal cells and over expressed by tumor cells. Equal number of live cells was separately derived from a portion of the normal tissue and a portion of the tumor tissue dissected from a live tumor of a breast cancer patient. The two sources of cells were then put into equal volume of media and grown for 96 h. Replicates of equal volume of the cellular supernatant from the tumor cells were drawn and measured for the activity of marker X using the FRET activity assay; the same was done for the cellular supernatant from the cells from the normal tissue. Twenty of the tumor replicates (low sample) and 18 of the normal replicates (blank sample) provided valid assay results.

Results

Screening of Statistical Methods for the Behavior of c _β

The medians of the 10,000 estimates of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_{\beta} (c_{\beta_k}^{(1)}, c_{\beta_k}^{(2)}, \ldots, c_{\beta_k}^{(10.000)})$$ \end{document} for each of the eight methods are reported in Table 2 (c _β=1 for the method based on PCT). It is evident that the c _β values were greatly affected by the shape of the distribution. The c _β values for methods based on median absolute deviation (MAD), Algorithm A, S_n , Q_n , and inter-quartile range (IQR) were particularly variable across the distributions examined when compared to the variability of c _β values produced by methods based on mean absolute deviation (MD), Gini's mean difference (GMD), and SD. Consequently, LOD estimation methods based on MAD, Algorithm A, S_n , Q_n , and IQR were removed from further investigation in this study.

Table 2.

c _β Values for Each Limit of Detection Estimation Method Across Various Symmetrical Distributions

Distribution	MD	MAD	Algorithm A	GMD	S _n	Q _n	SD	IQR
T3	2.17	3.13	2.04	1.44	1.97	1.91	1.48	1.58
T5	2.17	2.84	1.88	1.48	1.83	1.79	1.61	1.43
T10	2.12	2.64	1.76	1.47	1.73	1.71	1.64	1.33
T20	2.11	2.57	1.73	1.47	1.69	1.68	1.65	1.29
Gaussian	2.09	2.49	1.68	1.46	1.66	1.65	1.66	1.25

This table reports the c _β values (β=0.05) as produced by each method in simulations for symmetric-but-not-Gaussian and Gaussian. Here the method for dispersion measurement is used to refer to the LOD estimation method that is based on it. The more variation in c _β values the method has across different distributions, the less stable the method's formula and the more difficult it would be to implement.

Determining the Best c _β Values for the Remaining Methods

Besides the shape of the distribution, the c _β values also depended upon the number of low samples and the number of replicates for each sample. A grid-search approach was conducted to investigate the behavior of and to find the best value for c _β for MD-based and GMD-based methods based on different number of low samples (one and four), varying number of replicates for each low sample (total \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$N = 20, 40, 60, \ldots, 200$$ \end{document} ), and a variety of distribution types (T3, T5, and Gaussian). Different numbers of low samples were investigated for each distribution, and the conclusion remained the same (data not shown); therefore, only the results of one low sample and four low samples were shown here. For each scenario, the c _β value was determined as the value that produced the LOD estimate and defined as the median of 10,000 iterations estimate of LOD (calculated from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{LOD} = \widehat{LOB} + c_{\beta} \times \widehat{D}_{low}$$ \end{document} ) that had the lowest bias. The best c _β was then determined to be a fixed value that was a compromise of the c _β values from all scenarios. See Supplementary Figures S2 and S3, Supplementary Tables S2 and S3, and accompanying discussion in the Supplementary Data for illustrations of this simulation procedure and results using T5 distribution as an example. When there are multiple low samples, SD was calculated as the square root of the pooled variances; MD and GMD were calculated as the average of the dispersion of the individual low samples; PCT was calculated based on the pooled data from the median-adjusted individual samples (i.e., the individual samples were adjusted so that they all had the same median).

For the GMD-based method, when there was only one low sample with N=20, the c _β value was 1.49; when N=40, 60, or 80, the c _β value was 1.47; when N=100 or 200, the c _β value was 1.46. The c _β values remained unchanged for the GMD-based method when there were four low samples. For the MD-based method, when there was only one low sample with N=20, the c _β value was 2.24; when N=40 or 60, the c _β value was 2.17; when N=80 or 100, the c _β value was 2.15; when N=200, the c _β value was 2.14. However, when there were four low samples (with the same total N), the c _β values were quite different from the c _β values under one low sample, which was in contrast to the same comparison with the GMD-based method. (This is the reason why the biases were different between the one-sample scenario and the four-sample scenarios for the MD-based method in the simulations.)

Table 3 reveals the c _β values for GMD- and MD-based methods in situations with one low sample or four low samples, each evaluated across the different distributions and various number of replicates (total N=20 to 200, in increments of 20). The c _β values for GMD-based method were all close to 1.46; therefore, 1.46 was chosen as the best c _β value for this method (fixed for all situations); that is, the LOD estimate based on the GMD method has the fixed formula: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \widehat{LOD}_{GMD} = \widehat{LOB} + 1.46 \times GMD \tag{5} \end{align*} \end{document}

Table 3.

c _β Values for Mean Absolute Deviation and Gini's Mean Difference over Three Distributions, with Different Sample Numbers and Varying Numbers of Sample Replicates

	MD's c _β values with smallest bias (1 sample)			GMD's c _β values with smallest bias (1 sample)			MD's c _β values with smallest bias (4 samples)			GMD's c _β values with smallest bias (4 samples)
N	T3	T5	G	T3	T5	G	T3	T5	G	T3	T5	G
20	2.29	2.24	2.16	1.50	1.49	1.47	>2.40	>2.40	>2.40	1.49	1.49	1.47
40	2.22	2.17	2.11	1.47	1.47	1.46	2.31	2.29	2.23	1.46	1.47	1.46
60	2.19	2.17	2.09	1.46	1.47	1.46	2.27	2.25	2.19	1.46	1.47	1.46
80	2.18	2.15	2.09	1.45	1.47	1.46	2.23	2.21	2.15	1.45	1.47	1.46
100	2.17	2.15	2.08	1.44	1.46	1.46	2.21	2.19	2.13	1.44	1.46	1.46
200	2.15	2.14	2.07	1.44	1.46	1.46	2.17	2.15	2.10	1.44	1.46	1.46

The c _β values (β=0.05) detailed here illustrate c _β behavior for MD and GMD when the number of samples changes from 1 to 4 and the total number of replicates ranges from 20 to 200.

N, total number of replicates; G, Gaussian distribution.

The c _β values produced by the MD-based method demonstrated relatively larger variations than the variations in c _β values seen with the GMD-based method. As a compromise, 2.19 was chosen as the best c _β for MD-based method since it resulted in relatively low biases across different situations; that is, the MD-based method has the fixed formula, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \widehat{LOD}_{MD} = \widehat{LOB} + 2.19 \times MD \tag{6} \end{align*} \end{document}

Method Performance Comparison Based on Simulations

Simulations from different scenarios were then conducted to compare the four-screened LOD estimation methods that are based on SD, MD, PCT, and GMD, respectively. The symmetric distributions considered included Gaussian distribution and T-distributions with degrees of freedom (3, 5, 10, and 20). The asymmetric distributions considered included log-normal (0, 0.05) with a skewness value of 0.15, and log-normal (0, 0.25) with a highly skewness value of 0.8. Table 4 summarizes the simulation results of 10,000 runs for one low sample with sample replicates N=60 (more comprehensive simulation results are provided in the Supplementary Tables S4 and S5).

Table 4.

Method Comparison Across Different Scenarios

Distribution	True LOD		SD	PCT	MD	GMD
Gaussian	2.645	Bias	0%	−5.9%	4.9%	0%
		95% CI	(0.82, 1.19)	(0.70, 1.37)	(0.82, 1.20)	(0.82, 1.19)
		CP	95%	92%	93%	95%
T20	2.725	Bias	−0.2%	−6.0%	4.3%	−0.2%
		95% CI	(0.82, 1.21)	(0.70, 1.39)	(0.81, 1.21)	(0.82, 1.20)
		CP	95%	92%	94%	95%
T10	2.812	Bias	−0.5%	−6.1%	3.4%	−0.4%
		95% CI	(0.83, 1.19)	(0.69, 1.42)	(0.80, 1.22)	(0.80, 1.22)
		CP	94%	93%	95%	95%
T5	3.015	Bias	1.9%	−7.3%	1.5%	−0.7%
		95% CI	(0.78, 1.37)	(0.67, 1.51)	(0.80, 1.26)	(0.79, 1.28)
		CP	93%	93%	94%	95%
T3	3.353	Bias	7.2%	−8.1%	0.6%	0.4%
		95% CI	(0.72, 1.81)	(0.64, 1.66)	(0.76, 1.37)	(0.75, 1.44)
		CP	87%	95%	95%	94%
LogN(0,0.05)	1.079	Bias	3.9%	−5.3%	9.8%	4.3%
		95% CI	(0.82, 1.19)	(0.70, 1.35)	(0.81, 1.20)	(0.82, 1.19)
		CP	94%	92%	88%	94%
LogN(0,0.25)	1.337	Bias	26.2%	−4.5%	30.7%	24.8%
		95% CI	(0.80, 1.25)	(0.71, 1.33)	(0.80, 1.23)	(0.81, 1.23)
		CP	46%	92%	32%	46%

Comparison of method's bias, the width of its 95% CI (β=0.05), and the frequency the CIs covering the true LOD value (CP) by computer simulation across Gaussian, T-distributions, and log-normal distributions. Representative evaluation criteria for one low sample with 60 replicates are listed for each method.

CI, confidence interval; CP, coverage probability.

Symmetric-but-not-Gaussian distributions

The SD-based method performed poorly when the low sample distribution was not Gaussian; even though it was symmetric, it demonstrated relatively large bias and wider confidence interval (high uncertainty). MD- and GMD-based methods performed well with symmetric distributions. The only exception was the extremely heavy-tailed T3 distribution, in which case, the confidence interval for the GMD-based method was slightly wider. Overall, the confidence intervals for GMD- and MD-based methods had almost the same width. The GMD-based method demonstrated very low bias, unlike the MD-based method, which demonstrated high bias in some cases. When the low sample data followed a Gaussian distribution, the GMD-based method was comparable to the SD-based estimator (the optimal method for Gaussian distributions), whereas the MD-based estimator was very biased.

Asymmetric distributions

For the asymmetric distributions exemplified by a log-normal model, when the distribution was moderately skewed, GMD- and SD-based methods performed well. The two methods showed similar performance in terms of bias and uncertainty, while the MD-based method produced very large bias; the PCT-based method also displayed low bias, but it required a large sample size to reduce its uncertainty. When the distribution was much skewed, all four methods performed poorly; however, the GMD-based method was relatively better, its confidence interval was narrower, and it had a smaller bias than the MD-based method and the SD-based method. Overall, if the low sample data were approximately log-normal, GMD- and SD-based methods performed better than the other two methods when sample size was not large.

Sample size

The percentile method generated large uncertainty and large bias if the size was small. When the sample size was large, the percentile method performed well. The uncertainty of the GMD-based and the MD-based methods was almost the same. The MD-based method was heavily affected by the number of low samples (in these simulations, one low sample and four low samples were hypothesized), yet the GMD-based method showed little variation with respect to the number of low samples. The GMD-based method yielded good coverage probability (94%–95%), as opposed to the MD-based method, which had relatively lower coverage probability. This difference existed even when the sample size was large; this was because the GMD-based method had less bias than the MD-based method.

Real-Life Data Applications

For the analysis of the mercury level in blood, the LOB was reported as 0.239 in EP17A, which was considered fixed in this analysis. The raw data of the four subjects were used as the four low samples (Supplementary Table S1 and Supplementary Fig. S1). The Cochran test showed no significant differences in variation among the four subjects (P value=0.1459),²³ so the variation was assumed to be constant in the low sample data. The Shapiro–Wilk's test revealed that the four subjects' data were of approximately Gaussian distribution (P value=0.06329, 0.06277, 0.1608, and 0.2391 for the data of subjects 1, 2, 3, and 4, respectively)²⁴; therefore, the SD-based method was justified. Table 5 reports the estimated LOD values based on the four methods. For example, based on the 80 observations in the four subjects, the pool SD was \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\widehat{SD}_{pool} = 0.1076$$ \end{document} . The degrees of freedom used to estimate the pool SD was 76. The LOD estimate based on SD was \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \widehat {LOD}_{SD} = 0.239 + 1.6449 \times \frac {1} {(1 - \frac {1} {4 \times 76})} \times 0.1076 = 0.4165 \end{align*} \end{document}

Table 5.

Comparison of the Estimated Limit of Detection Values in the Real-Life Datasets

		Data 1			Data 2
Method	c _β	LOB	D_low	LOD	LOB	D _low	LOD
SD	1.65	0.239	SD=0.1076	0.4165	6,244.9	SD=1,894.01	9,360.56
GMD	1.46	0.239	GMD=0.1222	0.4174	6,244.9	GMD=1,702.78	8,730.98
MD	2.19	0.239	MD=0.085	0.4258	6,244.9	MD=1,070.38	8,599.75
PCT	1	0.239	D _(0.05,0.5)=0.1994	0.4384	6,244.9	D _(0.05,0.5)=952.18	7,197.10

Assuming the LOB values are fixed for the both datasets, the table lists the LOD estimates (β=0.05) based on the four methods. For data 1, assuming the low sample data follow Gaussian distributions, the true LOD value is 0.4165; for data 2, assuming the low sample data follow a log-normal distribution, the true LOD value is 7,630.53.

LOB, limit of blank.

For the GMD-based method, the average GMD value from the four subjects was 0.12222, so the estimate of LOD was \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \widehat{LOD}_{GMD} = 0.239 + 1.46 \times 0.12222 = 0.4174 \end{align*} \end{document}

The LOD estimates based on MD and PCT are 0.4258 and 0.4384, respectively. Since the data were approximately Gaussian, the SD-based method was expected to provide a good estimate of LOD. The estimate produced by the GMD-based method was very similar to that produced by the SD-based method. The estimates based on the other two methods were slightly higher.

The fitted log-normal distribution for the normal sample data had mean and SD of μ=7.1788, σ=0.9657; the fitted log-normal distribution for the tumor sample data had mean and SD of μ=7.886, σ=0.4473. Shapiro–Wilk's normality test was separately performed on the logarithmic-transformed data from the blank sample (normal tissue) and the tumor tissue, and the P values were 0.8558 and 0.0918, respectively. The LOB calculated from the normal sample was LOB=6,244.915, which was then assumed to be fixed. Assuming low sample data indeed follow a log-normal distribution, the true LOD should be calculated as LOB plus the distance between 5th percentile and 50th percentile of the true distribution LogNormal(7.886, 0.4473). This resulted in true LOD=7,630.53. Table 5 shows that the SD-, MD-, and GMD-based methods are all positively biased; this is because the large value of σ=0.4473 in the log-normal distribution makes the log-normal curve highly tailed and asymmetric (this agrees with the simulation results for highly skewed log-normal distribution). Relatively speaking, the GMD-based method is better than the SD-based method. The PCT-based method generated an LOD estimate that is close to the true value. Note that the true LOD is unknown. The P value of 0.0918 for the Shapiro–Wilk's normality test on the tumor data suggested that the assumed true LOD might be somewhat different from 7,630.53.

Discussion

This study assessed a few statistical methods for LOD estimation across various scenarios. After initial screening based on the stability of the c _β value, four methods (based on SD, MD, GMD, and PCT) remained the focus of this investigation. Among the four methods, the GMD-based method demonstrated good performance across a range of scenarios (distribution shapes, numbers of low samples, and total sample sizes). Despite each method being dependent on a variety of factors, the GMD-based method showed consistently good performance with a relatively stable formula (invariant c _β values) across all scenarios, which is an important consideration for a method's strength due to its easy implementation. The investigation in this research assumed that β=0.05, and the c _β values are different for different choices of β levels. For the GMD-based method, the c _β values for β=(0.01, 0.025, 0.05, 0.1) are c _β=(2.15, 1.76, 1.46, 1.12), respectively.

The major advantage of GMD as a measure of data dispersion is that it does not depend on some measure of location, and is therefore superior to those location-dependent methods when the distributions are asymmetric. GMD attaches the largest weight to the section closest to the median, and then, the weights decline symmetrically the farther the section is from the median.¹⁹ Therefore, it is still affected by outliers, though not as much as other methods. The MD-based method was generally superior to the SD-based method for non-Gaussian distributions, but both MD and SD as measures of data dispersion depend on the location of the data and are adversely affected by outliers. Still, when the low-level sample was assumed to be symmetric but not necessarily Gaussian, the MD-based LOD estimator performed well. When the distribution changed from Gaussian distribution to extremely heavy-tailed and the sample size changed, the change in c _β value was moderate for this method. The MD-based estimator demonstrated relatively lower uncertainty (narrower confidence interval) than the PCT-based method when sample size was small, yet MD-based estimator always exhibited larger bias than the GMD-based estimator. In comparison, the PCT-based method was simple, easy to calculate, and was not impacted by the shape of the distribution; however, the PCT-based estimate had higher uncertainty (wider confidence interval), so it requires large sample size to achieve a reliable estimate.

In this study, LOB was considered to be a fixed value so that the work could focus on LOD. LOB may be considered as a random variable in future research studies. Linnet and Kondratovich's 2004 publication considered the random blank samples when LOD was discussed.¹⁰ In that case, the heteroscedasticity between the blank sample and the low concentration sample was an important factor for consideration.¹⁰ Another factor that may need to be revisited in the future is the constant variation in the lower range of concentration. In this study, a constant variation was assumed to be true, and in most real-life assays, this is an appropriate assumption. However, when variation in low sample concentrations cannot be assumed to be constant, some other approaches to estimate LOD need to be proposed.

In general, the GMD-based method produced good estimations of LOD and maintained a relatively invariant multiplicative factor throughout the scenarios examined here, making it an effective and easy method to use in practice. The MD-based method yielded better estimations of LOD than the SD-based method did if the distributions were symmetric but not necessarily Gaussian. It is not surprising that the PCT method has good performance. By definition, LOD is such that 5% of the data fall to the left of LOB. The PCT-based method identifies the 5% quantile and the 50% quantile, and then essentially the 50% is claimed as the LOD. Therefore, in theory, it is guaranteed that the PCT-based method is the best. However, to have reliable estimates of the 5% and the 50% quantiles, a large sample size is required—in our simulations, large sample size means >200, which is usually very unrealistic to assay developers.

The major intention of this research is to find an alternative method to the current standard method—the SD-based method. It is clear that the GMD method is superior to the SD-based method in non-Gaussian distributions (symmetric or asymmetric), and is comparable to the SD-based method when the distribution is Gaussian. In conclusion, we recommend that GMD should be used as the dispersion measure if the data distribution is relatively symmetric. If the data are highly skewed, get a large sample size if possible and use the PCT-based method to estimate LOD; if sample size has to be small, GMD may be preferable but may yield a biased LOD estimate.

Footnotes

Acknowledgment

Rebecca J. Palmer, PhD, assisted in the preparation of this article.

Disclosure Statement

None.

Abbreviations

References

Burd

. Validation of laboratory-developed molecular assays for infectious diseases. Clin Microbiol Rev, 2010; 23:550–576.

Thomsen

, Schatzlein

, Mercuro

. Limits of detection in spectroscopy. Spectroscopy, 2003; 18:112–114.

Armbruster

, Pry

. Limit of blank, limit of detection and limit of quantitation. Clin Biochem Rev, 2008; 29,Suppl 1:S49–S52.

CLSI/NCCLS: Protocols for Determination of Limit of Detection Limits of Quantitation. Approved Guideline. CLSI Document EP17-A. Clinical and Laboratory Standards Institute: Wayne, PA, 2004.

ISO: Capability of Detection—Part 1. Terms and Definition. ISO 11843-1. International Organization for Standardization: Geneva, 1997.

ISO: Capability of Detection—Part 2. Methodology in the Linear Calibration Case. ISO 11843-2. International Organization for Standardization: Geneva, 2000.

ISO: Capability of Detection—Part 3. Methodology of Determination of the Critical Value for Response Variable When No Calibration Data Are Used. ISO 11843-3. International Organization for Standardization: Geneva, 2003.

ISO: Capability of Detection—Part 4. Methodology for Comparing the Minimum Detectable Value with a Given Value. ISO 11843-4. International Organization for Standardization: Geneva, 2003.

ISO: Capability of Detection—Part 5. Methodology in the Linear and Non-Linear Calibration Case. ISO 11843-5. International Organization for Standardization: Geneva, 2008.

10.

Linnet

, Kondratovich

. Partly nonparametric approach for determining the limit of detection. Clin Chem, 2004; 50:732–740.

11.

Linnet

. Nonparametric estimation of reference intervals by simple and bootstrap-based procedures. Clin Chem, 2000; 46:867–869.

12.

Limpert

, Stahel

, Abbt

. Log-normal distributions across the sciences: keys and clues. BioScience, 2001; 51:341–352.

13.

Koch

. The logarithm in biology. I. Mechanisms generating the log-normal distribution exactly. J Theor Biol, 1966; 12:276–290.

14.

Koch

. The logarithm in biology. II. Distributions simulating the lognormal. J Theor Biol, 1969; 23:251–268.

15.

Gelman

. Hierarchical bayes methods for serial dilution assays. http://iserp.columbia.edu/content/hierarchical-bayes-methods-serial-dilution-assays. 2011 October 20.

16.

Gad

. Handbook of Pharmaceutical Biotechnology, 1st. Wiley-Interscience: Hoboken, NJ, 2007.

17.

Yitzhaki

. Gini's mean difference: a superior measure of variability for non-normal distributions. METRON Int J Stat, 2003; LXI:285–316.

18.

ISO: Accuracy (Trueness, Precision) of Measurement Methods Results—Part 5. Alternative Methods for the Determination of the Precisiono of a Standard Measurement Method. ISO 5725-5. International Organization for Standardization: Geneva, 1998.

19.

Croux

, Rousseeuw

. Time-efficient algorithms for two highly robust estimators of scale. Comput Stat, 1992; 1:411–428.

20.

Rousseeuw

, Croux

. Alternatives to the mean absolute deviation. J Am Stat Assoc Theory Methods, 1993; 88.

21.

DiCiccio

, Efron

. Bootstrap confidence intervals. Stat Sci, 1996; 11:189–228.

22.

The R project for statistical computing. www.r-project.org. 2011 October 24.

23.

Cochran

. The distribution of the largest of a set of estimated variances as a fraction of their total. Ann Hum Genet (London), 1941; 11:47–52.

24.

Shapiro

, Wilk

. An analysis of variance test for normality (complete samples) Biometrika, 1965; 52:591–611.