Abstract
In classical group testing, one is given a population
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${ \cal N}$$\end{document}
and an unknown subset
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$D \subset { \cal N}$$\end{document}
of positive items, and the goal is to determine D by testing subsets of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${ \cal N}$$\end{document}
. Threshold group testing is a generalization of classical group testing, where the outcome of a group test is determined by the number of positive items in the test. In group testing on inhibitor model, inhibitors are the third type of item that dictate the test outcome to be negative regardless of how many positives are in the test. The threshold group testing on k-inhibitor model is a natural combination of threshold group testing and inhibitor model. In this article, we provide nonadaptive algorithms to conquer the threshold group testing on k-inhibitor model where error-tolerance is considered. Furthermore, we provide a two-stage algorithm to identify all inhibitors and find a g-approximate set.
1. Introduction
Group testing was originally proposed to reduce the number of blood tests required to detect syphilis in soldiers during World War II (Dorfman, 1943). The idea of group testing is to group blood samples and then apply a test to the group. Group testing has been well known for its applications in various fields, including communication networking, image compression, and molecular biology. Group testing also has strong relationships with several disciplines, such as coding theory, information theory, and computational learning theory. We refer readers to the books by Du and Hwang (2000, 2006) for a comprehensive review of the development and the major results in this area.
In the classical group testing, we have a set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
of n items, each of which is either positive or negative. A group test, also called a pool, is a subset of items that yields a positive outcome if it contains at least one positive item. The task of group testing is to determine the positive items in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
using as few group tests as possible. Conventionally, the positive items stand for those of interest, and their number is assumed to be at most d, which is much smaller than n. In the early stages of group-testing research, minimizing the number of tests required to identify positives is the main objective and thus most algorithms are sequential (tests are performed one by one and the outcomes of previous tests are known at the time of preparing the current test). The new applications of group testing to biological screening lead to a distinctive scenario. In the biological test, a test takes much time, and screening one pool at a time is far more expensive than screening many pools in parallel. These new challenges motivate the use of nonadaptive algorithms (also called pooling designs), where all tests are specified in advance without knowing their outcomes, and the use of multistage algorithms, where stages are sequential and tests in a stage are parallel.
Threshold group testing, introduced by Damaschke (2006), is a natural generalization of classical group testing, where a group test yields a positive outcome if it contains at least u positive items. Besides, a group test is negative if it contains at most l positive items, and otherwise the outcome is arbitrary. In addition to the theoretical interest, threshold group testing has wide applications, particularly in biology and chemistry where a testing outcome is determined by various factors such as the concentration of positive samples. The difference g := u−l−1 between the thresholds is known as the gap parameter. Let D denote the set of positive items throughout the article. Damaschke (2006) showed that the identification of D is possible only when |D| ≥ u; moreover, in general the set D can only be approximately identified within up to g false positives and g false negatives. A set S is called g-approximate if |S\D| ≤ g and |D\S| ≤ g. Chen and Fu (2009) provided a nonadaptive algorithm to find a g-approximate set in O(du+1 log(n/d)) tests, and its corresponding decoding procedure takes O(nudu+1 log(n/d)) computational time, where u is treated as a constant.
In some biological or chemical applications, there is a third type of item called inhibitors whose existence may cancel the effect of positives. Farach et al. (1997) first proposed the 1-inhibitor model in which a single inhibitor dictates the test outcome to be negative regardless of how many positives are in the test. De Bonis and Vacarro (2003) extended the above model to k-inhibitor model in which k inhibitors dictate the test outcome to be negative. Chang and Hwang (2007) proposed the general inhibitor model in which the exact cancellation effect of inhibitors on positive items is not specified. Recently, Chang et al. (2010) provided a pooling design to solve the general inhibitor model in O((d+h)2 log n) tests and O((d+h)2n log n) decoding time.
The threshold group testing on inhibitor model (introduced by He et al., 2012) is a natural combination of threshold group testing and inhibitor model. A group test yields a positive outcome if it contains at least u positive items and less than k inhibitors and a negative outcome whenever it contains at most l positive items or at least k inhibitors; otherwise, the outcome is arbitrary. He et al. (2012) extended the pooling designs used to solve the threshold group testing in Chen and Fu (2009) to solve the threshold group testing on inhibitor model; however, it is not working well and further adjustment is needed.
In this study, we provide nonadaptive algorithms to conquer the threshold group testing on k-inhibitor model where error-tolerance is considered. The rest of the article is organized as follows. In Section 2, we introduce preliminary notions and combinatorial structures. For the threshold group testing on inhibitor model, we first deal with the gap-free case (Section 3) and then extend our results to the case g>0 (Section 4). Furthermore, we provide a two-stage algorithm to identify all inhibitors and find a g-approximate set (Section 5).
Notation: Throughout the article,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
, D, and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal I}$$\end{document}
represent the set of items, the set of positives, and the set of inhibitors, respectively. Furthermore,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\mid {\cal N} \mid$$\end{document}
= n, u ≤ |D| ≤ d, and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\mid {\cal I} \mid$$\end{document}
≤ h are known information and the number of errors in testing outcomes is assumed to be at most e. In the literature, e, l, u, and k are assumed to be constants.
2. Preliminaries
A nonadaptive algorithm can be represented by a 0-1 matrix, where columns are indexed by items, rows are indexed by tests, and cell (i, j) = 1 if and only if the test i contains the item j. For a 0-1 matrix, we can view a column C as a set of row indices each intersecting column C (or having a 1-entry in column C); henceforth, we can take the union and intersection actions on columns. The main tool we use to solve the threshold group testing on the inhibitor model is as follows.
Definition 1
A matrix is (d, u; z]-disjunct if for any d+u columns
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$C_1 , \cdots , C_{d + u}$$\end{document}
, there exist z rows interesting
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$C_1 , \cdots C_u$$\end{document}
but none of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$C_{u + 1} , \cdots , C_{u + d} ,$$\end{document}
i.e.,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*} \left | \bigcap_{i = 1}^{u} C_i \backslash \bigcup_{i = u + 1}^{u + d}C_i \right| \geq z.\end{align*}\end{document}
In the literature, this generalized notion is referred to as strongly disjunct matrices. Cover-free families first introduced by Kautz and Singleton (1964) in the context of superimposed binary codes are equivalent to (d, 1; 1]-disjunct matrices. Stinson and Wei (2004) studied the generalized cover-free families that have the same structure as (d, u; z]-disjunct matrices. Besides applications in group testing, these structures have been applied to cryptography and communications (Stinson and Wei, 2004). Let t(n, d, u; z] denote the minimum number of rows among all (d, u; z]-disjunct matrices with n columns. A recent upper bound provided by Chen et al. (2008) is that t(n, d, u; z] < z(k/u)
u
(k/d)
d
[1+k(1+ln(n/k+1))], where k = d+u.
3. Gap-Free Threshold Group Testing
First, we consider that g = 0; that is, a pool is positive if and only if it contains at least u positive items and at most k−1 inhibitors. In this case, a complete identification of positive items is possible.
Whenever we apply a pooling design to a group-testing problem, let τ0(X, S) denote the number of negative pools that contain X and none of the items in S. Let us temporarily put aside the occurrence of errors. A set X of u positive items would not appear in a negative pool except that the pool also contains k inhibitors; thus τ0(X, S) = 0 if |S| = h−k+1 and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$S \subseteq {\cal N} \setminus X$$\end{document}
contains the most number of inhibitors. This idea is also employed by He et al. (2012); however, what they enumerate is the number of negative pools that contain X and at most k−1 items from a given set R of k inhibitors. Nevertheless, such number is not necessarily zero since X could belong to a negative pool due to the appearance of k inhibitors not in R. Next, we attempt to create a design such that τ0(X, S)>0 for any (h−k+1)-subset S of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\X if X contains less than u positive items. Let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X ) : = \min \limits_{S \subseteq {\cal N} \setminus X \atop {\mid S \mid = h - k + 1}} \tau_0 ( X , S )$$\end{document}
. We use this counter along with the following design to distinguish sets of u items.
Lemma 2
Apply (d+h−u−k+2,u; 2e+1]-disjunct matrix as a pooling design to the threshold group testing on k-inhibitor model with error-tolerance e. Then for a u-subset X of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X ) \leq e$$\end{document}
if and only if X is a set of positive items.
Proof
We have observed that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X ) \leq e$$\end{document}
if X is a set of u positive items. On the other hand, let X be a u-set containing not only positive items. For any (h−k+1)-set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$S \subseteq {\cal N} \setminus X$$\end{document}
, let Y be a (d−u+1)-subset of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\(X ∪ S) that contains the most number of positive items. By the (d+h−u−k+2, u; 2e+1]-disjunctness, there are 2e+1 pools containing X and none in S and Y. Thus each of these pools contains at most u−1 positive items and is counted in τ0(X, S), implying τ0(X, S) ≥ 2e+1−e = e+1 (the possibility of up to e errors is considered). Hence
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X ) > e$$\end{document}
. ■
Remark 3
A decoding algorithm associated with the design in Lemma 2 is as follows: Compute
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X )$$\end{document}
for each u-set X and output the union of all u-sets X satisfying
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X ) \leq e.$$\end{document}
Since
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t: = t ( d + h - u - k + 2 , u; 2e + 1 ] = O ( ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
. The decoding complexity is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O \left( t {n \choose u } {n - u \choose h - k + 1 } \right) = O ( n^ {h + u - k + 1 } ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
.
The above is an extension of strategies provided in Chen and Fu (2009) and is also a remedy of results in He et al. (2012). In the following, we provide another decoding algorithm that is advantageous and beats the previous one when d < h.
Let τ1(X, l) be the number of positive pools containing at most l elements in X. Then τ1(D, l) = 0 (if no error occurs) since every positive pool must contain more than l elements in D. We now apply the design in Lemma 2 and take this counter as a fundamental step in our decoding algorithm (Algorithm 1) to identify D.
Lemma 4
Algorithm 1 outputs D in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( n^d ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
computational time.
Proof
For the gap-free case, l = u−1. Then as we have observed, τ1(D, u−1) ≤ e if there are at most e errors. Let P be the set returned by the algorithm. Then |P| ≤ |D| since the size of the candidate set X scanned by the algorithm is from u up to d (see Line 2). Suppose that a positive item x is absent from P. Let A be a set of u positives including x, and I be an (h−k+1)-subset of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\A containing the most number of inhibitors (namely, all inhibitors except at most k−1 are in I). Let S be a (d−u+1)-subset of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\(A ∪ I ) containing the most number of items in P. Then all elements in P except at most u−1 are in I ∪ S. Since the design is (d+h−u−k+2, u; 2e+1]-disjunct, there are 2e+1 tests, each containing all items in A and none in I ∪ S. Then each of these 2e+1 tests has at least u positives and at most k−1 inhibitors and thus yields a positive outcome; further, it is counted in τ1(P, u−1) since it contains at most u−1 elements in P. It follows that τ1(P, u−1) ≥ 2e+1−e=e+1, a contradiction. We obtain that D ⊆ P and thus P=D since |P| ≤ |D|. ■
The computation of each τ1(X, u−1) takes O(td) time where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = O ( ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
, and thus Algorithm 1 takes
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O \left( td \left( {n \choose u } + {n \choose u + 1 } + \cdots + {n \choose d } \right)\right) = O ( tn^d ) = O ( n^d ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
time. ■
Theorem 5
For the error-tolerant threshold group testing on k-inhibitor model, there exists a nonadaptive algorithm to identify all positives in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
tests. Furthermore, its decoding complexity is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( \min ( n^d , n^ {h + u - k + 1 } ) ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
.
Proof
Concluded from Lemma 2, Remark 3, and Lemma 4. ■
4. Threshold Group Testing With A Gap
In this section, we deal with the case where g = u−l−1 >0. The techniques that we have developed in Section 3 can be extended here.
Again, τ0(X, S) ≤ e if X is a set of u positive items and S is a set of h−k+1 items containing the most number of inhibitors.
Lemma 6
Apply (d+h−l−k+1, u; 2e+1]-disjunct matrix as a pooling design to the threshold group testing on k-inhibitor model with error-tolerance e. Then for any u-subset X of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}\begin{cases} \tau_0^{h - k + 1} ( X ) \leq e \qquad \qquad {if \ X \ is \ a \ set \ of \ positive \ items} , \\ \tau_0^{h - k + 1} ( X ) \geq e + 1 \qquad \ {if \ \mid X \setminus D \mid \geq g + 1.}\end{cases}\end{align*}\end{document}
Proof
We have
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X ) \leq e$$\end{document}
if X is a set of u positive items. Let X be a u-set satisfying |X\D| ≥ g+1. Thus |X ∩ D| ≤ l. For any (h−k+1)-set S ⊆
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\X, let Y be a (d−l)-subset of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\(X ∪ S) that contains the most number of positive items. This implies that S ∪ Y contains all positives except at most l. By the (d+h−l−k+1, u; 2e+1]-disjunctness of the design, there are at least 2e+1 pools each containing X and none of the elements in S ∪ Y. Thus each of these pools contains at most l positive items and is counted in τ0(X, S), implying τ0(X, S) ≤2e+1−e. Hence
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X )\le e+ 1$$\end{document}
. ■
Now, a cutoff function τ0(h−k+1)(X) is prepared, and we build an algorithm to find a g-approximate set according to the function. A set P is called affirmative if each of its u-subset X satisfies
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\tau_0^{h - k + 1} ( X ) \leq e$$\end{document}
(this is different from the setting in Damaschke (2006) where a set is affirmative if each of its u-subsets yields a positive outcome). We can easily obtain that for any affirmative set P with |P| ≥ u, |P\D| ≤ g. An approach of finding a g-approximate set employed by Damaschke (2006) and described in Chen and Fu (2009) and He et al. (2012) is to establish a sequence of increasing affirmative sets until |D\P| ≤ g is satisfied. Cooperating with the strongly disjunct design in Lemma 6, we develop it as Algorithm 2 in the following.
Lemma 7
Algorithm 2 outputs a g-approximate set in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( n^ {h + u - k + 1 } ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
computational time.
Proof
It is assumed that |D| ≥ u and by Lemma 6 a set of u positives is affirmative; hence, Line 3 is satisfied. This implies |X\D| ≤ g. Notice that in the subsequent steps (Lines 4–12) the set X is expanded and keeps affirmative whenever the algorithm substitutes it by another set.
Let P be the set returned by the algorithm. Then we have the following two cases.
Case 1: P is returned due to the reason that there is no pair (A, B) to expand P. Suppose that |D\P| > g. Then let A be a (g+1)-subset of D\P and B be a g-subset of P containing P\D (this is feasible since |P\D| ≤ g according to the affirmativeness of P). Then (P ∪ A)\B is a subset of D and thus affirmative, contradicting the nonexistence of (A, B). Hence |D\P| ≤ g.
Case 2: The algorithm terminates at |X| ≥ d, that is |P| ≥ d. Since P is affirmative, |P\D| ≤ g and thus |D\P| = |D|−|D ∩ P| ≤ d−|P\(P\D)| ≤ d−(d−|P\D|) ≤ g.
Hence P is g-approximate. On the other hand, Line 2 takes
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O \left( t{n \choose u}{n - u \choose h - k + 1} \right) = O ( tn^{h + u - k + 1} )$$\end{document}
time where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = O ( ( d + h ) ^ {u + 1 } \log \big ( \frac {n } {d + h } \big )\big )$$\end{document}
. Each round of the while-loop takes
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O \big ( {n - u \choose g + 1}{d \choose g}{d \choose u} \big )$$\end{document}
time and the set X is expanded by at most d−u times. Hence, Lines 3–12 take O(dg+u+1ng+1) time. Therefore, the computational complexity of Algorithm 2 is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( n^ {h + u - k + 1 } ( d + h ) ^ {u + 1 } \log \big( \frac {n } {d + h } \big ) \big )$$\end{document}
. ■
In the following, we provide another decoding algorithm (Algorithm 3), which is an extension of Algorithm 1 and improves Algorithm 2 when d < h.
Lemma 8
Algorithm 3
outputs a g-approximate set in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( n^d ( d + h ) ^ {u + 1 } \log ( \frac {n } {d + h } ) )$$\end{document}
computational time.
Proof
We have observed that τ1(D, l) ≤ e if there are at most e errors. Again, the set P returned by the algorithm satisfies |P| ≤ |D|. Suppose that |D\P| ≥ g+1. Let A be a set of u positives including g+1 positives in D\P and I be an (h−k+1)-subset of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\A containing the most number of inhibitors. Let S be a (d−l)-subset of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\(A ∪ I) containing the most number of items in P. Thus all elements in P belong to I ∪ S except at most l. Since the design is (d+h−l−k+1, u; 2e+1]-disjunct, there are 2e+1 tests, each containing A and none in I ∪ S. Then each of these 2e+1 tests yields a positive outcome; further, it is counted in τ1(P, l) since it contains at most l elements in P. This implies that τ1(P, l) ≥ 2e+1−e = e+1, a contradiction. Hence |D\P| ≤ g; further, |P\D| ≤ g since |P| ≤ |D|.
On the other hand, the computation of each τ1(X, l) takes O(td) time, where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$t = O ( ( d + h ) ^ {u + 1 } \log \big ( \frac {n } {d + h } \big ) \big )$$\end{document}
and thus Algorithm 3 takes
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O \left( td \left( {n \choose u } + {n \choose u + 1 } + \cdots + {n \choose d } \right)\right) = O ( tn^d ) = O \left(n^d ( d + h ) ^ {u + 1 } \log \left( \frac {n } {d + h } \right)\right)$$\end{document}
time. ■
Theorem 9
For the error-tolerant threshold group testing on k-inhibitor model, there exists a nonadaptive algorithm to identify a g-approximate set in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( ( d + h ) ^ {u + 1 } \log \big ( \frac {n } {d + h } \big ) \big )$$\end{document}
tests. Furthermore, the decoding complexity is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O ( \min ( n^d , n^ {h + u - k + 1 } ) ( d + h ) ^ {u + 1 } \log \big ( \frac {n } {d + h } \big ) \big )$$\end{document}
.
Proof
Concluded from Lemma 6, Lemma 7, and Lemma 8. ■
5. Two-Stage Algorithm
The idea of our two-stage algorithm is to identify all inhibitors nonadaptively and eliminate them, and then go back to the threshold group testing. Let τ1(X) be the number of positive tests containing X.
Lemma 10
An (h−k+1, u+k; 2e+1]-disjunct matrix can be used to identify all inhibitors in threshold group testing on k-inhibitor model with at most e erroneous outcomes.
Proof
Let X be a set of k items. If X ⊆
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal I}$$\end{document}
, then τ1(X) ≤ e since a test containing k inhibitors must yield a negative outcome, except the occurrence of an error. On the other hand, suppose that X⊄I. Let S be a set of u positive items and Y ⊆
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$${\cal N}$$\end{document}
\(X ∪ S) be an (h−k+1)-set containing the most number of inhibitors. By the (h−k+1, u+k; 2e+1]-disjunctness of the pooling design, there are at least 2e+1 tests each containing X ∪ S and none of the items in Y. Then each of these tests contains at least u positive items and at most k−1 inhibitors; thus, it yields a positive outcome except an error. Hence τ1(X) ≥ 2e+1−e = e+1.
Accordingly, the union of k-subsets, each matching the criteria τ1(·) ≤ e, is the set of inhibitors. ■
Now, we can remove all inhibitors and apply a pooling design to threshold group testing. As provided by Chen and Fu (2009), a g-approximate set can be nonadaptively identified in O(du+1 log(n/d)) tests, and its corresponding decoding procedure takes O(nudu+1 log(n/d)) computational time. Therefore, we have
Theorem 11
For the error-tolerant threshold group testing on k-inhibitor model, there exists a two-stage algorithm to identify all inhibitors and a g-approximate set in O(hu+k+1 log(n/h)+du+1 log(n/d)) tests. Furthermore, the decoding complexity is O(nkhu+k+1 log(n/h)+nudu+1 log(n/d)).
Proof
Each computation of τ1(X) takes O(tk) time, where t := t(h−k+1, u+k; 2e+1) = O(hu+k+1 log(n/h)). A decoding algorithm associated with the design in Lemma 10 has computational complexity
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O \left( tk{n \choose k} \right) = O ( n^kh^{u + k + 1} \log ( n / h ) )$$\end{document}
. ■
6. Conclusion
Chang et al. (2010) provided a pooling design to identify all inhibitors, and our result (Lemma 10) is its extension to threshold group testing on inhibitor models. On the other hand, Theorem 9 is also an extension of the result provided by Chen and Fu (2009) to solve the threshold group testing.