Designing DNA Barcodes Orthogonal in Melting Temperature by Simulated Annealing Optimization

Abstract

Molecular barcode arrays are widely employed in the analysis of large strain libraries, whereby probes linked to unique oligonucleotides (“antitags”) are used to detect selected DNA targets (“tags”) by highly specific hybridization. One of the major problems for such screen designs is thus insuring a high degree of probe–target specificity and a low level of nonspecific binding (in sum, “orthogonality”) across the entire tag population (“collection”). Several approaches have been previously proposed for designing orthogonal DNA tags by–among others–focusing on their individual or pair-wise structures, such as Smith Waterman sequence similarity, the widely used nearest neighbor method, and full thermodynamic estimates of sequences. However, these methods generally involve imposing various heuristic constraints (“design rules”) on possible tag/antitag (TaT) sequences in order to achieve probe–target specificity across the collection. The resulting lack of freedom in considering all putative sequences can lead to potentially suboptimal designs and to the ensuing reduction in the degree of orthogonality within the constructed TaT collections. Here, we demonstrate that a randomized-search algorithm based on simulated annealing optimization can be used in order to substantially free the design process from the limitations of sequence constraints—allowing for the elucidation of potentially more optimal DNA tag collections. The designed sets of DNA oligonucleotides are optimized for the highest degree of orthogonality as quantified by melting temperature T_m—an experimentally relevant system property, which could also be used as a theoretically meaningful thermodynamic metric for optimizing TaT binding specificity. That is, this work describes an approach to constructing tag/antitag libraries, which offer the greatest melting temperature separation between specific probe–target duplexes and other nonspecific structures. The proposed method finds, with high probability, the global solution that maximizes the difference in T_m between the specific and nonspecific tag–antitag hybridizations across a collection of given size for TaTs of specified length. An application of this approach is demonstrated using 2 different DNA probe sets.

Introduction

DNA tag is a unique nucleotide sequence used for the identification of selected DNA segments via specific tag–antitag pairing. Target DNA fragments are reverse Watson-Crick complements of probe-linked antitags, which causes them to preferentially hybridize with one another, generating stable double-stranded DNA duplexes. As can be immediately seen, any experimental protocol involved is critically dependent on the specificity of probe–target hybridization between individual tags and their complementary antitags.

For instance, in DNA microarray applications, a set of oligonucleotides is immobilized at specific locations on chip surface to serve as probes for the presence of specific nucleotide sequences. The solution containing tagged target samples is then brought in contact with microarray surface to hybridize with probe antitags. The unbound molecules are subsequently washed out, while those that remain hybridized are monitored by scanning the chip. Since each location on the chip corresponds to a specific probe oligonucleotide, expression data is elucidated by measuring signal intensities of sample-specific labels at each location.

In this context, we call an event when a probe hybridizes to its own (Watson-Crick complementary) target a “perfect-hybridization”. When such perfect-hybridization reactions predominate, an assay is likely to produce a strong signal specific to target's label that results in its accurate identification (Fig. 1a). However, when probes spuriously hybridize to wrong targets or to each other, “cross-hybridization” events occur—leading to misidentification of the target by the noncognate probe or to signal reduction due to probe–probe sequestration (Fig. 1b). (Analogously, we also refer to a pair of cross-hybridized sequences as “cross-hybrid” and perfect-hybridized sequences as “perfect-hybrid” in subsequent discussion.) Therefore, a reliable identification assay requires a tag/antitag library (“collection”) that is optimized to exhibit strong perfect-hybridization and weak cross-hybridization for any possible pairing among its individual elements (i.e., it has to be orthogonal) (Ben-Dor et al., 2000).

FIG. 1.

(a) Identification by a perfect-hybridization of the probe 5′-CGATT-3′ with its target (reverse complement), 5′-AATCG-3′. (b) Misidentification by a cross-hybridization of the probe 5′-CGATT-3′ with another probe (or a wrong target) 5′-ACTCG-3′ which is near complementary. There is one mismatch T/C on the duplex, indicating that the near-complementary sequence is not the expected target.

In this work, we develop a method for finding a set of DNA oligonucleotides with the highest degree of orthogonality—for a specified set size/sequence length and without the need to impose any heuristic design constraints on oligonucleotide composition—by using an algorithm based on randomized-search simulated annealing. Note that the definition of “orthogonality” is subject to the specification of the metric with which the amount of perfect- versus cross-hybridization is quantified. Here, we employ melting temperature (T_m) as the thermodynamic basis for such a metric. While previous works have often used other system parameters, such as the free energy of hybridization (ΔG) or reaction equilibrium constants, we propose to target melting temperature for optimization since it could be most readily controlled in experimental settings, thus offering significant advantages in practical applications.

System model for orthogonal tags

Oligonucleotide orthogonalization

Let us begin by considering 2 DNA oligonucleotides: antitagU and antitagV (with their Watson-Crick complements referred to as tagU and tagV, respectively). This pair is taken to be “orthogonal” when the highest measure of its cross-hybrid affinities is much less than the lowest affinity of its perfect-hybrids (i.e., the highest affinity among antitagU–antitagV, tagU–tagV, antitagV–tagU, and antitagU–tagV dimers has to be much lower than the lowest affinity of either antitagU–tagU or antitagV–tagV duplex). This will lead dimerization reaction conditions to favor the production of desirable perfect-hybrid duplexes over the undesirable cross-hybrid ones. (In our application, melting temperature will be used as a measure of affinity, as discussed below.) While this may be easily achieved for the case of 2 antitags, the problem becomes significantly more difficult as the desired size of the library to be designed (i.e., the number of simultaneously resolvable elements) increases. The 2 problems involved are the presence of (1) near-complementary antitags, and (2) sequence-similar antitags.

A pair of near complementary antitags (or tags) are—in sequence—very similar to the reverse complements of each other, so that a strong hybridization between them will occur, making a stable DNA duplex (e.g., the duplex 5′-ACTG-3′/5′-CAGG-3′). Alternatively, a pair of sequence-similar antitags are almost identical except for a few nucleotides (e.g., 5′-ACTG-3′ and 5′-ACTT-3′) and so are likely to bind each other's targets. For instance, 5′-ACTT-3′ is near complementary to the target of 5′-ACTG-3′ and is thus more likely to form a duplex with the wrong target as 5′-ACTT-3′/5′-CAGT-3′ (Fig. 1). Therefore, to achieve orthogonality, antitags are designed to be oligonucleotides that are most dissimilar from each other as well as from each other's complements. Below we further refine this criterion via 4 joint subsidiary problem classes.

I. It is undesirable to have highly stable DNA duplexes formed by antitag–antitag interactions, as might be the case with near-complementary tags. In other words, even when the targets are not present in the solution, we desire to have DNA probes to be able to stay in single-stranded (melted) forms at a particular temperature to insure the reliability of their identification, else the most adjacently placed identifiers would tend to bind to each other instead of their targets even when the samples are present—thus decreasing assay sensitivity. This problem is dealt with by designing the antitags to have the melting temperature of any oligonucleotide duplex formed by the distinct antitags as low as possible.

In particular, securing a low level of probe–probe interactions can offer greater flexibility in certain microarray design problems dealing with probe–probe affinities, such as border minimization problem (Hannenhalli et al., 2002) (i.e., minimizing overall border length of nucleotide masks in synchronous chip design), or in general microarray layout problem (de Carvalho and Rahmann, 2006) (i.e., arranging the probes on microarray chips by synthesizing the oligonucleotides on specific chip locations in the order which should maximize the overall quality of probe syntheses) (de Carvalho and Rahmann, 2007).

II. Tags are analogously optimized for reduced dimerization in assay solution to insure that the targets remain single-stranded within some temperature range and avoid dimer formation when incubated on the microarray.

III. The design problem is incorporated into the case of similar tags by keeping the melting temperature of any noncognate antitag–tag duplex as low as possible in order to prevent binding of probes to the wrong targets, which is known to be one of the main causes of cross-signal contamination (Ben-Dor et al., 2000).

IV. Finally, probes (antitags) that bind weakly to their targets (tags) must be eliminated. That is, probes must exhibit strong perfect-hybridizations with their targets, allowing for the existence of a separation temperature (“safety distance”) regime, where most perfect-hybrids are in duplex form while most cross-hybrids are still melted, thus allowing strong signal-to-noise ratio.

When optimizing binding specificity, we also take into account secondary structures of antitag/tag oligonucleotides, such as hairpins, bulges, interior loops, etc. (Zuker, 2003). This ensures that the melting temperature of a duplex is computed by simulating the self-folding of the strands as well as other strand structure formation and profiling their competition [i.e., between double-strand (hetero-dimer), self-folding (self-dimerization), and homo-dimer formations].

Using these criteria, we can now formulate the design problem for the construction of a tag/antitag library based on T_m optimization.

Melting temperature optimization

In the experimental settings, double-stranded DNA melting is most readily affected through temperature control, whereby the solution is heated until the stacking interactions and hydrogen bonds primarily responsible for holding the molecular duplex together are broken, resulting in its dissociation into two single strands (Strachan and Read, 2010). This feature could be efficiently utilized when performing highly specific DNA target recognition, if—for a given tag/antitag library—there exists a temperature range where all the desired perfect-hybrids are present in the predominantly duplex form, while all of the spurious cross-hybrids have largely melted.

The formation/dissociation of a duplex is a process of chemical reaction, where the transition from one state to another state results in a change of energy in the system. The change in Gibbs free energy (ΔG) of the reaction is equal to the thermodynamic transition parameters enthalpy (ΔH) and entropy (ΔS) as ΔG=ΔH−T×ΔS. Note that ΔG and hence the equilibrium constant of the hybridization reaction depend on the temperature of the surrounding environment. In this setting, given an equal mixture of complementary DNA strands, a “melting temperature” of the underlying sequence can be defined as the temperature at which half of the molecules are in a duplex and half in single-stranded form. Using a simple two state (hybridized vs dissociated) DNA strand model, melting temperature can be estimated as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}T_m = \frac {\Delta H} {\Delta S + R \times \ln \left(\frac {C_t} {f} \right)} , \tag {1} \end{align*} \end{document}

where C_t corresponds to molar strand concentration, R is the universal gas constant, and f=1 when the two strands are the same or f=4 if the strands are different (Markham and Zuker, 2008). The sequence-dependent factors are applied as the amount of changes in transition parameters ΔG, ΔH, and ΔS, and salt corrections are added to T_m from published tables (Allawi and SantaLucia, 1997; SantaLucia, 1998; SantaLucia and Hicks, 2004).

However, the above equation is a very simple calculation of T_m that models double-stranded DNA dissociation as a single-step process. A more comprehensive definition of the “melting temperature” may be as the temperature at which the system passes through a sharp heat capacity maximum, ∂C_p/∂T=0, due to the cooperative loss of stacking interactions in the course of duplex dissociation (Mikulecky and Feig, 2006; Markham and Zuker, 2008). While this typically yield similar values to the ones obtained with the former half-melted duplex definition above (which also allows for a more direct interpretation of empirical results), the latter could be most immediately related to and calculated from the underlying biophysical and biochemical properties of the DNA polymer molecules involved.

In the latter case, the heat capacity is given by the equation (Mikulecky and Feig, 2006) \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}C_p = - T \frac {\partial^2 \Delta G} {\partial T^2} ,\end{align*} \end{document}

and T_m is obtained from the maximum or local maxima of the heat capacity curve such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}T_m (C_p) = \arg \max_{T} \{C_p | \partial C_p / \partial T = 0 \} \tag{2}\end{align*} \end{document}

In this work, we used the T_m(C_p) calculation given in (2) as the melting temperature metric (Mikulecky and Feig, 2006; Markham and Zuker, 2008).

Problem formulation

We seek to construct a set of orthogonal DNA probes such that the difference (X) between the minimum T_m of perfect-hybridizations (antitag–tag) and the maximum T_m of cross-hybridizations (antitag–antitag, tag–tag, antitag–[foreign]tag, [foreign]antitag–tag) is as large as possible across any pairing of sequences in the collection. Namely,

1. For all combinations of antitags/tags of U and V: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{split} max [T_m (U , V) , \ T_m (U , \underline{V}) , \ T_m (\underline{U} , V) , \ T_m (\underline{U} , \underline{V})] \\ \quad - min [T_m (U , \underline{U}) , \ T_m (V , \underline{V})] \leq X ,\end{split} \tag{3}\end{align*} \end{document}

where U and V are the reverse complements of U and V.

2. Minimize X.

Notice that when X is a positive number the melting temperature of perfect- and cross-hybrids cannot be fully separated (Fig. 2); otherwise, when X is a negative number it quantifies the “safety distance” (separation temperature) between perfect- and cross-hybrids. In the latter case, X represents the size of the range from which a temperature could be selected to enable high fidelity identification assay to be performed (Fig. 3). Note that the more negative X is, the larger the safety distance from which a temperature could be chosen and so the more efficient the assay separating cross- from perfect-hybrids. Thus, in this sense, “−X” could be viewed as a global measure of orthogonality for any given set of oligonucleotides.

FIG. 2.

Heat map and histogram of the average random set, with orthogonality X=10. Color images available online at www.liebertpub.com/nat

FIG. 3.

Heat map and histogram of the optimized Optimal Tags, X=−13. Color images available online at www.liebertpub.com/nat

Figures 3 and 4 show the histogram of melting temperatures for all possible perfect- and cross-hybrids in two TaT libraries. Both libraries have nonoverlapping temperature ranges between perfect- and cross-hybrids that allow for their full separation thus achieving orthogonality. These tag collections will later be discussed in further detail.

FIG. 4.

Heat map and histogram of the optimized Collection Tags, X=−15. Color images available online at www.liebertpub.com/nat

In this context, given a set of N unique L-mer probes, the diagonal elements of a 2N×2N matrix of hybridization scores (heat map) represent the perfect-hybridizations, while the off-diagonal elements represent the cross-hybridizations (Fig. 5). The orthogonality score X of the set is the outcome of the cost function associated with the heat map (i.e., the difference between the minimum diagonal and the maximum off-diagonal values). The corresponding probe set can be defined as a point in the sample space that consists of 4 ^N×L possible sequence sets.

FIG. 5.

Heat map for a tag/antitag (TaT) library corresponding to 7 orthogonal probes. Upper-left 7×7 matrix represents the melting temperatures (T_ms) of possible tag–antitag bindings; here, probe–target bindings (diagonal) show strong (red) perfect-hybridizations while probe–[foreign]target bindings (off-diagonal) have weak (blue) cross-hybridizations. Upper-right 7×7 matrix elements represent the probe–probe cross-hybrids. Lower-left matrix shows the target–target bindings. Lower-right matrix shows the target–probe (diagonal) and [foreign]target-probe (off-diagonal) bindings, which is the transpose of the upper-left matrix. Color images available online at www.liebertpub.com/nat

To improve the orthogonality X of a set of antitags one could perform an optimization by walking over the sample space of sets one by one, and searching the entire space by “brute force.” However, for most practical applications, antitag length L and set size N are large enough to make such a walk over all possible sets computationally infeasible. Furthermore, replacing an antitag once causes a transition to another point/set in the sample space, which affects not only the replaced antitag's perfect-hybrids but also those of all cross-hybrids related to it (i.e., 4N−1 cross-hybrids). Replacement of any one antitag generally affects the cross-hybrid affinities of all other antitags in the set. To see this, let us assume that a subset of antitags set* in the current set (set* ⊆ set) has been optimized (i.e., set* corresponds to the weakest cross-hybrids and the strongest perfect-hybrids). Should the next step of the optimization, however, attempt to replace an antitag in set\set* with a new antitag, which happens to diminish the orthogonality of set*, for example, because new antitag is near complementary (or sequence-similar) to one in set*, the overall set orthogonality is likely to be diminished and set* would need to be reoptimized as long as new antitag remains in the set. Because of this joint nonlocal nature of the problem, the optimization has to account for all interactions in the set at once in each step. Thus, no single oligonucleotide, subset of oligonucleotides, or even nucleotide subsequence can be a priori guaranteed to remain optimal or suboptimal with respect to a specific orthogonal TaT library design. That is, if we divide the problem into subsidiary problems and optimize them disjointly (i.e., considering only a group of antitags at a time), the complete solution may only result in a limited orthogonality improvement.

As an example, consider the sub-matrixes of the simple heat-map given in Fig. 5 and the four subsidiary problem classes defined previously, and let us attempt a method that solves these subsidiary problems separately using some sort of a “greedy” approach (e.g., Xu et al., 2009a). That is, suppose that we have first determined a set that has the weakest antitag–antitag interactions as measured by low T_m levels, a second set that has the weakest tag–tag interactions, a third set that has weakest antitag–[foreign]tag interactions, and a fourth set that has strongest antitag–tag interactions with high T_m scores. Then, it may be expected that the larger the set size, the fewer antitags these 4 sets will have in common. So, despite the easy implementation and faster convergence in each problem class, the ultimate solution will not be the global optimum.

Because of this, using such divide-and-conquer heuristics may not be the best strategy when looking for the global optima. Instead, we need to formulate a method that can be iteratively applied to generate simultaneous improvements across all interaction classes in tag/antitag library.

Designing orthogonal tags by probe elimination

To make this more concrete, consider a recent work by Xu et al. (Xu et al., 2009a), which describes a framework for elimination of unsuitable probes from a large initial set. It consists of 2 elimination phases: the first is based on analyzing nucleotide contents of sequences individually; and the second evaluates the mutual similarity between pairs. The method starts with a randomly generated 10 million 25-mer oligo DNA sequences. The next step involves the elimination of sequences by using a number of filters: restriction enzyme filter for excluding the candidates that contain restriction enzyme sites reserved for cloning purposes, T_m filter based on the nearest neighbor (NN) model for excluding sequences that have perfect-hybrid T_ms that are too high (>68°C) or too low (<58°C), GC composition filter for excluding sequences with GC contents above 60% or below 40%, and repetitive sequence filter for excluding sequences that contain 5 or more single nucleotide repeats or 4 or more double nucleotide repeats, respectively. Sequence candidates that pass all of these filters are then compared among each other for sequence similarity using basic local alignment search tool (BLAST). Finally, two distinct sets of 240,000 orthogonal sequences are determined by the Network Elimination Algorithm (Xu et al., 2009b).

This protocol can provide a fast approach for designing large sets of orthogonal sequences, but the speed is attained at the cost of reduced orthogonality, since filters employing probe design rules have to unnecessarily restrain the variety and bias the selection of sequences because many fundamentals of DNA hybridization cannot be “boiled down” or well understood in terms of such simple selection rules. For example, in (Xu et al., 2009a) it is shown that the T_m filter based on the NN model is not very helpful as a probe design rule due to frequent occurrences of good probes with T_ms falling out of the minimum-maximum T_m interval that end up being filtered out. Additionally, reliance on BLAST ignores the physical differences (e.g., energetics of binding) between different sequences, since considering pair-wise alignment is not sufficient to establish physical properties, such as melting temperature, which are frequently the ones most relevant in practice. Besides, there are at least 100 million other sets of 10 million distinct sequences beyond those allowed by the filtering constraints, since the sample space for 25-mers is as large as 4²⁵≅11×10¹⁴. Although the restricted set of 10 million sequences randomly generated in the initial step may appear large enough for determining 240,000 orthogonal tags, the contribution from the universal sample space is likely to contain many more orthogonal solutions.

Another related work presented by Shoemaker et al. (Shoemaker et al., 1996) had proposed a quantitative method for generating distant 20-bp tags, which have since been widely used in a number of studies. This method is based on 3 elimination phases. First it begins by considering all 20-mer sequences in the sample space, then it performs the elimination based on known hybridization problems, such as secondary structure, base composition, runs of single nucleotides, etc. Next, it uses pair-wise analyses to eliminate sequences that contain common 9-mers by assuming that long contiguous base-pairings (≥10 bp) is what largely causes cross-hybridization. The resulting 51,082 distinct sequences are predicted to have similar T_ms for perfect-hybrids within a min–max interval. Finally, the remaining sequences are passed through another filter to further eliminate those that are most likely to cross-hybridize, resulting in sets of different sizes—9,105, 2,643, 853, 170, and 42 tags—depending on threshold levels.

The utilization of sequence-similarity and T_m filters proposed in both studies can allow fast searches for target sets of sequences; however, they are often inclined to unnecessary removal of good candidates as pointed out in Xu et al., 2009a. Such T_m filters evaluate thermodynamic characteristics of probes individually, while ignoring potential interactions with other sequences. A recent case study has further highlighted similar concerns for other aforementioned design filters by evaluating the probe collections from 74 Affymetrix arrays (Tulpan, 2012). In contrast, the orthogonality measure X proposed here is influenced by all types of interactions across tag/antitag library, thus potentially giving a more practical handle for limiting cross-hybridization. Furthermore, X is a measure less sensitive to probe nucleotide contents than that imposed by T_m filters.

Given these considerations, we propose a method that uses metric X for probe interactions in order to reduce the melting temperatures of cross-hybrids as much as possible, while securing high melting temperatures of perfect-hybrids, thus allowing for a broad separation regime (where a reaction run at temperature above maximum cross-hybrid T_m and below minimum perfect-hybrid T_m will produce predominantly true probe–target duplexes, while most cross-hybrids remain melted). Our problem formulation naturally corresponds to sequence-similarity filter and T_m filter, since minimizing X also minimizes problematic sequence similarities by reducing cross-reactivity between probes and forces the perfect-hybrid T_ms to fall into narrow higher temperatures ranges. However, we still apply other probe design rules since they are widely accepted in literature for securing probe–target specificity (i.e., GC composition filter) and single and double nucleotide repetitive filters.

Designing orthogonal tags by simulated annealing optimization

When designing orthogonal tags (antitags), each with length L, the optimum set of N tags is the minimum point of the cost function over the N×L dimensional space S that contains 4^N×L constellation points—each representing a probe set. The cost function of every point gives the outcome value of orthogonality metric, X. The sets of unique tags form a subspace U⊂S. Each of the \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ \frac {4^L!} {(4^L - N) !} $$ \end{document} points in U represents the type of probe set we are interested in; however, certain classes of points in U correspond to identical probe sets due to the permutation-based redundancy: the same set is referred by N! different points due to the different orders of N tags. Since only 1 point in every N! points in U is distinct and belongs to the subspace of distinct sets, D⊂S, it suffices to find the minimum point of a cost function in U∩D, that is, the set of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Big (\begin{matrix}4^L \\ N\end{matrix} \Big)$$ \end{document} points, in order to find the most mutually orthogonal set. To improve memory efficiency, we may pass on keeping track of every set that has been walked, thus ignoring the exceedingly rare multiple visits to the same set (at the cost of having to reanalyze it, should a repeat visit occur) and basing our optimization on the sets/points in U.

Using an iterative improvement strategy by doing a random walk in U leads the walk through a local minimum depending on the starting point that, however, may or may not be globally optimal. Since an iterative improvement always takes a path downhill (towards a more orthogonal set), it avoids transfers to higher points (less orthogonal sets). Yet, such points may potentially stand on the path toward the global minima and so a search method should be allowed to sometimes transfer to higher points. An approach based on simulated annealing optimization offers some appropriate schedules for allowing such hill-climbing transitions, thus avoiding being trapped in local optima. As employed in this study, the major benefit of the simulated annealing approach lies in the fact that the global optimum set may possibly be hidden behind some sequences that are excluded in the initial sequence pool or filtered by the individual constraints such as T_m filter based on the NN model used in other algorithms. Although these filters usually find problematic sequences, excluding them should be avoided as much as possible. That is, some points/sets may not be very orthogonal but can still act as a narrow gate on the way towards the global optimum set; thereby, visits to these points are essential when searching for global optimum and should not be neglected due to pool or sequence constraints in an optimization procedure. When compared with other optimization techniques, another benefit of simulated annealing is its greater scalability with increasing number of sequences.

The idea of simulated annealing for the combinatorial optimization was introduced by Kirkpatrick et al. (Kirkpatrick et al., 1983). In physics, the annealing process of a solid material starts with an initial temperature level, C, which is allowed to gradually decrease, eventually reaching thermal equilibrium. Based upon this analogy, the simulated annealing optimization approach used here starts with an arbitrary state that is a random set of unique tags. The state of the process is defined by the measure of orthogonality, X, of the current set. During each iteration, a tag is replaced by a randomly determined unique tag, which is called a “new state” and the set with the new tag is called a “neighboring set.” By replacing one tag at a time, the algorithm checks the orthogonality of every new state and decides whether or not to transfer to the neighboring set with a gradually decreasing preference.

In each step, the probability of transferring to the new neighboring set depends on the closeness of the new state, iteration/step number, k, and the annealing temperature constant, C, as: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}P_T = \exp \left(\frac {- [X_ {new} - X] ^ +} {T_k} \right) , \tag {4} \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}T_k = \frac {C} {\ln (1 + k)} \tag {5} \end{align*} \end{document}

Note that if a better or an identical score is achieved, the transfer is realized with probability 1 (notice that [X_new−X]⁺ function in P_T takes the maximum of 0 and the gain in score); if the score is worse – there is still a possibility of transferring depending on the closeness of the new state's score and the annealing temperature T_k—an inversely proportional logarithmic function of the iteration number, k. This decision of transferring allows for the “hill-climbing” moves that prevent the process from being stuck in a local minima. Notice that the probability of transferring P_T is logarithmically decreasing as the optimization continues, since T_k is a logarithmic function of k. Early steps of the process are also more unstable because of high T_k values so transitions from one state to another may be frequently observed in similarity to the physical annealing. Thus, the algorithm more likely allows for the hill-climbing moves in the early phases of optimization. These hill-climbing moves allow the process to walk over a very scattered subspace of U, increasing the possibility of finding a path to the global optima. Later, the process reaches an equilibrium level so that there remains no better neighboring set to which to transfer. This is the result of P_T being lowered to very low percentages, so that it rejects almost any hill climbing move and the optimization goes through a stable phase. In this later phase of optimization, the process reaches a local minimum, which is, with high probability, the global minimum in the coordinate space U that will also be the global minimum in S.

Most authors studying simulated annealing have concentrated on determining an ideal annealing schedule, basically defining an initial temperature C and a convergence condition to finish the optimization process (Hajek, 1988). In this work, we assumed that the annealing iterations are large enough until P_T drops to the threshold of 0.01. For a small set (tags<20) of 20-mers, most of the optimization runs are assumed to end up with global optimum solutions when C=2, since the same process with C=3 does not achieve any better result. For large libraries (tags>100) and/or libraries with longer sequences (>50-mers), however, it is expected that the process is more likely to reach the global optimum when C≥2. The details of simulated annealing optimization method are summarized below.

Summary of the algorithm

• Initialize the algorithm.

- Pick a random initial point in U: Define an initial random dataset set consisting only of unique tags.

- Generate the heat map for the set and compute the orthogonality score as X=max[T_m,cross(set)] – min[T_m,perfect(set)].

- Initialize the best point of the random walk in U.

set_best←set

X_best←X

• Iterate until convergence: P_T<threshold.

- Do a random walk to a new neighboring set set_new by replacing a random tag and compute the new score X_new after updating T_ms of perfect and cross-hybrids related to the replaced tags on the heat map.

- Decide on the transition: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$P_T = \exp \left(\frac {- [X_ {new} - X] ^ +} {T_k} \right)$$ \end{document} , where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_k = \frac {C} {\ln (1 + k)} $$ \end{document} .

* Transfer if P_T>U (0, 1), where U (0, 1) is a random number uniformly distributed on [0, 1].

set←set_new

X←X_new

* Update the best set if X<X_best.

set_best←set

X_best←X

- k←k+1

• Return set_best and X_best.

• Replace the tags that are not approved by probe design rules.

• Mark as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$set^V_{best}$$ \end{document} the probes in set_best that violate any of the auxiliary design rules (for example, demanding that GC composition must be above 35% or below 65%, or that the sequence contains no more than 4 single nucleotide repeats, or 3 double nucleotide repeats, or not contain CCCC in 5′ half of the probe, or not contain AAAA or GGGG anywhere in the probe, etc.).

• Re-optimize set_best by (mostly) searching the neighboring sets of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$set^V_{best}$$ \end{document} that comply with aforementioned rules:

* Initialize the starting point with {set_best, X_best}.

* Iterate the optimization procedure until convergence: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$set^V_{best} = \phi$$ \end{document} .

• Return set_best and X_best.

Materials and Methods

Complexity

The proposed algorithm computes the heat map of an initial set in O(N²) time by simulating all N×(2N+1) perfect and cross-hybrids. Later in each step, the heat map is updated for the replaced tag in O(N) time that requires the simulation of 4N−1 putative hybrids. If the heat map is known, the optimization problem reduces to polynomial time complexity, O(Nk), where k is the number of iterations. The proposed method is applicable to parallel processing (i.e., in each step, 4N−1 simulations can be distributed among available processors of a cluster for the speedup).

The T_m of each duplex is computed after generating the duplex's melting profile by UNAFold software. This dependency may represent a major computational speed bottleneck in practical implementation, since each step requires pipelining the results from UNAFold to continue the optimization process. However, UNAFold is a greater source of accuracy in various hybridization conditions since it is based on such underlying physical properties of tags as nucleic acid folding structures, homo-/hetero-dimer formations and melting profiles, etc. (Markham and Zuker, 2008).

UNAFold

The hybridization of each pair of tags and/or antitags is simulated by UNAFold and the corresponding melting profile is generated. Given the melting profile of a duplex, there are various ways of computing its melting temperature. In our optimizer software, we considered the T_m(C_p), the prediction of melting temperature based on the heat capacity. In case of multiple distinct peaks on the heat-capacity curve, the T_m(C_p) value corresponding to the largest temperature is assumed as the melting temperature (Markham and Zuker, 2005). We used the following parameters as standard in all T_m computations:

• Initial concentrations: uniform concentrations are selected as 10 μM for each tag.

• Salt conditions: Na⁺=25 mM, Mg⁺⁺=10 mM for every hybridization.

Orthogonal tag optimizer server

A user-specified set of DNA tags can be requested from the online implementation of the proposed method (www.columbia.edu/∼ae2321/TagApplet/starter.html).

Users can submit a job by defining several inputs for the dataset of interest:

• Initial concentrations (mol/L): The uniform strand concentration for each tag.

• Salt conditions (mol/L): Na⁺ and Mg⁺⁺ concentrations in the solution.

• Number of tags, N: A set of 1,000 tags can be optimized approximately in 3 weeks of processing on a single processor.

• Tag length, L: Flexible for different sizes.

• Number of simulations: Several optimizations can be run with different initial points and the best result will be presented.

• Optimization parameter, C: Annealing temperature constant to adjust optimization schedules.

• E-mail address: When the results are ready, user is notified via email.

Results

In (Oh et al., 2010), a universal TagModule Collection is introduced for parallel genetic analysis of microorganisms. This collection has 4,280 TagModules. All TagModules are identical except for two unique 20-mer DNA identifiers labeled as Uptag and Downtag. These identifier tags have been selected from the sets generated in Shoemaker's works (Shoemaker et al., 1996). Here we are interested in the orthogonality of these 8,560 Uptags and Downtags in comparison with the 20-mer tags optimized by our approach.

Recall that the expected orthogonality for a set of probes reduces as the number of probes in the set increases. Although the library of 8,560 DNA probes has certain orthogonality with respect to its size, it is not guaranteed for smaller sub-sets to have the same relative orthogonality. We evaluated the degree of interactions X among tags in TagModule collection by randomly determining a set of 100 tags and generating its heat map. By repeating this, 30 different random sets were drawn, and it is observed that in almost every set there is a cross-hybrid contamination, where the corresponding orthogonality scores range from X=−1 to X=55, with the average score of X_avg=9.9. Figures 2, 6, and 7 show the heat maps and distributions for several sets among 30 random sets: an average set with X=10, the best random set with slight orthogonality X=−1, and the most contaminated set with X=55, respectively. As can be seen, the subsets of TagModule collection contain highly cross-reactive probes and thereby, an optimization procedure should be applied to discover orthogonal solutions. We applied our method to perform this optimization task since it studies all mutual interactions among tags.

FIG. 6.

Heat map and histogram of the best random set, X=−1. Color images available online at www.liebertpub.com/nat

FIG. 7.

Heat map and histogram of the most contaminated random set, X=55. Color images available online at www.liebertpub.com/nat

By using simulated annealing optimization, we have elucidated the most mutually orthogonal subset of 100 sequences from the TagModule collection. That is, we have restricted the space of possible sets to those only consisting of given 8,650 probes and used an appropriate annealing schedule (C=2) for the optimization. We simulated 30 parallel runs and extracted the best data set, Collection Tags, corresponding to the lowest score of degree of interaction, X_Coll=−15, which is, with high probability, the global solution in TagModule collection for the given set size. Figure 4 shows that cross-hybrid T_ms are distributed from 0°C to 53°C, and perfect-hybrid T_ms are distributed from 68°C to 80°C. The improvement is obvious when compared to the orthogonality of random subsets (Figs. 2, 6, 7) of the TagModule collection.

For comparison, we have applied our approach to the same problem—using the same optimization schedule—with the only difference being that no quantitative filters are used other than eliminating identical or complementary tags and finally replacing those that are not approved by the given probe design rules. Each optimization is executed by drawing purely random DNA tags and performing the pair-wise analyses for each type of tag/antitag interaction by evaluating full thermodynamic properties of pairs, and replacing undesired probes that cause strong cross-hybridizations and/or weak perfect-hybridizations. Among the 30 optimized sets, the best one, Optimal Tags, with the lowest score of degree of interaction X_Opt=−13 is displayed in Fig. 3. In this set, cross-hybrid T_ms are distributed from 0°C to 55°C, and perfect-hybrid T_ms are distributed from 68°C to 81°C.

Note that our Optimal Tags set reaches the same degree of cross-interactions on median (41°C), and in both sets the perfect-hybrids are distributed narrowly in a similar range and reduce to the same lower bound (T_m=68°C). Although Collection Tags has slightly better orthogonality score than Optimal Tags, this can be explained by the faster convergence of algorithm resulting from the restriction of optimization to the TagModule probes. Consider further that it is very unlikely to elucidate the Optimal Tags from a given database such as TagModule collection or even from large random pools. That is, after several tests we observed that none of the probes in Optimal Tag Collection are found in randomly generated initial pools consisting of 1 million 20-mers. This may indicate the diminishing effect of restriction of the search to such initial pools. Notice that by using a pool of 1 million≃4¹⁰ distinct 20-mers one can only cover the sample space of a 10-mer sequence, which is equivalent to predefining half of the nucleotides randomly in each elucidated probe.

Discussion

In this work, we propose an approach for designing orthogonal sets of probe–target oligonucleotides. The method offers potentially enhanced specificity and reduced cross-reactivity of tag–antitag recognition because it does not rely on auxiliary constraints—such as sequence-similarity filters, T_m filters, restricted oligonucleotide databases or predefined initial pools—which may restrict other approaches to potentially suboptimal solutions. The proposed method offers an alternative way of determining orthogonal probes by searching an entire feasible set on the sequence space so that the elucidated set can optimally satisfy any predefined properties as far as possible. On the other hand, if it is necessary to rely on a given probe database/source sequences (e.g., when designing polymerase chain reaction primers), our approach can be used to find, with high probability, the global optimum solution for the given database (e.g., the most mutually orthogonal subset of a given size —Collection Tags—from the TagModule collection designed previously (Oh et al., 2010).

The key benefit of our approach is the additional degree of freedom it offers towards the selection of tag/antitag sequences, which reach the global optimum, while imposing a thermodynamic quality control feature that evaluates all probe and target interactions involved. This degree of freedom helps the algorithm search a very scattered sample space as well as deal with similar and near-complementary tags that have led to major drawbacks in TaT design techniques (Tulpan, 2012). The proposed approach employs the simulated annealing method in order to free the optimization process from being stuck in local minima, finding—with high probability—the global solution of a probe set with the desired properties. These properties can include various individual or pair-wise constraints and/or probe design rules imposed on the tags, such as, restricting the tags in order to exclude/contain restricted enzyme sites or other sequence patterns or motifs, eliminating tags that have consecutive base-pair matches, defining the range of allowed GC contents, etc.

Here, we used the difference between the minimal melting temperature of perfect oligonucleotide hybrids and the maximal melting temperature of any spurious self- or cross-hybrids as the optimization parameter. What makes melting temperature between interacting probes and targets a practically relevant quantity when measuring orthogonality is that T_m values can be both controlled in situ as well as robustly estimated by accounting for the physical properties of the tags. Note that this behavior of DNA hybridization cannot be predicted by only studying the nucleotide matches between sequence pairs. In this work, we have employed the software published by (Markham and Zuker, 2008) for generating the melting profiles of each putative hybrid. For a sophisticated analysis of hybridization behaviors, it combines a number of auxiliary programs that performs the free energy minimization and full partition function calculations and stochastic sampling in hybridized or folded sequences (Hofacker, 2003; Zuker, 2003; Ding et al., 2004). For the interacting molecules of a duplex, the competitions between all possible molecular species are taken into account, (i.e., a hetero-dimer duplex, 2 homo-dimers, and 2 foldings). The folding predictions are based on free energy minimization and full partition function calculations, and all possible competitions between the dimer formations and foldings of the individual strands are simulated. For each species, the partition function computations are performed over a range of temperatures with overall ensemble quantities being computed for ΔG, ΔH, ΔS, and heat capacity after combining the results (Markham and Zuker, 2008).

It should be noted, however, that these calculations might still fall short in accurately estimating T_m because the algorithms do not always detect some unusually stable structures in single strands (Zuker, 1989). Furthermore, the employed software is designed for simulating the hybridizations in solution. If the probes are being designed for microarrays, it is necessary to use the effective concentrations of the oligos that will be immobilized on the microarray surface (Markham and Zuker, 2008). That is, one needs to compute the surface phase concentration and solution volume per surface area and to calculate the effective concentrations (Zhang et al., 2005). This calculation may be complicated due to certain factors that cause the effective volume to differ from the calculated ratio of total volume to total surface area (Zhang et al., 2005). In surface hybridizations, another important factor is the diffusion effects that determine the rate of hybridization (i.e., the time needed to reach the equilibrium state in a hybridization cell) (Markham and Zuker, 2008). The rates of hybridization in microarray cells can vary depending on the binding site concentrations and the equilibrium binding constants of the duplexes being hybridized—for instance, cells with the most stable duplexes may take longer to reach the equilibrium state than may otherwise be expected (Livshits and Mirzabekov, 1996). Even with incubation times on the order of hours, some sequences may not reach equilibrium in their cells and so probe–target signal strengths may vary due to the differences in hybridization rates. Thus, it should be emphasized that a probe set being designed by the proposed method may deviate from the expected characteristics due to aforementioned issues.

Hybridization characteristics of an oligo duplex also depend on various environment factors (e.g., salt conditions, oligo concentrations, etc.). It is well known that the melting temperature of an oligonucleotide or polymer hybrid is a salt dependent metric, whereby T_m decreases with lowering of salt concentration (Schildkraut and Lifson, 1965). To account for with this, the salt correction formulae published by (SantaLucia, 1998) are used to refine the thermodynamic parameters ΔG, ΔH, and ΔS when generating the melting profiles. The employed software further allows for ranges of salt conditions: the sodium concentration [Na⁺] must be between 0.01 M and 1 M, and magnesium concentrations [Mg²⁺] must be lower than 0.1 M in default oligomer mode (Markham and Zuker, 2005). This can give flexibility for the T_m predictions when optimizing oligonucleotide set designs under specific conditions.

The proposed method is based on simulating every hybrid of an initial probe set to generate the initial heat map and then simulating the putative hybrids of the tags being replaced during the optimization. The number of possible interactions in a probe library grows exponentially as the number of tags to be designed increases, so generation of the initial heat map also increases computational costs. Although this can be mitigated by applying parallel processing, the dependence on the software that generates the melting profiles of hybrids poses another limitation. Thus, it may be necessary to define a realistic maximum for the size of the library that can be designed using the proposed method in each individual setting. In our tests, applying parallel processing in combination with the appropriate annealing schedules, the proposed method can optimize sets that contain tens of thousands of sequences, for example, 10,000 tags can be elucidated in 8 days of processing on a cluster of 1,000 central processing units.

Footnotes

Acknowledgments

XW and MS conceived of the project. AE, GJ, XW, and MS participated in the design of the method. AE performed the computer experiments; AE and MS contributed in the writing of the draft. All authors read and approved the final manuscript.

Author Disclosure Statement

No competing financial interests exist.

References

Allawi

H.T.

, SantaLucia

1997. Thermodynamics and NMR of internal G·T mismatches in DNA. Biochemistry, 36:10581–10594.

Ben-Dor

, Karp

, Schwikowski

, Yakhini

2000. Universal DNA tag systems: a combinatorial design scheme. J. Comput. Biol., 7:503–519.

de Carvalho

S.A.

, Rahmann

2006. Microarray layout as quadratic assignment problem. German Conference on Bioinformatics. Huson

, Kohlbacher

, Lupas

, Nieselt

, Zell

Lecture Notes in Informatics P-83, 11–20.

de Carvalho

S.A.

, Rahmann

2007. Improving the design of genechip arrays by combining placement and embedding. Comput. Syst. Bioinformatics Conf., 6:417–27.

Ding

, Chan

C.Y.

, Lawrence

C.E.

2004. Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res., 32:135–141.

Hajek

1988. Cooling schedules for optimal annealing. Math. Oper. Res., 13:311–329.

Hannenhalli

, Hubbell

, Lipshutz

, Pevzner

2002. Combinatorial algorithms for design of DNA arrays. Chip Technology, volume 77 of Advances in Biochemical Engineering/Biotechnology. Hoheisel

et al. Springer: Berlin, 1–19.

Hofacker

I. L.

2003. The vienna RNA secondary structure server. Nucleic Acids Res., 31:3429–3431.

Kirkpatrick

, Gelatt

C.D.

, Vecchi

M.P.

1983. Optimization by simulated annealing. Science, 220:671–680.

10.

Livshits

, Mirzabekov

1996. Theoretical analysis of the kinetics of DNA hybridization with gel-immobilized oligonucleotides. Biophys. J., 71:2795–2801.

11.

Markham

N.R.

, Zuker

2005. DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res., 33:W577–W581.

12.

Markham

N.R.

, Zuker

2008. UNAFold: software for nucleic acid folding and hybridization. Keith

J. M.

, Walker

J. M.

Methods in Molecular Biology, 453 Bioinformatics. Humana Press: Totowa, NJ, 3–31.

13.

Mikulecky

, Feig

2006. Heat capacity changes associated with nucleic acid folding. Biopolymers, 82:38–58.

14.

, Fung

, Price

M.N.

, Dehal

P.S.

, Davis

R.W.

, Giaever

, Nislow

, Arkin

A.P.

, Deutschbauer

2010. A universal TagModule collection for parallel genetic analysis of microorganisms. Nucleic Acids Res., 38:e146.

15.

SantaLucia

1998. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. U. S. A., 95:1460–1465.

16.

SantaLucia

Jr. , Hicks

2004. The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct., 33:415–440.

17.

Schildkraut

, Lifson

1965. Dependence of the melting temperature of DNA on salt concentration. Biopolymers, 3:195–208.

18.

Shoemaker

D.D.

, Lashkari

D.A.

, Morris

, Mittmann

, Davis

R.W.

1996. Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy. Nat. Genet., 14:450–456.

19.

Strachan

, Read

A.P.

2010. Human Molecular Genetics. 4th. Garland Science: New York, 192–207.

20.

Tulpan

, Ghiggi

, Montemanni

2012. Computational sequence design techniques for DNA microarray technologies. Systemic Approaches in Bioinformatics and Computational Systems Biology. Lecca

, Tulpan

, K. R. IGI Global: Hershey, PA, 57–91.

21.

, Schlabach

, Hannon

, Elledge

2009a. Design of 240,000 orthogonal 25-mer DNA barcode probes. Proc. Natl. Acad. Sci. U. S. A., 106:2289–2294.

22.

, Schlabach

, Hannon

, Elledge

2009b. Implementation of the network elimination algorithm. http://elledgelab.bwh.harvard.edu/barcode/.

23.

Zhang

, Hammer

D.A.

, Graves

D.J.

2005. Competitive hybridization kinetics reveals unexpected behavior patterns. Biophys. J., 89:2950–2959.

24.

Zuker

1989. On finding all suboptimal foldings of an RNA molecule. Science, 244:48–52.

25.

Zuker

2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31:3406–3415.