A Simple Approach to the Reconstruction of a Set of Points from the Multiset of n 2 Pairwise Distances in n 2 Steps for the Sequencing Problem: II. Algorithm
Available accessResearch articleFirst published online December, 2016
A Simple Approach to the Reconstruction of a Set of Points from the Multiset of n 2 Pairwise Distances in n 2 Steps for the Sequencing Problem: II. Algorithm
A new uniform algorithm based on sequential removal of redundancy from inputs is proposed to solve the turnpike and beltway problems. For error-free inputs that simulate experimental data with high accuracy, the size of inputs decreases from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( {n^2} )$$
\end{document} to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( n )$$
\end{document}, which permits one to eliminate exhaustive search almost completely and reconstruct sequences in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^2}$$
\end{document} steps. Computational experiments show high efficiency of the algorithm for both the turnpike and beltway cases, with the reconstruction time for sequences of lengths up to several thousand elements being within 1 second on a modern PC.
1. Introduction
In bioinformatics, the problems of reconstruction of points from a multiset of pairwise distances are related to technologies for restriction mapping and de novo sequencing.
The problem of constructing restriction maps, that is, identifying all the locations of restriction sites on a genome, was critical in early sequencing projects of the late 20th century. There are three variants of the computational problems arising in the context of technologies for restriction mapping: the partial digestion problem (PDP), where the experiment provides data about all pairwise distances between restriction sites generated by one enzyme; the double digestion problem (DDP), which employs the experimental data formed by DNA digestion with two enzymes; and the simplified partial digestion problem (SPDP), where one enzyme is applied to two sets of DNA clones.
Different combinatorial approaches and computer programs have been suggested for PDP (Waterman and Griggs, 1986; Allison and Yee, 1988; Krawczak, 1988; Skiena and Sundaram, 1994), DDP (Fitch et al., 1983; Goldstein and Waterman, 1987; Ho et al., 1990; Pevzner, 1995; Sur-Kolay et al., 2005, 2009), and SPDP (Blazewicz et al., 2001, 2007, 2011). However, despite significant efforts, the problems still cannot be considered resolved. All of the algorithms encounter significant difficulties when applied to constructing physical maps with more than seven or eight sites for each restriction enzyme.
The PDP for linear DNA is also known as the turnpike problem in computer science (Weiss, 1992). The turnpike problem is algorithmically the simplest case. It can be resolved by different variants of brute force algorithm with time complexity from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( \mid X{ \mid ^{n - 2}} )$$
\end{document} down to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( {n^{2n - 4}} )$$
\end{document} (Neil and Pevzner, 2004). By using the method proposed by Rosenblatt and Seymour (1982), Lemke succeeded in converting an integer instance of the turnpike problem to one of factoring a polynomial constructed from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} in pseudopolynomial time (Lemke and Werman, 1988). Skiena et al. (1990) proposed a backtracking algorithm. They proved that in general the procedure ran in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O{ ( 2^n} \log n )$$
\end{document} time, but it usually ran in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( {n^2} \log n )$$
\end{document} time in the best or average case, and the instances when it took more time were rare. A class of examples for which this algorithm took an exponential time was given by Zhang (1982). A predictive backtracking algorithm, which selected \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n$$
\end{document} elements of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} such that the pairwise distance multiset of these points plus zero equaled \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} and allowed a lot of backtracks to be avoided, was proposed by Nadimi et al. (2011).
Dakic (2000) showed that certain classes of inputs of the turnpike problem were poly-time solvable. For example, the cases given by Zhang, which take exponential time for the backtracking algorithm, are poly-solvable by the 1-th semidefinite relaxations proposed by Lovasz and Schrijver (1990). In addition, the inputs that are solvable in polynomial time not only satisfy certain specified properties but also have one unique solution, and the multiset \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} has no repeats.
De novo sequencing, that is, the direct reconstruction of the primary sequences of peptide chains from mass spectra, has been in use since the early 2000s (Johnson and Taylor, 2002; Ma et al., 2003; Frank and Pevzner, 2005). Its major advantage is that it can be applied when no genomic information is available. Thus, de novo sequencing is applicable to proteins bearing polymorphisms (Tanner et al., 2005) or to artificially modified peptide chains. The problem of sequencing a linear peptide from a complete and error-free spectrum corresponds to PDP or the turnpike problem, and cyclic peptide sequencing problem (CPSP) corresponds to the beltway problem (Lemke et al., 2003).
Many algorithms were proposed for de novo sequencing; see the review of (Allmer, 2011). One of the first methods used for CPSP was the brute force approach (Sakurai et al., 1984; Hamm et al., 1986). Currently, the most popular approaches employ the transformation of a peak list into a graph, whose nodes are connected by edges if their m/z values differ by the mass of an amino acid (Lu and Chen, 2003; Frank et al., 2007; Chi et al., 2010). Other approaches involve dynamic programming (Chen et al., 2001; Mo et al., 2007), divide and conquer method (Zhang, 2004), and hidden Markov models (Fischer et al., 2005), etc. There are also approaches that use lossy data conversion of spectra to reveal some important information. Pevzner et al. (2000) introduced spectral convolution and spectral alignment for revealing similarities between related, but different, spectra. Bandeira et al. (2008) used spectral autoconvolution and autoalignment to reveal key features of spectra for the identification of a peptide.
Despite the enormous efforts made in recent years, the turnpike and beltway problems are still considered open. A new simple approach to reducing overhead costs for both the problems, based on the sequential removal of redundant information from inputs, is proposed in the first part of this work (see A simple approach to the reconstruction of a set of points from the multiset of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^2}$$
\end{document} pairwise distances in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^2}$$
\end{document} steps for the sequencing problem: I. Theory.). For error-free inputs that simulate de novo sequencing spectra with high accuracy, up to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${10^{ - 3}}$$
\end{document} Da, the size of inputs decreases drastically, from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^2}$$
\end{document} to O(n), which permits one to eliminate exhaustive search from algorithms almost completely and reconstruct sequences in numbers of steps in direct ratio to the input size, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^2}$$
\end{document}. In this part of the work, the approach proposed is implemented in a universal algorithm for solving both the turnpike and beltway problems. In addition, the benefits and drawbacks of the algorithm are shown by comparing it with the well-known backtracking algorithm for the turnpike problem and with the graph network algorithm (Fomin, 2015) proposed for the beltway problem.
2. Algorithm
In solving the problem, we use the original set, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document}, and a series of subsets obtained by the reduction procedures: subset of unique elements, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_u} \in \Delta X ,$$
\end{document} subset of symmetrical pairs, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_s} \in \Delta {X_u} ,$$
\end{document} and subsets obtained by convolution on partial solution of rank k, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_k} = ( \Delta {X_u}*{X_k} ) , 3 \le k \le n$$
\end{document}. The original set, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document}, is required for proving the correctness of a solution obtained.
Organize the search for a solution in such a way that the reduced subsets form a chain of sequentially nested sets: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_n} \subset \Delta {X_{n - 1}} \subset .. \subset \Delta {X_2} \subset \Delta {X_s} \subset \Delta {X_u} \subset \Delta X$$
\end{document}, that is, elements of an ensuing set are sought among elements of the preceding set. Since (1) reduction procedures retain at least one of the solutions, (2) each subsequent reduction step operates with a finite number of elements, and (3) set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_n}$$
\end{document}
is a solution, then the algorithm includes a finite number of operations.
Obviously, the algorithm includes exhaustive search because the reduced subsets contain spurious matches. In case of an error in construction of a partial solution of the rank \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k$$
\end{document}, preservation of the solution \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$X$$
\end{document} in the chain of reduced sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_n} \subset \Delta {X_{n - 1}} \subset ..\subset \Delta {X_k}$$
\end{document} is no more ensured. Thus, an algorithm should retain the option of return to lower rank partial solutions found previously; that is, the algorithm should be recursive.
The proposed algorithm includes the following preparatory steps associated with the construction of the initial set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_2}$$
\end{document} for recursion start:
(1) Construction of a set of unique elements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_u}$$
\end{document} by removal of all duplicates from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document};
(2) Construction of set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_s}$$
\end{document} whose elements combine into pairs symmetrical in the range \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$[ 0 , \Delta {x_{max}} ]$$
\end{document} (turnpike case) or pairs symmetrical in the ranges, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$[ 0 , \Delta {x_ \gamma } ]$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$[ \Delta {x_ \gamma } , \mid \Delta X \mid ]$$
\end{document}, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {x_ \gamma } \in \Delta X$$
\end{document} is a random nonzero element (beltway case);
(3) Choice of the points, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( 0 , \Delta {x_{min}} )$$
\end{document} (turnpike case) or \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( 0 , \Delta {x_ \gamma } )$$
\end{document} (beltway case), as elements of a rank 2 partial solution and construction of set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_2}$$
\end{document} by convolution, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_2} = ( \Delta {X_u}*{X_2} )$$
\end{document};
(4) Elimination of elements of set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_2}$$
\end{document} that do not enter set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_s}$$
\end{document}, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_2} \leftarrow \Delta {X_2} \cap \Delta {X_s}$$
\end{document};.
The key step of the algorithm is recursive. It includes transition from set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_k}$$
\end{document} to set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_{k + 1}}$$
\end{document}. The algorithm supposes that the sets, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_u}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_s}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_2}$$
\end{document}, .., \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_n}$$
\end{document}, are ordered. The pseudocode of this algorithm step is as follows:
Construction of subset ΔXk+1 ← ΔXk
Require:n: Number of elements in solution ΔX
Require: ΔXu: Set of unique ΔX elements
Require:lmin,lmax: Minimal and maximal L elements
Require:Xk = {x1,x2,..,xk}: Partial solution of rank k
Require: ΔXk: Result of the convolution of Xk with ΔXu
functionrecursive step(Xk,ΔXk)
if |Xk| = = nthen
ifunique control(Xn) then printXn
end if
Return
end if
for all Δxin [xk + lmin,xk + lmax]∈ΔXkdo
if Δx − xk∈ΔXuthen continue
end if
Xk+1 ←{Xk,Δx}
ΔXk+1 ← ΔXk∩ (ΔXu * Xk+1)
if ΔXk+1≠∅ thenrecursive step(Xk+1,ΔXk+1)
end if
end for
end function
Description the algorithm in more detail: The construction of a sequence \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_k}$$
\end{document} of length \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k = n$$
\end{document} is followed by check for the uniqueness of the resulting solution, saving and output of the unique solution, and return to previous levels of the recursion to seek other possible solutions. At each recursion step, not the entire ordered set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_k}$$
\end{document} is viewed, but only its subrange, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$[ {x_k} + {l_{min}} , {x_k} + {l_{max}} ] ,$$
\end{document} which contains elements closest to the end of the partially reconstructed solution \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_k}$$
\end{document}. Note that the condition \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta x - {x_k} \notin \Delta {X_u}$$
\end{document} may be replaced by the more stringent condition \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta x - {x_k} \notin L$$
\end{document}, which allows greater efficiency. The next candidate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta x$$
\end{document} is appended to the sequence \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_k}$$
\end{document}, and the resulting set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_{k + 1}}$$
\end{document} is utilized for the convolution \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( \Delta {X_u}*{X_{k + 1}} )$$
\end{document}. The convolution result is then filtered through the intersection with set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_k}$$
\end{document} to remove some spurious points. If the resulting set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_{k + 1}}$$
\end{document} is nonvoid, it is assumed that the current candidate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta x$$
\end{document} has withstood the check, and construction of a partial solution of the next rank can be attempted by recursive call of the same function. If no candidate withstands the check, it means that an error was made with a candidate of the preceding level, and the function returns to the preceding recursion level.
3. Results
The algorithm is implemented in C++ and is developed on a personal computer with a Core i7 processor Intel 3.5 GHz running Linux Ubuntu 14.04. To show the benefits and drawbacks of the algorithm, comparisons with Skiena's backtracking algorithm (turnpike case) and with the graph network algorithm (beltway case) are presented. We ran the algorithms with 300 randomly generated instances for each plot. In addition, for the turnpike case, we generated the Zhang instances, where the backtracking algorithm takes exponential time.
The algorithm efficiency was tested with the following data models to simulate experimental data with different accuracy degrees.
• Model \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document} is built on the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document} = {50, 100, 150, 200}. It can be regarded as a model with extremely poor spectral resolution, accurate to 50 Da, in peptide sequencing with the following presentation of amino acid residue weights: {57, 71} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\to$$
\end{document} 50; {87, 97, 99, 101, 103, 113, 114, 115} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\to$$
\end{document} 100, {128, 129, 131, 137, 147, 156, 163} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\to$$
\end{document} 150, {186} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\to$$
\end{document} 200.
• A model \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_b}$$
\end{document} with spectral resolution, accurate to 1 Da, involves the set {57, 71, 87, 97, 99, 101, 103, 113, 114, 115, 128, 129, 131, 137, 147, 156, 163, 186}, whose elements equal molecular weights of 18 amino acid residues.
• A model \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_c}$$
\end{document} is built on the set {57.021, 71.037, 87.032, 97.052, 99.068, 101.047, 103.009, 113.084, 114.042, 115.026, 128.058, 115.026, 128.058, 129.042, 131.040, 137.058, 147.068, 156.101, 163.063, 186.079}, allowing simulation of spectral data with high accuracy to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${10^{ - 3}}$$
\end{document} Da.
For each model, random sequences \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$X = \{ 0 , {x_1} = {l_{{j_1}}} , {x_2} = {x_1} + {l_{{j_2}}} , .. , {x_{n - 1}} = {x_{n - 2}} + {l_{{j_{n - 1}}}} \} $$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mid X \mid = {x_{n - 1}} + {l_{{j_n}}}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${l_{{j_k}}} \in L$$
\end{document} of lengths from 10 up to 6000 elements were generated and sets of pairwise distances \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} were obtained. Figure 1 shows the dependence of the time \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T$$
\end{document} required to obtain the solution on the size \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n$$
\end{document} of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$X$$
\end{document} generated on sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = \{ {L_a} , {L_b} , {L_c} \} $$
\end{document}. The Y-axis is logarithmic in all diagrams. Use of the logarithmic scale enables us to distinguish clearly between the exponential (solid curve) and polynomial (dashed curve) dependences on plotting data. In the first case, the plot is linear, but in the second case, its shape is logarithmic. Data for the plots were prepared by the following algorithms: the algorithm proposed here (reduce), the backtracking algorithm (back), and the graph network algorithm (graph).
Time \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T$$
\end{document} of reconstruction of sequences versus their lengths \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n$$
\end{document} for the sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = \{ {L_a} , {L_b} , {L_c} \} $$
\end{document} for the turnpike (left) and beltway (right) cases. The inset (right, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document}) shows the variation of the number of independent solutions.
Turnpike case
Compare the proposed algorithm with the backtracking algorithm for this case:
• The model\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document}. For the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document}, the backtracking algorithm is the best choice because it runs in polynomial time for all \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document}, as expected. In the Zhang instances, this algorithm runs in exponential time. The slope of the regression line combining the parameters, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathop { \log } \nolimits_{10} T$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = \sqrt {2 \mid \Delta X \mid }$$
\end{document}, is 0.079. The proposed algorithm runs in exponential time, with the regression slope equaling 0.34, that is, its performance is the worst.
• The model\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_b}$$
\end{document}. For the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_b}$$
\end{document}, in the range \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$0 \le n \le 200$$
\end{document}, the new algorithm runs in polynomial time and shows efficiency similar to the backtracking algorithm. The function that best approximates the data plot is a power function \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^k}$$
\end{document} with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k = 1.9$$
\end{document}. However, for sequences longer than 200 elements, the performance of the new algorithm worsens sharply, and it begins to run in exponential time.
• The model\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_c}$$
\end{document}. For the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_c}$$
\end{document}, both the algorithms run in polynomial time for sequences of lengths up to 6000 elements. The degree \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k$$
\end{document} of the power function that best approximates data is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k = 2.1$$
\end{document}.
Beltway case
In this study, we compare the new algorithm with the graph network algorithm (Fomin, 2015). We present the data that were calculated by two modifications of the algorithm proposed: with (reduce+) and without (reduce−) knowledge of the basic elements of the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L$$
\end{document}.
• The model\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document}. For the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document}, both the algorithms run in exponential time with the same efficiency. The slope of the regression line combining the parameters, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathop { \log } \nolimits_{10} T$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = \sqrt { \mid \Delta X \mid }$$
\end{document}, is 0.47, which is more than that for the turnpike case. The main cause of this is the exponential growth in the number of unique solutions (see inset in Fig. 1-\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document}(right)).
• The model\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_b}$$
\end{document}. The numbers of unique solutions found for this set are usually unity, and exceptions are exponentially rare. Specifically, exceptions are typical of sequences consisting only of elements, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ 57 , 114 \} \in {L_b}$$
\end{document}, because they are related to each other in the same way as elements, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ 50 , 100 \} \in {L_a}$$
\end{document}. In this case, all versions (reduce+, reduce−, and graph) run in exponential time with very different efficiencies. The reduce + version runs 5 times faster than reduce− and almost 200 times faster than graph. The slopes of regression lines for all the plots are close, from 0.037 to 0.039.
• The model\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_c}$$
\end{document}. For the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_c}$$
\end{document}, all the algorithms run in polynomial time, with the reduce+ and reduce− versions having the same efficiency, far exceeding that of graph. The functions that best approximate the plots are power functions \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^k}$$
\end{document} with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k = 2.2$$
\end{document} (with the coefficient of determination \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^2} = 0.999$$
\end{document}) for the reduce+ and reduce− plots and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k = 4.9$$
\end{document} for the graph plot. It is worth noting that due to the shortage of physical memory, the graph network algorithm cannot restore sequences longer than 400.
4. Discussion
Figure 2 shows the mean numbers of backtracks \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n_b}$$
\end{document} generated by the new algorithm for the generating sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = \{ {L_b} , L_c^{0.1} , L_c^{0.01} , L_c^{0.001} \} $$
\end{document} simulating spectral data with different accuracies: to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$1 , 0.1 , 0.01$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$0.001$$
\end{document} Da, respectively. The plots are presented for both the turnpike (left) and beltway (right) cases. The X-axis is logarithmic in all diagrams.
The mean numbers of backtracks \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n_b}$$
\end{document} in our algorithm depending on sequence length \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n$$
\end{document} and set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = \{ {L_b} , L_c^{0.1} , L_c^{0.01} , L_c^{0.001} \} $$
\end{document} for turnpike (left) and beltway (right) cases. For all instances, the regression lines correlating \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\log {n_b}$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n$$
\end{document} are showed.
It is apparent that the numbers of backtracks in all the sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L$$
\end{document} grow with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n$$
\end{document} exponentially, but the slope of the regression line correlating \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\log {n_b}$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n$$
\end{document} goes down with the increase in data accuracy. The slope of the regression line decreases from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$1.1 \cdot {10^{ - 2}}$$
\end{document} down to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$8.5 \cdot {10^{ - 5}}$$
\end{document} and from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$4.9 \cdot {10^{ - 2}}$$
\end{document} down to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$3.3 \cdot {10^{ - 4}}$$
\end{document} for turnpike and beltway cases, respectively. Thus, for the data with high accuracy, the mean numbers of backtracks drop dramatically, down to almost null.
In this way, the main factor determining the exponential growth of computational time, namely the processing of the exponentially growing number of backtracks, is eliminated. The small number of backtracks also implies that convolution procedures permit one to reach a high degree of redundancy elimination. For set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L_c^{0.001}$$
\end{document}, the number of convolution procedures sufficient for complete purification of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} varies from 3 to 6. The consequence is that even a partial solution as short as three elements entirely purifies the pairwise distance set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} from redundancy. It is pertinent to note that such short partial solutions can be obtained by mere overview of the data.
Because the efficiency of sequence reconstruction depends on data accuracy, a breakthrough in this area is likely to be driven by the development of experimental techniques. Let us consider some experimental errors that deteriorate data accuracy. There are several types of errors, which can affect the input data for the digestion method (Dix and Kieronska, 1988; Inglehart and Nelson, 1994; Wright et al., 1994). Measurement errors, which are very difficult to avoid, are caused by the imprecise measurement of the lengths of the cut fragments. According to Marra et al. (1997), the relative deviation of 5% from the true fragment length can be guaranteed in biochemical experiments for fragments with lengths in the range from 600 to 12,000 base pairs. This low data accuracy means that the experimental data of the digestion method can be described by a data model intermediate between models, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document} (relative accuracy 25%) and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_b}$$
\end{document} (0.5%). In our opinion, it is hardly likely that the accuracy of biochemical experiments can be improved significantly. Therefore, the cyclic PDP is unlikely to be solved in the near future. It should also be noted that the search for poly-time algorithms for all digestion problems is more of academic than practical interest because current digest approaches are becoming less popular with the development of cheaper sequencing techniques.
The situation with de novo sequencing is different. The accuracy that can be described by a model such as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_c}$$
\end{document} is provided by modern mass spectrometers. For example, the resolution, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$m / \delta m{ = 5*10^5}$$
\end{document}, can be readily obtained on an ion cyclotron resonance spectrometer with masses about 500 Da. Thereby, measurements of ion masses can be accurate to the fourth or fifth decimal place (Scigelova et al., 2011). So, the CPSP is likely to be solved in the immediate future.
5. Conclusions
Summarize both parts of the work. With regard to the results of reductive operations—deduplication \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X \to \Delta {X_u}$$
\end{document}, symmetrization \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X \to \Delta {X_s}$$
\end{document}, and convolution \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X \to \Delta {X_k}$$
\end{document}—the plurality \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Omega$$
\end{document} of all bounded instances \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X$$
\end{document} of the turnpike and beltway problems, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mid \Delta X \mid \le C < \infty$$
\end{document}, can be nonstrictly divided into the following three subpluralities:
• \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _a}$$
\end{document}: the numbers of elements in the sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_u} , \Delta {X_s} , \Delta {X_k}$$
\end{document} are approximately the same and different from the lower and upper limit values of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mid X \mid$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mid \Delta X \mid$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mid \Delta X \mid > \mid \Delta {X_u} \mid \simeq \mid \Delta {X_s} \mid \simeq \mid \Delta {X_k} \mid > \mid X \mid$$
\end{document};
• \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _b}$$
\end{document}: the numbers of elements in all the sets differ from each other, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mid \Delta X \mid > \mid \Delta {X_u} \mid > \mid \Delta {X_s} \mid > \mid \Delta {X_k} \mid > \mid X \mid$$
\end{document};
• \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _c}$$
\end{document} (opposite to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _a}$$
\end{document}): the numbers of elements in the sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta {X_u} , \Delta {X_s} , \Delta {X_k}$$
\end{document} differ from each other and almost coincide with the limits, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mid \Delta {X_u} \mid \simeq \mid \Delta X \mid > \mid \Delta {X_s} \mid > \; \mid \Delta {X_k} \mid \simeq \mid X \mid$$
\end{document}.
With regard to the beltway problem, the instances \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X \in { \Omega _a}$$
\end{document}
have an exponentially large set of solutions and therefore they are unsolvable in polynomial time; the instances \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X \in { \Omega _b}$$
\end{document} have usually one solution, but they are also unsolvable in polynomial time; and, finally the instances \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Delta X \in { \Omega _c}$$
\end{document}, which have one solution each and contrary to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _a}$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _b} ,$$
\end{document} can be resolved in polynomial time.
As demonstrated above, depending on accuracy, the complete and error-free instance of CPSP experimental data can be classified into one of three subpluralities, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _a}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _b}$$
\end{document}, or \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \Omega _c}$$
\end{document}, with different expectations concerning the time of reconstruction.
Mass spectra for CPSP with low resolutions contain intense strong signals determined by numerous overlapping indistinct spectrum signals. Hence, the main feature of such data is a high level of duplication. As shown above, data with high duplication levels (model \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_a}$$
\end{document}) have a great number of solutions for the beltway case, and this number exponentially increases with sequence length. For such data, spectrum purification from redundancy is inefficient, sequence reconstruction is ambiguous, and the computational cost is exponential.
If the mass spectrum resolution is sufficient to distinguish ions differing in weight by 1 Da, amino acid residue weights can be described by integer numbers (model \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L_b}$$
\end{document}). In this case, purification from redundancy efficiently reduces the decision tree and allows design of algorithms for the beltway case to be fairly efficient with sequences up to 200–250 elements in length.
In high-resolution spectra, strong signals are split into a number of single ones, and duplication disappears. As shown above, at the resolution of 0.001 Da, sequences of lengths up to 2000 elements can be reconstructed in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${n^2}$$
\end{document} steps.
To sum up, it is reasonable to expect that the ongoing progress in bioinformational approaches to peptide sequencing with use of tandem mass spectrometry can be achieved with transition to processing of high and ultrahigh-resolution spectra. Paradoxically, the increase in data volume simplifies rather than complicates the search for solution. Just for such spectra, sequences can be uniquely reconstructed with the computational cost determined by the second-power polynomial \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T \sim c{n^2}.$$
\end{document}
Footnotes
Acknowledgment
This study was supported by the Russian Science Foundation, project 14-24-00123.
Author Disclosure Statement
No competing financial interests exist.
References
1.
AllisonL., and YeeC.1988. Restriction site mapping is in separation theory. Comput. Appl. Biosci. 8, 97–101.
2.
AllmerJ.2011. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev. Proteomics, 8, 645–657.
3.
BandeiraN., NgJ., MeluzziD., et al.2008. De novo sequencing of nonribosomal peptides, 181–195. In Research in Computational Molecular Biology, volume 4955 of Lecture Notes in Computer Science Volume. Springer Berlin Heidelberg.
4.
BlazewiczJ., BurkeE., KasprzakM., et al.2007. The Simplified partial digest problem: Enumerative and dynamic programming algorithms. IEEE/ACM Trans. Comput. Biol. Bioinf. 4, 668–680.
5.
BlazewiczJ., BurkeE., KasprzakM., et al.2011. The Simplified Partial Digest Problem: Approximation and a graph-theoretic model. Eur. J. Operat. Res. 208, 142–152.
6.
BlazewiczJ., FormanowiczP., KasprzakM., et al.2001. Construction of DNA restriction maps based on a simplified experiment. Bioinformatics, 17, 398–404.
7.
ChenT., KaoM., TepelM., et al.2001. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 8, 325–337.
8.
ChiH., SunR., YangB., et al.2010. pNovo: De novo peptide sequencing and identification using HCD spectra. J. Proteome Res. 9, 2713–2724.
9.
DakicT.2000. On the Turnpike problem [Ph.D. thesis]. Simon Fraser University. B.C., Canada.
10.
DixT., and KieronskaD.1988. Errors between sites in restriction site mapping. Comput. Appl. Biosci. 4, 117–123.
11.
FischerB., RothV., RoosF., et al.2005. NovoHMM: A hidden Markov model for de novo peptide sequencing. Anal. Chem. 77, 7265–7273.
12.
FitchW., SmithT., and RalphW.1983. Mapping the order of DNA restriction fragments. Gene, 22, 19–29.
13.
FominE.2015. Reconstruction of sequence from its circular partial sums for cyclopeptide sequencing problem. J. Bioinform. Comput. Biol. 13, 1540008.
14.
FrankA., and PevznerP.2005. PepNovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973.
15.
FrankA., SavitskiM., NielsenM., et al.2007. De novo peptide sequencing and identification with precision mass spectrometry. J. Proteome Res. 6, 114–123.
16.
GoldsteinL., and WatermanM.1987. Mapping DNA by stochastic relaxation. Adv. Appl. Math. 8, 194–207.
HoS., AllisonL., and YeeC.1990. Restriction site mapping for three or more enzymes. Comput. Appl. Biosci., 6, 195–204.
19.
InglehartJ., and NelsonP.1994. On the limitations of automated restriction mapping. Comput. Appl. Biosci. 10, 249–261.
20.
JohnsonR., and TaylorJ.2002. Searching sequence databases via de novo peptide sequencing by tandem mass spectrometry. Mol. Biotechnol. 22, 301–315.
21.
KrawczakM.1988. Algorithms for the restriction site mapping of DNA molecules. Proc. Natl. Acad. Sci. USA. 85, 7298–7301.
22.
LemkeP., SkienaS., and SmithW.2003. Reconstructing sets from interpoint distances. Discr. Comput. Geom. Alg. Comb. 25, 597–631.
23.
LemkeP., and WermanM.1988. On the complexity of inverting the autocorrelation function of a finite integer sequence, and the problem of locating n points on a line, given the \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left( {n \atop 2} \right)$$
\end{document} unlabelled distances between them. Technical Report 453, Institute for Mathematics and its Application IMA.
24.
LovaszL., and SchrijverA.1990. Cones of matrices and set-functions and 0–1 optimization. SIAM J. Optim. 1, 166–190.
25.
LuB., and ChenT.2003. A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 10, 1–12.
26.
MaB., ZhangK., HendrieC., LiangC., LiM., Doherty-KirbyA., and LajoieG.2003. PEAKS: Powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17, 2337–2342.
27.
MarraM., KucabaT., DietrichN., et al.1997. High throughput fingerprint analysis of large-insert clones. Genome Res. 7, 1072–1084.
28.
MoL., DuttaD., WanY., and ChenT.2007. MSNovo: A dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. Anal. Chem. 8, 325–337.
29.
NadimiR., FathabadiH., and GanjtabeshM.2011. A fast algorithm for the partial digest problem. Jpn. J. Indust. Appl. Math. 28, 315–325.
30.
NeilC., and PevznerP.2004. An Introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, England.
31.
PevznerP.1995. DNA physical mapping and alternating Eulerian cycles in colored graphs. Algorithmica, 13, 77–105.
32.
PevznerP., DancikV., and TangC.2000. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7, 777–787.
33.
RosenblattJ., and SeymourP.1982. The structure of homometric sets. SIAM J. Algebra Discr. Methods, 3, 343–350.
34.
SakuraiT., MatsuoT., MatsudaH., et al.1984. PAAS 3: A computer program to determine probable sequence of peptides from mass spectrometric data. Biol. Mass Spectrom. 11, 396–399.
35.
ScigelovaM., HornshawM., GiannakopulosA., and MakarovA.2011. A. Fourier transform mass spectrometry. Mol. Cell. Proteom. 10, M111.009431–009M111.009431.
36.
SkienaS., SmithW., and LemkeP.1990. Reconstructing sets from interpoint distances, 332–339. In Proceedings of the Sixth ACM Symposium on Computational Geometry. Berkeley, CA. ACM, New York, NY.
37.
SkienaS., and SundaramG.1994. A partial digest approach to restriction site mapping. Bull. Math. Biol. 56, 275–294.
38.
Sur-KolayS., BanerjeeS., MukhopadhyayaS., et al.2005. Genetic algorithm for double digest problem, 623–629. In PalS., BandyopadhyayS., and BiswasS., eds., Pattern Recognition and Machine Intelligence, volume 3776 of Lecture Notes in Computer Science Volume. Springer Berlin Heidelberg.
39.
Sur-KolayS., BanerjeeS., MukhopadhyayaS., et al.2009. The double digest problem: Finding all solutions. Int. J. Bioinf. Res. Appl. 5, 570–592.
40.
TannerS., ShuH., FrankA., et al.2005. InsPecT: Identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639.
41.
WatermanM., and GriggsJ.1986. Interval graphs and maps of DNA. Bull. Math. Biol. 48, 189–195.
42.
WeissM.1994. Data Structures and Algorithm Analysis. The Benjamin/Cummings Publishing Company, Redwood City.
43.
WrightL., LichterJ., ReinitzJ., et al.1994. Computer-assisted restriction mapping: An integrated approach to handling experimental uncertainty. Comput. Appl. Biosci. 10, 435–442.
44.
ZhangZ.1982. An exponential example for a partial digest mapping algorithm. J. Comp. Biol. 1, 235–240.
45.
ZhangZ.2004. De novo peptide sequencing based on a divide-and-conquer algorithm and peptide tandem spectrum simulation. Anal. Chem. 76, 6374–6383.