1. Introduction
Transposition is an important genome rearrangement operation, and sorting a permutation by transpositions (SPbT) is an important problem in bioinformtics. In the transposition operation, a segment is cut out of the permutation and pasted in a different location. SPbT was first studied by Bafna and Pevzner (1998), who discussed the first 1.5–approximation algorithm which had quadratic running time. Eriksson et al. (2001) gave an algorithm that sorts any given permutation of n elements by at most
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\frac {2} {3} n$$\end{document}
transpositions. Later, Hartman and Shamir (2006) used the concept of simplified breakpoint graph to design another 1.5–approximation algorithm with O(n2) running time. They further used the splay tree to implement this simplified algorithm and thereby reducing the time complexity to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$O (n^ \frac {3} {2} \ \sqrt {\log n})$$\end{document}
(Hartman and Shamir, 2006; Sleator and Tarjan, 1985). Finally, Elias and Hartman (2006) presented an 1.375–approximation algorithm, which is the best known approximation algorithm for SPbT in the literature so far. The running time of that algorithm (Elias and Hartman, 2006), however, is O(n2). Very recently, Feng and Zhu (2007) improved the running time of the 1.5–approximation algorithm of Hartman and Shamir (2006) to O(n log n) by introducing and using a new data structure named the permutation tree. In this paper, with the help of the permutation tree data structure we improve the running time of the 1.375 Approximation Algorithm for SPbT of Elias and Hartman (2006) to O(n log n).
2. Preliminaries
A transposition τ ≡ trans(i, j, k) on
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi = (\pi_0 \ldots \pi_{n - 1})$$\end{document}
is an exchange of two disjoint contiguous segments
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$X = \pi_i , \ldots , \pi_{j - 1}$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$Y = \pi_j , \ldots , \pi_{k - 1}$$\end{document}
. Given a permutation π, the SPbT asks to find a sequence of transpositions to transform π into the identity permutation such that the number of transpositions t is minimized. The transposition distance of a permutation π, denoted by d(π), is the smallest possible value of t. The breakpoint graph G(π) (Bafna and Pevzner, 1998) is an edge-colored graph on 2n vertices
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\{l_0 , r_0 , l_1 , r_1 , \ldots , l_{n - 1} , r_{n - 1} \}$$\end{document}
. For every 0 ≤ i ≤ n − 1, ri and li+1 are connected by a grey edge, and for every πi, lπi and rπi−1 are connected by a black edge, denoted by bi. The breakpoint graph uniquely decomposes into c(π) cycles. A k-cycle has k black edges; if k is odd (resp. even), the cycle is odd (resp. even). Further, if k < 3, it is short and else, long. The number of odd cycles is denoted by codd(π), and we define Δcodd(π, τ) = codd(τ.π) − codd(π), where τ.π denotes the result after τ is applied. A transposition τ is a k-move if Δcodd(π, τ) = k. A cycle is called oriented if there is a 2-move that is applied on three of its black edges; otherwise, it is unoriented. If G(π) contains only short cycles, then, both π and G(π) are called simple. A permutation π is 2-permutation (resp. 3-permutation) if G(π) is contains only 2-cycles (resp. 3-cycles). Permutations can be made simple by inserting new elements into the permutations and thereby splitting the long cycles (Gu et al., 1999).
Two pairs of black edges (a, b) and (c, d) are said to intersect if their edges occur in alternated order in the breakpoint graph, i.e., in order a, c, b, d. Cycles C and D intersect if there is a pair of black edges in C that intersects with a pair of black edges in D. A configuration of cycles is a subgraph of the breakpoint graph that is induced by one or more cycles. Configuration A is connected if for any two cycles c1 and ck of A there are cycles
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$c_2 , \ldots , c_{k - 1} \in A$$\end{document}
such that, for each
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$i \in [ 1 , k - 1 ]$$\end{document}
, ci intersects with ci+1. A component is a maximal connected configuration in a breakpoint graph. The size of a configuration or a component is the number of cycles it contains. A configuration (similarly, a component) is said to be unoriented if all of its cycles are unoriented. A configuration (similarly, a component) is small if its size is at most 8; otherwise, it is big. Small components that do not have an
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\frac {11} {8}$$\end{document}
-sequence are called bad small components (Elias and Hartman, 2006).
In a configuration, an open gate is a pair of black edges of a 2-cycle or an unoriented 3-cycle that does not intersect with another cycle of that configuration. A configuration not containing open gates is referred to as a full configuration. An (x, y)–sequence of transpositions on a simple permutation (for x ≥ y) is a sequence of x number of transpositions, such that at least y of them are 2-moves and that leaves a simple permutation at the end.
A permutation tree (Feng and Zhu, 2007) is firstly a balanced binary tree T with root r, where each internal node of T has two children. The left and right children of an internal node t are denoted by L(t) and R(t), respectively. The height of t is denoted by H(t); a leaf node has height zero. Secondly, a permutation tree must correspond to a permutation. The permutation tree corresponding to π has n leaf nodes, labeled by
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi_1 , \pi_2 , \ldots , \pi_n$$\end{document}
, respectively. Each node corresponds to an interval of π and has a value equal to the maximum number in the interval. The interval corresponding to an internal node t is the concatenation of the two intervals corresponding to L(t) and R(t). The height of the permutation tree of π is bounded by O(log ∣π∣). A permutation tree (Build operation) can be built in O(∣π∣) time. Suppose, T,t1, and t2 correspond to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_1 \pi_2 \ldots \pi_{m - 1} \pi_m \pi_{m + 1} \ldots \pi_n)$$\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_1 \pi_2 \ldots \pi_m)$$\end{document}
, and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_{m + 1} \pi_{m + 2} \ldots \pi_n)$$\end{document}
, respectively. Then, Join(t1, t2) returns T in O(H(t1) − H(t2)) time and Split(T, m) returns tl and t2 in O(log n) time.
3. Faster Running Time for Elias and Hartman's Algorithm
The 1.375–approximation algorithm for SPbT of Elias and Hartman (referred to as the EH algorithm henceforth) is presented in Algorithm 1. Now, to achieve our goal, we need to be able to use the permutation tree for applying (x, y)–sequence and k–move. Additionally, given a pair of black edges we can find, with the help of a permutation tree, another pair of black edges such that these two pairs intersect. Feng and Zhu (2007) used the following lemma to find such a pair of black edges.
Lemma 1 (Bafna and Pevzner, 1998). Let bi and bj are two black edges in an unoriented cycle C such that i < j. Let πk = maxi<m≤ jπ
m
and πl = πk + 1. Then the black edges bk and bl−1 belong to the same cycle and the pair 〈bk, bl−1〉 intersects the pair 〈bi, bj〉.
Feng and Zhu (2007) suggested that a permutation tree can be used for query and transposition as follows. Assume that the permutation tree T corresponding to a simple permutation
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\pi = (\pi_1 \ldots \pi_n)$$\end{document}
has been constructed by procedure Build. Now, Procedure Query(π, i, j) finds a pair of black edges intersecting the pair 〈bi, bj〉 and Procedure Transposition(π, i, j, k), applies a transposition trans(i, j, k) on π. These two procedures can be implemented as follows.
Query(π, i, j): Split T into three permutation trees, t1, t2 and t3, corresponding to, respectively,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_1 , \ldots , \pi_i)$$\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_i + 1 , \ldots , \pi_j)$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_j + 1 , \ldots , \pi_n)$$\end{document}
. Clearly this can be done in O(log n) time by two splitting operations of T. The value of the root of t2 is the largest element (say, πk) in the interval
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$[\pi_i + 1 \ldots \pi_j]$$\end{document}
. Assume that πl = πk + 1. By Lemma 1, pair 〈bk, bl−1〉 intersects pair 〈bi, bj〉.
Transposition(π, i, j, k): Split T into four permutation trees t1, t2, t3 and t4, corresponding to, respectively
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_1 , \ldots , \pi_{i - 1})$$\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_i , \ldots , \pi_{j - 1})$$\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_j , \ldots , \pi_k - 1)$$\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$(\pi_k , \ldots , \pi_n)$$\end{document}
. Then, join the four trees by executing Join(Join(Join(t1, t3), t2), t4). Clearly, adjusting the permutation tree T can be done by three splitting and three joining operations spending O(log n) time.
In the rest of this section, we state and prove a number of lemmas concerning the running time of different steps of the the EH algorithm, Algorithm 1, achieving an O(nlog n) running time for the algorithm in the sequel.
Lemma 2. Step 1 of the EH algorithm can be implemented in O(n) time.
Proof. A permutation π is made simple by (g, b)-splits acting on the breakpoint graph G(π). A (g, b)-split for G(π) splits one cycle into two shorter ones. Equivalently, this operation inserts a new element into π (Hannenhalli and Pevzner, 1999). A breakpoint graph G(π) can be transformed into
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$G (\hat{\pi})$$\end{document}
containing only 1-cycles, 2-cycles, and 3-cycles by a series of (g, b)-splits (Hartman and Shamir, 2006), that is, the permutation corresponding to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$G (\hat{\pi})$$\end{document}
beomces simple. This can be done by scanning the permutation linearly and inserting a new element when necessary. Thus, Step 1 can be implemented in O(n) time.
Lemma 3. Step 2 of the EH algorithm can be implemented in O(n log n) time.
Proof. To check whether a (2, 2)-sequence exists, the following sub-steps are executed:
We check whether there are (at least) four 2-cycles. If yes, then we are done; otherwise we go to the next step.
If there are two intersecting 2-cycles, then a (2, 2)-sequence exists and we are done (Elias and Hartman, 2006). Otherwise, we go to the following step.
If there are two nonintersecting 2-cycles, we apply a transposition on three of the four black edges of the two 2-cycles (check all four possibilities). Clearly, this is a 2-move (Christie, 1999). Now, there is a (2,2)-sequence iff in the resulting graph there is an oriented cycle. Otherwise we go to the following step.
In this case the permutation is a 3-permutation. Here, if all cycles are unoriented, there is no (2,2)-sequence. Otherwise, for each oriented 3-cycle, we need to check if, after applying a 2-move on it, there is an oriented cycle in the resulting graph. There is a (2,2)-sequence iff the answer is yes for some cycle.
Clearly, the complexity depends on Sub-steps 3 and 4 as these two cases involve applying the 2-move and the transpositions. Hence, the result follows.
Lemma 4. Steps 3 and 4 of the EH algorithm can be implemented in O(n log n) time.
Proof. We have even number of 2-cycles in the breakpoint graph for a simple permutation (Feng and Zhu, 2007). A 2-move in Step 3 transforms two 2-cycles into a 1-cycle and a 3-cycle. All the 2-cycles of G(π) can be found in linear time and be eliminated by at most
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\frac {n} {2}$$\end{document}
2-moves. Since one transposition takes O(log n) time, Step 3 can be done in O(n log n) time. Finally, for Step 4, all the 3-cycles can be marked by a linear scan of the breakpoint graph which takes at most O(n) time. Hence, the result follows.
Lemma 5. The while loop at Step 5 of the EH algorithm can be implemented in O(n log n) time.
Proof. The loop iterates at most n times and each iteration takes O(log n) time as follows. Step 7 runs in O(log n) time, because, to apply a 2-move, we use the transposition operation on the permutation tree. Now, consider Steps 9 to 15. There are two types of extensions that are sufficient for extending any cycle C:
Type 1: Extensions closing open gates, and
Type 2: Extensions of full configurations such that the extended configuration has at most one open gate.
To do a sufficient extension of Type 1 (add a cycle that closes an open gate), we need to pick an arbitrary open gate and find another cycle that intersects with the open gate. For this, we query the permutation tree with the black edge 〈bi, bj〉 of the open gate under consideration. The query procedure in turn returns the intersecting pair 〈bk, bl−1〉 as stated above. This step takes O(log n) time.
If the configuration is full, i.e., there are no open gates, we do sufficient extension of Type 2. To do this, we query the permutation tree with each pair of black edges of each cycle in the configuration, until we find a cycle that intersects with a pair. If such a cycle is found, we extend the configuration by this cycle to find a component of size greater than or equal to 9. As there can be at most 24 such pairs of black edges, this step takes O(log n) time as well. Finally, we apply an
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\frac {11} {8}$$\end{document}
-sequence by using the transposition procedure of permutation tree which takes O(log n) time. Hence, the result follows.
Lemma 6. Steps 20–22 of the EH algorithm can be implemented in O(n log n) time.
Proof. In Step 20, the application of an
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland,xspace}\usepackage{amsmath,amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$\frac {11} {8}$$\end{document}
-sequence takes O(log n) time, and this step iterates at most O(n) times. Now, applying a (3,2)-sequence is essentially equivalent to applying 3 transpositions such that at least 2 of them are 2-moves. Since there can be no more than n 3-cycles, Step 21 also runs in O(n log n) time. Hence, the result follows, since Step 22 of the EH algorithm can also be implemented in O(n log n) time (Feng and Zhu, 2007).
4. Conclusion
From the above lemmas, it is easy to see that the EH algorithm, implemented with permutation tree, runs in O(n log n) time.