Abstract
Abstract
In computional biology, up-to-date homology-based methods for the reconstruction of ancestral gene orders usually rely on two phases. First, potential ancestral co-localizations of some genomic markers are detected from homologies between extant species. Next, the assembling phase mainly consists in resolving the conflicts between the potential ancestral features. This can be done using many methods, but one of the most advanced solutions is to identify and discard from the set of potential features those that belong to inclusivewise minimal conflicting sets of features. It relies on the consecutive ones property (C1P), and the notion of minimal conflicting set (MCS), widely used in physical mapping problems. Let
1. Introduction
I
Up-to-date reconstruction of ancestral gene orders from homologies between extant species usually consists of two phases. First, potential ancestral co-localizations of some genomic markers are detected from homologies between extant species by detecting potential ancestral contiguities of some genomic loci (see Ma et al., 2006; Chauve and Tannier, 2008; Bertrand et al., 2010; Muffato et al., 2010). In practice, these genomic loci are genes or genomic segments present in extant genomes. The potential ancestral syntenies are sets of genomic loci that are found co-localized in some extant genomes, inferring a signal of potential co-localization in the ancestral genomes.
Next, the assembling phase mainly consists in resolving the conflicts between the potential ancestral features. They are nonconflicting if they can be assembled into a single ancestral genome such that all syntenies are preserved. Given a set of n genomic loci and m potential ancestral syntenies, this property can be exactly translated into the consecutive ones property (C1P) in a 0-1 matrix.
Let
A subset
On real biological matrices, the C1P is rarely satisfied as some potential ancestral co-localizations might be due to false-positive signals. Thus, only some subsets of rows might satisfy the desired property. However, the combinatorics of such sets is difficult to handle, and a strategy to deal with them has been proposed by Bergeron et al. (2004), Chauve and Tannier (2008), and Stoye and Wittler (2009). It consists in identifying and discarding some rows belonging to minimal conflicting subsets of rows that do not satisfy the C1P, but such that any of their proper subsets does.
Definition 1
A set
However, it is not difficult to build examples of matrices such that the number of MCS is polynomial or even exponential in the number of rows. Figure 1 shows an example of a matrix for which the number of MCS is exponential in the number of rows. Indeed, let k be the number of nodes of external rows, which are r7, r8, and r9 in the figure. The total number of rows is 3k, the number of columns 2k, and the number of MCS is 2 k because any induced chordless cycle in the row intersection graph of the matrix (Fig. 1b) constitutes an MCS.

From a computational point of view, the first question that arises is the following: is a given row
In this article we present a new simpler O(m2n2 + nm7) time algorithm for deciding if a given row belongs to at least one MCS and if true, exhibits one. Our algorithm is based on an alternative approach considering minimal forbidden induced subgraphs of interval graphs (see Lekkerkerker and Boland, 1962) instead of Tucker matrices. Moreover, our central paradigm consists in reducing the recognition of complex forbidden induced subgraphs to the detection of induced cycles in ad-hoc graphs. Our approach is faster and simpler, but a difficulty shared by both approaches resides in avoiding to report conflicting sets that are not minimal.
2. MCS And Forbidden Induced Subgraphs
The row–column intersection graph of a 0-1 matrix
It should be noted that a column vertex (white) is only connected to row vertices (black). The neighborhood N(r) of a row r is the set of rows intersecting r,
Theorem 1 (Lekkerkerker and Boland, 1962; Theorem 4)
A 0-1 matrix

Forbidden induced subgraphs for the row–column intersection graph of
Property 1
From Theorem 1, a set
Given an MCS
Definition 2
A row of an MCS
Property 2
An induced subgraph of the form II, III, IV, or V always contains at least one kernel, whereas an induced subgraph of the form I contains no kernel.
We denote by GR(M) the subgraph of GRC(M) induced by the set of rows
Graph sizes
GR(M) has m vertices and at most m(m − 1)/2 edges, whereas GRC(M) has m + n vertices and at most nm + m(m − 1)/2 edges.
3. A Global Algorithm
Our algorithm to decide if a row 1. MCS of type I, 2. MCS of size 3 (type IV or V), 3. MCS of type II, 4. MCS of type III, 5. MCS of type IV and size larger than 3, and MCS of type V and size larger than 3.
See Figure 3 for an overview. Steps 2 to 4 are based on brute-force algorithms to detect forbidden induced subgraphs of GRC(M) containing at most four black vertices (rows) including r. Steps 1, 5, and 6 rely on a reduction to the detection of induced chordless cycles in ad-hoc graphs.

The different steps of the algorithm: in each case, when row r has a specific location in the forbidden induced subgraph that is looked for, this location is indicated in bold character. Other rows and columns of the forbidden induced subgraph are indicated in lightface characters.
In the following, we simply write GRC(M) as G and GR(M) as GR.
3.1. Step 1: Forbidden Induced subgraph I
We first test if r belongs to an MCS of the form I. The rows (black vertices) of such an MCS necessarily constitute an induced chordless cycle in GR. If it is true, then r belongs to an induced chordless cycle of G of length at least 4 containing only black vertices. Such a cycle exists in G if and only if it is also a chordless cycle in GR because GR is the subgraph of G induced by the set of rows
Proposition 1
Algorithm Check_I is correct and worst case O(m5) time.
Proof
The correctness of Algorithm Check_I comes from the fact that r is contained in a MCS of the form I if and only if r belongs to an induced chordless cycle of GR of length at least 4 whose set of vertices
Algorithm Check_I might be implemented in O(m5). The test performed on a given P4 containing r (lines 2–5 of the algorithm) can be achieved in O(m2 + m log m) as follows: removing the neighborhood of its internal vertices might be done in m2 time, and finding a chordless path between the two extremities might be performed using Dijkstra's algorithm in O(m2 + m log m) time. Enumerating all P4 containing r might be done in time O(m3) using a BFS from r stopping at depth 4. Eventually, the whole algorithm is in O(m3(m2 + m log m)) = O(m5) time. ■
Precomputation
In the following steps, we assume that the following precomputations have been achieved:
• For any triplet of rows (r, ri, rj) that are pairwise intersecting, i.e., each couple is an edge in G, r − (ri ∪ rj) and (ri ∩ rj) − r are precomputed; • Two rows ri and rj are overlapping if ri ∩ rj ≠ ∅ and ri − rj ≠ ∅ and rj − ri ≠ ∅. The overlapping relation between any couple of rows is precomputed; • For any quadruplet of rows (r, ri, rj, rk) such that ri, rj, and rk overlap r, r − (ri ∩ rj ∩ rk) is precomputed.
All those precomputations can simply be performed in O(nm4) time using straightforward algorithms, that is, scanning the n columns of the input matrix for each triplet or quadruplet of rows.
3.2. Step 2: Forbidden induced subgraph responsible for an MCS of size 3
We then test if r belongs to an MCS of size 3. An MCS of size 3 is necessarily caused by a forbidden induced subgraph of the form IV or V. As a consequence, the following property is immediate.
Property 3
An MCS of size 3 is always composed of 3 rows that are pairwise overlapping.
Thus, in this step, it suffices to use a brute-force algorithm, Check_IV_V_3 (see Algorithm 2), running in O(m2) time to search for a triplet of rows in G including r, satisfying one of the two configurations shown in Figure 3. IV_V_3.
Proposition 2
Algorithm Check_IV_V_3 is correct and runs in O(m2) time.
Proof
The correctness of Algorithm Check_IV_V_3 comes from the fact that r is contained in an MCS of size 3 if and only if this MCS is caused by a forbidden induced subgraph of the form IV or V (Property 3). Thus, r should belong to a triplet of rows (r, ri, rj) that are pairwise overlapping, and satisfy the conditions given in:
• either line 2 of the algorithm to produce a forbidden induced subgraph of the form IV (left-hand graph in Fig. 3.IV_V_3), • or line 5 of the algorithm to produce a forbidden induced subgraph of the form V (right-hand graph in Fig. 3.IV_V_3).
In both cases, Algorithm Check_IV_V_3 returns the set {r, ri, rj} as an MCS if such a set of rows exists. This set cannot contain a smaller subset of rows that is an MCS as 3 is the minimum size of any MCS.
Algorithm Check_IV_V_3 runs in O(m2) time because, given r, there might be O(m2) couples (ri, rj) on which the tests performed (lines 2–8 of the algorithm) might be achieved in O(1), thanks to the precomputations that have been done. ■
3.3. Step 3: Forbidden induced subgraph II
We test here if r belongs to an MCS of the form II, with the assumption that r is not contained in an MCS of size 3. Note that such an MCS is of size 4. In this step, it suffices to use a brute-force algorithm, Check_II_4 (see Algorithm 3), running in O(m3) time to search for a quadruplet of rows in G including r, satisfying one of the two configurations shown in Figure 3.II_4.
Proposition 3
Algorithm Check_II_4 is correct and runs in O(m3) time.
Proof
The correctness of Algorithm Check_II_4 comes from the fact that, if r belongs to an MCS of the form II, then r should belong to a quadruplet of rows (r, ri, rj, rk) such that one of these rows is a kernel, and the three other rows do not intersect each other. Thus, the row r is:
• either a kernel of the MCS, tested in lines 1–5 of the algorithm (left-hand graph in Fig. 3.II_4), • or not a kernel of the MCS tested in lines 6–10 of the algorithm (right-hand graph in Fig. 3.II_4).
In both cases, Algorithm Check_II_4 returns the set {r, ri, rj, rk} as an MCS if such a set of rows exists. This set cannot contain a smaller subset of rows that is an MCS as this subset would be a subset of 3 rows that cannot satisfy Property 3.
Algorithm Check_IV_V_4 runs in O(m3) time because all the tests performed on a given triplet (ri, rj, rk) in lines 2–4 and 7–9 of the algorithm can be achieved in O(1), and given r there might be O(m3) such triplets. ■
3.4. Step 4: Forbidden induced subgraph III
We test here if r belongs to an MCS of the form III, with the assumption that r is not contained in an MCS of size 3. Note that such an MCS is of size 4. In this step, we use a brute-force algorithm, Check_III_4 (see Algorithm 4), running in O(m3) time to search for a quadruplet of rows in G including r, satisfying one of the three configurations shown in Figure 3.III_4.
Proposition 4
Algorithm Check_III_4 is correct and runs in O(m3) time.
Proof
The correctness of Algorithm Check_III_4 comes from the fact that r belongs to an MCS of the form III if and only if r should belong to an quadruplet of rows (r, ri, rj, rk) included in an induced subgraph of the form III such that two of these rows are kernels of the subgraph, and one of these kernels contains a column of the induced subgraph that is not shared with any of the other rows. Let us call this kernel kernel_1, and the other kernel kernel_2. For example, in the left-hand graph in Figure 3.III_4, kernel_1 = r, and kernel_2 = rj.
Thus, the row r is:
• either kernel_1, tested in lines 1–5 of the algorithm (left-hand graph in Fig. 3.III_4), • or not a kernel, tested in lines 6–10 of the algorithm (middle graph in Fig. 3.III_4), • or kernel_2, tested in lines 11–15 of the algorithm (right-hand graph in Fig. 3.III_4).
In the first and third cases, the set {ri, rj, rk} cannot be an MCS because such a set cannot satisfy Property 3, In all cases, Algorithm Check_III_4 returns the set {r, ri, rj, rk} as an MCS if such a set of rows exists, and {ri, rj, rk} is not an MCS (in the second case). As we made the assumption that r is not contained in an MCS of size 3, there cannot exist a smaller subset of {ri, rj, rk} containing r that is an MCS.
Algorithm Check_III_4 runs in O(m3) time using a similar proof as the complexity proof for Check_IV_V_4: all the tests performed by the algorithm (lines 2–4, 7–9, and 12–14 of the algorithms) on a given triplet (ri, rj, rk) are achieved in O(1) thanks to the precomputations, and given r there might be O(m3) such triplets. ■
3.5. Step 5: Forbidden induced subgraph IV
We test here if r belongs to an MCS of the form IV, with the assumption that r is contained, neither in an MCS of size 3 nor in an MCS of type I. Depending on whether the size of the MCS is 4 or larger than 4, we describe two algorithms.
3.5.1. MCS of size 4
We first test if r belongs to an MCS of the form IV of size 4. In this step, it suffices to use a brute-force algorithm, Check_IV_4 (See Algorithm 5), running in O(m3) time to search for a quadruplet of rows in G including r, satisfying the configurations shown in Figure 3.IV_4.
Proposition 5
Algorithm Check_IV_4 is correct and runs in O(m3) time.
We look for a triplet of rows (ri, rj, rk) such that the set {r, ri, rj, rk} is an MCS of the form IV (Fig. 3.IV_4). In an induced subgraph of the form IV containing 4 rows {r, ri, rj, rk}, two rows are kernels, and in that case, r is either a kernel of the MCS, or not. If r is a kernel, then it is either a kernel—called kernel_1—containing a column of the induced subgraph that is not shared with any of the other rows, or not—called kernel_2. For example, in the left-hand graph in Figure 3.IV_4, the two kernels are the two central black vertices of the graph: the top one is a kernel_1, and the bottom one a kernel_2. Algorithm Check_IV_4 looks for each of these configurations: r is a kernel_1, tested in lines 1–5 of the algorithm; r is not a kernel, tested in lines 6–10 of the algorithm; r is a kernel_2, tested in lines 11–15 of the algorithm.
3.5.2. MCS of size larger than 4
We test here if r belongs to an MCS of the form IV of size larger than 4. An MCS of the form IV of size larger than 4 contains one and only one kernel. Depending on whether r is the kernel or not, we distinguish two cases here.
Case 1: If row r is the kernel of the MCS
Algorithm Check_IV
k
(see Algorithm 6) recovers an MCS
Proposition 6
Algorithm Check_IVk is correct and runs in O(nm2) time.
Proof
Note that, if the MCS exists, then all the rows belonging to the MCS, except r, belong to a same connected component of H. Thus, in each connected component of H, the algorithm looks for a chordless path Q linking two vertices ri, rj satisfying (1) ri and rj are not connected, and (2) ri, rj overlap r, and (3) Q does not contain any smaller subpath satisfying conditions (1) and (2). These conditions are necessary and sufficient for the set {r} ∪ Q to form the rows of an induced subgraph of the form IV. The set {r} ∪ Q cannot contain a subset that is an MCS as such a smaller MCS should be:
• either an MCS of size 3 including r, which is impossible by assumption, • or an MCS of type II or III necessarily including r as kernel, • or an MCS of type IV and size larger than 3 having r as kernel.
The two last cases are also impossible, because Q would not satisfy condition (3) in these cases.
Next, there might be n columns
Case 2: If row r is not the kernel of the MCS
Algorithm Check_IV
p
(see Algorithm 7) recovers an MCS
Algorithm Check_IV is called in Algorithm Check_IV
p
. It recovers an MCS
Proposition 7
Algorithm Check_IVp is correct, and runs in O(nm6) time.
Proof
The correctness and the complexity of Check_IV p follows directly from the correctness and the complexity of Algorithm Check_IV that is called in Algorithm Check_IV p .
The correctness of Check_IV comes from the fact that r does not belong to any chordless cycle in the graph C computed at line 2 of the algorithm by assumption. Then at line 6 of the algorithm, any chordless cycle in the graph D containing vertex r necessarily contains at least one edge (ri, rj) belonging to the set Ea. The number of edges belonging to the set Ea in such a chordless cycle Q cannot be greater than 1, as any couple of such edges in the chordless cycle would induce a chord. Indeed, if Q contains more than one edge belonging to Ea, any two such edges would have two extremities in Va, one from each of the two edges, that are not connected in the graph C. These extremities would thus be linked by an edge in Ea, creating a chord for the cycle Q in the graph D.
Therefore, the set of vertices of the chordless cycle Q induces a chordless path in G such that each vertex of Q is connected to vertex a by definition of the graph H, and the extremities ri and rj of Q satisfy (1) ri and rj are not connected in G, and (2) ri, rj overlap r, and (3) Q does not contain any smaller subpath satisfying conditions (1) and (2). These conditions are necessary and sufficient for the set {a} ∪ Q to form the rows of an induced subgraph of the form IV, and this set cannot contain a smaller MCS because such an MCS would be: (a) either an MCS of size 3 including a; (b) or an MCS of type II or III necessarily including a as kernel; (c) or an MCS of type IV and size larger than 3 having a as kernel.
The three cases are impossible, because they would induce a chord from the set Ea in the chordless cycle induced by Q in the graph D.
Algorithm Check_IV calls Algorithm Check_I. Both algorithms have the same time complexity in O(m5) time. It follows immediately that Algorithm Check_IV p runs in O(nm6) time. ■
3.6. Step 6: Forbidden induced subgraph V
We test here if r belongs to an MCS of the form V, with the assumption that r is contained neither in an MCS of size 3 nor in an MCS of type I. Depending on whether the size of the MCS is 4, 5, or larger than 5, we describe three algorithms.
3.6.1. MCS of size 4 or 5
We first test if r belongs to an MCS of the form V of size 4 or 5.
In this step, it suffices to use two brute-force algorithms, Check_V_4 (See Algorithm 9) and Check_V_5) (see Algorithm 10), running in O(m3) time and O(m4) time, respectively to search for a quadruplet and a quintuplet of rows in G including r, satisfying the configurations shown in Figure 3.V_4 and V_5).
Proposition 8
Algorithm Check_V_4 is correct and runs in O(m3) time.
Proof
Algorithm Check_V_4 looks for an induced subgraph with 4 black vertices {r, ri, rj, rk} that are pairwise connected to each other. These 4 black vertices should be such that there exist three different couples of vertices among them, such that two couples are disjoint and the third one (called couple_kernel) overlaps the two first, and the 2 rows of each of these couples share a column that is not shared with the two other rows of the set. In this case, if {ri, rj, rk} is not an MCS, then the subgraph induced by {r, ri, rj, rk} and the 3 columns (white vertices) connected to the 3 couples of rows is of the form V, and is responsible for an MCS {r, ri, rj, rk}. Algorithm Check_V_4 looks for two cases, depending on whether r belongs to couple_kernel (lines 3–5) or not (lines 6–8).
Next, all the tests performed by Algorithm Check_V_4 (lines 2–9 of the algorithm) on a given triplet (ri, rj, rk) are achieved in O(1) thanks to the precomputations, and given r there might be O(m3) such triplets. Thus, Algorithm Check_V_4 runs in O(m3) time. ■
Next, for an MCS of size 5, we look for a quadruplet of rows (ri, rj, rk, rl) such that the set {r, ri, rj, rk, rl} is an MCS of the form V (Fig. 3.V_5). Algorithm Check_V_5 looks for an induced subgraph of the form V, consisting of 5 rows (black vertices) r, ri, rj, rk, rl that are pairwise connected, except for one missing edge, say (ra, rb) in {r, ri, rj, rk, rl} × {r, ri, rj, rk, rl}, and three columns (white vertices) satisfying the configuration of Figure 3.V_5.
Proposition 9
Algorithm Check_V_5 is correct and runs in O(m4) time.
Proof
Algorithm Check_V_5 looks for an induced subgraph with 5 black vertices {r, ri, rj, rk, rl} that are pairwise connected, except for one missing edge (ra, rb) in {r, ri, rj, rk, rl} × {r, ri, rj, rk, rl}. The 4 black vertices that belong to the set with r should correspond to a set of rows that is C1P. Moreover, there should exist two particular rows (black vertices) of the set, with three columns (white vertices) that satisfy the conditions on line 4 of the algorithm in order to fit the configuration depicted in Figure 3.V_5.
Next, all the tests performed by Algorithm Check_V_5 (lines 2–8 of the algorithm) on a given quadruplet (ri, rj, rk, rl) are achieved in O(1) thanks to the precomputations, and given r there might be O(m4) such triplets. Thus, Algorithm Check_V_5 runs in O(m4) time. ■
3.6.2. MCS of size larger than 5
An MCS of the form V of size larger than 5 contains exactly two kernels. Depending on whether r is a kernel or not, we distinguish two cases.
Case 1: If row r is a kernel of the MCS
Algorithm Check_V
k
(see Algorithm 11) recovers an MCS
Proposition 10
Algorithm Check_Vk is correct and runs in O(n2m2) time.
Proof
The proofs are similar to the proofs for the correctness and the complexity of Algorithm Check_IV
k
as the two algorithms are based on the same principle. However, here the complexity is multiplied by a factor n due to considering all black vertices
Case 2: If row r is not a kernel of the MCS
Algorithm Check_V p (See Algorithm 12) recovers an MCS S of the form V of size larger than 5 containing r, such that r is not a kernel of the MCS, with the assumptions that r is not contained in an MCS of size 3 or 4, and r does not belong to an induced chordless cycle of GR (Fig. 3.V).
The principle of the algorithm is similar to that of Algorithm Check_IV p . It consists in first choosing the two kernels (a, b) of the MCS S among the black vertices (rows) neighbors of r, and the column c, of the induced subgraph responsible for S, that is contained in both a and b, but in no other row of the MCS. Next, the algorithm calls Algorithm Check_V (see Algorithm 13) to look for the MCS S with r, (a, b), c, and G given as parameters.
Proposition 11
Algorithm Check_Vp is correct and runs in O(nm7) time.
Proof
In order to prove the correctness and the complexity of Algorithm Check_V p , we need to prove the correctness and give the complexity of Algorithm Check_V that is called in Check_V p .
The correctness of Check_V comes from the fact that r does not belong to any chordless cycle in the graph C computed at line 2 of the algorithm by assumption. Let Q be a chordless cycle in the graph D containing vertex r, computed at line 9 of the algorithm. As r does not belong to an induced chordless cycle of the C by assumption, then Q necessarily contains at least one edge belonging to the set EAB ∪ Ea ∪ Eb. We first give two trivial but useful properties for the remaining of the proof: (i) for any two edges of Q, there always exist two extremities u and v of these edges, one from each edge, that are not connected by an edge in the graph C, i.e., u ∩ v = ∅; (ii) V
A
⊆ V
b
and V
B
⊆ V
a
. We also prove the following useful property: (iii) V
a
⊆ (V
B
∪ V
b
) and V
b
⊆ (V
A
∪ V
a
). Let
We now prove that the cycle Q necessarily contains at most one edge of the set EAB ∪ Ea ∪ Eb. Indeed, let us suppose that Q contains two edges of EAB ∪ Ea ∪ Eb. Then let u, v be two disconnected extremities of these edges in the graph C [Property (i)]. We show that (u, v) necessarily constitutes an edge that is a chord for the chordless cycle Q in the graph D : contradiction.
(1) If (2) If (3) If (4) If (5) If (6) If (7) If
In consequence, there exists at most one edge, and then exactly one edge of the set EAB ∪ Ea ∪ Eb in the cycle Q in the graph D. Next, let (ri, rj) be the only edge of Q belonging to EAB ∪ Ea ∪ Eb. We show that
So we have
The correctness of Algorithm Check_V p follows immediately from the correctness of Algorithm Check_V. Algorithm Check_V calls Algorithm Check_I. Both algorithms have the same time complexity in O(m5) time. It follows immediately that Algorithm Check_IV p runs in O(nm7) time. ■
4. Conclusion
We describe an O(n2m2 + nm7) time algorithm for deciding if a given row r of an m × n 0-1 matrix M belongs to at least one MCS.
The algorithm consists of a precomputation phase that runs in O(nm4), followed by 6 consecutive steps in which we look for some MCS containing r, and constituting the set of rows of a forbidden induced subgraph in the row–column intersection graph of M (see Table 1). The core of the algorithm relies on reducing the recognition of complex forbidden induced subgraphs to the detection of induced cycles in ad-hoc graphs.
Footnotes
Acknowledgment
This work was partly supported by the French MAPPI project (ANR-2010-COSI-004). We would like to thank Nicolas Trotignon for his valuable comments on induced subgraphs and also Juraj Stacho for his participation in some meetings on the subject.
Disclosure Statement
The authors declare that no competing financial interests exist.
