Abstract
Abstract
The breakage-fusion-bridge (BFB) mechanism was proposed over seven decades ago and is a source of genomic variability and gene amplification in cancer. Here we formally model and analyze the BFB mechanism, to our knowledge the first time this has been undertaken. We show that BFB can be modeled as successive inverted prefix duplications of a string. Using this model, we show that BFB can achieve a surprisingly broad range of amplification patterns. We find that a sequence of BFB operations can be found that nearly fits most patterns of copy number increases along a chromosome. We conclude that this limits the usefulness of methods like array CGH for detecting BFB.
1. Introduction

The Breakage-Fusion-Bridge (BFB) mechanism.
When the fused chromosome is torn apart during anaphase, it likely does not tear exactly in the middle of the two centromeres. As a result, one daughter cell receives a chromosome with a terminal inverted duplication while the other receives a chromosome with a terminal deletion. After many BFB cycles, repeated inverted duplications can result in a dramatic increase in the copy number of segments of the unstable chromosome (Fig. 1B).
BFB's ability to amplify chromosomal segments suggests a role for the mechanism in cancer. Gene amplification is common in tumors. A recent review identified 77 genes whose amplification is implicated in cancer development (Santarius et al., 2010). Multiple lines of evidence indicate that BFB may be responsible for much of this amplification. Telomere dysfunction and crisis is associated with tumorigenesis (Artandi and DePinho, 2010). Such dysfunction is consistent with the initiation and continuation of BFB cycles. Some tumors also display the cytogenetic hallmarks of BFB: chromosomes that stretch across spindle poles during anaphase, dicentric chromosomes, and homogeneously staining regions. Thus, an improved understanding of BFB may shed light on how genomic instability leads to tumor formation and progression.
Observing BFB through classical cytogenetic techniques can be difficult and only provides coarse detail. Recent studies have begun to apply modern methods to the problem of detecting BFB and elucidating its role in generating genomic aberrations. Kitada and Yamasaki performed FISH and array CGH on a lung cancer cell line and showed that the pattern of amplification and rearrangement they observed was consistent with BFB (Kitada and Yamasaki, 2008). Later, Bignell et al. sequenced breakpoints in a breast cancer cell line and confirmed that the copy count and breakpoint patterns on 17q were consistent with a BFB model, purportedly the first demonstration of a sequence-level hallmark of BFB in human cancer (Bignell et al., 2007).
These studies, among others, illustrate the promise of new methods for gaining a more complete understanding of BFB. However, making observations that are consistent with a model does not allow one to conclude that the model is correct. Moreover, without a precise definition of the model under consideration, an investigator cannot use data to refine the model and may succumb to bias when considering evidentiary support for the model.
To address these concerns, we present perhaps the first formal description of the BFB mechanism. We use this formalization to consider the range of amplification patterns that can be produced by BFB. The underlying algorithmic problems are challenging and of unknown complexity. We develop heuristic algorithms and rules to determine if a given pattern is consistent with BFB, based on, among other observations, a tight connection between BFB patterns and trees with certain symmetries. The methods make the problems tractable for practical instances.
Using these methods, we show that BFB-associated amplification patterns are common. In fact, if some imprecision is allowed, a majority of possible patterns are consistent with BFB. This suggests that observing that an amplification pattern could have been produced by BFB is not conclusive evidence that BFB produced the amplification.
2. Formalizing The BFB Schedule
Our first task is to describe a model of the breakage-fusion-bridge mechanism that is both consistent with its biological features and is amenable to computational techniques. We begin by considering a chromosome arm that has lost its telomere. Suppose we label potentially unequally sized intervals along this chromosome arm
For illustration, suppose that the chromosome arm, after loss of the telomere, is composed of five intervals, ABCDE. Then, the chromosome is analogous to Figure 1Ba where A,B,C,D,E correspond to the magenta, cyan, green, red, and blue segments, respectively. After replication, fusion, and centromere separation, the whole arm has been duplicated to form a palindrome, EDCBAABCDE, as in Figure 1Bc. This palindromic stretch of DNA is then torn apart. Unless it breaks in the center, between the two A segments, one daughter cell will have a deletion and the other will have an inverted duplication. In Figure 1Bd, one daughter cell gets CDE while the other gets BAABCDE. Thus, BFB cycles can be thought of as operations on a string of chromosomal segments. The only significant restriction this creates is that breaks must occur at segment boundaries, but this is not a loss of generality since the segment boundaries can be defined freely. Moreover, BFB breakpoints may be reused in subsequent BFB cycles (Shimizu et al., 2005), so it is likely useful to keep track of candidate breakpoints.
We are now ready to describe our model. Let Σ (|Σ| = k) be an alphabet where each symbol corresponds to a chromosomal segment, and the ordering of Σ corresponds to the ordering of the segments from telomere to centromere. Let xt denote the string representing chromosomal segments after t BFB operations. Before any BFB operation, the string is just the initial segments of the chromosome in order. Therefore, x0 consists of all k lexicographically ordered characters from Σ. Let x−1 be the reverse of string x, pref(x) be a prefix of the string x, and suff(x) be a suffix of the string x.
Define a BFB-schedule as a specific sequence of BFB operations, that is, inverted prefix duplications and prefix deletions. Define x as a BFB(x0)-string if it can be generated from x0 through a series of BFB operations. For ease of notation, x0 is implied, and we refer to x being a BFB-string. The following simple lemmas establish basic properties and ensure that we do not need to worry about deletions.
Lemma 1
Let x be a BFB-string. If x 0 is a suffix of x, then x can be obtained from x 0 using only inverted prefix duplications.
If a prefix deletion removes some of the characters produced by an inverted prefix deletion, then the deletion's affect can be achieved by making the inverted prefix shorter. If the deletion removes all of the characters produced by an inverted prefix deletion, its affect can be achieved by omitting the inverted prefix deletion.
A prefix deletion will remove all or some of the characters produced by at least one previous prefix inverted duplication and all of the characters produced by any duplications after that inverted prefix duplication but before the prefix deletion. If the deletion is removed from the BFB schedule and the appropriate inverted duplications are removed or shortened, then the final string produced will remain the same. ▪
Lemma 2
3. Algorithms for BFB
We begin with a simple problem: Given string x of length n, determine if x is a BFB-string. This can be solved in O(n) time using the following algorithm:
CheckBFB(x)
Theorem 3
Algorithm CheckBFB checks if x is a BFB-string in O(n) time.
A linear time algorithm for finding maximum palindrome sizes is described in the appendix. The remainder of CheckBFB visits each character at most once, so CheckBFB is in O(n). ▪
For illustration, consider the palindrome array P1 for the string x1 = BAABCCBAAAABC. Starting with i = 1, we can advance i as 1 → 3 → 6 → 10 → 11→ True. For x2 = BBCCCCBBBAABC, we advance i as 1 → 5 → 6 → False.
Thus, if we are given the full ordering of segments of a chromosome arm, we can determine if BFB could have produced that ordering. However, such complete information is often not available from current technologies. For example, an array CGH or sequencing experiment may only give the count of each segment, not the order. So, we would like to determine if a pattern of copy counts of chromosomal segments could have been produced by BFB. Formally, we define a sequence of positive integer copy counts from telomere to centromere along a chromosome arm as a count-vector (
In this case, x4 has 6, 3, and 5 of characters A, B, and C. We now define our problem:
The BFB-count-vector Problem: Given a count vector
BFB-Pivot
Lemma 4
BFB-Pivot(
BFB-Pivot attempts to prepend both eligible characters to the existing string until either an acceptable string is found or the current string fails because it is not a BFB-string or the count of a character exceeds the count in the count vector. By checking all such candidate BFB-strings, BFB-Pivot is guaranteed to find a BFB-string satisfying
It is useful to consider a graphical representation of BFB-Pivot, shown in Fig. 2. As nodes of the same character are added, the color of the added node oscillates. For each leftmost node, two possible edges are considered, a left edge ( ← ) and either an up (

An illustration of BFB-Pivot searching for candidate BFB strings. (nextLetter and prevLetter return the character immediately lower or higher lexicographically, and firstChar returns the leftmost character of a string.) See The BFB-Pivot algorithm in the text.
Each BFB-string has a “lowest” character. For example, in the BFB-string CCCBBCCBAAAAAABC, that character is C. We will call this the base character. Base characters appear in pairs within a BFB-string, with all other characters appearing within these pairs. For example, the above BFB-string can be broken into CC, CBBC, and CBAAAAAABC. We will call these substrings bounded by base characters return-blocks. If the count of the base character of a BFB-string is even, then the entire BFB-string will be composed of return-blocks.
Lemma 5
Every return-block is a palindrome.
BFB-strings are produced by inverted prefix duplications. As a result, they are composed of palindromes such that the left edge of one palindrome is the center of a subsequent palindrome. Consider the rightmost RB of a BFB-string. It is flanked by two base characters. The only palindrome these characters could be a part of is one centered between them, so the rightmost RB is a palindrome.
Now consider an RB where all RBs to the right are palindromes. Either a palindrome center lies within the RB or no centers lie within the RB. In the latter case, the RB is the reverse of a palindromic RB to the right, and hence a palindrome. In the former case, we have a situation similar to that of the rightmost RB: the left base character can only be part of a palindrome centered at the center of the RB. Thus, the RB is a palindrome. ▪
Lemma 6
Every return-block is a BFB-string.
If there are palindrome centers within the RB, then there is a palindrome with a left edge within the RB. The portion of that palindrome that lies within this RB is a suffix of an RB to the right, and hence is a BFB-string. Subsequent BFB operations that extend this suffix to the full RB still results in a BFB-string. ▪
Now consider a count-vector
Lemma 7
The count-vector
As a result, we can focus on RB-BFB-schedules without loss of generality. Lemma 6 shows that BFB-strings have a recursive structure that allows us to represent them as trees (Fig. 3b,c), Consider a BFB-string for a count-vector

BFB-tree generated from an RB-BFB-schedule.
We use this idea to define rooted labeled trees. Each node is labeled so all nodes an identical distance from the root have the same label. For a node v with 2ℓ + 1 (odd) children, number the child nodes as
LabelTraverse(T(r))
A labeled tree is mirror-symmetric if for all nodes v, LabelTraverse(T(v)) = LabelTraverse(T(v))−1. In other words, ‘rotation’ at v results in the same tree. The labeled traversal of a tree T visits each node exactly twice, outputting its label each time. Define a partial order on the nodes of T in which u < v if the last appearance of u precedes the first appearance of v in the labeled traversal. For any u < v, let T < (u, v) denote the subtree of T containing the LCA of u, v and all nodes w s.t. u < w < v. We call (u, v), with u < v, a label-pair if u and v have the same label σ, and no node in T < (u, v) has label σ. A labeled tree is pair-symmetric if for all label-pairs (u, v), T < (u, v) is mirror-symmetric. Finally, we say a labeled tree has long-ends if starting from the root, and following the least numbered node at each step, we can reach each layer k,
Definition 1
A BFB-tree is a labeled tree with long-ends, mirror-symmetry, and pair-symmetry.
Theorem 8
Let T(r) be a BFB-tree rooted at r. Then LabeledTraversal (T(r)) is an RB-BFB-string.
Because T(r) has long-ends, we know that the label traversal of each return-block does not contain any characters lexicographically higher than its initial ordered substring. For example, the return-block DCBBCCBBCD begins with BCD and hence cannot contain any A's. So, if there are two consecutive return-blocks such that the highest character in the return-block on the left is not higher than the highest character in the return-block on the right, there is a palindrome centered between the two return-blocks that extends to another palindrome center on the left. For example, if we have DCBBCCBBCDDCBAABCD, we have two return-blocks: DCBBCCBBCD and DCBAABCD. And, between them, there is the palindrome BCDDCB whose left edge is the center of the palindrome CBBC. So, we are able to find a set of appropriately overlapping palindromes.
On the other hand, if there are two consecutive return blocks such that the highest character in the return-block on the left is higher than the highest character in the return-block on the right, pair-symmetry guarantees that there is still a palindrome that extends left to another palindrome center. For example, the string CBAABCCBBCCBAABC is composed of three return-blocks: CBAABC, CBBC, and CBAABC. Pair-symmetry ensures that there is a palindrome that reaches to the center of the leftmost return-block, in this case ABCCBBCCBA.
Thus, the structure of T(r) guarantees that its label traversal is composed of overlapping palindromes and is thus a BFB-string. ▪
Theorem 9
The tree T S derived from an RB-BFB-string S is a BFB-tree.
Denote the children of node v by the set
BFB-Tree(T, j, I)
Note that once a node is dependent (I(v) = 0), its descendants remain dependent. Further, the multiplicity of the node is a power of 2, and is doubled each time an independent node is a non-central child of its parent. As the multiplicities increase, the number of valid assignments decreases quickly, improving the running time in practice. Further analysis of the algorithm is presented in the appendix.
Consider a count-vector
Lemma 10
[Subsequence Rule] Let
For example, if (6,4,6,8) admits a BFB schedule, then so must (6,4,6), (6,6,8), (6,4,8), (4,6,8), etc.
Lemma 11
[Rule of One] If ∃ i < j ≤ k such that ni = 1, nj > 1, then
Lemma 12
[Odd-Even Rule] If ∃i ≤ j ≤ k such that ni is odd and nj is even, then
We will show that, in a string yielded by any BFB schedule, the number of A's to the right of any B is even. First consider the trivial case where the B in question is the rightmost B. Then the number of A's to the right of that B is zero, which is even.
Now, suppose there is only one run A's to the right of the B in question, B1. Then the string is
where m > = 0 and (A)n means a string of n A's.
In this case, B
Finally consider the case when there are arbitrarily many runs of A's after the given B, and all but the leftmost run contain an even number of A's
In this case B
Similar reasoning can establish that the count of B's to the right of any A is odd. When a BFB schedule begins, A can be duplicated arbitrarily many times before B is duplicated for the first time, yielding
In this case, the count of B's to the right of any A is 1, which is odd. Now, consider the case where there are only two runs of B's after a given A. One of those runs must be the original B, so we have
Using the reasoning above, n must be even because A1 could only have been generated after a copying of an A from (A) m , which would have doubled the B's leading to the run (B) n .
In the case where there are arbitrarily many runs of B's after an A, and all of the runs have an even number of B's except the leftmost run and the rightmost run of length 1. Then, the count of B's to the right of every A is odd.
Now, return to the question of whether a BFB schedule can create a final string with an odd number of A's and and even number of B's. The final string ends in either A or B. If it ends in A, then the count of B's is odd because the count of B's to the right of the last A must be odd. But, we want the count of B's to be even, so the string cannot end in A. If the string ends in B, then the total count of A's must be even, but we want the count of A's to be odd. So, there is no string resulting from a BFB schedule where the count of A is odd and the count of B is even.
By using the Subsequence Rule, we can extend this result to conclude that no odd count can precede an even count in a count-vector achievable by BFB. ▪
Lemma 13
[Rule of Four] Suppose, all counts in
Lemma 14
[Two Reduction] If ni = 2 for some i. Then,
Now, if d > 1 or e > 1, then D and E must also be duplicated the first time C is duplicated. If they were not, then the string after the C duplication would be C[AB]*CDE, and all D's and E's would be after a C and therefore ineligible to be part of any prefix inversion/duplication. So, the string must be EDC[AB]*CDE, and subsequent prefix inversion/duplications will not extend past the final ED. The subsequent BFB schedule must achieve counts of e − 1 and d − 1 for E and D, respectively. ▪
Lemma 15
[Four Reduction] If ∃ i < j < k s.t. ni = 4, nk = 4, and nj > 4. Then
Lemma 16
[Odd Reduction] Suppose all counts in
We also have a sufficient condition for BFB(
Lemma 17
[Count Threshold Rule] BFB
Once the rules have been applied, we apply BFB-Pivot or BFB-Tree. Note that some of these rules are reductions, that is, they reduce an instance of the count-vector problem to simpler instances. These can be applied multiple times. More detail is available in the appendix.
4. Results
Nearly all of the count-vectors that the rules could not resolve admitted a BFB schedule (Table 1), so BFB-Pivot and BFB-Tree could usually halt once an acceptable BFB string was found rather than exhausting all possible paths or trees.
We ran BFB-Pivot and BFB-Tree on each of the 333,980 count-vectors that could not be resolved by rules. The running times plotted against n for each algorithm is shown in Figure 4. As expected, both algorithms' worst-case running times grew exponentially with n. However, the worst case running times for BFB-Tree were orders of magnitude lower than for BFB-Pivot. For example, the longest running count-vector for BFB-Tree was [20,3,19,19,19] which took 10 seconds to complete. The longest running count-vector for BFB-Pivot was [20,18,20,20,6] which took 23,790 seconds to complete.

Pivot
On its face, this is a reasonable conclusion. Only 15% of the count-vectors with k ≤ 5 admit a BFB schedule. If we expand that analysis to the 64,000,000 count-vectors with k = 6 and ni ≤ 20, only 7.3% admit a BFB schedule. So, it is plausible that observing a count-vector that admits a BFB schedule is much more likely if BFB did in fact occur. However, using our model of BFB, we show that this inference is usually incorrect.
Note first that experimentally derived count-vectors are often imprecise. Experiment typically provides a small range of values for each element in the count-vector rather than a single definite value. This is a result of the imprecision of the experimental method as well as potentially high levels of structural variation or aneuploidy in the genome being studied. Therefore, the observation made about the data may not be that a particular count-vector admits a BFB schedule but that there is a count-vector that admits a BFB schedule “nearby,” that is, within experimental precision of the observed count-vector.
Further, the uncertainty in a count tends to increase with the magnitude of the count. It is easier to distinguish between copy counts 1 versus 2 than between 32 versus 33. For simplicity, we assume that the relationship between uncertainty and magnitude is linear and use the Canberra distance (Deza and Deza, 2009) to compare count-vectors:
For each of the 64,000,000 count-vectors with 6 segments, we searched for the nearest count-vector that admits a BFB schedule and recorded the distance to that count-vector. For the 7.2% of count-vectors that admit a BFB schedule, the distance was, of course, zero. The distribution of distances is shown in Figure 5.

The results are striking. Consider the count-vector [14, 7, 18, 16, 9, 12], which does not admit a BFB-schedule. The nearest count-vector that does is [14, 7, 19, 17, 9, 13], at a distance of .097. Half of all count-vectors tested were at least this close to a count-vector admitting a BFB-schedule. So, if the precision of the experimental method employed is such that it can not reliably discern between a copy count of 12 and a copy count of 13 or a copy count of 18 and a copy count of 19, then half of the count-vectors we examined would appear to admit a BFB schedule. Similarly, if the method can not reliably discern a copy count of 9 and and 8 or 7 and 8, then 70% of count-vectors will appear to admit a BFB schedule. Fig. 5 has distances and and example vector pairs for additional percentiles.
Thus, even with small amounts experimental uncertainty, a majority of count-vectors admit a BFB schedule, so mechanisms other than BFB are likely to produce count-vectors that look like they were created by BFB. Therefore, finding a count-vector consistent with BFB should only slightly increase one's belief that BFB occurred.
On the other hand, observing a count-vector that is distant from any BFB admitting count-vector, provides strong evidence that BFB was not the cause of the amplification. The likelihood of observing a count-vector that is distant from a count-vector that admits a BFB schedule is low in any case, and is surely even lower if the chromosome arm underwent BFB.
5. Discussion
We present perhaps the first formalization of the BFB mechanism. Our main result is that BFB can result in a surprisingly broad range of amplification patterns along a chromosome arm. Indeed, for most contiguous patterns of amplification, it is possible to find a BFB schedule that yields either the given pattern or one that is very similar. As a result, one must be cautious when interpreting copy count data as evidence for BFB. The presence of amplification at all, or the presence of a terminal deletion from a chromosome arm can both suggest the occurrence of BFB, though they may not distinguish well between BFB and other amplification hypotheses. However, unless the counts are known precisely, a specific pattern of copy counts along the chromosome arm cannot offer compelling support for BFB.
This study suggests several avenues for future work. There are other types of evidence that can be deployed to argue that BFB has occurred. FISH can reveal, to some extent, the arrangement of segments along a chromosome arm. Next-generation sequencing can reveal the copy counts of breakpoints between rearranged chromosomal segments and may allow for a fuller characterization of a chromosome arm. Methods similar to those presented in this paper may prove useful for evaluating and interpreting these different types of evidence. Modeling may also be helpful for evaluating proposed refinements to models of BFB.
Finally, we have outlined two algorithms for determining whether a count-vector admits a BFB schedule, but neither is polynomial in reasonable measures of the input size. This problem, along with related problems, is interesting as computational problems per se. We hope in the future to have either faster solutions to these problems or proofs of hardness.
6. Appendix
A. Proofs
Theorem 3
Algorithm CheckBFB checks if x is a BFB-string in O(n) time.
We would like to describe an algorithm that finds the radius of the largest palindrome centered at each interstice in a string in linear time. We will, in fact, describe two. The first lends itself to clear exposition and establishes that the problem can be solved in linear time. The second lends itself to easier implementation and is the method we used for results in this paper.
Our first algorithm has three steps and is similar to the algorithm described by Gusfield (1997).
1. Build a generalized suffix tree from the string and the reverse of the string, that is, a suffix tree that contains both suffixes and prefixes of the string. As usual, each leaf corresponding to a suffix beginning at position i is labeled i. Each leaf corresponding to a prefix ending at i is labeled −i. It is well known this can be done in linear time. 2. For each position in the string, i, find the lowest common ancestor of i and −i. Using Tarjan's algorithm, this can be done with linear preprocessing time and constant query time (Harel and Tarjan, 1984). 3. Find the distance from the root to the LCA. This is the size of the largest common prefix ending and i and suffix starting at
While this is indeed a linear time algorithm to find maximal palindromes, Tarjan's algorithm is not considered to be particularly implementable. So, we implemented another linear time algorithm that is similar to Manacher's algorithm (Manacher, 1975). This algorithm proceeds through each interstice in the string, from left to right. As each interstice, is visited, pairs of characters at the boundary of the current known palindrome are tested to see if that palindrome can be expanded. If it cannot, then the palindrome radius is appended to the R array. Then, for each interstice to the right lying in the radius of this palindrome, the corresponding maximal palindrome size for the interstice lying to the left is checked. If the maximal palindrome to the left lies completely within the radius of the current palindrome, then we know that the radius of the maximal palindrome to the right is equal to the radius of the maximal palindrome on the left. If the maximal palindrome on the left is a proper prefix of the current palindrome or it extends beyond the radius of the current palindrome, then the corresponding palindrome on the right has a radius that extends at least to the current palindrome's radius.
MaximalPalindromes
Note that if the first if clause is true, then ℓ is incremented by one. If the inner loop iterates all ℓ times, then ℓ is set to zero and i is increased by ℓ + 1. If the inner loop finds a palindromic suffix and breaks, then ℓ is set to ℓ − i′ − 1 and i is set to i + i′ + 2. In all three cases, i + ℓ is incremented by one in each iteration of the outer loop. Therefore the outer loop only iterates n times. The inner loop appends a value to R in iteration and thus is called at most n − 1 times. Thus, MaximalPalindromes is in O(n).
Finally, note that both algorithms presented above give us the radius of largest palindrome measured from the center of the palindrome. The algorithm CheckBFB uses the length of largest palindrome measured from its leftmost character to its center, so we need to convert the R array we get from MaximalPalindromes to the P array used by CheckBFB. This can easily be done in linear time:
RtoP
RtoP reads through the R array once. Each time it finds a nonzero palindrome radius, it updates the appropriate element in the P array. Then, it reads through the P array once and updates each value so it is the maximum of its current value or the appropriately decremented value for a palindrome that contains the element. ▪
B. Applying BFB rules
Some of the rules are “reductions” rather than simple tests if a count-vector admits a BFB schedule. This means that they show that a count-vector admits a BFB schedule if (and sometimes only if ) some set of simpler count-vectors all admit a BFB schedule. We therefore apply these rules in a recursive fashion outlined below:
ApplyRules
C. Analysis of BFB_tree
Let I(v) denote the multiplicity of a node v that is independent, and 0 otherwise. At step j of the algorithm, we must assign the nj nodes at the current level to the independent nodes at level j − 1. Let njv be the nodes assigned to node v. Then, all assignments must satisfy
And the total number of trees considered is bounded by
By definition, these assignments will satisfy mirror-symmetry, and the long-end property is easily checked as well. However, checking all pair-symmetry constraints at level j requires a single traversal of the tree.
However, the actual performance should be superior. The algorithm also requires maintaining the count-vector I. Note that I(r) = 1 for the root node r. For other nodes v with children
Note that once a node is dependent (I(v) = 0), its descendants remain dependent. Further, the multiplicity of the node is a power of 2, and is doubled each time an independent node is a non-central child of its parent. As the multiplicities increase, the number of assignments decreases quickly, so that in practice, there will be very few valid assignments after the first few levels.
Specifically, consider an input data-set with k + 1 levels, a single node at level k + 1, and a maximum of m nodes at each of the k levels.
let It be the multiplicity of the left-most node at level t from a random assignment. The long-ends property ensures that It ≠ 0, Then,
It is not hard to show that It almost doubles with each round, and reaches the maximum value m after log2(m) rounds. Once It = n, for some t there is only one assignment possible. Prior to that, the maximum number of assignments at each level is O(2
m
). Therefore, the number of labeled trees with long-ends and mirror symmetry is bounded by
Footnotes
Acknowledgments
This research was supported in part by grants from the NIH (5RO1-HG004962) and the NSF (CCF-1115206).
Disclosure Statement
No competing financial interests exist.
