Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies. For example, such metrics are used to assess the quality of phylogeny construction methods, to define optimization criteria in supertree building algorithms, or to find horizontal gene transfer (HGT) events. Among the set of metrics described so far in the literature, the most commonly used seems to be the Robinson–Foulds distance. In this article, we define a new metric for rooted trees—the Matching Pair (MP) distance. The MP metric uses the concept of the minimum-weight perfect matching in a complete bipartite graph constructed from partitions of all pairs of leaves of the compared phylogenetic trees. We analyze the properties of the MP metric and present computational experiments showing its potential applicability in tasks related to finding the HGT events.
1. Introduction
Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies (Boc et al., 2010; Koonin et al., 2011; Smith et al., 2013; Whidden et al., 2014; Chaudhary et al., 2013). The most common application of such distances is measuring the difference of trees inferred from the same data but using different methods, for example, the distance between the “true” tree and a tree reconstructed using a particular method can be used as an indicator of the particular method's accuracy [see Bogdanowicz et al. (2012)].
Metrics are also used directly in some of phylogenetic tree construction methods, for example, in super trees construction, whereby the sum of distances in a particular metric between input trees, whose leaf sets overlap partially, and a super tree (containing all the taxa) is an object of minimization [see Lin et al. (2009); Bansal et al. (2010); Chaudhary et al. (2013); Whidden et al. (2014)]. Quantifying similarities between phylogenies is also useful in an analysis and visualization of a group of phylogenetic trees (Hillis et al., 2005).
The values of some metrics are defined as a minimum number of operations of a particular type needed to transform one phylogenetic tree into another, for example, nearest neighbor interchange, subtree prune and regraft (SPR), and tree bisection and reconnection operations. In particular, SPR metric can be used in modeling and analyzing horizontal gene transfers (HGTs) (Section 5). Unfortunately, the computation of the value of all three metrics is proven to be NP-hard (DasGupta et al., 1997; Allen and Steel, 2001; Bordewich and Semple, 2005; Hickey et al., 2008). Fortunately, there are methods for the analysis of HGTs that incorporate other polynomially computable distances (Boc et al., 2010). As metrics are important tools in phylogenetics, properties of the metric spaces are often subjects of ongoing research (Gordon et al., 2013; Humphries and Wu, 2013).
Recently (Bogdanowicz and Giaro, 2012), we described a general framework for defining phylogenetic metrics, which is based on computing a minimum-weight perfect matching in bipartite graphs. Each partition of such a graph describes one of the compared trees. The weights of the edges of the graph represent the distance between the elements (e.g., splits and clusters) used to describe analyzed trees. Using this framework, we described two metrics: the Matching Split (MS) distance for unrooted trees (Bogdanowicz and Giaro, 2012) and the Matching Cluster (MC) distance for rooted trees (Bogdanowicz and Giaro, 2013). Both metrics are polynomially computable and can be regarded as generalizations of the Robinson–Foulds (RF) metric (Robinson and Foulds, 1981). Recently, another generalization of RF metric was formulated (Böcker et al., 2013). This metric is also based on the concept of matchings similar to MC, but with additional assumptions related to the phylogenetic information that, unfortunately, makes computation NP-hard (Böcker et al., 2013).
Although the concept of matching metrics is relatively new, the MS distance constructed based on this idea has already been found useful in biological studies (Smith et al., 2013). Both the MS and MC metrics have some advantages in comparison with the RF metric (despite the fact that they operate on the same tree representation, i.e., splits or clusters). They are more discriminative. Both metrics have a property that a relocation of a small subtree results in a relatively small distance increase (Bogdanowicz and Giaro, 2012, 2013). Note that the RF metrics do not have such a feature (Bogdanowicz and Giaro, 2012), because it is possible for two trees to have a maximum RF distance although they only differ in the position of a single leaf. It should also be noted that there is no simple answer to the question about “the best” phylogenetic metric in general. Depending on application, some of the phylogenetic metric's features can be more or less desired, for example, if the main criterion is computational efficiency, then most probably the RF distance will be the most suitable, but if discriminative power is considered, then other metrics (e.g., MS) might be a better choice.
In this article, we define and analyze the properties of a new phylogenetic metric based on the already mentioned matching concept—the Matching Pair (MP) distance (Section 3). We show that the MP distance has many interesting features (Section 4), for example, similarly to MC, a small change in a tree structure results in a small change in the MP distance, which does not hold for instance for the RF metric, but in contrast to MC (and MS), average MP distance between random trees grows as fast as its diameter. Moreover, MP performs very well in comparison with five other metrics, known in the literature, when applied to the detection of HGTs (Section 5).
2. Basic Definitions and Notation
For sets A, B let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \oplus B = ( A \backslash B ) \cup ( B \backslash A )$$
\end{document} be their symmetric difference, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert A \vert$$
\end{document} denote the cardinality of set A, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${2^A} = \{ B:B \subseteq A \} $$
\end{document}. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$G = ( V , E )$$
\end{document} be a graph with a set of vertices V and a set of edges E. A family of nonempty sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${A_1} , \ldots , {A_k}$$
\end{document} such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bigcup \nolimits_{i = 1}^k {A_i} = A$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${A_i} \cap {A_j} = \emptyset$$
\end{document} for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i \ne j$$
\end{document} is a partition of the set A. A bipartite graph\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$G ( {V_1} , {V_2} , E )$$
\end{document} has vertices partitioned into two disjoint sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${V_1} \cup {V_2} = V$$
\end{document} such that no two vertices within the same set are adjacent. A bipartite graph is complete if every two vertices \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${v_1} \in {V_1}$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${v_2} \in {V_2}$$
\end{document} are adjacent. A tree is a connected acyclic graph.
A matching\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$M \subseteq E$$
\end{document} in a graph \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$G = ( V , E )$$
\end{document} is a set of pairwise nonadjacent edges; that is, no two edges share a common vertex. A perfect matching covers all vertices of the graph. If we assign a weight function \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$w:E \to { \mathbb{Z}_{ \ge 0}}$$
\end{document} to the edges of G, then a minimum-weight perfect matching is defined as a perfect matching, where the sum of the weights of its edges has a minimum value. Minimum-weight perfect matchings in bipartite graphs can be computed efficiently in time \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( \vert E \vert \sqrt { \vert V \vert } \log ( \vert V \vert \mathop { \max } \nolimits_{e \in E} w ( e ) ) )$$
\end{document} (Gabow and Tarjan, 1989; Orlin and Ahuja, 1992).
A rooted phylogenetic tree\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T = ( V , E )$$
\end{document} is a tree whose leaves, that is, vertices (nodes) of degree 1, are labeled bijectively by the elements of a finite set L (representing the species), there is exactly one distinguished nonleaf vertex \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r ( T ) \in V \backslash L$$
\end{document} called the root and none of the vertices of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$V \backslash \{ r ( T ) \} $$
\end{document} has degree 2.
Present-day species under examination form the finite set L and are represented by leaves of a tree. Internal vertices, that is, members of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$V \backslash L$$
\end{document}, represent hypothetical ancestors of the taxa of L. In particular, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r ( T )$$
\end{document} is the ancestor of all species under study. For the sake of simplicity, we can identify the leaves with their labels, that is, for a phylogenetic tree T, by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L ( T )$$
\end{document} we denote the set of leaves of T or the set of labels of those leaves. By \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L^{ ( 2 ) }}$$
\end{document} we denote the set of all unordered pairs of leaves, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L^{ ( 2 ) }} = \{ \{ x , y \} :x , y \in L , x \ne y \} $$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert {L^{ ( 2 ) }} \vert = \vert L \vert ( \vert L \vert - 1 ) / 2$$
\end{document}.
A rooted binary phylogenetic tree is a rooted phylogenetic tree such that the root has degree 2 and all other internal vertices have degree 3. By RL and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R_L^B$$
\end{document} we denote the sets of all rooted phylogenetic trees and all rooted binary phylogenetic trees over the set of leaves L, respectively. For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = \{ 1 , \ldots , n \} $$
\end{document}, we use the notations Rn and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R_n^B$$
\end{document}. A rooted tree T defines a partial order relation of being descendant (and ancestor) on its vertices, denoted by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \le _T}$$
\end{document}. For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$a , b \in V ( T ) $$
\end{document}, we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$a{ \le _T}b$$
\end{document} (i.e., a is a descendant of b and b is an ancestor of a) if the path in T from a to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r ( T )$$
\end{document} contains b. In particular, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$v{ \le _T}r ( T )$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$v{ \le _T}v$$
\end{document} for any \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$v \in V ( T )$$
\end{document}. To every vertex v, we can assign its cluster\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$c ( v ) \subseteq L$$
\end{document}, that is, the set of leaves (labels) that are descendants of v. There are \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert L \vert + 1$$
\end{document}trivial clusters in a tree T that are related to leaves u (where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$c ( u ) = \{ u \} $$
\end{document}) and to the root (where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$c ( r ( T ) ) = L ( T )$$
\end{document}), all other clusters are nontrivial. By \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\sigma ( T )$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \sigma _*} ( T )$$
\end{document}, we denote families of all clusters of T and all nontrivial clusters of T, respectively. A rooted phylogenetic tree T is uniquely described by a set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \sigma _*} ( T )$$
\end{document} and the translation between these two descriptions can be performed efficiently in linear time [see Semple and Steel (2003) section 3.5].
Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \subseteq L$$
\end{document}, and let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T ( A )$$
\end{document} be a minimal subgraph of T that connects leaves of A and choose its root as the vertex closest to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r ( T )$$
\end{document}. The subtree of T induced by A is a tree \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{ \vert A}} \in {R_A}$$
\end{document} obtained from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T ( A )$$
\end{document} by successively removing all vertices of degree 2 (with exception of the root) and identifying their adjacent edges. The tree \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{ \vert A}}$$
\end{document} contains all phylogenetic information about evolutionary history of taxa from A, which is represented by T. A tree \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T \in {R_A}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \subseteq L$$
\end{document}, is an agreement subtree for trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document} if \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T = {T_{1 \vert A}} = {T_{2 \vert A}}$$
\end{document}. An agreement subtree having maximum number of leaves is called a Maximum Agreement Subtree (MAST) (Finden and Gordon, 1985).
The lowest common ancestor (LCA), also called the most recent common ancestor, of a pair of leaves \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$u , v$$
\end{document} of a rooted tree T, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$l ( u , v )$$
\end{document}, is the closest vertex to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r ( T )$$
\end{document} on the path connecting u and v in T. To every internal vertex v of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T \in {R_L}$$
\end{document}, we can assign a set of pairs of leaves \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$lp ( v )$$
\end{document} for which v is the LCA, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$lp ( v ) = \{ \{ x , y \} \in {L^{ ( 2 ) }}:l ( x , y ) = v \} $$
\end{document}. We will call the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$lp ( v )$$
\end{document} the pair set of v. By \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( T )$$
\end{document}, we denote the family of all pair sets of T, so \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( T )$$
\end{document} is a partition of the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L^{ ( 2 ) }}$$
\end{document} determined by T. Note that a rooted phylogenetic tree T is uniquely described by a set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( T )$$
\end{document} because \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( T )$$
\end{document} determines \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \sigma _*} ( T )$$
\end{document} and we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$c ( v ) = \bigcup \nolimits_{z \in lp ( v ) } z$$
\end{document} for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$v \in V \backslash L$$
\end{document}.
To compare phylogenetic histories represented by trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document}, the structure of a metric space in the set RL is introduced. One of the most widely used metrics on a set RL is the RF distance (Robinson and Foulds, 1981) based on clusters.
Definition 1.The RF distance between two rooted trees\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document}is defined as*
Before we proceed to the formal definition of a new metric, we recall the general construction of matching metrics presented in Bogdanowicz and Giaro (2012).
Definition 2.There are given a finite set D, an element\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O \,\notin\, D$$
\end{document}, and a metric h on\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$D \cup \{ O \} $$
\end{document}. We define a metric\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_h}\,{:2^D} \times {2^D} \to { \mathbb{R}_{ \ge 0}}$$
\end{document}, where the distance between\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A , B \in {2^D}\ {d_h} ( A , B )$$
\end{document}is equal to the value of a minimum-weight perfect matching in a complete bipartite graph\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$G = ( {V_1} , {V_2} , E )$$
\end{document}defined as follows:
• for arbitrary\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s , t \in { \mathbb{Z}_{ \ge 0}}$$
\end{document}such that\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s - t = \vert A \vert - \vert B \vert$$
\end{document}, we define the sets\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*} & {V_1} = \{ {a_1} , \ldots , {a_{ \vert A \vert }} , {a_{ \vert A \vert + 1}} , \ldots , {a_{ \vert A \vert + t}} \} , \\ & {V_2} = \{ {b_1} , \ldots , {b_{ \vert B \vert }} , {b_{ \vert B \vert + 1}} , \ldots , {b_{ \vert B \vert + s}} \} , \end{align*}
\end{document}
as the vertices partitions of the graph\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$G ( {V_1} , {V_2} , E )$$
\end{document}and vertex labeling\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$l:{V_1} \cup {V_2} \to D \cup \{ O \} $$
\end{document}, so that\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*} & A = \{ l ( {a_i} ) :1 \le i \le \vert A \vert \} , \\ & B = \{ l ( {b_j} ) :1 \le j \le \vert B \vert \} , \end{align*}
\end{document}
and\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$l ( {a_i} ) = l ( {b_j} ) = O$$
\end{document}for\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert A \vert + 1 \le i \le \vert A \vert + t$$
\end{document},\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert B \vert + 1 \le j \le \vert B \vert + s$$
\end{document};
• the weights of the edges are defined using the metric h as\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$w ( \{ {a_i} , {b_j} \} ) = h ( l ( {a_i} ) , l ( {b_j} ) )$$
\end{document}.
The function dh is a metric on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${2^D}$$
\end{document} (i.e., for any \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A , B , C \in {2^D}$$
\end{document} holds \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_h} ( A , B ) = 0 \Leftrightarrow A = B$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_h} ( A , B ) = {d_h} ( B , A )$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_h} ( A , B ) + {d_h} ( B , C ) \ge {d_h} ( A , C )$$
\end{document}) and the value of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_h} ( A , B )$$
\end{document} does not depend on s and t (when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s - t = \vert A \vert - \vert B \vert$$
\end{document}). Hence, we can always assume that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\min \{ s , t \} = 0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\max \{ s , t \} = \vert \vert A \vert - \vert B \vert \vert$$
\end{document}. Moreover, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_h} ( A , B ) = {d_h} ( A \backslash B , B \backslash A )$$
\end{document} (Bogdanowicz and Giaro, 2012).
In two recent articles (Bogdanowicz and Giaro, 2012, 2013), we described properties of phylogenetic metrics created according to the shown schema, that is, the MS distance for unrooted trees and the MC distance for rooted tree. In the case of the matching metric for rooted trees, the dissimilarity between two clades \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A , B$$
\end{document} can be measured as the number of elements that appear in one of the clade but not in the other, that is, the cardinality of the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \oplus B$$
\end{document}. As the cardinality of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \oplus B$$
\end{document} introduces a metric space structure in an arbitrary family of finite sets, we obtain (Bogdanowicz and Giaro, 2013) the following:
Definition 3.Let\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document}be rooted phylogenetic trees,\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${h_C}\,{:2^L} \times {2^L} \to { \mathbb{Z}_{ \ge 0}}$$
\end{document}be such that\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${h_C} ( A , B ) = \vert A \oplus B \vert$$
\end{document}, and let\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O = \emptyset$$
\end{document}. According to Definition 2 we define the MC distance\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MC}}:{R_L} \times {R_L} \to { \mathbb{Z}_{ \ge 0}}$$
\end{document}as
In this work, we want to recall the definition of a metric introduced in Bogdanowicz (2008), which, similarly to MC, is based on the schema presented in Definition 2. The rest of the article contains a detailed study of properties of the metric.
Definition 4.Let\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document}be rooted phylogenetic trees,\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${h_P}\,{:2^{{L^{ ( 2 ) }}}} \times {2^{{L^{ ( 2 ) }}}} \to { \mathbb{Z}_{ \ge 0}}$$
\end{document}be such that\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${h_P} ( A , B ) = \vert A \oplus B \vert$$
\end{document}, and let\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O = \emptyset$$
\end{document}. According to Definition 2 we define the MP distance\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MP}}:{R_L} \times {R_L} \to { \mathbb{Z}_{ \ge 0}}$$
\end{document}as
Observe that the MP distance between phylogenetic trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document} is equal to the transfer distance between partitions \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_1} )$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_2} )$$
\end{document} of the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L^{ ( 2 ) }}$$
\end{document}. The transfer distance between two partitions of a finite set S, equal to the minimum number of elements that must be moved from one set of a partition to another (possibly empty) to turn first partition into the second (Guénoche, 2011) has been introduced by Régnier (1965). The equivalence between the value of a minimum-weight perfect matching in a bipartite graph constructed according to Definitions 2 and 4 and the value of the transfer distance has been proven by Day (1981), for more details about the relation, see also Charon et al. (2006); Denœud (2008); Gusfield (2002); and Konovalov et al. (2005). An equivalent definition of the transfer distance between two partitions P and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$P ^\prime$$
\end{document} of a set S, which can be found in Gusfield (2002), says that the distance is equal to “the minimum number of elements that must be deleted from S, so that the two induced partitions (P and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$P ^\prime$$
\end{document} restricted to the remaining elements) are identical.”
For example, we calculate the MP distance between trees as shown in Figure 1. The weight of a minimum-weight perfect matching in a bipartite graph shown in Figure 1 is equal to 6, so \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MP}} ( {T_1} , {T_2} ) = 3$$
\end{document}. The MP metric between two trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_n}$$
\end{document} can be computed in time \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( {n^{5 / 2}} \log ( n ) )$$
\end{document} using scaling algorithms for finding minimum-weight perfect matchings (Gabow and Tarjan, 1989; Orlin and Ahuja, 1992).
Calculation of the MP distance between trees T1 and T2. The bipartite graph of their partitions has a perfect matching of minimum weight equal to 6. MP, matching pair.
Note that, although both the MP and MC metrics are defined as the weight of a minimum-weight perfect matching in a bipartite graph, there is no direct relationship between \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MC}}$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MP}}$$
\end{document}. In other words, if we know \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MC}} ( T , T ^\prime )$$
\end{document}, we cannot compute easily \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MP}} ( T , T ^\prime )$$
\end{document}.
4. Properties of The Matching Pair Distance for Rooted Phylogenetic Trees
4.1. Small topological transformations
In the case of the MC metric (as well as for the MS metric for unrooted trees), it was shown that adding a new leaf to compared trees can decrease their distance (Bogdanowicz and Giaro, 2012, 2013). Regarding this feature, the MP metric behaves more similarly to RF metric, which is expressed in Theorem 1 and Corollary 2. Before we present the theorem, we give two lemmas that we then use in the proof of the theorem.
An example of trees at a distance of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( 1 / 2 ) {n^2} - ( 3 / 2 ) n + 2$$
\end{document} in the MP metric.
Proof. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = A \cup \{ z \} $$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${A_z} = \{ \{ x , z \} :x \in A \} $$
\end{document}. We have ■
Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$M = \bigcup \nolimits_{i = 1}^k \{ ( {g_i} , {h_i} ) \} $$
\end{document}, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${g_i} \in \gamma ( {T_1} ) \cup \{ \emptyset \} $$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${h_i} \in \gamma ( {T_2} ) \cup \{ \emptyset \} $$
\end{document}, be an optimal pairing that defines the distance \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MP}} ( {T_1} , {T_2} )$$
\end{document}. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$M ^\prime$$
\end{document} be a pairing formed based on M by removing from it all leaf pairs containing z, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$M ^\prime = \bigcup \nolimits_{i = 1}^k \{ ( {g ^\prime _i} , {h ^\prime _i} ) \} $$
\end{document}, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${g ^\prime _i} = {g_i} \backslash {A_z}$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${h ^\prime _i} = {h_i} \backslash {A_z}$$
\end{document}. Note that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$M ^\prime$$
\end{document} is a pairing of elements of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{1 \vert A}} ) \cup \{ \emptyset \} $$
\end{document} with sets of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{2 \vert A}} ) \cup \{ \emptyset \} $$
\end{document}. Moreover, each pair set of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{1 \vert A}} )$$
\end{document}, as well as each element of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{2 \vert A}} )$$
\end{document}, appears in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$M \prime$$
\end{document} exactly once. Hence, we have
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*} { d_ { MP } } ( { T_1 } , { T_2 } )& = \frac { 1 } { 2 } \mathop \sum \limits_ { i = 1 } ^k \vert { g_i } \oplus { h_i } \vert \\ & = \frac { 1 } { 2 } \mathop \sum \limits_ { i = 1 } ^k \left( { \vert { { g \prime } _i } \oplus { { h ^\prime } _i } \vert + \vert ( { g_i } \cap { A_z } ) \oplus ( { h_i } \cap { A_z } ) \vert } \right) \\ & \ge \frac { 1 } { 2 } \mathop \sum \limits_ { i = 1 } ^k \vert { g \prime _i } \oplus { h ^\prime _i } \vert \ge { d_ { MP } } ( { T_ { 1 \vert A } } , { T_ { 2 \vert A } } ) . \end{align*}
\end{document}
Tightness of this inequality is straightforward, which can be seen, for example, for trees defined so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} = {T_2} \in {R_L}$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{1 \vert A}} = {T_{2 \vert A}}$$
\end{document}.
The second inequality follows directly from the fact that the MP distance is a special case of the transfer distance and Equation (6), because \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert {A_z} \vert= n - 1$$
\end{document}.
Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = \{ {a_1} , {a_2} , \ldots , {a_n} \} $$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${a_n} = z$$
\end{document}, hence \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A = L \backslash \{ {a_n} \} $$
\end{document}. For a tight example of the inequality, consider trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T ^\prime _1} , {T ^\prime _2} \in R_A^B$$
\end{document} constructed as presented in Figure 2 but for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n - 1$$
\end{document} leaves. Note that trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_3} , {T_4} \in R_L^B$$
\end{document} (Fig. 3) can be constructed by adding leaf an to trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T ^\prime _1} , {T ^\prime _2}$$
\end{document}, respectively. Hence, by Lemmas 1 and 2, we obtain that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MP}} ( {T_3} , {T_4} ) - {d_{MP}} ( {T ^\prime _1} , {T ^\prime _2} ) \ge n - 1$$
\end{document}. ■
Proof. The proof follows from repeated application of Theorem 1. ■
It seems to be a very natural property, a similar inequality holds for other classical metrics for phylogenetic trees, for example, RF and Triples metrics (Critchlow et al., 1996), but it is not fulfilled, for example, for MC (Bogdanowicz and Giaro, 2013).
Corollary 3.Let\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document}and N be the number of leaves in MAST for trees\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2}$$
\end{document}. The following inequality holds:
Proof. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$X \subset L , \vert L \vert = n , \vert X \vert = N$$
\end{document}, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_*} \in {R_X}$$
\end{document} be a MAST for trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_L}$$
\end{document}, so \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_*} = {T_{1 \vert X}} = {T_{2 \vert X}}$$
\end{document} and T1 (as well as T2) can be obtained from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_*}$$
\end{document} by a sequence of additions of leaves from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L \backslash X$$
\end{document}. Based on Equation (5) from Theorem 1, it is clearly seen that by adding a single leaf to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_*}$$
\end{document} we can obtain trees at the MP distance of at most \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$N$$
\end{document}. Therefore, by adding \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n - N$$
\end{document} leaves to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_*}$$
\end{document} we can obtain trees at the MP distance less than or equal to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$N + ( N + 1 ) + \cdots + ( n - 2 ) + ( n - 1 ) = ( {n^2} - n - {N^2} + N ) / 2$$
\end{document}. ■
The metric space determined by the MC distance has an interesting feature, which we can call “regularity” or “lack of isolated islands” (Bogdanowicz and Giaro, 2013), that is, any two trees of RL can be connected by a sequence of trees defined so that any two subsequent elements of it are at a distance of 4 at the most. The MP metric, however, does not have this property.
Lemma 3.For each\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s \in { \mathbb{Z}_{ > 0}}$$
\end{document}and a finite set L with\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert L \vert > 8s$$
\end{document}, there exists a partition\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$S \cup S ^\prime = {R_L}$$
\end{document},\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$S \cap S ^\prime = \emptyset$$
\end{document}such that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{ \forall _{T \in S}}{ \forall _{T ^\prime \in S ^\prime }}{d_{MP}} ( T , T ^\prime ) > s.
\end{align*}
\end{document}
Note that there are metrics, for example, Quartet metric (QT) (Estabrook et al., 1985), whereby even the minimal distance between different trees grows together with the number of leaves. In the case of MP, such a situation does not hold. Moreover, for any tree \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T \in {R_L}$$
\end{document}, there exists a tree \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T ^\prime \ne T \in {R_L}$$
\end{document} close to T, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${d_{MP}} ( T , T ^\prime ) \le 4$$
\end{document}. The same remains true for binary trees, that is, if \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T \in R_L^B$$
\end{document} then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T ^\prime$$
\end{document} can be chosen from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R_L^B$$
\end{document}. Such a tree can be constructed based on T using one of the operations presented in Figure 4. The operations 1 and 2 are feasible within a family of binary trees. The MP distance between trees after performing operation 1 is equal to 4, the distance between trees taking part in operations 2 and 3 equals 1. Therefore, operations 2 and 3 lead to the creation of a neighboring tree (a tree being at the minimal positive distance) in MP.
Operations that create close trees. By letters Ai, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , \ldots , k$$
\end{document} we denote single leaves or rooted subtrees.
4.2. Matching pair metric space diameter
One of the important and frequently studied properties of any phylogenetic metric space is its diameter, that is, the maximum distance between n-leaf trees. Diameters of phylogenetic metrics indicate the range, in which a particular metric defines differences between trees. Usually, more discriminative metrics have larger diameters, provided that the minimum distance between distinct trees is constant or grows much slower than the diameter. In this section we present bounds on the maximal distance in the MP metric.
Theorem 2 (Charon et al., 2006). Let\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$P , Q$$
\end{document}be partitions of a set X having the cardinality at most p and q,\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$p \le q$$
\end{document}, respectively. The maximum transfer distance between such partitions is equal to\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert X \vert - \lceil \vert X \vert / q \rceil$$
\end{document}.
Theorem 3.The maximal distance in the MP metric between trees of Rn can be characterized as
Proof. By Corollary 1, the lower bound is realized by the trees shown in Figure 3. The upper bound can be obtained using Theorem 2 by taking \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$P = \gamma ( {T_1} )$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$Q = \gamma ( {T_2} )$$
\end{document}, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert X \vert = n ( n - 1 ) / 2$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$q < n$$
\end{document}. ■
Construction of distant trees in the MP metric is presented in Figure 3. The trees are at maximum possible MP distance in Rn for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$3 \le n \le 6$$
\end{document}. We suspect that Figure 3 shows trees at maximum possible MP distance also for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n \ge 7$$
\end{document} and Equation (7) is tight.
4.3. Distances of random trees
To fully interpret the level of dissimilarity of two trees using the value of the distance between them in a particular metric, a reference value is usually needed. In most cases, the average distance between random trees generated according to a particular model can be used for such purposes. In this section, we consider the distance in the MP metric between binary phylogenetic trees drawn independently from any label-invariant model. A model of random phylogenetic trees is label invariant if the probability of a tree remains constant under an arbitrary permutation of its taxa labels (Steel and Penny, 1993). This is a natural assumption stating that the probability of a particular phylogenetic tree depends on its shape and does not depend on labeling of its leaves. It is easy to observe that the following lemma holds.
Lemma 4.Let\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \subseteq L$$
\end{document}and T be a tree chosen randomly from\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R_L^B$$
\end{document}according to any label-invariant model. Then, the probability distribution of trees\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{ \vert A}}$$
\end{document}over\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R_A^B$$
\end{document}is also label invariant.
Two models of random phylogenetic trees can be distinguished as those that are most often used in the literature: the uniform model, whereby equal probability is assigned to each possible tree, and the Yule model, whereby trees are constructed iteratively: starting from three random taxa, new taxa (chosen randomly) are added to a branch connected to a leaf (also chosen uniformly at random). Both models are label invariant [see McKenzie and Steel (2000) for more information].
The main result in this section, expressed in Theorem 4, concerns estimation of an asymptotic behavior of the expected MP distance between two random binary trees. To prove this fact, we utilized a combinatorial tool called Steiner triple system [see Colbourn and Dinitz (2006)], which is a pair \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( V , \mathcal{B})$$
\end{document} of a set V together with a collection \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathcal{B}$$
\end{document} of three-element subsets (called blocks or triples) of V with the property that every two-element subset of V occurs in exactly one block \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$B \in \mathcal{B}$$
\end{document}.
Lemma 5 (Kirkman, 1847). If V is a set of cardinality\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = \vert V \vert > 1$$
\end{document}such that\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert V \vert \equiv 1 \bmod 6$$
\end{document}or\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert V \vert \equiv 3 \bmod 6$$
\end{document}, then there exists a Seiner triple system, in which\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert {\mathcal{B}} \vert = n ( n - 1 ) / 6$$
\end{document}.
A term migrated pair has been introduced by Chen (2012) in a general context of partition distance. For two partitions P1 and P2 of a fine set V, a set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ s , t \} \in {V^{ ( 2 ) }}$$
\end{document} is a migrated pair if s and t are in the same set in one partition and in different sets in the other. It is easy to see from Chen (2012) that the cardinality of any set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$S \subseteq {V^{ ( 2 ) }}$$
\end{document} of disjoint migrated pairs is a lower bound on a partition distance between P1 and P2, because at least one element from any migrated pair has to be deleted to make considered partitions identical.
Theorem 4.For rooted trees\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{{1_n}}} , {T_{{2_n}}} \in R_L^B$$
\end{document},\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = \vert L \vert$$
\end{document}generated independently at random according to any label-invariant model, for example, the Yule model or the uniform model, their expected distance is
Proof. The upper bound \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( {n^2} )$$
\end{document} on the expected distance follows from Theorem 3. Therefore, we have to prove that the lower bound is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Omega ( {n^2} )$$
\end{document} and by Theorem 1, Equation (4), and Lemma 4 we may also assume that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = 6k + 1$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k \in {Z_{ > 0}}$$
\end{document}. We consider a tree induced by unordered triples \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ x , y , z \} \subseteq L$$
\end{document} of leaves. There are only three possible subtrees (Figure 5). Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L^{ ( 3 ) }} = \{ \{ x , y , z \} :x , y , z \in L , x \ne y \ne z \ne x \} $$
\end{document} be a family of all unordered triples of leaves of L. We define the indicator function \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$I:{L^{ ( 3 ) }} \times R_L^B \times R_L^B \to \{ 0 , 1 \} $$
\end{document} such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$I ( \{ x , y , z \} , {\hat T} , \check{T}) = 1$$
\end{document} if and only if induced trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \hat T_{ \vert \{ x , y , z \} }}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\check{T}_{ \vert \{ x , y , z \} }$$
\end{document}are different.
The three possible trees having \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ x , y , z \} $$
\end{document} leaf set.
Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( L , \mathcal{B} )$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert L \vert = n$$
\end{document} be a Steiner triple system. Such a system exists based on Lemma 5 and the fact that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = 6k + 1$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k \in {Z_{ > 0}}$$
\end{document}. Consider a triple \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$B = \{ x , y , z \} \in \mathcal{B}$$
\end{document}. If the set B induces different subtrees in the compared trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{{1_n}}} , {T_{{2_n}}} \in R_L^B$$
\end{document}, then note that at least one pair of leaves from B has to be transferred between pair sets of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{{1_n}}} )$$
\end{document} to transform it to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{{2_n}}} )$$
\end{document} (Fig. 5). Hence, if \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$I ( B , {T_{{1_n}}} , {T_{{2_n}}} ) = 1$$
\end{document}, then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${B^{ ( 2 ) }}$$
\end{document} contains a migrated pair for partitions \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{{1_n}}} )$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\gamma ( {T_{{2_n}}} )$$
\end{document} of the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${L^{ ( 2 ) }}$$
\end{document}. For example, a pair \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ s = \{ x , y \} , t = \{ x , z \} \} $$
\end{document} is a migrated pair for partitions induced by trees T and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T ^\prime$$
\end{document} in Figure 5. Moreover, sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${B^{ ( 2 ) }}$$
\end{document} are disjoint for different elements B of a Steiner system \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( L , \mathcal{B} )$$
\end{document}, hence we obtain the following relation:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{d_{MP}} ( {T_{{1_n}}} , {T_{{2_n}}} ) \ge \mathop \sum \limits_{B \in \mathcal{B}} I ( B , {T_{{1_n}}} , {T_{{2_n}}} ) . \tag{10}
\end{align*}
\end{document}
Thus by taking the expected value according to random choices of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{{1_n}}}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{{2_n}}}$$
\end{document} from both sides of the inequality, we obtain that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\mathbb{E} [ {d_{MP}} ( {T_{{1_n}}} , {T_{{2_n}}} ) ] \ge \mathbb{E} \left[ { \mathop \sum \limits_{B \in \mathcal{B}} I ( B , {T_{{1_n}}} , {T_{{2_n}}} ) } \right] = \mathop \sum \limits_{B \in \mathcal{B}} \mathbb{E} [ I ( B , {T_{{1_n}}} , {T_{{2_n}}} ) ] . \tag{11}
\end{align*}
\end{document}
As the trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_{{1_n}}} , {T_{{2_n}}} \in R_L^B$$
\end{document} are drawn independently according to some label-invariant model, from Lemma 4 \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{E} [ I ( B , {T_{{1_n}}} , {T_{{2_n}}} ) ] = ( 9 - 3 ) / 9 = 2 / 3.$$
\end{document} Consequently, as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vert {\mathcal{B}} \vert = n ( n - 1 ) / 6$$
\end{document}, by Equation (11) we obtain the lower bound on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{E} [ {d_{MP}} ( {T_{{1_n}}} , {T_{{2_n}}} ) ]$$
\end{document}, which is at least \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n ( n - 1 ) / 9$$
\end{document}. ■
One of the main advantages of MC and MS metrics based on Definition 2 in comparison with RF metric is the fact that relocation of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( 1 )$$
\end{document} leaves in compared trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_1} , {T_2} \in {R_n}$$
\end{document} can change their distance by at most \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( n )$$
\end{document}, which is a relatively small value when referring to the maximum distance being \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Theta ( {n^2} )$$
\end{document} for MC and MS metrics (Bogdanowicz and Giaro, 2012, 2013). We can say that a little change in phylogenetic trees cannot cause large (considering the whole space Rn) changes in their distance. Theorems 1 and 3 show that the mentioned statements are valid also for MP. It is worth mentioning that the expected MP distance between random trees is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Theta ( {n^2} )$$
\end{document}, so the value is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Omega ( n )$$
\end{document} times more than the possible MP distance change, this being the result of a relocation of a bounded number of leaves. Considering this property further, we can notice that the MP metric behaves more naturally than the MC metric or the MS metric, for which the expected value grows asymptotically slower than the diameter \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Theta ( {n^{3 / 2}} )$$
\end{document} in the uniform model (Bogdanowicz and Giaro, 2012, 2013) and only \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( n \log n )$$
\end{document} in the Yule model.† In Table 1 we summarized the main properties of MP metric discussed in this section.
Comparison of Selected Properties of Analyzed Metrics for Binary Trees
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Theta ( n )$$
\end{document}
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( n \log n )$$
\end{document}
In Supplemental Materials, we present comparisons of distributions of various metrics, that is, RF metric, the Triples distance (TT) (Critchlow et al., 1996), the Nodal Splitted metric with L2 norm (NS) (Cardona et al., 2010), MC, MP, and the cophenetic metric with L2 norm (CPH) (Cardona et al., 2013), between 1000 random pairs of trees having 100 leaves generated according to the two most popular models, that is, the Yule model and the uniform model.
5. Heuristics for Rooted Subtree Prune and Regraft Distance: An Application of The Matching Pair Distance in Detecting Horizontal Gene Transfer
In this section, we provide a preliminary study of the usefulness of the MP distance in identifying HGT. HGT is a direct transfer of genetic material from one lineage to another (Boc et al., 2010). HGT events lead to SPR transformations (Wu, 2009; Boc et al., 2010), appearing in rooted phylogenetic trees constructed based on respective DNA regions.
There are HGT detection methods present in literature (Wu, 2009; Boc et al., 2010), whose main concept is based on determining the SPR distance between phylogenetic trees T1 and T2, that is, the minimum number of SPR (or rooted SPR [rSPR] for rooted trees) operations needed to transform tree T1 into T2. In this section, we present two simple heuristic approaches to the problem of computing the rSPR distance, which make use of the polynomially computable comparison metrics.
Unfortunately, computing the rSPR distance is NP-hard (Bordewich and Semple, 2005), so some approximation or heuristic algorithms have to be applied to large instances of the problem. In the next part of this section, we present the results of computational experiments on the effectiveness of heuristics for estimation of the rSPR distance between analyzed trees. The tests have been performed based on randomly generated pairs of trees of various sizes.
For small instances, that is, when the compared trees are rather similar, exact exponential algorithms work reasonably fast. We have tested the SPRDist application (Wu, 2009), which was able to compute all our test cases (each test case consists of 1000 comparisons between randomly generated pair of trees) of size up to 20 leaves, and UltraNet (Chen et al., 2015) able to compute test cases up to 40 leaves. For test cases with 50 leaves UltraNet (which worked much faster than SPRDist) was able to compute only 25 distances in about 5 days on a desktop computer, so we decided to interrupt further computation of the exact rSPR distance. Finally, we manged to compute exact distances for all test cases on 50 leaves using rSPR 1.2.0 software (Whidden et al., 2014) (http://kiwi.cs.dal.ca/Software/RSPR).
All of the trees used in the experiments were generated according to the uniform model. In our experiments, we included most of the known metrics for rooted trees, that is, RF metric, the Triples distance (TT) (Critchlow et al., 1996), the NS metric (Cardona et al., 2010), MC and MP metrics, and the CPH metric (Cardona et al., 2013). It turns out that the MP distance performs better than any other metric in both experiments.
5.1. Single metric algorithm
Here, we present a simple heuristic that uses the value of an arbitrary phylogenetic trees metric as an indicator for consequent rSPR moves. The idea of such type of heuristics has been introduced by Boc et al. (2010). The method presented hereunder, in contrast to that described in Boc et al. (2010), uses only metrics for rooted phylogenetic trees and does not check any evolutionary constraints on possible rSPR moves.
Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_a} , {T_b} \in R_L^B$$
\end{document} be the input trees. In the first step, we compute the distance in a selected polynomially computable metric d between each tree Tc of 1 − rSPR neighborhood of Ta and the target tree Tb. We call the trees within 1 − rSPR neighborhood as candidate trees. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T ^\prime _c}$$
\end{document} be an arbitrary chosen candidate tree whose distance in the metric d to Tb is minimal. In the next step, we compute the 1 − rSPR neighborhood of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T^ \prime _c}$$
\end{document} and again choose from it an arbitrary candidate tree of minimal distance to Tb. We then repeat the steps until the tree Tb is reached and the number of steps is an obtained estimation (upper bound) on rSPR distance between Ta and Tb.
Note that, to be useful in such a procedure, a phylogenetic tree metric d should guarantee that for any two trees \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_a} , \,{T_b} \in R_L^B$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${T_a} \ne {T_b}$$
\end{document}, there exists \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$T \in R_L^B$$
\end{document} obtained from Ta by a single rSPR operation such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$d ( {T_a} , {T_b} ) > d ( T , {T_b} )$$
\end{document}. The fact that RF and QT metrics for unrooted trees fulfill the mentioned condition (where unrooted SPR instead of rSPR is considered) has been proved in Bordewich et al. (2009). During our experiments, we found that such a property does not hold for the NS, MC, and CPH metrics, but we did not find any counterexamples for metrics TT and MP, which suggests that both these metrics have this property.‡
In Table 2 we present the results, that is, the number of rSPR operations estimated using a particular metric, true rSPR distance (the last column), and the ratio of the estimated distance to the exact value of rSPR metric. Each row contains average values from 1000 test cases. Table 2 in Supplementary Materials presents the fraction of the test cases, in which a particular method gave the best result (note that more than one method can give the best result in a single test case). Based on the data, we can see that the best results for all analyzed sizes of trees have been obtained using the MP metric.
Performance Comparison of a Single Metric Rooted Subtree Prune and Regraft Heuristic—Average Distance and Relative Performance
RF
TT
MP
N
y
y/rSPR
y
y/rSPR
y
y/rSPR
rSPR
10
5.958
1.169
5.552
1.090
5.458
1.071
5.103
15
10.499
1.209
9.774
1.125
9.590
1.104
8.701
20
15.212
1.230
14.265
1.152
13.901
1.122
12.396
30
24.886
1.236
23.972
1.190
23.098
1.147
20.152
40
34.617
1.233
34.143
1.216
32.583
1.160
28.099
50
44.355
1.226
44.396
1.227
42.327
1.170
36.184
By y we denote the average number of rSPR operations obtained using a particular heuristic. SPR, subtree prune and regraft; rSPR, rooted SPR. Bold indicates the minimum value among the analyzed heuristics.
5.2. Guided Robinson–Foulds metric algorithm
Here, we consider a modified version of the mentioned procedure, in which two metrics are used. First, the RF distances between candidate trees and the destined tree are computed. Next, ties in a subset of candidate trees of the minimum RF distance in a particular iteration are resolved using any other metric. Note that any metric (or even only a dissimilarity measure) can be chosen as a tiebreaker because using the RF metric as the based one guarantees that the procedure finishes in a number of steps not greater than the RF distance between compared trees [see footnote on this page and similar analysis for unrooted trees for RF and QT metrics described in Bordewich et al. (2009)].
The results of the experiment are presented in Table 3 and in Supplementary Materials, where we present the fraction of test cases in which a particular method gave the best result. Similarly to the previous experiment, Table 3 contains average values from 1000 test cases for the estimated number of rSPR operations, true rSPR distance, and the ratio of the estimated distance to the exact value of rSPR metric. The MP metric performs better that the other distances in almost all cases.
Performance Comparison of a Guided Robinson–Foulds Metric Rooted Subtree Prune and Regraft Heuristic—Average Distance and Relative Performance
RF+CPH
RF+NS
RF+MC
RF+TT
RF+MP
n
y
y/rSPR
y
y/rSPR
y
y/rSPR
y
y/rSPR
y
y/rSPR
rSPR
10
5.553
1.089
5.473
1.073
5.450
1.068
5.429
1.065
5.406
1.060
5.103
15
9.757
1.124
9.622
1.108
9.569
1.102
9.459
1.089
9.475
1.091
8.701
20
14.014
1.132
13.849
1.119
13.811
1.115
13.624
1.100
13.573
1.096
12.396
30
23.177
1.151
22.890
1.137
22.846
1.135
22.481
1.116
22.421
1.114
20.152
40
32.584
1.160
32.128
1.144
32.234
1.148
31.546
1.123
31.386
1.117
28.099
50
41.952
1.160
41.409
1.145
41.485
1.147
40.742
1.126
40.621
1.123
36.184
By y we denote the average number of rSPR operations obtained using a particular heuristic. Bold indicates the minimum value among the analyzed heuristics.
Based on the results from both experiments, we obtain that the strategy incorporating two metrics (e.g., \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{RF}} + { \rm{MP}}$$
\end{document}), where the second metric is used as a tiebreaker, is worth further development and can lead to better results than using a single metric. A similar observation has been recently reported by Whidden et al. (2014), where the RF distance is used as a tiebreaker for the SPR supertree method.
6. Conclusion
By using a general matching method (Bogdanowicz, 2008; Bogdanowicz and Giaro, 2012), we defined a new phylogenetic metric that can be used for general (binary and nonbinary) rooted phylogenetic trees and analyzed its main properties (such as minimum positive distance, diameter, and average distance between random trees). The interesting fact is that the MP metric becomes the transfer distance between partitions of the set of unordered leaf pairs determined by compared trees. As the transfer distance has already been found useful in bioinformatics (Konovalov et al., 2005), we believe that the MP distance will also become a valuable supplement to the current set of computational tools for phylogenies.
Among the analyzed properties of the MP distance, at least the following two deserve more attention, the quadratic (with respect to the number of leaves) diameter and the quadratic distance between random trees, and a relatively low, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O ( n )$$
\end{document} distance increase after relocation of a bounded number of leaves in comparison with the maximum or average distance. Presented computational experiments show potential area in which the MP metric can be successfully applied, that is, HGT analysis according to the method introduced in Boc et al. (2010). Application of MP metric as a tiebreaker metric in the procedure of searching for SPR supertree (Whidden et al., 2014) might also be interesting and requires further research.
BocA., PhilippeH., and MakarenkovV.2010. Inferring and validating horizontal gene transfer events using bipartition dissimilarity. Syst. Biol., 59, 195–211.
4.
BöckerS., CanzarS., and KlauG.W.2013. The generalized Robinson-Foulds metric, 156–169. In Algorithms in Bioinformatics. LNCS 8126. Springer, Berlin, Heidelberg.
5.
BogdanowiczD.2008. Comparing phylogenetic trees using a minimum weight perfect matching. Proc. 1st International Conference on Information Technology, 451–454. Gdansk, Poland.
6.
BogdanowiczD., and GiaroK.2012. Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM Trans. Comput. Biol. Bioinform., 9, 150–160.
7.
BogdanowiczD., and GiaroK.2013. On a matching distance between rooted phylogenetic trees. Int. J. Appl. Math. Comput. Sci., 23, 669–684.
8.
BogdanowiczD., GiaroK., and WróbelB.2012. TreeCmp: Comparison of trees in polynomial time. Evol. Bioinform., 8, 475–487.
9.
BordewichM., GascuelO., HuberK.T., et al.2009. Consistency of topological moves based on the balanced minimum evolution principle of phylogenetic inference. IEEE/ACM Trans. Comput. Biol. Bioinform., 6, 110–117.
10.
BordewichM., and SempleC.2005. On the computational complexity of the rooted subtree prune and regraft distance. Ann. Comb., 8, 409–423.
11.
CardonaG., LlabrésM., RossellóF., et al.2010. Nodal distances for rooted phylogenetic trees. J. Math. Biol., 61, 253–276.
12.
CardonaG., MirA., RossellóF., et al.2013. Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf. BMC Bioinformat. 14, 3.
13.
CharonI., DenœudL., GuénocheA., et al.2006. Maximum transfer distance between partitions. J. Classification, 23, 103–121.
14.
ChaudharyR., BurleighJ.G., and Fernández-BacaD.2013. Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance. Algorithms Mol. Biol. 8, 28.
15.
ChenY.H.2012. The k partition-distance problem. J. Comp. Biol., 19, 404–417.
16.
ChenZ.-Z., FanY., and WangL.2015. Faster exact computation of rSPR distance. J. Comb. Optim., 29, 605–635.
17.
ColbournC.J., and DinitzJ.H.2006. Handbook of Combinatorial Designs. Second Edition, Chapman and Hall/CRC. Boca Raton, FL, USA.
18.
CritchlowD.E., PearlD.K., and QianC.1996. The triples distance for rooted bifurcating phylogenetic trees. Syst. Biol., 45, 323–334.
19.
DasGuptaB., HeX., JiangT., et al.1997. On distances between phylogenetic trees. Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 427–436.
20.
DayW.H.E.1981. The complexity of computing metric distances between partitions. Math. Soc. Sci., 1, 269–287.
21.
DenœudL.2008. Transfer distance between partitions. Adv. Data Anal. Classification, 2, 279–294.
22.
EstabrookG.F., McMorrisF.R., and MeachamC.A.1985. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst. Biol., 34, 193–200.
23.
FindenC.R., and GordonA.D.1985. Obtaining common pruned trees. J. Classification, 2, 255–276.
24.
GabowH.N., and TarjanR.E.1989. Faster scaling algorithms for network problems. SIAM J. Sci. Comput., 18, 1013–1036.
25.
GordonK., FordE., and St. JohnK.2013. Hamiltonian Walks of Phylogenetic Treespaces. IEEE/ACM Trans. Comput. Biol. Bioinform., 10, 1076–1079.
26.
GuénocheA.2011. Consensus of partitions: a constructive approach. Adv. Data Anal. Classif., 5, 215–229.
27.
GusfieldD.2002. Partition-distance: A problem and class of perfect graphs arising in clustering. Inf. Process. Lett., 82, 159–164.
28.
HickeyG., DehneF., Rau-ChaplinA., et al.2008. SPR distance computation for unrooted trees. Evol. Bioinform., 4, 17–27.
29.
HillisD.M., HeathT.A., and St. JohnK.2005. Analysis and visualization of tree space. Syst. Biol., 54, 471–482.
30.
HumphriesP.J., and WuT.2013. On the neighborhoods of trees. IEEE/ACM Trans. Comput. Biol. Bioinform., 10, 721–728.
31.
KirkmanT.P.1847. On a problem in combinatorics. Cambridge Dublin Math. J., 2, 191–204.
32.
KirkpatrickM., and SlatkinM.1993. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution, 47, 1171–118.
33.
KonovalovD.A., LitowB., and BajemaN.2005. Partition-distance via the assignment problem. Bioinformatics, 21, 2463–2468.
34.
KooninE.V., PuigbòP., and WolfY.I.2011. Comparison of phylogenetic trees and search for a central trend in the “forest of life.”. J. Comp. Biol., 18, 917–924.
35.
LinH.T., BurleighJ.G., and EulensteinO.2009. Triplet supertree heuristics for the tree of life. BMC Bioinformat. 10(Suppl 1), S8.
36.
McKenzieA., and SteelM.2000. Distributions of cherries for two models of trees. Math. Biosci., 164, 81–92.
37.
OrlinJ.B., and AhujaR.K.1992. New scaling algorithms for the assignment and minimum mean cycle problems. Math. Program., 54, 41–56.
38.
RégnierS.1965. Sur quelques aspects mathématiques des problèmes de classification automatique. I.C.C. Bulletin, 4, 175–191. Reprint in Mathématiques et Sciences Humaines 82, 13–29, 1983.
39.
RobinsonD.F., and FouldsL.R.1981. Comparison of phylogenetic trees. Math. Biosci., 53, 131–147.
40.
SackinM.J.1972. “Good” and “bad” phenograms. Syst. Zool., 21, 225–226.
41.
SempleC., and SteelM.2003. Phylogenetics. Oxford University Press. Oxford, United Kingdom.
42.
ShaoK.-T., and SokalR.R.1990. Tree balance. Syst. Zool., 39, 266–276.
43.
SmithK.C., Castro-NallarE., FisherJ.N.B., et al.2013. Phage cluster relationships identified through single gene analysis. BMC Genomics, 14, 410.
44.
SteelM.A., and PennyD.1993. Distributions of tree comparison metrics–some new results. Syst. Biol., 42, 126–141.
45.
WhiddenC., ZehN., and BeikoR.G.2014. Supertrees based on the subtree prune-and-regraft distance. Syst. Biol., 63, 566–581.
46.
WuY.2009. A practical method for exact computation of subtree prune and regraft distance. Bioinformatics, 25, 190–196.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.