In this section, we present the results of our study. We first prove a generalization of Hoeffding inequality that relates to quartet plurality scores (see Section 2.1) and develop a statistical test based on this generalization. We then apply the devised test to sets of plurality quartets constructed using either real or simulated data, which enable us to identify several cases where the differences between those sets are statistically significant.
3.1. Mathematical theory
Here, we prove a theoretical result pertaining to dependent random variables indexed by quartets, based on a generalization of Hoeffding inequality (Hoeffding, 1963), which is due to Janson (2004). We use
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{E}X$$
\end{document}
to denote the expectation of the random variable X and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{P} \left( {event} \right)$$
\end{document}
to denote the probability of event.
We denote by Xq the random variable representing the plurality score of the quartet q. We assume that the phylogenies of two disjoint sets of species are completely independent, and hence, we assume that if
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${q_1} \cap {q_2} = { \not 0}$$
\end{document}
, then
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_{{q_1}}}$$
\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_{{q_2}}}$$
\end{document}
are independent random variables. More generally, we assume that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_{{q_0}}}$$
\end{document}
is independent of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_{{q_1}}} , {X_{{q_2}}} , \ldots , {X_{{q_m}}}$$
\end{document}
if the following condition pertaining to the quartets
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${q_0} , {q_1} , \ldots , {q_m}$$
\end{document}
holds:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{q_0} \cap \left( { \cup _{i = 1}^m{q_i}} \right) = { \not 0}. \tag{1}
\end{align*}
\end{document}
Our main theoretical result is the following:
Lemma 3.1 Suppose that Q is the set of all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left( {_4^n} \right)$$
\end{document}
quartets induced by a set of taxa
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left\{ {1 , 2 , \ldots , n} \right\} $$
\end{document}
and that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar X = \frac { 1 } { { \left( { _4^n } \right) } } \sum \nolimits_ { q \in Q } { { X_q } } $$
\end{document}
. Then, for every
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$t > 0 ,$$
\end{document}
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\mathbb { P } \left( { \bar X \ge \mathbb { E } \bar X + t } \right) \le { \rm { \; } } exp \left( { - \frac { 9 } { 2 } . { \frac { \left( { _4^n } \right) { t^2 } } { \left( { _4^n } \right) - \left( { _4^ { n - 4 } } \right) } } } \right) , \tag { 2 }
\end{align*}
\end{document}
which can be approximated to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\mathbb{P} \left( { \bar X \ge \mathbb{E} \bar X + t} \right) \, \le \,exp \, \left( { - n \left( {9 / 32 + o \left( 1 \right) } \right) {t^2}} \right). \tag{3}
\end{align*}
\end{document}
The same holds for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{P} \left( { \bar X \le \mathbb{E} \bar X - t} \right).$$
\end{document}
Before proving Lemma 3.1, we recall the definition of a dependency graph (Janson, 2004): for a set of random variables
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \left\{ {{X_ \alpha }} \right\} _{ \alpha \in {\cal A} }} ,$$
\end{document}
a dependency graph is any graph
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Gamma }}$$
\end{document}
with vertex set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \cal A}$$
\end{document}
such that if
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \cal B} \; \subseteq \;{ \cal A}$$
\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\alpha \in \;{ \cal A}$$
\end{document}
is not connected by an edge to any vertex in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${\cal B}$$
\end{document}
, then
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_ \alpha }$$
\end{document}
is independent of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \left\{ {{X_ \beta }} \right\} _{ \beta \; \in \;{ \cal B} }}.$$
\end{document}
Notice that in most cases, a dependency graph for a given set of random variables is not unique. The following theorem, due to Janson (2004) (theorem 2.1), extends the classical Hoeffding inequality (Hoeffding, 1963). We present here a simplified version that better suits our needs:
Theorem 3.2 Suppose that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$X = \sum \nolimits_{ \alpha \; \in \;{ \cal A} } {{X_ \alpha }}$$
\end{document}
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \{ {X_ \alpha } \} _{ \alpha \; \in \;{ \cal A} }}$$
\end{document}
are random variables and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \{ {a_ \alpha } , \,{b_ \alpha } \} _{ \alpha \; \in \;{ \cal A} }}$$
\end{document}
are real numbers such that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${a_ \alpha } \le {X_ \alpha } \le {b_ \alpha }$$
\end{document}
for all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\alpha \in { \cal A}$$
\end{document}
. Then, for every
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$t > 0$$
\end{document}
and every
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Gamma$$
\end{document}
, which is a dependency graph for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \{ {X_ \alpha } \} _{ \alpha \; \in \;{ \cal A} }} ,$$
\end{document}
we have
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\mathbb { P } \left( { X \ge \mathbb { E } X + t } \right) , \mathbb { P } \left( { X \le \mathbb { E } X - t } \right) \le { \rm { \; } } exp { \rm { \; } } \left( { - 2 { \frac { { t^2 } } { \left( { { \rm { \Delta } } \left( { \rm { \Gamma } } \right) + 1 } \right) \mathop \sum \nolimits_ { \alpha \; \in \; { \cal A } } { \rm { } } { { ( { b_ \alpha } - { a_ \alpha } ) } ^2 } } } } \right) , \tag { 4 }
\end{align*}
\end{document}
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Delta }} \left( { \rm{ \Gamma }} \right)$$
\end{document}
denotes the maximum degree of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Gamma }}{ \rm{.}}$$
\end{document}
We now turn to proving Lemma 3.1.
Proof: We define the following graph
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$G = \left( {V , \,E} \right)$$
\end{document}
: We define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$V = Q ,$$
\end{document}
that is, the set of vertices V is the set of all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left( {_4^n} \right)$$
\end{document}
quartets. We then connect
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${q_1} , {q_2} \in Q \; ( {q_1} \ne {q_2} )$$
\end{document}
by an edge if and only if
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${q_1} \cap {q_2} \ne { \not 0}$$
\end{document}
and define the set of edges E as the collection of all edges thus constructed. Let us explain why G is a dependency graph for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \left\{ {{X_q}} \right\} _{q \in Q}}$$
\end{document}
. Indeed, if q0 is not connected to any vertex in
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left\{ {{q_1} , {q_2} , \ldots , {q_m}} \right\} $$
\end{document}
, then
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${q_0} \cap {q_k} = { \not 0}$$
\end{document}
for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k = 1 , 2 , \ldots , m.$$
\end{document}
This implies that (1) holds and thus
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${X_{{q_0}}}$$
\end{document}
is independent of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left\{ {{X_{{q_1}}} , {X_{{q_2}}} , \ldots , {X_{{q_m}}}} \right\} ,$$
\end{document}
as required.
Clearly, the number of vertices in G is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left( {_4^n} \right) = {n^4} \left( {1 / 24 + o \left( 1 \right) } \right)$$
\end{document}
. The degree of any edge in G can be readily calculated: Since a quartet is not connected to itself, nor to any of the
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left( { _4^{ n - 4}} \right)$$
\end{document}
quartets with which it has no elements in common, the degree of any vertex in G is
We now estimate
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{P} \left( { \bar X \ge \mathbb{E} \bar X + t} \right).$$
\end{document}
The following is obvious:
We apply (4) to the right-hand side of (6). Since
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \left\{ {{X_q}} \right\} _{q \in Q}}$$
\end{document}
are random variables of plurality scores,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$1 / 3 \le {X_q} \le 1$$
\end{document}
for all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$q \in Q$$
\end{document}
, so we set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${a_ \alpha } \equiv 1 / 3 , \,{b_ \alpha } \equiv 1$$
\end{document}
and get
Combining (6), (7), and (5), we conclude that
which establishes (2). Furthermore, since
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Delta }} \left( G \right) = {n^3} \left( {2{ \rm{ / }}3 + o \left( 1 \right) } \right) , \left( {_4^n} \right) = {n^4} \left( {1{ \rm{ / }}24 + o \left( 1 \right) } \right) ,$$
\end{document}
we write
which establishes (3) and completes the proof. ▪
Theorem 3.2 enables one to prove the following:
Corollary 1 Let us assume that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${Y_1} , {Y_2} , \ldots , {Y_k} , {Z_1} , {Z_2} , \ldots , {Z_m}$$
\end{document}
are random variables in the interval
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\left[ {a , b} \right]$$
\end{document}
. We define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A = \left\{ {{Y_1} , {Y_2} , \ldots , {Y_k}} \right\} $$
\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$B = \left\{ {{Z_1} , {Z_2} , \ldots , {Z_m}} \right\} $$
\end{document}
. We also define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar Y = \frac { 1 } { n } \sum \nolimits_ { i = 1 } ^k { { Y_i } } { \rm { } } $$
\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar Z = \frac { 1 } { m } \sum \nolimits_ { i = 1 } ^m { { Z_i } } $$
\end{document}
. Then
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\mathbb { P } \left( { \bar Y - \bar Z - \left( { \mathbb { E } \bar Y - \mathbb { E } \bar Z } \right) \ge t } \right) \le { \rm { \; } } exp { \rm { \; } } \left( { - 2 { \frac { { t^2 } } { \left( { { \rm { \Delta } } \left( { { \rm { \Gamma } } \left( { A , B } \right) } \right) + 1 } \right) { { ( b - a ) } ^2 } \left( { \frac { 1 } { k } + \frac { 1 } { m } } \right) } } } \right) , \tag { 10 }
\end{align*}
\end{document}
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Gamma }} \left( {A , B} \right)$$
\end{document}
is any dependency graph of the union
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \cup B = \left\{ {{Y_1} , {Y_2} , \ldots , {Y_k} , {Z_1} , {Z_2} , \ldots , {Z_m}} \right\} $$
\end{document}
.
In addition, if Zj is independent of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A = \left\{ {{Y_1} , {Y_2} , \ldots , {Y_k}} \right\} $$
\end{document}
for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$j = 1 , 2 , \ldots , m$$
\end{document}
and Yi is independent of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$B = \left\{ {{Z_1} , {Z_2} , \ldots , {Z_m}} \right\} $$
\end{document}
for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , 2 , \ldots , k$$
\end{document}
, then (10) can be simplified to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\mathbb { P } \left( { \bar Y - \bar Z - \left( { \mathbb { E } \bar Y - \mathbb { E } \bar Z } \right) \ge t } \right) \le { \rm { \;exp \; } } \left( { - 2 { \frac { { t^2 } } { \left( { { \rm { \;max \; } } \left( { { \rm { \Delta } } \left( { { \rm { \Gamma } } \left( A \right) } \right) , { \rm { \Delta } } \left( { { \rm { \Gamma } } \left( B \right) } \right) } \right) + 1 } \right) { { ( b - a ) } ^2 } \left( { \frac { 1 } { k } + \frac { 1 } { m } } \right) } } } \right). \tag { 11 }
\end{align*}
\end{document}
The proof of Corollary 1 can be found in the Supplementary Material. In the next section, we will show examples of how this corollary may be used.
3.2. Applications
As mentioned above, quartet trees may be used as input for computer programs that implement heuristic methods of phylogenetic construction based on the supertree approach. However, determining the overall agreement among the input quartets in a quantitative way remains a largely open problem. We offer MPS as a measure of the strength of this agreement. Several examples, applying the theory of Section 3.1 to simulated data and then to real data, demonstrate how this measure can be used.
3.2.1. Simulation runs
Here, we employ the theory we developed in Section 3.1 in several tests involving simulated gene trees. The gene trees were constructed by subjecting simulated species trees to an HGT process of a certain “rate.” We briefly mention that an HGT rate
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\lambda$$
\end{document}
implies an expected value of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\lambda$$
\end{document}
HGT events per 1 U length of the tree. The gene trees, having
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = 1000$$
\end{document}
leaves each, were divided into disjoint sets based on the HGT rates that underlie their construction. We used the HGT rates of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\lambda = 0.01 , 0.02 , \ldots , 0.1$$
\end{document}
, thus a total of 10 sets of simulated gene trees were constructed, with 100 trees in each set. See Section 2.2 for details.
We test the statistical significance of the differences between the sets of simulated gene trees. For each set of gene trees, we consider
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Big ( { \begin{matrix} n \\ 4 \end{matrix} }\Big )$$
\end{document}
random variables (
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = 1000$$
\end{document}
), representing the plurality scores of the
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Big ( { \begin{matrix} n \\ 4 \\ \end{matrix} } \Big )$$
\end{document}
quartets they induce. Thus, we computed the MPS of each of the 10 sets of gene trees aforementioned. We note that an increase in the rate of HGT events implies an increase in the number of quartets that are affected by HGT, a fact that is likely to weaken the strength of the tree signal that those quartets support. Consequently, we expect the MPS to be inversely correlated with the rate of HGT. Since the plurality score of each quartet, as induced by each set of gene trees, is completely independent of all the other gene trees sets, we use inequality (11) to estimate the probability of the difference between two MPSs, but first, we replace the inequality's parameters with concrete numeric values. In accordance with Corollary 1, when comparing two sets of gene trees, we define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A = \left\{ {{Y_1} , {Y_2} , \ldots , {Y_{ \left( { \begin{matrix} n \\ 4 \\ \end{matrix} } \right) }}} \right\} $$
\end{document}
to be the plurality scores of all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Big ( { \begin{matrix} n \\ 4 \end{matrix} }\Big )$$
\end{document}
quartets as computed based on the first set and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$B = \left\{ {{Z_1} , {Z_2} , \ldots , {Z_{ \left( { \begin{matrix} n \\ 4 \\ \end{matrix} } \right) }}} \right\} $$
\end{document}
to be the plurality scores of all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Big ( { \begin{matrix} n \\ 4 \end{matrix} }\Big )$$
\end{document}
quartets as computed based on the second set. The null hypothesis that we test is that the two MPSs are identical, hence we assume that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{E} \bar Y = \mathbb{E} \bar Z$$
\end{document}
. Naturally, since we deal with plurality scores, we set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$a = 1 / 3$$
\end{document}
,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$b = 1$$
\end{document}
. Furthermore, we consider all
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Big ( { \begin{matrix} n \\ 4 \end{matrix} }\Big )$$
\end{document}
quartets in this analysis, hence we set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$k = m = \Big ( { \begin{matrix} n \\ 4 \end{matrix} }\Big )$$
\end{document}
(for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$n = 1000$$
\end{document}
). Finally, as in the proof of Lemma 3.1, for both A and B, we can construct dependency graphs whose maximal degree is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Big ( { \begin{matrix} n \\ 4 \end{matrix} }\Big ) - \Big ( { \begin{matrix} {n - 4} \\ 4 \\ \end{matrix} } \Big ) - 1$$
\end{document}
. Thus, we may write
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Delta }} \left( {{ \rm{ \Gamma }} \left( A \right) } \right) = { \rm{ \Delta }} \left( {{ \rm{ \Gamma }} \left( B \right) } \right) = \Big ( { \begin{matrix} n \\ 4 \\ \end{matrix} } \Big ) - \Big ( { \begin{matrix} {n - 4} \\ 4 \\ \end{matrix} } \Big) - 1$$
\end{document}
. Replacing those into inequality (11), we get
In Table 1, we see the HGT rates and MPSs of the 10 sets of gene trees, as well as the resulting estimations of the probabilities of their differences based on inequality (12). The cells in the table are marked based on the following rule: if the probability is greater than 5%, the cell is marked with a gray circle; if it is between 1% and 5%, the cell is marked with a gray triangle; and if it is lesser than 1%, the cell is marked with a gray rhombus. First, Table 1 demonstrates that an increase in the rate of HGT results in a decrease in the MPS, as to be expected. Second, we see that the resulting differences in the MPSs between the different sets of quartets are at times statistically significant, with a probability of less than 5% or even less than 1% for them to occur by chance. Thus, the theory developed in the previous section is used to refute the null hypothesis of no significant differences between the MPSs, which implies a fundamental difference in the HGT rates, in some of the cases we test.
The table presents the estimations of the probabilities of the differences between the MPSs of the different sets of simulated gene trees, as computed using inequality (12). The cells of the table are marked based on those estimations: if the probability is greater than 5%, the cell is marked with a gray circle; if the probability is 1%–5%, the cell is marked with a gray triangle; and if the probability is lesser than 1%, the cell is marked with a gray rhombus.
HGT, horizontal gene transfer; MPS, mean plurality score.
3.2.2. Real data analysis
Here, we demonstrate how our theoretical result can be applied to real data analysis. A key difference between studies of real data gene trees and simulated gene trees is the pattern of HGT events. While we were able to impose a uniform HGT model on the construction of simulated gene trees, we clearly have no control over the HGT events that affect the evolution of genes in nature. Indeed, several articles specifically mention a variety of causes that yield biased HGTs, such as toxicity (Sorek et al., 2007), phylogenetic proximity (Popa et al., 2011; Skippington and Ragan, 2012), gene function (Beiko et al., 2005; Nakamura et al., 2004; Wellner et al., 2007), or restricted recombination (Thomas and Nielsen, 2005). Therefore, real HGT events are expected to be nonuniform. In this section, we test whether a pattern of domain-dependent biased HGTs can be detected.
We study a collection of gene trees that was first constructed and studied in Puigbò et al. (2009), based on a set of 100 species (41 archaea and 59 bacteria). In this article, we focus on a collection of 123 NUTs, that is, gene trees with at least 90 taxa that are of particular interest because of their relatively high stability (Puigbò et al., 2010). We divided the plurality quartets that the NUTs induce into three disjoint sets: quartets with four archaea, quartets with four bacteria, and quartets with two archaea and two bacteria. For brevity, we refer to these groups of quartets as a4b0, a0b4, and a2b2, respectively. For simplicity, quartets with three archaea and one bacterium or with one archaeon and three bacteria are ignored. It is noteworthy that genes that evolve with no HGT events also induce a perfect separation between the archaea and the bacteria. Moreover, intra-domain HGTs (involving two archaea or two bacteria) cannot violate this archaea–bacteria separation since they do not involve the transfer of genetic information between the two domains. Hence, we expect the plurality scores of a2b2 quartets that are less than 100% to reflect inter-domain HGTs, involving one archaeon and one bacterium. Similarly, plurality scores of a4b0 quartets and a0b4 quartets are likely to reflect intra-archaea and intra-bacteria HGTs, respectively. Therefore, our goal was to compute the MPSs of the three quartet sets separately and to test the significance of the differences between them.
In Table 2, we present the number of quartets and the MPSs of the three quartet sets above. The results show that a0b4 quartets have the lowest MPS, a4b0 quartets have a higher MPS, and a2b2 quartets have the highest MPS. As the MPS is inversely correlated with the rate of HGT (see previous section), this is indicative of a relatively high rate of intra-bacteria HGTs compared with intra-archaea HGTs, and an inter-domain HGT rate, which is smaller than both intra-domain HGT rates.
The table presents the information relevant to the three sets of real data quartets tested. It includes the type of quartet set, the number of quartets in each set, and the computed MPSs.
Our null hypothesis is that the differences between the computed MPSs are not significant. In accordance with Corollary 1, we test this hypothesis based on inequality (10). The results are presented in Table 3. Three pairs of quartet sets were evaluated: a4b0-a2b2; a4b0-a0b4; a2b2-a0b4, where each row in Table 3 corresponds to one pair. As in Corollary 1, when comparing two sets of quartets, we define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A = \left\{ {{Y_1} , {Y_2} , \ldots , {Y_k}} \right\} $$
\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$B = \left\{ {{Z_1} , {Z_2} , \ldots , {Z_m}} \right\} $$
\end{document}
to be the random variables representing the plurality scores of the quartets of the first and the second set, respectively. The parameter t, representing the difference between the MPSs, was computed based on Table 2. As in the proof of Lemma 3.1, we construct a dependency graph for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \cup B$$
\end{document}
in which two quartets are connected by a branch if and only if they have at least one species in common. The corresponding values of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Delta }} \left( {{ \rm{ \Gamma }} \left( {A , B} \right) } \right)$$
\end{document}
for each row (i.e., the maximum degree of the dependency graph thus constructed) were found in a direct computation using a script that we wrote.
The table presents the information needed to estimate the significance of the differences between the computed real data MPSs. A and B represent the compared quartet sets, k and m represent the number of elements in A and B, respectively, (corresponding to Table 2), t represents the difference between the relevant MPSs,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \rm{ \Delta }} \left( {{ \rm{ \Gamma }} \left( {A , B} \right) } \right)$$
\end{document}
is the maximum degree of the constructed dependency graph of
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$A \cup B$$
\end{document}
, and the resulting probability estimation is based on the right-hand side of inequality (10).
Finally, we assume
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathbb{E} \bar Y = \mathbb{E} \bar Z$$
\end{document}
(as implied by the null hypothesis) and set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$a = 1 / 3 , b = 1$$
\end{document}
(since every plurality scores is bounded from below and above by 1/3 and 1). Plugging all of these into inequality (10), we get the probability estimations of the differences in the MPSs shown in Table 3. As Table 3 shows, the probabilities of the differences between the computed MPSs are no greater than 63% or more. Thus, the null hypothesis is not rejected in any of the cases pertaining to real data that we study. This is partly related to the size of the species set, as we explain in the following section.