Abstract
Abstract
In this article we study canonical γ-structures, a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A γ-structure is composed of specific building blocks that have topological genus less than or equal to γ, where composition means concatenation and nesting of such blocks. Our main result is the derivation of the generating function of γ-structures via symbolic enumeration using so called irreducible shadows. We furthermore recursively compute the generating polynomials of irreducible shadows of genus ≤ γ. The γ-structures are constructed via γ-matchings. For 1 ≤ γ ≤ 10, we compute Puiseux expansions at the unique, dominant singularities, allowing us to derive simple asymptotic formulas for the number of γ-structures.
1. Introduction and Background
A
Secondary structures can be interpreted as (partial) matchings in a graph of permissible base pairs (Tabaska et al., 1998). When represented as a diagram, that is, as a graph whose vertices are drawn on a horizontal line with arcs in the upper halfplane, one refers to a secondary structure with crossing arcs as a pseudoknot structure.
Folded configurations exhibit the stacking of adjacent base pairs and specific minimum arc-length conditions (Smith and Waterman, 1978), where a stack is a sequence of parallel arcs
The topological classification of RNA structures (Bon et al., 2008); Andersen et al., 2013) has recently been translated into an efficient folding algorithm (Reidys et al., 2011). This algorithm a priori folds into a novel class of pseudoknot structures, the γ-structures. These γ-structures differ from pseudoknotted RNA structures of fixed topological genus of an associated fatgraph or double-line graph (Orland and Zee, 2002; Bon et al., 2008), because they have an arbitrarily high genus. They are composed of irreducible subdiagrams whose individual genus is bounded by γ and contain no bonds of length one (1-arcs; see Section 2 for details).
Nebel and Weinberg (2011) study a plethora of RNA structures. The authors study asymptotic expansions for γ = 1 and find that in the limit of large n there are
In this article we study canonical γ-structures, that is, partial matchings composed by irreducible motifs of genus ≤ γ without isolated arcs and 1-arcs. These motifs are called irreducible shadows. We first establish a functional relationship between the generating function of γ-matchings and that of irreducible shadows. Via this relation, we identify a polynomial Pγ(u, X), whose unique solution equals the generating function of γ-matchings. We then derive a recurrence of the generating function of irreducible shadows using the Harer-Zagier recurrence (Harer and Zagier, 1986). The generating function of γ-matchings is then expanded at its unique dominant singularity as a Puiseux-series. This implies, by means of transfer theorems (Flajolet and Sedgewick, 2009), simple asymptotic formulas for the numbers of γ-matchings.
γ-Matchings are the stepping stone to derive via Lemma 3 the further refined, bivariate generating function of γ-shapes, that is, γ-matchings containing only stacks composed by a single arc. This generating function keeps additional track of the 1-arcs, which are vital for the later inflation into γ-structures. We then compute the generating function of τ-canonical γ-structures, inflating γ-shapes by means of symbolic enumeration.
2. Some Basic Facts
2.1. γ-Diagrams
A diagram is a labeled graph over the vertex set
A stack of length τ is a maximal sequence of “parallel” arcs,
A stack of length ≥ τ is called a τ-canonical stack; that is, a stack of length zero is an isolated arc. The particular arc (1, n) is called a rainbow, and an arc is called maximal if it is maximal with respect to the partial order (i, j) ≤ (i′, j′) iff i′ ≤ i ∧ j ≤ j′ (Fig. 1).

A stack of length

A schematic showing σ- and P-intervals.
We shall consider diagrams as fatgraphs,
A diagram
The shadow of a diagram of genus g is obtained by removing all noncrossing arcs, deleting all isolated vertices, and collapsing all induced stacks (i.e., maximal subsets of subsequent, parallel arcs) to single arcs (Fig. 3). We denote shadows by σ.

The shadow is obtained by removing all noncrossing arcs and isolated points and collapsing all stacks and resulting stacks into single arcs.
The shadow of a diagram

The four shadows of genus one.
A diagram is called irreducible if and only if for any two arcs, α1, αk contained in E, there exists a sequence of arcs
Let
For instance, for genus 1 and 2 we have
The shadow
Any diagram

A diagram
• one removes (i.e., cuts the backbone at two points and after removal merges the cut-points) irreducible
• if the removal of an irreducible
A diagram,
We denote the set of τ-canonical γ-diagrams by
2.2. Some generating functions
In this article we denote the ring of polynomials over a ring R by R[X] and the ring of formal power series ∑n≥0 anXn by R[[X]]. R[[X]] is a local ring with maximal ideal (X), that is, any power series with nonzero constant term is invertible. A Puiseux-series (Wall, 2004) is a power series in fractional powers of X, that is, ∑n≥0 an Xn/k for some fixed
We denote the generating functions of a set of diagrams
Let
where
Furthermore, there is a natural projection ϑ from γ-matchings to γ-shapes defined by collapsing each non-empty stack onto a single arc
which is surjective and preserves irreducible shadows as well as the number of 1-arcs. ϑ restricts to a surjection
which collapses each stack to an arc and preserves any irreducible shadow and also the number m of 1-arcs.
3. Combinatorics of γ-Matchings
In this section, we study γ-matchings.
Theorem 1
Let
(a) The generating function of γ-matchings,
equivalently,
In particular, there exists a polynomial
(b) Equation (3.1) determines
Claim 1
To prove Claim 1 we consider a
and Claim 1 follows.
Let
Note that each
Claim 2
We shall construct

First, a fixed irreducible shadow σ is inflated into a
An induced arc, that is, an arc together with at least one nontrivial
Clearly, we have for a single induced arc
By construction, the maximal arcs of an
with generating function
Next we inflate the arcs of the
This inflation process generates
whence Claim 2.
Claim 3
Let M be the set of irreducible shadows of genus g ≤ γ. Then
The maximal arcs of a
These maximal arcs induce exactly (2m − 1)t σ-intervals. In each σ-interval, we find again an element of
The passage from
Here
We next derive the functional equation for
where
Setting wu(X) = 1 − uX2, Equation 1 gives rise to the polynomial
where κγ = 6γ − 2,
It remains to prove (b). Since M is the finite set of irreducible shadows of genus g ≤ γ, and any such shadow has 2g ≤ m ≤ κγ arcs (Andersen et al., 2012), any M-shadow has ≤ κγ arcs. Setting
and consequently
All coefficients of
4. Irreducible Shadows
The bivariate generating function of irreducible shadows of genus g with m arcs is denoted by
Let
The bivariate generating function of matchings of genus g with m arcs is denoted by
Theorem 2
The generating functions
equivalently,
• blocks whose maximal component contains only one arc,
• blocks whose maximal component is an irreducible (nonempty) matching.
In the first case, the removal of the maximal component (one arc) generates again an arbitrary matching, which translates into the term
Let
Let σ be a fixed irreducible shadow of genus g having n arcs. Let
where
We shall construct

Step I: Inflation of each arc in σ into a sequence of induced arcs.
Clearly, we have for a single induced arc
Inflating each arc into a sequence of induced arcs,
since the genus is additive.

Step II: Inflation of each arc in the component with shadow σ into stacks.

Step III: Insertion of additional matchings at exactly (2n − 1) σ-intervals.
Combining these three steps and utilizing additivity of the genus, we arrive at
Therefore
We derive
completing the proof of Equation 12. ■
Now we can derive a recursion for
Corollary 1
For g ≥ 1,
where
Note that
Hence,
Setting
completing the proof. ■
A seminal result (Harer and Zagier 1986), computes a recursion and generating function for the number c g (m) as follows:
Lemma 1
(Harer and Zagier 1986) The
where
The recursion Equation 13 is equivalent to the ODE
where
with initial condition
Theorem 3
(Andersen et al., 2013) For any g ≥ 1 the generating function
where Qg(z) is a polynomial with integral coefficients of degree at most (3g − 1), Qg(1/4) ≠ 0, [z2g]Q g (z)≠ 0 and [zh]Qg(z) = 0 for 0 ≤ h ≤ 2g − 1.
The recursion Equation (14) permits the calculation of the polynomials Qg(z), the first five of which are given as follows (Andersen et al., 2013):
Applying Corollary 1 together with the generating function
For example, for g = 1,
For 1 ≤ g ≤ 8, we list
We conjecture that the polynomial
5. Asymptotics of γ-Matchings
Let us begin recalling the following result of Flajolet and Sedgewick (2009):
Theorem 4
Let y(u) = ∑n≥0ynun be a generating function, analytic at 0, satisfying a polynomial equation Φ(u, y) = 0. Let ρ be the real dominant singularity of y(u). Define the resultant of Φ(u, y) and
then y(u) has the following expansion at ρ
for some nonzero constant λ
Further the coefficients of y(u) satisfy
for some constant c > 0.
Then we apply Newton's polygon method to determine the type of expansion and find the first exponent of u to be
Combining Theorem 1 and Theorem 4, the asymptotic analysis of
Theorem 5
For 1 ≤ γ ≤ 10, let
the resultant of Pγ(u, X) and
for some cγ > 0.
6. Combinatorics of γ-Diagrams
Lemma 2
For any γ ≥ 1, we have
The proof of Lemma 2 can be obtained by standard symbolic method.
Lemma 3
Let λ be a fixed γ-shape with s ≥ 1-arcs and m ≥ 0 1-arcs. Then the generating function of τ-canonical γ-diagrams containing no 1-arcs that have shape λ is given by
In particular,
Our main result about enumerating τ-canonical γ-structures follows.
Theorem 6
Suppose γ, τ ≥ 1 and let
In particular for 1 ≤ s, i ≤ 2 we have
for some constants ks,i > 0, for
According to Lemma 3,
using Lemma 2 in order to confirm Equation 23, where the second equality follows from direct computation. Let
denote the argument of
from which we immediately conclude that
According to Theorem 5 we have
For τ = 1, 2, we verify directly that ρ1,i and ρ2,i are the unique solutions of minimum modulus of θ1(z) = μi and θ2(z) = μi. These solutions are strictly smaller than any other singularities of θ1(z) and θ2(z) and furthermore satisfy
where s = 1, 2 and ks,i is some positive constant. ■
Theorem 6 has its analogue for τ-canonical, γ-diagrams containing 1-arcs. The asymptotic formula in case of τ = 1, γ = 1,
is due to Nebel and Weinberg (2011), who used the explicit grammar developed in Reidys et al. (2011) to obtain an algebraic equation for
Corollary 2
Suppose γ, τ ≥ 1 and let
In particular for γ = 1, we have
for some constants j1, j2, where
7. Discussion
The symbolic approach based on γ-matchings allows not only for computing the generating function of canonical γ-structures. On the basis of Theorem 6, it is possible to obtain a plethora of statistics of γ-structures by means of combinatorial markers.
For instance, we can analogously compute the bivariate generating function of τ canonical γ-structures over n vertices, containing exactly m arcs,
where uτ(z, t) is given by
This bivariate generating function is the key to obtain a central limit theorem for the distribution of arc-numbers in γ-structures (Bender, 1973) on the basis of the Lévy-Cramér theorem on limit distributions (Feller, 1991).
Statistical properties of γ-structures play a key role for quantifying algorithmic improvements via sparsifications (Busch et al., 2008; Möhl et al., 2010; Wexler, 2007). The key property here is the polymer-zeta property (Kabakcioglu and Stella, 2008; Kafri et al., 2000), which states that the probability of an arc of length ℓ is bounded by k ℓc, where k is some positive constant and c > 1. Polymer-zeta stems from the theory of self-avoiding walks (Vanderzande, 1998) and has only been empirically established for the simplest class of RNA structures, namely those of genus zero. It turns out, however, that the polymer-zeta property is genuinely a combinatorial property of a structure class. Moreover, our results allow for quantifying the effect of sparsifications of folding algorithms into γ-structures (Andersen et al., 2012; Huang and Reidys, 2012).
We finally remark that around 98% of RNA pseudoknot structures catalogued in databases are, in fact, canonical 1-structures. RNA pseudoknot structures like the HDV virus exhibiting irreducible shadows of genus two are relatively rare.
Footnotes
Acknowledgments
We thank Fenix W.D. Huang for discussions and comments. We furthermore acknowledge the financial support of the Future and Emerging Technologies (FET) program within the Seventh Framework Programme (FP7) for Research of the European Commission, under the FET-Proactive grant agreement TOPDRIM, number FP7-ICT-318121.
Author Disclosure Statement
The authors declare that no competing financial interests exist.
