Abstract
Abstract
The topological filtration of interacting RNA complexes is studied, and the role is analyzed of certain diagrams called irreducible shadows, which form suitable building blocks for more general structures. We prove that, for two interacting RNAs, called interaction structures, there exist for fixed genus only finitely many irreducible shadows. This implies that, for fixed genus, there are only finitely many classes of interaction structures. In particular, the simplest case of genus zero already provides the formalism for certain types of structures that occur in nature and are not covered by other filtrations. This case of genus zero interaction structures is already of practical interest, is studied here in detail, and is found to be expressed by a multiple context-free grammar that extends the usual one for RNA secondary structures. We show that, in O(n6) time and O(n4) space complexity, this grammar for genus zero interaction structures provides not only minimum free energy solutions but also the complete partition function and base pairing probabilities.
1. Introduction
An RNA molecule is a linearly oriented sequence of four types of nucleotides, namely,

RNA interaction structures are structures over two backbones. We distinguish internal arcs and external arcs as having their endpoints on the same and different backbones, respectively. Interaction structures are represented as two backbones with internal and external arcs drawn in the upper halfplane. Alternatively, they can be represented by drawing the two backbones on top of each other (Fig. 2).

The simplest approach for folding RNA-RNA interaction structures concatenates two (or more) interacting sequences one after another remembering the specific merge point (cut-point) and then employs the standard secondary structure folding algorithm on a single strand with a slightly modified energy model that treats loops containing cut-points as external elements. The software tools RNAcofold (Hofacker et al., 1994; Bernhart et al., 2006), pairfold (Andronescu et al., 2005), and NUPACK (Dirks et al., 2007) subscribe to this strategy. This approach falls short in predicting many important motifs such as kissing-hairpin loops. The paradigm of concatenation has also been generalized to include cross-serial interactions (Rivas and Eddy, 1999). The resulting model, however, still does not generate all relevant interaction structures (Chitsaz et al., 2009b; Qin and Reidys, 2007). An alternative line of thought, implemented in RNAduplex and RNAhybrid (Rehmsmeier et al., 2004), is to neglect all internal base pairings in either strand—i.e., to compute the minimum free energy (MFE) secondary structure of hybridization of otherwise unstructured RNAs. RNAup (Mückstein et al., 2006, 2008) and intaRNA (Busch et al., 2008) restrict interactions to a single interval that remains unpaired in the secondary structure for each partner. As a special case, snoRNA/target complexes are treated more efficiently using a specialized tool (Tafer et al., 2009) due to the highly conserved interaction motif. Algorithmically, the approaches mentioned so far are close relatives of the “classical” RNA folding recursions given by Zuker and Sankoff (1984) and Waterman and Smith (1978). A different approach was taken independently by Pervouchine (2004) and Alkan et al. (2006), who proposed MFE folding algorithms for predicting the AP-structure of two interacting RNA molecules. In this model, the intramolecular structures of each partner are pseudoknot-free, the intermolecular binding pairs are non-crossing, and there is no so-called “zig-zag” motif (see Section 2). The optimal joint structure can be computed in O(N6) time and O(N4) space by means of dynamic programming.
In contrast to the RNA secondary folding problem, where minimum energy folding and partition functions can be obtained by similar algorithms, the case of interaction structures is more involved. The reason is that simple unambiguous grammars are known for RNA secondary structures (Dowell and Eddy, 2004), while the disambiguation of grammar underlying the Alkan-Pervouchine algorithm requires the introduction of a large number of additional non-terminals (which algorithmically translate into additional dynamic programming tables). The partition function was derived independently by Chitsaz et al. (2009b) (piRNA) and Huang et al. (2009) (rip1). In Huang et al. (2010), probabilities of interaction regions as well as entire hybrid blocks were derived. Although the partition function of joint structures can also be computed in O(N6) time and O(N4) space, the current implementations require large computational resources. Salari et al. (2009) recently achieved a substantial speed-up making use of the observation that the external interactions mostly occur between pairs of unpaired regions of single structures. Chitsaz et al. (2009a), on the other hand, use tree-structured Markov Random Fields to approximate the joint probability distribution of multiple (≥3) contact regions. The RNA-RNA interaction structures of Huang et al. (2010), Alkan et al. (2006), Hofacker et al. (1994), and Bernhart et al. (2006) have the following features:
• when drawing the two backbones on top of each other, all base pairs are non-crossing, i.e., no pseudoknots formed by internal or external arcs are allowed, • zig-zag motifs are disallowed.
This article will relax the above constraints and propose a novel filtration of RNA-RNA interaction structures based on the topological fitration of RNA interaction structures. Interaction structures that do not belong to the Alkan-Pervouchine class exist: for instance the integral RNA (hTER) of the human telomerase ribonucleoprotein has a conserved secondary structure that contains a potential pseudoknot (Ly et al., 2003). There is evidence that the two conserved complementary sequences of one stem of the hTER pseudoknot domain can pair intermolecularly in vitro, and that formation of this stem as part of a novel “transpseudoknot” is required for the telomerase to be active in its dimeric form. The classification and expansion of pseudoknotted RNA structures over one backbone via topological genus of the associated fatgraph were first proposed by Orland and Zee (2002), Penner (2004), and Bon et al. (2008)
In Reidys et al. (2011) and Zagier (1995), it was proved that, for any genus, there are only finitely many shadows, i.e., particular, simple atomic motifs. In case of genus one, these shadows were first presented in Bon et al. (2008). Shadows give rise to a novel structure class, naturally generalizing RNA secondary structures. These γ-structures (Reidys et al., 2011) are generated by concatenation and nesting of irreducible building blocks of genus ≤γ. We shall present the topological classification of RNA-RNA interaction structures. This filtration gives rise to the notion of γ-structures over two backbones. In analogy to their one-backbone counterparts, γ-structures over two backbones are composed of irreducible building blocks of genus ≤γ and have accordingly arbitrarily high genus. We shall see that, for any fixed genus, there are only finitely many irreducible shadows over two backbones. In particular, we study genus zero structures over two backbones. The latter are the two backbone analogue of RNA secondary structures.2 0-structures over two backbones already exhibit interesting features not shared with AP-structures (Fig. 3). We furthermore derive an unambiguous grammar for 0-structures over two backbones, which translates into an efficient dynamic programming algorithm. This grammar, illustrated in Figure 4, allows the calculation of the minimum free energy, partition function and Boltzmann-sampling. It explicitly treats hybrids and gap structures, i.e., maximal regions with exclusively intermolecular interactions and maximal regions with base pairs over one backbone. The grammar thus facilitates the computation of the probability of hybrids, the target interaction probability between two RNA strands, and the probability of gap structures.


An unambiguous grammar of RNA-RNA interaction structures of genus zero over two backbones. Basic building blocks are: tight structures (gray), secondary structures, and hybrid structures
2. Basic Facts
2.1. Diagrams
A diagram is a labeled graph over the vertex set
The vertices and arcs of a diagram correspond to nucleotides and base pairs, respectively. For a diagram over b backbones, the leftmost vertex of each backbone denotes the 5′ end of the RNA sequence, while the rightmost vertex denotes the 3′ end. In case of b > 1, we shall distinguish two types of arcs: an arc is called exterior if it connects different backbones and interior otherwise. Diagrams over b backbones without exterior arcs are disjoint unions of diagrams over one backbone.
The particular case b = 2 is referred to as RNA interaction structures (Huang et al., 2009, 2010) (Fig. 2A). As mentioned before, interaction structures are oftentimes represented alternatively by drawing the two backbones R and S on top of each other, indexing the vertices R1 to be the 5′ end of R and S1 to be the 3′ of S.
A zig-zag is defined as follows: given two sequences R and S, suppose that RaSb (i.e., Ra is base paired with Sb), RiRj, and

A zig-zag structure. R1R4 and S2S5 are dependent interior arcs owing to the base pair R3S3, but in view of R2S1 and R6S4, neither subsumes the other.
2.2. From diagrams to topological surfaces
One approach for deriving meaningful filtrations of RNA structure is to pass from diagrams to topological surfaces (Massey, 1967). It is natural to make this transition from combinatorics to topology via fatgraphs (Penner et al., 2010, 2011). A fatgraph
For an RNA diagram, we may draw a representation as usual so that the backbone is a horizontal line oriented from left to right, and the arcs lie in the upper half-plane. This determines a unique fattening on any diagram; compare the leftmost two panels in Figure 6 for the fatgraph and its corresponding surface. Each boundary component of

This backbone-collapse preserves orientation, Euler characteristic and genus by construction. It is reversible by inflating each vertex to form a backbone. Using the collapsed fatgraph representation, we see that, for a connected diagram over b backbones, the genus g of the surface (with boundary) is determined by the number n of arcs as well as the number r of boundary components, namely, 2 − 2g − r = v −e = b − n (Fig. 6).
Diagrams over one and two backbones are related by gluing, i.e., we have the mapping
where α(E) is obtained by keeping all arcs in E and connecting the 3′ end of R and the 5′ end of S (Fig. 7A).

In addition to gluing, there is another operation mapping a pairs of diagram over two backbones into a diagram over two backbones: given two diagrams over two backbones,
It is straightforward to see that • is an associative product with unit given by the diagram over two empty backbones. The product • is not commutative.
3. Shadows
Definition 1
A stack in a diagram is a maximal collection of parallel arcs of the form
A diagram determines a shadow by removing all non-crossing arcs, deleting all isolated vertices and collapsing each induced stack to a single arc as in Figure 8. We shall denote the shadow of a diagram X by σ(X), so σ2(X) = σ(X). Projecting into the shadow does not affect genus, i.e., g(X) = g(σ(X)). In case there are no crossing arcs, σ(X) becomes an empty diagram on the same number of backbones as X as in Figure 8C). By definition, any empty backbone contributes one boundary component. For example, for a diagram X over b backbones that contains no crossing arcs, σ(X) is a sequence of b empty backbones with b boundary components.

Shadows:
Let us begin by refining an observation about shadows over one backbone from Reidys et al. (2011):
Theorem 1
Shadows of genus g ≥ 1 over one backbone have the following properties:
Proof
First note that if there is more than one boundary component, then there must be an arc with different boundary components on its two sides and removing this arc decreases r by exactly one while preserving g since the number of arcs is given by n = 2g + r − 1. Furthermore, if there are νℓ boundary components of length ℓ in the polygonal model, then 2n = ∑ℓℓνℓ since each side of each arc is traversed once by the boundary. For a shadow, ν1 = 0 by definition, and ν2 ≤ 1 as one sees directly. It therefore follows that 2n = ∑ℓℓνℓ ≥ 3(r − 1) + 2, so 2n = 4g + 2r − 2 ≥ 3r − 1, i.e., 4g − 1 ≥ r. Thus, we have n = 2g + (4g − 1) − 1 = 6g − 2, i.e., any shadow can contain at most 6g − 2 arcs. The lower bound 2g follows directly from n = 2g + r − 1 since r ≥ 1.
Let S2g be a shadow containing 2g mutually crossing arcs, i.e., each arc crosses any of the remaining (2g − 1) arcs. S2g has genus g and contains a unique boundary component of length 4g, i.e., traversing 4g non-backbone arcs counted with multiplicity. We construct a new shadow S2g+1 of genus g containing 2g + 1 arcs, by inserting an arc crossing into S2g from the 5′ end of S2g such that the boundary component in S2g splits into one boundary component of length 3 and another of length 4g + 2 − 3 = 4g − 1. The latter becomes the first boundary component of S2g+1. The newly inserted arc is by construction crossing, splits a boundary component and preserves genus. We now prove the assertion by induction of the number of inserted arcs. By the induction hypothesis, there exists a shadow S2g+i of genus g having 2g + i arcs, whose first boundary component has length 4g − i. Again, we insert a crossing arc as described above thereby splitting the first boundary component into one of length 3 and the other of length (4g − (i + 1)). After i = 4g − 2 such insertions, we arrive at a shadow whose first boundary component has length 2 while all other boundary components have length 3. Accordingly, there exists a set

Constructing the sequence of shadows Sℓ for genus g = 2, see Theorem 1, for 2g = 4 ≤ ℓ ≤ 6g − 2 = 10. Newly inserted arcs are drawn bold.
Corollary 1
A shadow over two backbones has the following properties:
Proof
We first claim that any shadow of genus g over two backbones can be obtained by cutting the backbone of a shadow over one backbone having either genus g or g + 1. To see this, suppose we are given a shadow of genus g, having r boundary components and n arcs so that 2 − 2g − r = b − n, i.e., g = (2 + n − r − b) / 2, where b = 1. Cutting the backbone then either splits a boundary component or merges two distinct boundary components. Since cutting does not affect arcs and increases the number of backbones by one, we have the resulting genus
as was claimed. We next observe that a shadow of genus g = 0 over two backbones has at least 2 arcs, while the maximum number of arcs contained in such a shadow is given by 6(0 + 1) − 2 = 4. For g ≥ 1, it is impossible to cut a shadow of genus g having 2g arcs and keep the genus. Thus the shadow of genus g over two backbones has at least 2g + 1 arcs. We can always map an arbitrary shadow over two backbones of genus g via α into a shadow over one backbone, whence the assertion. Theorem 1 guarantees that there are only finitely many such shadows, and the corollary follows. ■
Corollary 2
There exist exactly seven non-trivial shadows over two backbones having genus 0.
Proof
There exists no non-trivial shadow over one backbone of genus 0 since 0-structures over one backbone are secondary structures containing exclusively non-crossing arcs. In view of Corollary 1, all non-trivial shadows over two backbones having genus 0 are therefore obtained by cutting the backbone of shadows of genus 1 over one backbone. By inspection, there are seven possible such cuts as in Figure 10. ■

The shadows over two backbones having genus 0 obtained by cutting the four shadows of genus 1 over one backbone.
4. Irreducibility
Definition 2
A diagram E over b backbones is called irreducible if and only if it is connected and for any two arcs, α1, αk contained in E, there exists a sequence of arcs
We proceed by refining Theorem 1:
Corollary 3
An irreducible shadow having genus g = 0 over two backbones contains at least 2 and at most 4 arcs, and for and 2 ≤ ℓ ≤ 4, there exists an irreducible shadow of genus g = 0 over two backbones having exactly ℓ arcs. An irreducible shadow having genus g ≥ 1 has the following properties:
Proof
Part (a) follows directly from Theorem 1, and for (b), the shadows
Definition 3
Let X be a diagram. We call S′ an irreducible shadow of X (irreducible X-shadow) if and only if S′ is an irreducible shadow and any arc in S′ is contained in X. Let
Clearly, our notion of irreducibility recovers for diagrams over one backbone that of Reidys et al. (2011) and Bon et al. (2008). A diagram D over one backbone can iteratively be decomposed by first removing all non-crossing arcs as well as isolated vertices and second by removing irreducible D-shadows iteratively as follows:
• one removes (i.e., cuts the backbone at two points and after removal merges the cut-points) irreducible D-shadows from bottom to top, i.e., such that there exists no irreducible S-shadow that is nested within the one previously removed. • if the removal of an irreducible D-shadow induces the formation of a non-trivial stack as in Figure 11, then it is collapsed into a single arc.

Removing irreducible shadows from “bottom to top.” Any stacks, that are induced by these removals are collapsed into single arcs.
We next extend the decomposition of diagrams over one backbone (Reidys et al., 2011) to diagrams over two backbones. Let E be a diagram over two backbones. By definition, irreducible E-shadows over two backbones are either connected or a disjoint union of two irreducible shadows over one backbone. Thus, E can be decomposed by removing first all non-crossing arcs as well as any isolated vertices and second all irreducible E-shadows in two rounds as follows:

Decomposition of a shadow over two backbones. First, from bottom to top, the only irreducible shadow over one backbone is removed. During its removal, a stack of length two is induced (bold arcs), which is projected into a single arc. Second, the two irreducible shadows over two backbones are iteratively removed.
5. γ-Structures Over Two Backbones
Definition 4
A diagram X over b backbones is a γ-structure over b backbones if and only if we have g(S′) ≤ γ for any irreducible X-shadow S′.
With foresight, we refine the notion of irreducible X-shadow as follows:
Lemma 1
Suppose E is a γ-structure over two backbones. Then
Proof
By construction, α(E) is a shadow over one backbone consisting of irreducible components of genus at most γ + 1. Thus, α(E) is a (γ + 1)-structure and
Let
Claim 1. Suppose
To prove this, we use the operation
and in view of eq. (5.2), Claim 1 follows.
Claim 2. If
We claim that
6. A Grammar For 0-Structures Over Two Backbones
In this section, we develop an unambiguous decomposition grammar
The key building blocks of • gap-structures: a gap structure • hybrid-structures: a hybrid structure • tight structures: a tight structure (TS) • pre-tight structures: a pre-tight structure (PTS) is a structure
Now we are in position to formulate the production rules of

The grammar
Lemma 2
Any 0-structure over two backbones can uniquely be decomposed via
Proof
First, we show that a 02-structure can uniquely be decomposed into blocks containing exclusively non-crossing arcs. We shall establish this by induction on the number of its irreducible shadows.
Induction basis: any 02-structure over two backbones that contains no shadow of genus zero over two backbones exhibits no crossing arcs. Namely, it contains only blocks that are either secondary structures or hybrids. Accordingly, such a structure can be decomposed uniquely via the context-free grammar of secondary structures or the unique decomposition of hybrid-structures.
Induction step: Suppose Em is a 02-structure containing m ≥ 1 irreducible shadows over two backbones of genus 0. We decompose from “inside to outside,” i.e., from the 3′-end of R and the 5′-end of S. Suppose we encounter a substructure S which collapses into an irreducible shadow over two backbones of genus 0. S itself determines a unique maximal tight structure, TS, such that σ(TS) = S. Removing TS from Em yields a diagram Em−1 over two backbones containing m − 1 irreducible shadows over two backbones of genus 0. The induction hypothesis guarantees the unique decomposition of Em−1 via
It remains to show how to decompose tight structures: the shadow of a tight structure is by construction irreducible and is given by one of the seven irreducible shadows over two backbones described in Corollary 2. In order to decompose a tight structure, we dissect it into maximal gap structures and hybrid-structures (which in turn collapse into interior and exterior arcs of the irreducible shadow, respectively), as well as secondary structures. All of these elements are
Finally, we show that
Theorem 2
The grammar
Proof
Assertion
of 02-structures, where R is the universal gas constant, T is the temperature, G(s) is energy of structure s over sequence x, and
As for assertion
The probabilities
where the sum is over all 02-interaction structures containing Ni,j;h,ℓ.
Suppose Ni,j;h,ℓ is obtained by decomposing θs. The conditional probabilities
From this backward recursion, one immediately derives a stochastic backtracing recursion from the probabilities of partial structures that generates a Boltzmann sample of 0-interaction structures; (Tacker et al., 1996; Ding and Lawrence, 2003; Huang et al., 2010). The basic data structure for this sampling is a stack A which stores blocks of the form (i, j; r, s, N), presenting interaction substructures of nonterminal symbols N. L is a set of base pairs storing those removed by the decomposition step in the grammar. We initialize with the block (1, n, I) in A, and L = Ø. In each step, we pick up one element in A and decompose it via the grammar with probability QM/QN, where QN is the partition function of the block which is picked up from A, and QM is the partition function of the target block which is decomposed by the rewriting rule. The base pairs which are removed in the decomposition step are moved to L. For instance for the decomposition rule of
The sampling step is iterated until A is empty. The resulting 02-interaction structure is given by the list L of base pairs. The probability of hybrid-structures can be calculated since a hybrid structure is by construction a block in the grammar, (Huang et al., 2010). The probability of interactions involving a fixed interval [i, j] is given by
A gap structure, representing a maximal non-crossing stem on either backbone is also a
In order to prove assertion
1.
2.
3.
4.
5.
By virtue of these new blocks, we may rewrite

The decomposition of
7. Discussion
In this article, we have introduced the toplogical filtration of RNA interaction structures and developed the notions of shadows, irreducibility and γ-structures for them. Shadows are of particular importance for the minimum free energy folding since they represent the basic motifs of genus g. Since we have proved that for any genus there are always finitely many such shadows, it is therefore in principle possible to assign them individual energies, which would presumably lead to high specificity.
The simplest topological class of RNA interaction structures is that of 0-structures over two backbones. This is the two-backbone analogue of the classical RNA secondary structures. Despite their simple irreducible shadows (Corollary 2), 0-structures over two backbones exhibit features not present in the AP-structures of Pervouchine (2004) and Alkan et al. (2006). Namely, they allow for pseudoknots formed by exterior arcs as reported, for instance, in Homo sapiens ACA27 snoRNA, (Figs. 3 and 15).

Let us next compare AP-structures and 0-structures over two backbones in more detail. Recall that an AP-structure, J(R, S, I), is a graph such that:
1. R, S are secondary structures, 2. I is a set of exterior arcs without external pseudoknots, 3. J(R, S, I) contains no zig-zags.
A tight AP-structure (R(TS)) is a substructure that cannot be decomposed via block decomposition (Huang et al., 2009, 2010). Accordingly, the shadow of a R(TS) is connected and hence irreducibile. R(TS) and tight structures of 0-structures over two backbones are distinct concepts. We have already observed that 0-structures over two backbones are not contained in the set of AP-structures. Likewise, AP-structures are not contained in the set of 0-structures over two backbones, for example, consider a shadow of a 0-structure over two backbones which consist of 3 < x distinct, irreducible shadows over two backbones having genus 0. According to Lemma. 1, the genus of this diagram is x − 1. Drawing an interior arc covering the R-endpoints of these x shadows tightly, the resulting diagram is by construction a R(TS) as in Figure 15. As inserting a single arc changes the genus at most by one, the diagram, R(TS), has genus ≥1, has an irreducible shadow and is consequently not a 0-structure over two backbones.
Footnotes
Disclosure Statement
No competing financial interests exist.
1
with respect to the a priori specified energy function.
2
Which are well-known to be genus zero structures over one backbone.
3
which are in fact O(n16) for
