Probing RNA Native Conformational Ensembles with Structural Constraints

Abstract

Noncoding ribonucleic acids (RNA) play a critical role in a wide variety of cellular processes, ranging from regulating gene expression to post-translational modification and protein synthesis. Their activity is modulated by highly dynamic exchanges between three-dimensional conformational substates, which are difficult to characterize experimentally and computationally. Here, we present an innovative, entirely kinematic computational procedure to efficiently explore the native ensemble of RNA molecules. Our procedure projects degrees of freedom onto a subspace of conformation space defined by distance constraints in the tertiary structure. The dimensionality reduction enables efficient exploration of conformational space. We show that the conformational distributions obtained with our method broadly sample the conformational landscape observed in NMR experiments. Compared to normal mode analysis-based exploration, our procedure diffuses faster through the experimental ensemble while also accessing conformational substates to greater precision. Our results suggest that conformational sampling with a highly reduced but fully atomistic representation of noncoding RNA expresses key features of their dynamic nature.

1. Introduction

Noncoding ribonucleic acid (RNA) molecules mediate a wide range of biological processes in the cell. Their function is often modulated by highly dynamic, conformational substates (Cruz and Westhof, 2009; Fonseca et al., 2014; Kim et al., 2014; Leulliot and Varani, 2001; van den Bedem and Fraser, 2015). Characterizing conformational substates of RNA holds the promise of uncovering functional mechanisms (Lipfert et al., 2007; Zhang et al., 2007) or predicting molecular interactions of RNA subunits (Guo, 2010; Rother et al., 2014) and protein-RNA complexes (Cléry et al., 2008), which, in turn, can lead to RNA-based therapeutics (Dorsett and Tuschl, 2004; Cooper et al., 2009) or nanomedicine (Zhou et al., 2011).

Conformational sampling procedures based on energy evaluations, such as Monte-Carlo (Frellsen et al., 2009; Landau and Binder, 2009) and molecular dynamics (MD) (Frenkel and Smit, 2001) can accurately explore the free-energy landscape of a molecule but are computationally expensive. Sampling techniques based on motion planning (Thomas et al., 2005; Al-Bluwi et al., 2012) and loop closure using inverse kinematics (Canutescu and Dunbrack, 2003; Coutsias et al., 2004; Cortés et al., 2004; Shehu et al., 2006; Yao et al., 2008) can significantly increase the efficiency of such methods. Constraint-based samplers rely exclusively on the geometry of the molecule and nonlocal constraints (Wells et al., 2005; Zavodszky et al., 2004; Yao et al., 2012) and easily jump large energy barriers to broadly sample conformational space. Normal mode analysis (NMA) and elastic network models are also popular constraint-based sampling methods that encode nonlocal interactions as harmonic restraints (Ma, 2005; Schröder et al., 2007; Chennubhotla et al., 2005; Al-Bluwi et al., 2013). The majority of these efficient techniques have only been implemented and tested on proteins.

In this study we present an inverse kinematics technique, kino geometric sampling for RNA (KGSrna), an efficient, constraint-based sampling procedure for RNA inspired by robotics. In KGSrna, an RNA molecule is represented with rotatable bonds as degrees of freedom (DOFs) and groups of atoms as rigid bodies. Noncovalent bonds are distance constraints that create nested cycles (Fig. 1b). To avoid breaking the noncovalent bonds, conformational changes in cycles require coordination (van den Bedem et al., 2005; Yao et al., 2008, 2012; Budday et al., 2015; Pachov and van den Bedem, 2015). Closed cycles greatly reduce conformational flexibility and consequently deform the biomolecule along preferred directions in the conformational landscape. We also integrated a differentiable parameterization of ribose conformations into the kinematic model.

FIG. 1.

Geometric constructions: tree and ribose. (a) The molecular graph (undirected), constructed from atoms, and the covalent and hydrogen bond networks. (b) The kinematic graph (undirected), constructed by edge-contracting all nonrotatable bonds in the molecular graph. (c) The kinematic tree (directed), constructed by finding a spanning tree in the kinematic graph. (d) Geometric characterization of ribose ring kinematics. The position of C1′ is determined from an ideal O4′-C1′ distance (yellow sphere), an ideal C1′-C2′ distance, and ideal C1′-C2′-C3′ angle (yellow circle).

In the remainder, we first detail the methodology and the implementation of KGSrna. Next, we demonstrate that KGSrna accurately recovers all representative models in the experimental NMR bundle starting from a single member. We then perform a direct comparison with the NMA method by Lopéz-Blanco et al. (2011), which shows KGSrna maintains high quality geometry of the molecules while locally exploring more diverse portions of conformational space.

2. Methods

The purpose of KGSrna is to sample the unweighted native ensemble of RNA molecules starting from a single member of an ensemble. For this purpose, KGSrna takes as input an initial conformation, q _init, and an exploration radius, r_init \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\in { {\mathbb R}}$$ \end{document} . First, a graph is constructed such that atoms are represented as vertices and covalent bonds and hydrogen bonds are edges. A minimal directed spanning tree is extracted from this graph and two conformational operators acting on this tree, the null space perturbation and the rebuild perturbation, are used to make conformational moves that maintain bonds in the graph. KGSrna then grows a pool of conformations by repeatedly perturbing a seed conformation, q _seed, selected among previously generated conformations in the pool (or q _init).

2.1. Construction of the tree

A graph G_m = (V_m, E_m) is constructed such that V_m contains all atoms and E_m contains all covalent or hydrogen bonds (see Fig. 1a). In RNA, only the hydrogen bonds A(N3)–U(H3) and G(H1)–C(N3) in canonical Watson-Crick (WC) base pairs are included as edges. WC base pairs are taken as all base pairs labeled XX, XIX in the Saenger nomenclature from RNAView (Yang et al., 2003).

Next, a compressed graph G_k = (V_k, E_k) is constructed from G_m by edge contracting members of E_m that correspond to (1) partial double bonds, (2) edges (u, v) where u or v has degree one, or (3) edges in pentameric rings (ribose in nucleic acids or proline in amino acids) (Fig. 1b). Each edge in E_k thus corresponds to a revolute joint, that is, a rotating bond with 1 degree of freedom, and vertices in V_k correspond to collections of atoms that form rigid bodies.

Finally, a rooted minimal spanning tree, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_k = ( V_k , E_k^{ \prime} )$$ \end{document} , is constructed from G_k (Fig. 1c). Forward kinematics are defined as propagation of atom coordinate transformations from the root of T_k, along the direction of edges in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$E_k^{ \prime}$$ \end{document} . Constraints are defined as all edges in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$C_k \equiv E_k \backslash \ E_k^{ \prime}$$ \end{document} . The two perturbation methods use the forward kinematics specified by T_k and an inverse kinematics method to maintain the constraints specified by C_k. As the two perturbation methods are approximations that can introduce small displacements of constraints, we assign a weight of 1 to covalent bonds and 2 to hydrogen bonds, and use Kruskal's algorithm for the spanning tree construction. This guarantees that covalent bonds are favored over hydrogen bonds for inclusion in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$E_k^{ \prime}$$ \end{document} . As the choice of root for T_k does not have any effect on the sampling of internal coordinates, we let the O5′ terminal of the first chain be the root.

2.2. Modeling the conformational flexibility of pentameric rings

The flexibility of RNA is particularly dependent on conformational flexibility of ribose rings (Leontis et al., 2006), but directly perturbing a torsional angle in pentameric rings breaks the geometry of the ring. While pseudorotational angles (Altona and Sundaralingam, 1972) are frequently used to characterize ribose conformations, they are not convenient for a kinematic model as the equations mapping a pseudorotation angle to atom positions are nontrivial. We therefore introduce a parameterization inspired by Ho et al. (2005) from a continuous differentiable variable τ to the backbone δ angle, (C5′-C4′-C3′-O3′) so ideal geometry of the ribose is maintained (Fig. 1d).

The positions of O4′, C4′, and C3′ are determined by (torsional) DOFs higher in the kinematic tree. The position of C2′ and the branch leaving C3′ in the kinematic tree is determined from the C5′-C4′-C3′-O3′ torsion, δ. Thus, only the remaining atom C1′ needs to be placed. Positions of C1′ with ideal C1′-C2′ distance and C1′-C2′-C3′ angle are represented by a circle (Fig. 1d), centered on the C3′-C2′ axis and having the C3′-C2′ axis as its normal vector. Positions of C1′ that have ideal C1′-O4′ distance are represented by a sphere centered on O4′. The position of C1′ is on either of the intersections between the sphere and the circle, indicated by the variable, u \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\,{\in}\,$$ \end{document} {−1, 1}.

To avoid using u, which is discontinuous, and δ, which is limited by the ring geometry, we introduce the periodic and continuous variable τ, which uniquely specifies both δ and u. Since δ is restricted to move in the range 120^∘ ± A where A is typically ≈40^∘, we set δ = 120^∘ + A cos τ. By defining u = sgn sin τ, the ribose conformation follows a continuous, differentiable, and periodic motion for τ \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\,{\in}\,$$ \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ {\mathbb R}}$$ \end{document} . This is essential as the inverse kinematics methods described in the following rely on taking position derivatives.

2.3. Null space perturbations

The full conformation of a molecule is represented as a vector q containing values of all DOFs, both torsions and τ. To make a conformational move, we perform a so-called null space projection of a random trial vector that ensures constraints stay together van den Bedem et al., 2005; Yao et al., 2012).

We use a constraint c \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\in}$$ \end{document} C_k with endpoints a and b and the paths L and R from each endpoint to their nearest common ancestor. Maintaining a constraint corresponds to maintaining the six equations \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}\textbf{\textit{f}}_L ( \textbf{\textit{a}} , \textbf{\textit{q}} ) = \textbf{\textit{f}}_R ( \textbf{\textit{a}} , \textbf{\textit{q}} ) \tag{1}\end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}\textbf{\textit{f}}_L ( \textbf{\textit{b}} , \textbf{\textit{q}} ) = \textbf{\textit{f}}_R ( \textbf{\textit{b}} , \textbf{\textit{q}} ) \tag{2}\end{align*} \end{document}

where f _L( x , q ) and f _R( x , q ) are the positions of x after applying forward kinematics of the DOFs in q along L and R respectively. We denote the subspace of conformations that satisfy these equations for all constraint the closure manifold. The first-order approximation of these equations can be written J d q = 0 where J is a 6 |C_k|× n matrix containing partial derivatives of endpoints with respect to the n DOFs. Solutions to this equation are in the null space of J , which constitute the tangent-space to the point q on the closure manifold. The right-singular vectors of the singular value decomposition J = UΣV ^T form a basis, N _J, for the null space of the Jacobian. As long as sufficiently small steps are taken in the null space it is possible to traverse any connected component of the closure manifold. A null space perturbation of q _seed is therefore performed by finding a small random trial vector Δ q and setting \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\textbf{\textit{q}}_{ \rm new} \leftarrow \textbf{\textit{q}}_{ \rm seed} + \textbf{\textit{N}}_J \textbf{\textit{N}}_J^T \Delta \textbf{\textit{q}}$$ \end{document} . The trial vector was scaled so its largest torsional component was at most 5.7 degrees. Computing the singular value decomposition of the Jacobian dominates the running time, so the Intel Math Kernel Library was used for its efficient parallel implementation of LAPACK. Sampling based only on null space perturbations is thus fast but might not always account for functionally important moves of individual nucleotides.

2.4. Rebuild perturbations

The conformation of ribose rings change when performing null space perturbations, but in general the changes are small enough that a full change from C3′-endo to C2′-endo is very rarely observed even in flexible loop regions. As shifts from one ribose conformation to another are frequent and biologically important in RNA molecules (Levitt and Warshel, 1978), a rebuild perturbation was designed that can completely change a ribose conformation and rebuild the backbone so the conformation stays on the closure manifold.

A rebuild perturbation first picks a segment of two nucleotides, neither of which are constrained by hydrogen bonds or aromatic stacking. It then disconnects the C4′-C5′ bond at the 3′ end of the segment, stores the positions of C4′ and C5′, and resamples the τ value of the two nucleotides, which breaks the C4′-C5′ bond.

To reclose the broken bond we let q ′ denote the backbone DOFs in the segment (not including τ-angles) and let e denote the end-effector vector, which points from the current positions of C4′ and C5′ to the stored ones. A first-order approximation to the problem of finding a vector q ′ that minimizes | e | can be written J d q ′ = e where J is the 6 × n′ Jacobian matrix containing the derivatives of end-points with respect to the n′ DOFs in q ′.

In general J is not invertible, so instead the pseudo-inverse, J ^†, which gives the least squares approximation solution to the above equation, is used. The pseudo-inverse can be found from the singular value decomposition of J : J ^† = V Σ ^† U ^T where Σ is a diagonal matrix with entries s_ii, and Σ ^† is a diagonal matrix with entries 1/s_ii if s_ii > 0 and 0 otherwise. To reclose the C4′-C5′ bond we therefore iteratively set q ′← q ′ + 0.1 · J ^† e until the distance between the original C4′ and C5′ atoms is less than 0.0001.

Ribose conformations in experimental structures mainly fall in two distinct peaks corresponding to C2′-endo and C3′-endo. To mimic this behavior, τ-angles are sampled using a mixture of wrapped normal distributions. The following bimodal distribution (Fig. 2) was obtained by fitting to the τ-angles of riboses taken from the high-resolution RNA dataset compiled by Bernauer et al. (Bernauer et al., 2011). \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}P ( \tau ) = 0.6 \cdot N ( \tau , 215^{ \circ} , 12^{ \circ} ) + 0.4 \cdot N ( \tau , 44^{ \circ} , 17^{ \circ} )\end{align*} \end{document}

FIG. 2.

Kino geometric sampling (KGS) illustrated by τ angle. (a) Distributions of ribose conformations in KGS samples and in the NMR-bundle of MLV readthrough pseudoknot (2LC8). Ribose conformations of 1000 samples are displayed vertically as color-coded histograms with a bin-width of 1.8 degrees. The top panel shows distributions without rebuilding steps and the bottom with rebuilding steps. Rebuild perturbations recover the full range of τ-angles in the NMR bundle for free nucleotides. The distribution from which τ-angles are sampled is shown on the right. The large peak corresponds to C3'-endo conformations and the smaller one to C2'-endo conformations. (b) The relationship between the τ-angle and the pseudorotational angle was introduced by Altona and Sundaralingam (1972) for all nucleotides in the benchmark set.

Only nucleotides that are not part of any base-pairing or stacking, as obtained by RNAView, were included.

After resampling a loop segment most loop closure methods tend to overly distort DOFs near the endpoint of the chosen segment. Our method addresses this by (1) resampling randomly chosen segments of tow nucleotides only so the end points are not always in the same location and (2) by using the inverse Jacobian method that tends to distribute the DOF-updates more evenly along the segment than for example, cyclic coordinate descent (Canutescu and Dunbrack, 2003) or random loop generation (Cortés et al., 2004).

2.5. Experimental design

A benchmark set of sixty RNA molecules (see Supplementary Table S1, available online at www.liebertpub.com/cmb) was compiled from the Biological Magnetic Resonance Bank (BMRB) (Ulrich et al., 2008) by downloading single-chain RNAs that contain more than 15 nucleotides and are solved with NMR spectroscopy. RNAs with high sequence similarity were removed so the edit-distance between the sequences of any pair was at least five.

For each molecule in the benchmark set, the first NMR model is chosen as q _init, and a pool of conformations are generated by repeatedly perturbing a seed conformation and placing the new conformation in the pool. The seed conformation is selected from the pool of existing conformation by picking a random nonempty interval of width r_init/100 between 0 and r_init. If there is more than one conformation in the pool whose distance to q _init falls within this interval, a completely random conformation is generated and the conformation nearest to the random structure is chosen as q _seed. This guarantees that samples in sparsely populated regions within the exploration radius are more likely to be chosen as seeds and that the sample population will distribute widely. A rebuild perturbation of two free nucleotides or a null space perturbation is then performed at a rate of 10/90. A null space perturbation can start from a seed generated by a rebuild perturbation or vice versa, allowing detailed exploration of remote parts of conformation space.

If a new conformation contains a clash between two atoms it is rejected and a new seed is chosen. An efficient grid-indexing method is used for clash detection by overlapping van der Waals radii (Halperin and Overmars, 1994). The van der Waals radii were scaled by a factor of 0.5.

The iMod toolkit (Lopéz-Blanco et al., 2011) uses normal mode analysis (NMA) in internal coordinates to explore conformational flexibility of biomolecular structures, for instance via vibrational analysis, pathway analysis, and Monte-Carlo sampling. The iMod Monte-Carlo sampling application was used for comparison with KGSrna and run with the default settings: heavy-atoms, five top eigenvectors, 1000 Monte-Carlo iterations per output structure, and a temperature of 300K.

3. Results and Discussion

To assess the performance of our model in representing RNA modes of deformation, we compared the distribution of our samples to the available NMR bundles. For this purpose, we performed sampling runs that all start from a single member of the NMR bundle and diffuse out to a predefined exploration radius. We define the exploration width as the ability of KGSrna to quickly diffuse away from the starting conformation and the exploration accuracy as the ability to sample conformations close to any biologically relevant member of the native ensemble. To evaluate the width and accuracy of the exploration we consider NMR models as representative members of the native ensemble and measure how close to KGSrna samples these members are, both in terms of local measures (τ-angle distributions) and in terms of full chain measures (RMSD). KGSrna was used to generate 1,000 samples, starting from the first model of each of the sixty RNA structures in the benchmark set (Supplementary Table S1). The largest RMSD distance between any two models was used as the exploration radius for that molecule. The sampling took on average 372 seconds on an Intel Xeon E5-2670 CPU.

3.1. Broad and accurate atomic-scale sampling of the native ensemble

To assess the importance of the rebuilding procedure we evaluated the sampling with and without rebuild perturbations. Figure 2a illustrates distributions of the τ angle in KGSrna samples and NMR bundle structures for the Moloney MLV readthrough pseudoknot (PDB-id 2LC8). Without a rebuilding step, KGSrna samples show a very narrow sampling in the geometrically constrained loop region starting at nucleotide 40. With rebuilding enabled, the distributions of τ-angles widen significantly and all ribose conformations present in the NMR bundle are reproduced in the KGSrna sampling. When sampling without rebuilding, 9 out of the 196 nucleotides in the benchmark set that have both C3′-endo and C2′-endo conformations are fully recovered. When enabling rebuild perturbations all but four ribose conformations (98%) are recovered. These four are all in less common conformations, such as O4′-endo or C1′-endo. Supplementary Figure S1 shows the effects of KGSrna sampling with rebuilding on a δ/ε-plot.

Traditionally, ribose conformations are described using the pseudorotation angle, P, which depends on all five torsions in the ribose ring (Altona and Sundaralingam, 1972). Figure 2b shows the relationship between τ and P for all nucleotides in the benchmark set. While the two are not linearly related there is a monotonic relationship indicating that τ is as useful as P in characterizing ribose conformations in addition to being usable as a differentiable degree of freedom in a kinematic linkage.

3.2. Large-scale deformations

We evaluated the performance of KGSrna in probing conformational states on whole-molecule scale using the root mean square deviation (RMSD) of C4′ coordinates after optimal superposition. Figure 3a shows the evolution of the minimum and maximum distance from each of the 10 NMR bundle structures to the KGSrna sample of the Moloney MLV readthrough pseudoknot (2LC8) as the sampling progresses. The sampling has expanded to the limits of the exploration radius after 400 samples. The minimum distance to each of the noninitial NMR bundle conformations quickly converges to approximately 2Å RMSD. Both these trends are consistent across the benchmark set with an average minimum RMSD of 1.2Å as shown in Supplementary Table S1.

FIG. 3.

Conformational exploration of KGSrna at molecular scale illustrated using the Moloney MLV readthrough pseudoknot (2LC8). (a) The evolution of smallest (lower bright-green curves) and largest (upper dark-green curves) RMSD as the sampling progresses. RMSD distances are measured to each of the 10 structures in the NMR bundle (initial in bold). (b) The conformation of the initial structure with 25 randomly chosen samples superposed. The color and thickness of the backbone indicates the degree of flexibility for nearby degrees of freedom. Very flexible regions are shown as thick and red-shifted while rigid regions are thin and green.

Regions of the molecule that are either constrained by tight sterics or by hydrogen bonds are difficult to deform, which is implicitly represented in KGSrna's model of flexibility. Figure 3b uses color-coding to highlight the regions of 2LC8 where the degrees of freedom show a particularly high variance. The base-paired regions that are tightly woven in a double helix show little flexibility while the unconstrained loop region displays the highest degree of flexibility. Even though the O3′-terminal end (right-most side of Fig. 3a) does not by itself display a large degree of flexibility, it still moves over a large range as shown by the 25 randomly chosen overlaid KGSrna samples.

3.3. KGSrna as an alternative to NMA

The iMod Monte-Carlo application (iMC) is one of the state-of-the-art methods most directly comparable to KGSrna as it efficiently performs large conformational moves that reflect the major modes of deformation of biomolecules.

Figure 4a and b shows results of running iMC for 1,000 iterations on the Moloney MLV readthrough pseudoknot (2LC8). While KGSrna is able to sample sugar conformations widely, the standard deviation of τ is less than 1 degree for all nucleotides in the iMC sample set. Supplementary Figure S2 shows a similar comparison for the remaining backbone torsions. Furthermore, KGSrna samples widely and reaches the exploration radius of 5Å after 400 samples, while iMC has converged on 3.3Å after 1,000 samples. KGSrna generates structures closer than 2Å to an NMR bundle conformation while the best iMC conformation is just over 2.5Å from its nearest NMR bundle structure. This indicates both a broader exploration width and higher exploration accuracy of KGSrna compared to iMC.

FIG. 4.

Conformational exploration of iMC illustrated using the Moloney MLV readthrough pseudoknot (2LC8). (a) Distributions of ribose conformations in 1000 iMC samples started from the same molecule and displayed on the same scale as KGSrna samples in Figure. 2. (b) The evolution of minimum (light red curves) and maximum (dark red curves) C4' RMSD as the iMC sampling progresses. Minimum (resp. maximum) KGSrna curves are provided in light (resp. dark) green for reference. This panel is directly comparable to Figure 3a. (c) Distributions of hydrogen bond lengths in WC base pairs. The vast majority of samples generated by KGSrna has hydrogen bonds that fluctuate by less than 1Å. The same trend was observed over the rest of the benchmark set as well (data not shown).

Figure 4c shows distributions of hydrogen bond length in WC base pairs in the 1,000 samples from iMC and KGSrna respectively. The average standard deviation of hydrogen bond distances is 1.04Å for iMC base pairs, which for most applications would constitute a full break of the bond. The standard deviation is only 0.33Å for KGSrna. The source of hydrogen bond fluctuations in KGSrna is primarily the null space moves, where a relatively high step size causes the first-order approximations to introduce small deviations from the closure manifold.

An alternative approach to model the flexibility of a molecule is to predict its structure “de novo” from the sequence and secondary structure. The resulting set of “decoys” can be used as representatives of a conformational ensemble. The macromolecular conformations by symbolic programming method (MC-sym) is an RNA 3D structure modeling system that takes as input the sequence and secondary structure in dot-bracket notation (Lemieux and Major, 2002). Figure 5 shows the 175 structures generated by running MC-sym for 96 hours on the primary and secondary structure of 2LC8:

FIG. 5.

Conformational exploration of MC-sym illustrated using the Moloney MLV readthrough pseudoknot (2LC8). (a) Distributions of ribose conformations in 175 MC-sym samples generated using the sequence and secondary structure of 2LC8 (comparable to Fig. 2). (b) The evolution of minimum (light curves) and maximum (dark curves) C4' RMSD as the MC-sym sampling progresses.

GGUCAGGGUCAGGAGCCCCCCCCUGAACCCAGGAUAACCCUCAAAGUCGGGGGGCA

((((((((((((..[[[[[[[.))))..))))))))............]]]]]]].

While some flexibility is permitted for sugar conformations, they are relatively limited for most residues and fail to recover the NMR bundle conformation in the flexible loop region. The minimum RMSD to any NMR bundle member is a little under 5Å, which was the largest permitted exploration radius for KGS. This indicates that only very few of the MC-sym structures reached the native ensemble. The fragment assembly of RNA with full-atom refinement method (FARFAR) is another de novo method, but it is mainly tested on nucleic acid chains of length 20 or less. The server version rejects any chain length longer than 32 residues, which excludes a large portion of our benchmark set (Das et al., 2010; Lyskov et al., 2013).

4. Conclusion

As opposed to MD simulations, nondeterministic sampling algorithms coupled with simplified, knowledge-based potentials provide no information on dynamics but can broadly explore the conformational landscape (Bernauer et al., 2011; Das and Baker, 2007; van den Bedem and Fraser, 2015). Our analysis demonstrates that conformational ensembles of noncoding RNAs in solution are accessible from efficiently sampling coordinated changes in rotational degrees-of-freedom that preserve the hydrogen bonding network. Each member of a synthetic ensemble was approximated to within 2Å on average by a KGSrna sampled conformation on a benchmark set of sixty noncoding RNAs without relying on a forcefield. By contrast, an NMA-based sampling algorithm diffuses through the folded state at a slower rate, approximating each ensemble member with 25% less accuracy. Additionally, de novo methods that model conformational space only from primary and secondary structures were demonstrated to require much longer computation times while obtaining less sampling accuracy, which is not surprising given the more demanding nature of the problem they seek to solve.

Hydrogen bonds and similar noncovalent constraints, like hydrophobic interactions, encode preferred pathways on the conformational landscape, enabling our procedure to efficiently probe the conformational diversity resulting from equilibrium fluctuations of the ensemble. Our procedure is generic, atomically detailed, mathematically well founded, and makes minimal assumptions on the nature of atomic interactions. Combined with experimental data, it can provide insight into which substates are adopted. Our procedure is easily adapted to DNA, and protein–protein or protein–nucleic acid complexes. It could provide insights on the flexibility of interesting systems such as RNA aptamers, RNA–protein recognition, or possibly characterize riboswitch structures. Software is available online (KGS, 2016).

Footnotes

Acknowledgments

This work is part of the ITSNAP Associate Team. We thank the Inria Équipe Associée program for financial support. J.B. acknowledges access to the HPC resources of TGCC under the allocation t2013077065 made by GENCI.

This work was supported by the U.S. National Institute of General Medical Sciences Protein Structure Initiative [U54GM094586] and by a SLAC National Accelerator Laboratory LDRD (Laboratory Directed Research and Development) grant [SLAC-LDRD-0014-13-2 to H.v.d.B].

Author Disclosure Statement

No competing financial interests exist.

References

Al-Bluwi

, Siméon

, and Cortés

2012. Motion planning algorithms for molecular simulations: a survey. Comput. Sci. Rev., 6, 125–143.

Al-Bluwi

, Vaisset

, Siméon

, et al. 2013. Modeling protein conformational transitions by a combination of coarsegrained normal mode analysis and robotics-inspired methods. BMC Struct. Biol., 13(Suppl 1), S2.

Altona

, and Sundaralingam

1972. Conformational analysis of the sugar ring in nucleosides and nucleotides: new description using the concept of pseudorotation. J. Am. Chem. Soc., 94, 8205–8212.

Bernauer

, Huang

, Sim

A.Y.L.

, et al. 2011. Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation. RNA, 17, 1066–1075.

Budday

, Leyendecker

, and van den Bedem

2015. Geometric analysis characterizes molecular rigidity in generic and non-generic protein configurations. J. Mech. Phys. Solids, 83, 36–47.

Canutescu

A.A.

, and Dunbrack

R.L.

2003. Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12, 963–972.

Chennubhotla

, Rader

A.J.

, Yang

L.W.

, et al. 2005. Elastic network models for understanding biomolecular machinery: from enzymes to supramolecular assemblies. Phys. Biol., 2, S173.

Cléry

, Blatter

, and Allain

F.H.T.

2008. RNA recognition motifs: boring? Not quite. Curr. Opin. Struct. Biol., 18, 290–298.

Cooper

T.A.

, Wan

, and Dreyfuss

2009. RNA and disease. Cell, 136, 777–793.

10.

Cortés

, Siméon

, Remaud-Siméon

, et al. 2004. Geometric algorithms for the conformational analysis of long protein loops. J. Comput. Chem. 25, 956–967.

11.

Coutsias

E.A.

, Seok

, Jacobson

M.P.

, et al. 2004. A kinematic view of loop closure. J. Comput. Chem., 25, 510–528.

12.

Cruz

J.A.

, and Westhof

2009. The dynamic landscapes of RNA architecture. Cell, 136, 604–609.

13.

Das

, and Baker

2007. Automated de novo prediction of nativelike RNA tertiary structures. Proc. Natl. Acad. Sci. USA, 104, 14664–14669.

14.

Das

, Karanicolas

, and Baker

2010. Atomic accuracy in predicting and designing noncanonical RNA structure. Nat. Methods, 7, 291–294.

15.

Dorsett

, and Tuschl

2004. siRNAs: applications in functional genomics and potential as therapeutics. Nat. Rev. Drug Discov., 3, 318–329.

16.

Fonseca

, Pachov

D.V.

, Bernauer

, et al. 2014. Characterizing RNA ensembles from NMR data with kinematic models. Nucleic Acids Res. 42, 9562–9572.

17.

Frellsen

, Moltke

, Thiim

, et al. 2009. A probabilistic model of RNA conformational space. PLoS Comput. Biol., 5, e1000406–e1000406.

18.

Frenkel

, and Smit

2001. Understanding Molecular Simulation: From Algorithms to Applications, vol. 1. Academic Press, New York, NY.

19.

Guo

2010. The emerging field of RNA nanotechnology. Nat. Nanotechnol., 5, 833–842.

20.

Halperin

, and Overmars

M.H.

1994. Spheres, molecules, and hidden surface removal. Comput. Geom., 11, 113–122.

21.

B.K.

, Coutsias

E.A.

, Seok

, et al. 2005. The flexibility in the proline ring couples to the protein backbone. Protein Sci. 14, 1011–1018.

22.

KGS, 2016. https://simtk.org/home/kgs/.

23.

Kim

, Abeysirigunawarden

S.C.

, Chen

, et al. 2014. Protein-guided RNA dynamics during early ribosome assembly. Nature, 506, 334–338.

24.

Landau

D.P.

, and Binder

2009. A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge University Press, Cambridge.

25.

Lemieux

, and Major

2002. RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire. Nucleic Acids Res. 30, 4250–4263.

26.

Leontis

N.B.

, Lescoute

, and Westhof

2006. The building blocks and motifs of RNA architecture. Curr. Opin. Struct. Biol., 16, 279–287.

27.

Leulliot

, and Varani

2001. Current topics in RNA-protein recognition: control of specificity and biological function through induced fit and conformational capture. Biochemistry, 40, 7947–7956.

28.

Levitt

, and Warshel

1978. Extreme conformational flexibility of the furanose ring in DNA and RNA. J. Am. Chem. Soc., 100, 2607–2613.

29.

Lipfert

, Das

, Chu

V.B.

, et al. 2007. Structural transitions and thermodynamics of a glycine-dependent riboswitch from vibrio cholerae. J. Mol. Biol., 365, 1393–1406.

30.

Lopéz-Blanco

J.R.

, Garzón

J.I.

, and Chacón

2011. iMod: multipurpose normal mode analysis in internal coordinates. Bioinformatics, 27, 2843–2850.

31.

Lyskov

, Chou

F.C.

, Conchúir

, et al. 2013. Serverification of molecular modeling applications: the rosetta online server that includes everyone (rosie). PLoS One, 8, e63906.

32.

2005. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure, 13, 373–380.

33.

Pachov

, and van den Bedem

2015. Nullspace sampling with holonomic constraints reveals molecular mechanisms of protein gas. PLoS Comput. Biol., 11, e1004361.

34.

Rother

, Rother

, Skiba

, et al. 2014. Automated modeling of RNA 3d structure. Methods Mol. Biol., 1097, 395–415.

35.

Schröder

G.F.

, Brunger

A.T.

, and Levitt

2007. Combining efficient conformational sampling with a deformable elastic network model facilitates structure refinement at low resolution. Structure, 15, 1630–1641.

36.

Shehu

, Clementi

, and Kavraki

2006. Modeling protein conformational ensembles: from missing loops to equilibrium fluctuations. Proteins, 65, 164–179.

37.

Thomas

, Song

, and Amato

N.M.

2005. Protein folding by motion planning. Phys. Biol., 2, S148.

38.

Ulrich

E.L.

, Akutsu

, Doreleijers

J.F.

, et al. 2008. BioMagResBank. Nucleic Acids Res. 36(Suppl 1), D402–D408.

39.

van den Bedem

, and Fraser

J.S.

2015. Integrative, dynamic structural biology at atomic resolution—it's about time. Nat. Methods, 61, 307–318.

40.

van den Bedem

, Lotan

, Latombe

J.C.

, et al. 2005. Real-space protein-model completion: an inverse-kinematics approach. Acta Crystallogr. D Biol. Crystallogr., 61, 2–13.

41.

Wells

, Menor

, Hespenheide

, et al. 2005. Constrained geometric simulation of diffusive motion in proteins. Phys. Biol., 2, S127.

42.

Yang

, Jossinet

, Leontis

, et al. 2003. Tools for the automatic identification and classification of RNA base pairs. Nucleic Acids Res. 31, 32:3450–3460.

43.

Yao

, Dhanik

, Marz

, et al. 2008. Efficient algorithms to explore conformation spaces of flexible protein loops. IEEE/ACM Trans. Comput. Biol. Bioinform., 5, 534–545.

44.

Yao

, Zhang

, and Latombe

J.C.

2012. Sampling-based exploration of folded state of a protein under kinematic and geometric constraints. Proteins, 80, 25–43.

45.

Zavodszky

M.I.

, Lei

, Thorpe

M.F.

, et al. 2004. Modeling correlated main-chain motions in proteins for flexible molecular recognition. Proteins, 57, 243–261.

46.

Zhang

, Stelzer

A.C.

, Fisher

C.K.

, et al. 2007. Visualizing spatially correlated dynamics that directs RNA conformational transitions. Nature, 450, 1263–1267.

47.

Zhou

, Shu

, Guo

, et al. 2011. Dual functional RNA nanoparticles containing phi29 motor pRNA and anti-gp120 aptamer for cell-type specific delivery and HIV-1 inhibition. Methods, 54, 284–294.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.50 MB