Evidence of a Direct Evolutionary Selection for Strong Folding and Mutational Robustness Within HIV Coding Regions

Abstract

A large number of studies demonstrated the importance of different HIV RNA structural elements at all stages of the viral life cycle. Nevertheless, the significance of many of these structures is unknown, and plausibly new regions containing RNA structure-mediated regulatory signals remain to be identified. An important characteristic of genomic regions carrying functionally significant secondary structures is their mutational robustness, that is, the extent to which a sequence remains constant in spite of despite mutations in terms of its underlying secondary structure. Structural robustness to mutations is expected to be important in the case of functional RNA structures in viruses with high mutation rate; it may prevent fitness loss due to disruption of possibly functional conformations, pointing to the specific significance of the corresponding genomic region. In the current work, we perform a genome-wide computational analysis to detect signals of a direct evolutionary selection for strong folding and RNA structure-based mutational robustness within HIV coding sequences. We provide evidence that specific regions of HIV structural genes undergo an evolutionary selection for strong folding; in addition, we demonstrate that HIV Rev responsive element seems to undergo a direct evolutionary selection for increased secondary structure robustness to point mutations. We believe that our analysis may enable a better understanding of viral evolutionary dynamics at the RNA structural level and may benefit to practical efforts of engineering antiviral vaccines and novel therapeutic approaches.

1. Background

HIV is an extremely lethal human viral pathogen that causes about 1–2 million of new contaminations and deaths each year, with more than 35 million carriers worldwide. Despite being recognized as a global public health concern, today, there is no approved vaccine or preventive therapy for the HIV infection.

In addition to being a virus of an immense clinical significance, HIV possesses important biophysical properties making it especially appealing for researchers. An extremely high mutation rate (Cuevas et al., 2015), a complex coding capacity despite its relatively small genome size (Frankel and Young, 1998; Hryckiewicz et al., 2011), and a variety of expression regulation mechanisms and codes such as frameshifts, alternative splicing, and secondary structures (Jacks et al., 1988; Malim et al., 1989; Cullen, 1991; Parkin et al., 1992; Damgaard et al., 2004; Seelamgari et al., 2004; Goff, 2007; Stoltzfus, 2009; Brakier-Gingras et al., 2012; Karn and Stoltzfus, 2012; Schiralli Lester and Henderson, 2012), together with availability of reachable experimental data (e.g., Shankarappa et al., 1999; Liu et al., 2006; Watts et al., 2009; Zanini et al., 2015) and high-quality annotated genomic sequences (“HIV Databases,” n.d.), turn HIV into a good model system for studying viral evolution and replication (Shankarappa et al., 1999; Liu et al., 2006; Zanini and Neher, 2013).

HIV and other viruses undergo a rapid evolutionary selection that enables them to evade the host immune system and to efficiently compete with endogenous transcripts of the host cell over the gene expression machinery. Mechanisms that facilitate an efficient and selective viral replication are inherent in the nucleotide composition of the viral genomic sequence itself and can involve the recruitment and/or modification of specific host factors. Nonsynonymous mutations that alter the amino acid sequence provide a distinct evolutionary advantage due to selective pressure, allowing viruses to escape from innate defense mechanisms and acquired immune surveillance of the host, and to rapidly adapt to new cell types, tissues, or species. Yet, genomes (and even coding sequences), both viral and of other organisms, not only code for protein products but also carry additional information encrypted in the composition of alternating codons (Lavner and Kotlar, 2005; Parmley and Huynen, 2009; Tuller et al., 2010; Plotkin and Kudla, 2011; Li et al., 2012; Zur and Tuller, 2012; Mortimer et al., 2014; Tuller and Zur, 2014; Goz and Tuller, 2015). Being related to different biophysical and evolutionary characteristics, this information can be induced by synonymous mutations, which preserve the underlying protein, and may play an important regulatory role in different viral replication stages.

In particular, an important feature of viral RNA, which is encoded in synonymous (and nonsynonymous) aspects of its nucleotide composition, is its structure/conformation that is formed by the molecule folding upon itself as a result of hydrogen bonds and other biochemical connections acting between pairs of nucleotides. The number and strength of these bonds determine the minimum free folding energy (MFE), which is related to the folding strength of a structure; more negative MFE is related to possibly stronger and more stable folding, whereas less negative MFE corresponds to weaker and less structured conformations.

A large number of experimental and computational studies in HIV and other viruses and organisms demonstrated the importance of different RNA structural elements at all stages of the viral life cycle. Among others, these elements were found to be involved in transcription activation, reverse transcription initiation, facilitation of genomic dimerization, direction of HIV packaging, manipulation of reading frames, regulation of RNA nuclear export, regulation of translation and replication, polyadenylation signaling, and interactions with viral and host proteins (Malim et al., 1989; Cullen, 1991; Abbink and Berkhout, 2003; Damgaard et al., 2004; D'Souza and Summers, 2005; Gaudin et al., 2005; Moore and Hu, 2009; Stoltzfus, 2009; Watts et al., 2009; Karn and Stoltzfus, 2012). An RNA secondary structure model for the complete HIV genome has recently been published based on SHAPE technology (Watts et al., 2009); as a result, several well-known RNA structures were confirmed, and numerous new structured motifs were proposed. Nevertheless, the significance of many of these structures is unknown, and plausibly, new regions containing RNA structure-mediated regulatory signals remain to be identified.

An important characteristic of genomic regions carrying functionally significant conformations is their mutational robustness, that is, the extent to which the phenotype associated with the secondary structure remains invariant despite mutations. In the light of the extremely high error rate in the process of HIV replication, robustness may protect viruses in the face of deleterious mutation and prevent fitness loss due to disruption of possibly functional secondary structures.

Several previous studies have addressed the evolution of secondary structure robustness in viruses and other organisms. For example, Borenstein and Ruppin (2006) provided an evidence of a direct evolutionary selection of mutational robustness in micro RNA molecules, whereas Tuller et al. (2011) suggested a signal of mutational robustness at the beginning of the mRNA. In Wagner and Stadler (1999), it was shown that conserved RNA secondary structure elements are consistently more robust than nonconserved elements in different groups of RNA viruses (in particular, trans-activation response element (TAR) and Rev responsive element (RRE) elements of HIV were analyzed). In Churkin et al. (2010), the authors demonstrated the direct evolution of mutational robustness in the functionally important structural elements of the Hepatitis C virus (HCV); in Waldispühl et al. (2008), the structure robustness of HCV cis-acting replication element and HIV TAR was investigated.

In the current work, we performed for the first time, a genome-wide computational analysis of HIV coding regions aiming to identify significant signals encoded by synonymous information that are related to evolutionary selection for strong mRNA folding. Further, we focused on sites spanned by the predicted, evolutionarily significant, and local structural elements to analyze whether in addition to being selected for stability, their structures also undergo a direct selection pressure toward increased mutational robustness.

The study in Wagner and Stadler (1999), included, among others, the analysis of two HIV elements (TAR and RRE) and was based on statistical comparison of conserved and not conserved regions from the viral genome; in contrast, we analyzed the selection for a structural robustness on predicted and significantly stable sites within the coding regions, basing on randomization models inspired by the previous studies on other biological entities (Borenstein and Ruppin, 2006; Churkin et al., 2010). These randomization models exclude the possibility of an implicit enhancement of robustness as a side effect of specific sequence (e.g., amino acid content and order of the viral proteins) and structure (e.g., the strength of the RNA folding) characteristics, and permit to reveal statistically significant signals of a direct selection pressure. HIV robustness was also addressed in Waldispühl et al. (2008) by computing energetically favorable multiple-point mutations and studying their impact on the robustness of the secondary structure; however, this study analyzed only the TAR noncoding element and did not make use of a null model to provide the statistical significance of the reported findings and to control for factors that could have a roundabout effect on robustness (as mentioned above).

We believe that our analysis will enable a better understanding of viral evolutionary dynamics at the structural level and may benefit practical efforts of engineering antiviral vaccines and novel therapeutic approaches.

2. Results

The different general stages of the study appear in Figure 1 (see Section 4 for additional details). HIV coding regions of the HXB2 HIV strain were prepared according to published annotations (Fig. 1A I). To analyze the evolutionary selection for strong folding in local genomic elements, we used a reference model which promises that signals of increased structural stability cannot be explained by some of the fundamental properties of the wild-type sequences— the amino acid composition of the encoded proteins and/or disrupted stacking base-pairs (dinucleotide and amino acid preserving model). To this aim, we designed 1000 randomized variants that preserved both the encoded protein and the distribution of frequencies of pairs of adjacent nucleotides (dinucleotides) known to have a significant impact on the folding strength (Zuker and Stiegler, 1981; Nussinov, 1984; Workman, 1999) (Fig. 1A II).

FIG. 1.

(A) Flow diagram of the study. Our study includes the following general steps (details are in the main text): I. Nine HIV coding regions were prepared according to HXB2 reference strain annotations. II. Each of thecoding regions was randomized 1000 times based on dinucleotide and amino acid preserving randomization model. III. Local MFE profiles were predicted in all 150 nt genomic windows for each wild-type and randomized sequences, separately. IV. Wild-type and randomized MFE profiles were compared to identify genomic sites with significantly strong folding signals (p-value < 0.01, BH-FDR = 0.01). V. 1000 dinucleotide and amino acid preserving randomized variants were generated for each selected region. VI. 1000 structure preserving variants were generated for each selected region. VII. SMR was computed for each selected region and its randomized variants. VIII. Wild-type and randomized SMR values were compared to identify regions in which local RNA structure has evolved a significantly increased robustness to mutations (p-value < 0.05). (B). Prediction of MFE in local windows (red broken-line square) along the coding sequence (brown); each position i in the sequence was assigned with the MFE value predicted in the 150 nt window starting at this position. (C). Modification of the wild-type secondary structure (top) after introducing a single-point mutation (bottom); the mutated nucleotide is marked in red; the base-pair distance between the wild-type and mutated secondary structures (number of changes in the base-pair connections required to transferring one structure into another) in this example is d = 13. (D). Computation of the SMR: L, sequence length; d, base-pair distance between the secondary structure of the wild-type sequence (S(WT)) and the secondary structure of the mutant (S(MT)), averaged over all single-point mutants at all positions along the sequence (total of 3L mutants). MFE, minimum free folding energy; SMR, structure- based mutational robustness.

MFE was estimated in all genomic windows of length 150 nt within each wild-type coding region and its randomized variants, and the resulting values were used to construct local MFE profiles; each position in a profile contained an MFE value computed in a window starting at this position (Fig. 1A III, B).

To identify positions along the coding regions that were possibly selected during the course of viral evolution for significantly strong folding (i.e., more negative MFE), we investigated the positionwise statistical differences between the MFE profiles corresponding to the wild-type sequences and MFE profiles of their randomized variants (Fig. 1A IV). For each coding sequence, we identified positions for which the MFE values were found to be lower than in more than 99% of the corresponding randomized variants (i.e., positions with empiric MFE-associated p-value <0.01) with the Benjamini–Hochberg false discovery rate (BH-FDR) of 1%; regions spanned by these positions have significantly less negative MFE in comparison to random and are conjectured to undergo an evolutionary selection for strong folding (for convenience we call them MFE-selected positions/regions).

To investigate which of the MFE-selected regions have also evolved an increased robustness to mutations associated with its secondary structure, we first defined the structure-based mutational robustness (SMR) measure as an extent to which single-point mutants tend to modify the secondary structure of a region (Fig. 1C, D). The SMR takes values between 0 and 1; the higher the value, the higher is the robustness.

To infer a direct selection for SMR of specific genomic regions and exclude the effect of intrinsic robustness associated with the increased stability of the secondary structure (strongly affected by stacking pairs of nucleotides) and/or the selection for amino acid order and/or content, we used the aforementioned dinucleotide and amino acid preserving randomization model (Fig. 1V).

In addition, to provide the evidence that the increased structural robustness to mutations has evolved independently and not as a side effect of the evolution of specific conformations (certain conformations may be inherently more robust than others), it was essential to demonstrate that a native secondary structure is significantly more robust than phenotypically similar randomized variants. To this aim, we used a second randomization model that controls for the effect of the specific conformation by generating random sequences with exactly the same structure as in the wild type (but not necessary preserving the encoded peptide) (Fig. 1VI).

These randomization models aimed at providing an initial answer to the question whether the SMR values of the analyzed sites can be explained by nondirect factors, other than mutational robustness, such as the selection for specific secondary structure, the selection for folding strength, and/or amino-acid compositions.

For every MFE-selected region, we computed the SMR measure for wild-type coding sequences and their variants, generated by each one of the randomization models (1000 variants per model; Fig. 1VII). The wild-type and randomized SMR values were then compared (for each region and randomization model separately); a region, for which the wild-type SMR was found to be higher than in 95% of the variants (SMR-associated p-value <0.05) for both randomization models (Fig. 1VIII), was hypothesized to undergo a direct evolutional selection for SMR.

In the following section, we present a detailed analysis of our findings. More details about the methods used in the study can be found in Section 4.

2.1. Evidence that specific regions of HIV structural genes undergo an evolutionary selection for strong folding

In each one of the HIV structural genes (gag, pol, and env), one cluster of MFE- selected positions was identified. In env and pol genes, these clusters overlap with the RRE and the ribosomal frameshift site, both known to contain important functional secondary structures (Gaudin et al., 2005; Fernandes et al., 2012).

To the best of our knowledge, the cluster identified in gag gene does not correspond to any known functional site and may play important (yet unknown) roles in the viral life cycle.

Table 1 summarizes our findings.

Table 1.

Clusters of MFE-Selected Positions in HIV Structural Genes

Gene	Site	Site position	Cluster position	References
env	RRE	7758–8050	7709–8063	Fernandes et al. (2012)
pol	Frameshift	2085–2130	2085–2252	Gaudin et al. (2005) and Staple and Butcher (2005)
gag	?	?	1975–2100

The start–end positions of identified clusters of MFE-selected positions, and start–end positions of known functional sites (as given in the references) overlapping with these clusters are specified. The cluster start position records the start position of the first 150 nt MFE-selected window in the cluster; the cluster end position records the end position of the last 150 nt MFE-selected window in the cluster. All coordinates are given with respect to the HXB2 reference genome. We have not found in the literature any known functional site that corresponds to the predicted cluster in gag (the site coordinates are unspecified and therefore marked by “?”

MFE, minimum free folding energy; RRE, Rev responsive element.

In the remaining seven nonstructural genes, we found no statistical evidence of evolutionary selection for strong folding within their coding regions; although several positions where the MFE was more negative than in 99% of the randomized variants (p-value < 0.01) were detected, none of them passed the BH-FDR correction (Fig. 2).

FIG. 2.

Evidence that specific regions of HIV structural genes undergo an evolutionary selection for strong folding. Each panel corresponds to wild-type (blue) and mean randomized (green) MFE profiles for one gene. The y-axis corresponds to the MFE (kcal/mol) in the 150 nt genomic window starting at positions specified along the x-axis (nucleotide coordinated given with respect to the start of the coding region); red points—positions with MFE-related p-value <0.01 (in these positions, the wild-type folding is stronger than in 99% of randomized variants); yellow points—MFE-selected positions (MFE p-value <0.01 & BH-FDR = 0.01), these positions span genomic regions that are conjectured to undergo an evolutionary selection for strong folding. We can see clusters of MFE-selected positions in each one of the structural genes (env, gag, pol); in other genes, no evidence of selection for strong folding was found.

2.2. HIV Rev responsive element undergoes a direct evolutionary selection for increased secondary structure robustness to point mutations

Next, we were interested to check whether the significantly structured regions corresponding to the identified clusters of MFE-selected positions (Table 1) also undergo a direct evolutionary selection for robustness of their secondary structures with mutations in the underlying sequence.

In Figure 3, we can see that RRE was found to be significantly more robust than in random (p-value = 0.028, z-score = 1.6 with respect to the structure preserving randomized variants; p-value = 0.013, z-score = 1.2 with respect to the dinucleotide and amino acid preserving models), and this robustness cannot be explained by the specific secondary structure of the corresponding region, its folding strength, and/or other sequence attributes, such as composition of dinucleotides and amino acids. Thus, we can conclude that our analysis supports the conjecture that the RRE secondary structure undergoes a direct evolutionary selection for mutational robustness.

FIG. 3.

Structure-based mutation robustness of RRE. X-axis—variant id: 1, wild type; 2, 1001–randomized. Y axis—SMR (see Section 4). The red line corresponds to the wild-type SMR value. The p-value (portion of randomized variants with a higher robustness than in wild type) and z-score (number of standard deviation of the wild-type SMR that is higher than the mean randomized SMR) are specified in red. Results for structure preserving (A) and dinucleotide and amino acid preserving (B) randomizations are shown. We can conclude that RRE is significantly more robust than in random, and this robustness cannot be explained by the specific secondary structure of the corresponding region, its folding strength, and/or other sequence attributes, such as composition of dinucleotides and amino acids.

In contrast, we have not found a significant robustness signal for the other two selected sites in pol and gag genes (data not shown).

3. Discussion

We have shown for the first time that previously known functional genomic elements, RRE and gag-pol ribosomal frameshift site, seem to undergo an evolutionary selection for strong local mRNA folding.

RRE is a conserved and highly structured element that consists of ∼350 nucleotides spanning the border of gp120 and gp41 proteins in env coding region. It is a necessary regulatory factor for HIV expression and serves as a binding region on which molecules of the viral protein Rev assemble. The Rev-RRE complex promotes the export of viral mRNAs containing RRE from the nucleus to the cytoplasm, where they are both translated to produce new proteins and serve as a genomic material for new viral particles.

In agreement with the studies that showed that both RRE sequence and structure are necessary for proper functionality (Zapp and Green, 1989; Olsen et al., 1990; Fernandes et al., 2012), we have demonstrated that RRE not only undergoes an evolutionary selection for strong folding but its underlying secondary structure is also directly selected for an increased robustness to mutations.

Previous studies have demonstrated a correlation between the thermodynamic stability of RNA secondary structure and its mutational robustness (Wuchty et al., 1999); therefore, in general, it is plausible that the increased robustness may arise from the increased thermodynamic stability of the analyzed sequences. However, our randomization models, which control both the sequence characteristics affecting the folding strength and the underlying structure, support the conjecture that the discovered robustness has not evolved as a correlated side effect of sequence features, such as thermodynamic stability and/or encoded protein, but underwent a direct positive selection in the course of the viral evolution.

Another important site that has been predicted to undergo an evolutionary selection for strong folding was found to overlap with the beginning of the pol gene, taken to be the first T in the sequence TTTTTTAG. This sequence forms part of the secondary structure that potentiates ribosomal slippage on the RNA, resulting in the −1 frameshift and the translation of the Gag-Pol polyprotein (Jacks et al., 1988; Gaudin et al., 2005; Staple and Butcher, 2005; Low et al., 2014). Although we have shown that this secondary structure seems to undergo an evolutionary selection for strong folding, we found no evidence that it also undergoes an evolutionary pressure toward mutational robustness.

It can be speculated that the folding strength and not the RNA structure itself is important for controlling the efficiency of the frameshift. For example, in Mouzakis et al. (2013), the role of the secondary structure associated with frameshift was investigated and found to have a modest impact on frameshift efficiency, compatible with the hypothesis that the genomic secondary structure attenuates frameshifting by affecting the overall rate of translation. Consistently, in Ritchie et al. (2012), it was found that increased frameshifting efficiency was correlated with an increased tendency to form alternate and incompletely folded structures, suggesting a more complex picture of the role of the structure involving the conformational dynamics.

Finally, we have identified a previously unknown site in the gag gene with a highly significant signal of a strong mRNA folding, which can imply that this site may play an important functional role in the viral replication cycle. Although this site seems to be selected for a strong folding, our analysis revealed no evidence for a direct evolutionary selection for a structure-based robustness therein. Further studies are needed to explore the function of this element and its effect on the viral fitness. This can be done, for example, through a mutagenesis of this region followed with measurements of viral replication rates.

4. Methods

4.1. Data

The analysis presented herein was based on the HXB2 genome (GenBank accession number K03455). HXB2 was selected as the prototype, because it is the most commonly used reference strain for many different kinds of functional studies. All genomic coordinates/positions specified within this study were taken with respect to the start of the HXB2 genome. All the described computations were performed on each gene separately according to the annotated coordinates (“HIV Databases,” n.d.).

4.2. Local MFE profiles

The local MFE profiles were constructed by applying a 150 nt length sliding window to a genomic sequence; in each step, the MFE of a local subsequence enclosed by the corresponding window was calculated by Vienna (v. 2.1.9) package RNAfold function with default parameters (Lorenz et al., 2011). This function predicts the MFE and the associated secondary structure for the input RNA sequence, using a dynamic programming based on the thermodynamic nearest-neighbor approach (the Zucker algorithm) (Zuker and Stiegler, 1981).

4.3. Dinucleotides and amino acids preserving randomization

To model the composition of nucleotide pairs that are argued to have an important effect on formation of secondary structures, a model that preserves both the amino acid order and content and the frequency distribution of 16 possible pairs of adjacent nucleotides (dinucleotides) for each sequence separately was used. Although efficient methods exist for preserving the amino acid (e.g., permutation of synonymous codons) or the dinucleotide content [e.g., random generation of an Euler path in a De Bruijn-like graph, whose edges represent the dinucleotides (Altschul and Erickson, 1985)] separately, it has been difficult to combine them for satisfying both the constraints. To overcome these difficulties, we used an elegant algorithm proposed in Zhang et al. (2013), which is based on a multivariate Boltzmann sampling scheme, initially introduced in the context of enumerative combinatorics. This algorithm produces random variants that feature both correct dinucleotide frequencies and coding capacity, while being generated with provably uniform probability. We used the original source code, which can be found in http://csb.cs.mcgill.ca/sparcs

4.4. Structure-based mutational robustness

Following Borenstein and Ruppin, 2006, we define the SMR of an RNA sequence of length, L, as SMR = 〈(L−d)/L〉, where d is the base-pair distance between the secondary structure of the original sequence and the secondary structure of the mutant, averaged over all 3L one-mutant neighbors. The base-pair distance measures the dissimilarity between two structures as the number of closed pairs that are present in only one of the two structures (i.e., the number of pairs that should be opened or closed to transform one structure to the other). The SMR take values between 0 and 1; the higher the value, the higher is the robustness.

4.5. Structure preserving randomization model

To generate randomized sequences that fold into a given structure, we applied the inverse folding algorithm implemented in the RNAinverse function of Vienna package (Hofacker et al., 1994). Such inversely folded random reference set does not preserve the encoded protein, but perfectly retains the structural phenotype of the wild type.

Footnotes

Acknowledgment

We thank Dr. Richard Neher for very helpful discussions. E.G. is supported, in part, by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University. T.T. is partially supported by the Minerva ARCHES award.

Author Disclosure Statement

No competing financial interests exist.

References

Abbink

T.E.M.

, and Berkhout

2003. A novel long distance base-pairing interaction in human immunodeficiency virus type 1 RNA occludes the Gag start codon. J. Biol. Chem., 278, 11601–11611.

Altschul

, and Erickson

1985. Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage. Mol. Biol. Evol., 2, 526–538.

Borenstein

, and Ruppin

2006. Direct evolution of genetic robustness in microRNA. Proc. Natl. Acad. Sci. U. S. A., 103, 6593–6598.

Brakier-Gingras

, Charbonneau

, and Butcher

S.E.

2012. Targeting frameshifting in the human immunodeficiency virus. Expert Opin. Ther. Targets. 16, 249–258.

Churkin

, Cohen

, Shemer-Avni

, et al. 2010. Bioinformatic analysis of the neutrality of RNA secondary structure elements across genotypes reveals evidence for direct evolution of genetic robustness in HCV. J. Bioinform. Comput. Biol., 8, 1013–1026.

Cuevas

J.M.

, Geller

, Garijo

, et al. 2015. Extremely high mutation rate of HIV-1 in vivo. PLoS Biol. 13, e1002251.

Cullen

B.R.

1991. Regulation of HIV-1 gene expression. FASEB J. 5, 2361–2368.

D'Souza

, and Summers

M.F.

2005. How retroviruses select their genomes. Nat. Rev. Microbiol., 3, 643–655.

Damgaard

C.K.

, Andersen

E.S.

, Knudsen

, et al. 2004. RNA interactions in the 5′ region of the HIV-1 Genome. J. Mol. Biol., 336, 369–379.

10.

Fernandes

, Jayaraman

, and Frankel

2012. The HIV-1 Rev response element: An RNA scaffold that directs the cooperative assembly of a homo-oligomeric ribonucleoprotein complex. RNA Biol. 9, 6–11.

11.

Frankel

A.D.

, and Young

J.A.

1998. HIV-1: Fifteen proteins and an RNA. Annu. Rev. Biochem., 67, 1–25.

12.

Gaudin

, Mazauric

M.-H.

, Traïkia

, et al. 2005. Structure of the RNA signal essential for translational frameshifting in HIV-1. J. Mol. Biol., 349, 1024–1035.

13.

Goff

S.P.

, 2007. Host factors exploited by retroviruses. Nat. Rev. Microbiol., 5, 253–263.

14.

Goz

, and Tuller

2015. Widespread signatures of local mRNA folding structure selection in four Dengue virus serotypes. BMC Genomics, 16 Suppl 1, S4.

15.

HIV Databases [www Document], 2016. Available at: www.hiv.lanl.gov/content/index (Accessed March 16, 2016).

16.

Hofacker

I.L.

, Fontana

, Stadler

P.F.

, et al. 1994. Fast folding and comparison of RNA secondary structures. Monatshefte fur Chemie Chem. Mon., 125, 167–188.

17.

Hryckiewicz

, Bura

, Kowala-Piaskowska

, et al. 2011. HIV RNA splicing. HIV AIDS Rev. 10, 61–64.

18.

Jacks

, Power

M.D.

, Masiarz

F.R.

, et al. 1988. Characterization of ribosomal frameshifting in HIV-1 gag-pol expression. Nature, 331, 280–283.

19.

Karn

, and Stoltzfus

C.M.

2012. Transcriptional and posttranscriptional regulation of HIV-1 gene expression. Cold Spring Harb. Perspect. Med. 2, a006916.

20.

Lavner

, and Kotlar

2005. Codon bias as a factor in regulating expression via translation rate in the human genome. Gene, 345, 127–138.

21.

G.-W.

, Oh

, and Weissman

J.S.

2012. The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature, 484, 538–541.

22.

Liu

, McNevin

, Cao

, et al. 2006. Selection on the human immunodeficiency virus type 1 proteome following primary infection. J. Virol., 80, 9519–9529.

23.

Lorenz

, Bernhart

S.H.

, Höner Zu Siederdissen

, et al. 2011. ViennaRNA Package 2.0. Algorithms Mol. Biol., 6, 26.

24.

Low

J.T.

, Garcia-Miranda

, Mouzakis

K.D.

, et al. 2014. Structure and dynamics of the HIV-1 frameshift element RNA. Biochemistry, 53, 4282–4291.

25.

Malim

M.H.

, Hauber

, Le

S.Y.

, et al. 1989. The HIV-1 rev trans-activator acts through a structured target sequence to activate nuclear export of unspliced viral mRNA. Nature, 338, 254–257.

26.

Moore

M.D.

, and Hu

W.S.

2009. HIV-1 RNA dimerization: It takes two to tango. AIDS Rev. 11, 91–102.

27.

Mortimer

S.A.

, Kidwell

M.A.

, and Doudna

J.A.

2014. Insights into RNA structure and function from genome-wide studies. Nat. Rev. Genet., 15, 469–479.

28.

Mouzakis

K.D.

, Lang

A.L.

, Vander Meulen

K.A.

, et al. 2013. HIV-1 frameshift efficiency is primarily determined by the stability of base pairs positioned at the mRNA entrance channel of the ribosome. Nucleic Acids Res. 41, 1901–1913.

29.

Nussinov

1984. Strong doublet preferences in nucleotide sequences and DNA geometry. J. Mol. Evol., 20, 111–119.

30.

Olsen

, Nelbock

, Cochrane

, et al. 1990. Secondary structure is the major determinant for interaction of HIV rev protein with RNA. Science, 247, 845–848.

31.

Parkin

N.T.

, Chamorro

, and Varmus

H.E.

1992. Human immunodeficiency virus type 1 gag-pol frameshifting is dependent on downstream mRNA secondary structure: Demonstration by expression in vivo. J. Virol., 66, 5147–5151.

32.

Parmley

J.L.

, and Huynen

M.A.

2009. Clustering of codons with rare cognate tRNAs in human genes suggests an extra level of expression regulation. PLoS Genet. 5, e1000548.

33.

Plotkin

J.B.

, and Kudla

2011. Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet., 12, 32–42.

34.

Ritchie

D.B.

, Foster

D.A.N.

, and Woodside

M.T.

2012. Programmed −1 frameshifting efficiency correlates with RNA pseudoknot conformational plasticity, not resistance to mechanical unfolding. Proc. Natl. Acad. Sci. U. S. A., 109, 16167–16172.

35.

Schiralli Lester

G.M.

, and Henderson

A.J.

2012. Mechanisms of HIV transcriptional regulation and their contribution to latency. Mol. Biol. Int. 2012, 614120.

36.

Seelamgari

, Maddukuri

, Berro

, et al. 2004. Role of viral regulatory and accessory proteins in HIV-1 replication. Front. Biosci., 9, 2388–2413.

37.

Shankarappa

, Margolick

J.B.

, Gange

S.J.

, et al. 1999. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol., 73, 10489–10502.

38.

Staple

D.W.

, and Butcher

S.E.

2005. Solution structure and thermodynamic investigation of the HIV-1 frameshift inducing element. J. Mol. Biol., 349, 1011–1023.

39.

Stoltzfus

C.M.

2009. Chapter 1. Regulation of HIV-1 alternative RNA splicing and its role in virus replication. Adv. Virus Res., 74, 1–40.

40.

Tuller

, Veksler-Lublinsky

, Gazit

, et al. 2011. Composite effects of gene determinants on the translation speed and density of ribosomes. Genome Biol. 12, R110.

41.

Tuller

, Waldman

Y.Y.

, Kupiec

, et al. 2010. Translation efficiency is determined by both codon bias and folding energy. Proc. Natl. Acad. Sci. U. S. A. 107, 3645–3650.

42.

Tuller

, and Zur

2014. Multiple roles of the coding sequence 5′ end in gene expression regulation. Nucleic Acids Res. 43, 13–28.

43.

Wagner

, and Stadler

P.F.

1999. Viral RNA and evolved mutational robustness. J. Exp. Zool., 285, 119–127.

44.

Waldispühl

, Devadas

, Berger

, et al. 2008. Efficient algorithms for probing the RNA mutation landscape. PLoS Comput. Biol. 4, e1000124.

45.

Watts

J.M.

, Dang

K.K.

, Gorelick

R.J.

, et al. 2009. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature, 460, 711–716.

46.

Workman

1999. No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res. 27, 4816–4822.

47.

Wuchty

, Fontana

, Hofacker

I.L.

, et al. 1999. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers, 49, 145–165.

48.

Zanini

, Brodin

, Thebo

, et al. 2015. Population genomics of intrapatient HIV-1 evolution. Elife, 4, e11282.

49.

Zanini

, and Neher

R.A.

2013. Quantifying selection against synonymous mutations in HIV-1 env evolution. J. Virol., 87, 11843–11850.

50.

Zapp

M.L.

, and Green

M.R.

1989. Sequence-specific RNA binding by the HIV-1 Rev protein. Nature, 342, 714–716.

51.

Zhang

, Ponty

, Blanchette

, et al. 2013. SPARCS: A web server to analyze (un)structured regions in coding RNA sequences. Nucleic Acids Res. 41, W480–W485.

52.

Zuker

, and Stiegler

1981. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148.

53.

Zur

, and Tuller

2012. Strong association between mRNA folding strength and protein abundance in S. cerevisiae. EMBO Rep. 13, 272–277.