Efficient Manipulations of Synonymous Mutations for Controlling Translation Rate: An Analytical Approach

Abstract

Gene translation is a central process in all living organism with important ramifications to almost every biomedical field. Previous systems evolutionary studies in the field have demonstrated that in many organisms coding sequence features undergo selection to optimize this process. In the current study, we report for the first time analytical proofs related to the various aspects of this process and its optimality. Among our results we show that coding sequences with mono- tonic increasing profiles of translation efficiency (i.e., with slower codons near the 5′UTR), mathematically optimize ribosomal allocation by minimizing the number of ribosomes needed for translating a codon per time unit. Thus, the genomic translation efficiency profile reported in previous studies for many organisms is optimal in this sense. In addition, we show that improving translation efficiency of a codon in a gene may result in a decrease in the translation rate of other genes, demonstrating that the relation between codon bias and protein translation rate is less trivial than was assumed before. Based on these observations we describe an efficient heuristic for designing coding sequences with specific translation efficiency and minimal ribosomal allocation for heterologous gene expression. We demonstrate how this heuristic can be used in biotechnology for engineering a heterologous gene before expressing it in a new host.

1. Introduction

Gene translation is the process in which a messenger RNA (mRNA) sequence is decoded by a ribosome into a chain of amino acids that can later fold into a protein (Alberts et al., 2002). Translation consists of three major stages: initiation, elongation and termination, which form a recurring cycle of events. The initiation stage includes mainly the binding of a ribosome to the mRNA sequence with the help of several initiation factors. The elongation step is the iterative stage in which a ribosome decodes the codons of the mRNA sequence and translates them into a chain of amino acids. Each iteration of this stage involves the translation of a single codon with the help of a transfer RNA (tRNA) molecule that recognizes it (Alberts et al., 2002). The last stage—termination—involves the disassembly of the ribosome-mRNA complex after approaching a stop codon.

In recent years many systems biology studies have dealt with questions related to the process of gene translation (Kudla et al., 2009; Drummond and Wilke, 2009; Cannarozzi et al., 2010; Kurt and Michael, 2010; Tuller et al., 2010a,b; Taniguchi et al., 2010; Uemura et al., 2010; Reuveni et al., 2011; Plotkin and Kudla, 2010).

A large portion of these studies were based on large-scale measurements of this process generated by new technologies that have maturated recently (Arava et al., 2003; Newman et al., 2006; Kudla et al., 2009; Ingolia et al., 2009; Welch et al., 2009; Taniguchi et al., 2010; Uemura et al., 2010; Vogel et al., 2010). However, the systematic study of translation is relative new compared to the study of transcription and the field includes many fundamental open questions related to the cellular and coding sequence features that affect its efficiency (Kudla et al., 2009; Welch et al., 2009; Tuller et al., 2010b; Plotkin and Kudla, 2010).

For example, the fact that different codons may have different translation efficiencies has been known for a few decades (Ikemura, 1982; Sharp and Li, 1987; Duret and Mouchiroud, 1999). Based on this observation, it was suggested that a gene's codon composition can affect protein levels (Ikemura, 1982; Sharp and Li, 1987; Duret and Mouchiroud, 1999); thus, measures of the codon bias of a gene can be used for estimating its expression levels or protein abundance (Sharp and Li, 1987; Comeron and Aguad, 1998).

Recently, systems biology studies that were based on large-scale genomic data and cellular measurements have demonstrated that the order of codon, and not only their composition, may also significantly influence the efficiency of translation (Cannarozzi et al., 2010; Kurt and Michael, 2010; Tuller et al., 2010a; Reuveni et al., 2011). Specifically, it was shown that in many species, a “ramp” of codons with lower translation rate tends to appear in the first 30–50 codons of the mRNA. It was suggested that this systematic trend serves as a late stage of translation initiation, forming a means to reduce ribosomal traffic jams, thus minimizing the cost of protein expression by reducing the ribosomal densities over the mRNA sequences (Kurt and Michael, 2010; Tuller et al., 2010a; Plotkin and Kudla, 2010).

In addition, over the years, several mathematical models based on physical and stochastic properties of the translation process were suggested and studied with the intention of mapping and quantifying different factors that could influence translation efficiency from a mathematical point of view. However, these factors were mainly analyzed using computer simulations (MacDonald et al., 1968; Heinrich and Rapoport, 1980; Zhang et al., 1994; Shaw et al., 2003; Reuveni et al., 2011).

In this study, we focus on the elongation stage and on the interaction between the initiation and the elongation, providing analytical proofs to some of the results previous reported in studies based on systems biology analysis. These observed properties were analyzed in this article for a pool with an infinite number of ribosomes, unless stated otherwise.

Next, we describe a method that uses these translation properties to develop an approach for controlling translation efficiency of heterologous genes by manipulating their codons efficiency with the help of synonymous mutations. The implementation of this heuristic can have various biotechnological applications (Gustafsson et al., 2004; Wenzel and Müller, 2005; Burgess-Brown et al., 2008; Welch et al., 2009; Mueller et al., 2010; Thomas and Ming-Qun, 2011).

The rest of the article is organized as follows: in Sections 2.1, 4.1, and 4.2, we provide some details of the analyzed mathematical models in the study. In Section 4.4, we provide analytical proofs related to properties of translation, which are followed with demonstrations in Sections 2.2, 2.3, and 2.5. Sections 2.6 and 2.7 describe an approach for controlling translation rate of a gene, while Sections 2.7 and 4.6 demonstrate the applications of this method on the human gene insulin. The Discussion includes some implications of the results reported in this study and future directions. All details of the proofs appear in the Methods and Proofs section, and some additional proofs appear in Appendix B.

2. Results

2.1. Translation models that are sensitive to the order of codons

Over the years, several mathematical models for describing translation have been suggested. In this study, we focus on models that can incorporate several important physical properties of the translation process, such as the volume of the ribosome, the different translation rate of each type of codons and their order on the mRNA. Initiation time was shown to depend on several features, such as the number of free ribosomes in the cell and features of the 5′UTR, such as the folding energy at the beginning of the ORF (Kozak, 1987; Kudla et al., 2009); therefore, this factor can differ among genes and hosts, thus should be parameterized in order to study its influence on the overall translation efficiency.

As a result, the Totally Asymmetric Simple Exclusion Process (TASEP) mathematical model used in previous studies (MacDonald et al. 1968; Heinrich and Rapoport, 1980; Shaw et al., 2003; Reuveni et al., 2011) was chosen for incorporating all these properties. A generic scheme of TASEP is presented in Figure 1A. In this model, ribosomes span over several codons and if two ribosomes are adjacent, the trailing one is delayed until the ribosome in front of it has proceeded onwards (Fig. 1A). Ribosomes were assumed to cover 11 codons (as the size of the footprint of the ribosome in eukaryotes (Ingolia et al., 2009); Tuller et al., 2010a; Reuveni et al., 2011) and initiation time as well as the time a ribosome spends translating each codon were defined as stochastic, similarly to these processes in nature. Specifically, in this study the initiation and translation times were defined to be exponentially distributed, but other distribution types could also be used. Average codon translation times were defined to be proportional to tRNA abundance in the host.

FIG. 1.

The models of translation. (A) Graphical description of the translation model assumed in this study. Each codon has a translation time. In case of TASEP, translation times are stochastic while in the case of the DTASEP they are deterministic; ribosomes have volume (and a footprint of several codons) and they can block each other; the models also include initiation and termination times. (B) Histogram of Spearman correlation coefficient calculated between DTASEP and TASEP translation rates based on 1000 permutations of the codons of 100 S. cerevisiae genes; each correlation is calculated on translation rates of 1000 permutations of the codons of the same gene (a control for the codon content). (C) Percentage of translation rate difference between DTASEP and TASEP for the genes in (B) Each bar depicts the mean (center of the bar) and standard deviation (half length of the bar) of the percentage difference of DTASEP and TASEP translation rates. (D) DTASEP translation rates versus protein abundance values of all genes of S. cerevisiae genome. (E) TASEP translation rates versus protein abundance values of all genes of S. cerevisiae genome.

Despite the rather simple description of the mathematical equation describing the flow of a single ribosome from one site (codon) to the next, no exact solution for the steady state translation rate exists for this model (Shaw et al., 2003). Because of TASEP's poor mathematical tractability, steady state ribosomal occupation and translation rate are calculated for this model using computer simulations.

However, to allow for analytical study of the translation process, we examined a deterministic model with similar properties, named Deterministic Totally Asymmetric Simple Exclusion Process (DTASEP) that has been employed in a few previous studies (Zhang et al., 1994; Tuller et al., 2010a). Thus, all analytical proofs of the various translation properties reported in this work were done on DTASEP. To show these properties' relevance to TASEP, these were also demonstrated using TASEP simulations.

Reuveni et al. (2011) suggested approximating TASEP with the deterministic Ribosome Flow Model (RFM). Therefore, several theorems reported in this article that are also correct for the RFM model were proved and appear in Appendix B together with a description of the RFM. In addition, in this study S. cerevisiae was used as a model organism to demonstrate some of the presented claims.

To supply justification of the selected mathematical models for describing the translation process, translation rates of all 5869 genes of the S. cerevisiae genome were calculated with TASEP and DTASEP. Spearman correlation between the calculated translation rates and the genes' Protein Abundance (PA) resulted in fair correlations (TASEP: R = 0.55, P < 10⁻¹⁵; DTASEP: R = 0.53, P < 10⁻¹⁵; Fig. 1D, E) suggesting that these models can be used for describing the translation process.

Recently it was suggested that the codon order of a gene can influence its translation efficiency (Cannarozzi et al., 2010; Kurt and Michael, 2010; Tuller et al., 2010b), therefore we compared the ability of both DTASEP and TASEP to grasp this effect in a similar manner. For this task 100 different S. cerevisiae genes with different PA were selected, and the codon order of each one of the genes was permutated for 1000 times. For each gene we computed the Spearman correlation coefficient between translation rates predictions of DTASEP and TASEP. The results show that, on average, the two models highly correlate (mean correlation: R = 0.81, P < 10⁻⁸; for more details, see Fig. 1B, C and Section 4.3). Further investigation also showed that DTASEP and TASEP models also achieve similar numerical values, with TASEP almost consistently resulting in lower rates (on average, translation rates calculated with DTASEP were higher by 17.75 ± 9.5% than when calculating with TASEP; Fig. 1C). Therefore, these results provide justification for the approximation of TASEP by DTASEP.

Throughout the paper several basic symbols are used, denoting key parameters of the DTASEP and TASEP models. To ease the reading continuum, these symbols are summarized as follows: let L denote the number of codons in a gene and let H denote the number of codons a ribosome covers. In this study we use the ^∼ accent to mark random variables. Let U_i and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{U}_i$$ \end{document} denote the time it takes a ribosome to translate codon i in DTASEP and TASEP models respectively. Let B and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{B}$$ \end{document} denote the initiation time in DTASEP and TASEP models respectively; In this paper we analyze DTASEP at steady state, i.e., B(t) = B, U_i(t) = U_i. More details about the models appear in Sections 4.1 and 4.2.

2.2. The influence of initiation time on translation rate

We start by studying the relation between initiation time and elongation rate. It was suggested before that either one of the initiation or elongation steps of the translation process can be rate limiting (Kudla et al., 2009; Tuller et al., 2010b; Reuveni et al., 2011; Plotkin and Kudla, 2010). In this article we formally show that for DTASEP, high initiation time values (relatively to the gene's codons translation time) have a major impact on translation rate. This property is formally stated in the following lemma:

Lemma 2

A high initiation time \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$B > > \{U_j \}_{j = 1}^L$$ \end{document} becomes the limiting translation rate factor.

The full proof of Lemma 2 based on the DTASEP model appears in section 4.4. To demonstrate this claim, translation rates of 100 different genes from S. cerevisiae genome were calculated using different initiation times, ranging from 1 to 15000 time units, while the host's codons translation times varied between 16.8 and 1355 time units. As seen in Figure 2A, for low initialization time values (1 time unit) the mean and standard deviation of the calculated translation rates are relatively high (4.9 × 10⁻⁴ ± 2 × 10⁻⁴), indicating that the translation rate of the genes is mainly dictated by the translation times of the codons composing the genes. For high initiation time values (10000 time units) the mean translation rates and their standard deviation decreased (to 9.28 × 10⁻⁵ ± 3.23 × 10⁻⁶), suggesting that translation rate is mainly dictated by the initiation time regardless of the genes' codon composition.

FIG. 2.

The effect of initiation time on translation rate. (A) Mean and standard deviation of translation rates of 100 S. cerevisiae genes for different initiation times. (B) A specific example: TASEP translation rate as function of initiation time for the YJR011C gene from S. cerevisiae genome, consisting 103 codons. The upper plot of (B) includes the translation rate versus the initiation time, and the lower plot of (B) includes the translation time distribution of the gene codons. When initiation time is higher than the translation time of the slowest codon in the coding sequence (cyan vertical bar around 1400 time units), it becomes a “bottleneck” and translation rate starts to significantly decrease.

This behavior was also demonstrated on a single gene (YJR011C). As seen in Figure 2B, translation rate remained almost constant for initiation values lower than the slowest codon in the gene (1355 time units). An increase in initiation time beyond this value caused to a decrease in translation rate and eventually entirely dictated it.

Thus, since we aimed at isolating the effect of codons (their composition and ordering) on translation rates, we neutralized the initiation time effect in Sections 2.3, 2.5, and 2.6 by setting the initiation time parameter to be lower than the translation time of the fastest codon in the genome (in practice, 65% of the fastest codon).

2.3. The influence of increasing translation time of a codon on translation rate

Intuitively, decreasing translation time of a codon in a gene should not decrease its translation rate. We formally phrase this claim into the following statement:

Lemma 3

A decrease in one of the codon's translation time U_i cannot decrease the translation rate R of a gene.

The full proof of Lemma 3 appears in Section 4.4, and in this section it is also demonstrated on the same 100 genes selected for demonstrating Lemma 2 in Section 2.2. For each gene a random codon was selected and its translation time values were changed from 16.8 to 1400 with steps of 50 time units. Figure 3A presents the mean and standard deviation of the calculated translation rates for the selected genes as function of the translation time of the manipulated codon, while Figure 3B shows the translation rate values for four selected genes (LIP5, YDR248C, PAU12, and RPS24B). As seen from this simulation, high translation time values of even a single codon can drastically reduce the translation rate regardless of the genes' codon composition, while decreasing a codon's translation time can increase translation rate only up to a certain saturation value that is mainly dictated by translation times of other codons in the gene. This behavior was observed to be general, as demonstrated in Figure 3A.

FIG. 3.

Influence of translation time of a single codon on translation rate of a gene. (A) Average TASEP translation rates and standard deviation of 100 different genes from S. cerevisiae genome as function of translation time of a random selected codon. (B) Translation rates of four selected genes as function of different translation times of a single selected random codon.

Therefore, adding a codon to the mRNA chain cannot increase translation rate, regardless of its value. This modification can be also perceived as increasing translation time of a codon from 0 to a finite value, which results in the inverse behavior described by Lemma 3. This observation was formalized in the following statement and its full proof appears in Section 4.4.

Lemma 4

Adding a codon to the mRNA cannot increase its translation rate R.

This lemma also suggests that for low initiation time values, where codon composition can be influential, longer genes have lower translation rates. This result partially supports the calculated negative correlation between the lengths of all genes in S. cerevisiae and their measured PA values (R = −0.17, P < 10⁻¹⁶; Fig. 4B). Spearman correlation coefficient between TASEP translation rate and gene length values were also found to negatively correlate (R = −0.43, P < 10⁻¹⁶; Fig. 4E) as suggested by the lemma.

FIG. 4.

(A) Spearman correlation between protein abundance and gene lengths for the S. cerevisiae genome. (B) Spearman correlation between TASEP predicted translation rates and gene lengths for S. cerevisiae genome.

2.4. Increasing translation rate of a codon(s) may have a negative effect on translation rate of a different gene in the host when the pool of ribosomes is finite

In the previous subsection it was shown that decreasing translation time of a codon in a gene cannot decrease its translation rate (Lemma 3). However, this modification may increase ribosomal density on the altered mRNA, and as a result may decrease the number of available free ribosomes in the cell. A steep decrease in the amount of available free ribosomes in the host can lead to ribosomal starvation and eventually reduce translation efficiency of other genes.

This effect was demonstrated using two different genes (SIF2 and RPL27B) taken from S. cerevisiae. First, we calculated the translation rates of two selected genes and the number or ribosomes allocated on each one of the mRNAs (for an infinite number of ribosomes in the pool). Next, we created a mutated fast version of one of the genes (SIF2) by reducing translation times of part of its codons and recalculated the translation rate and number of allocated ribosomes. The genes' codon translation times are described in Figure 5A, while the calculated translation rates and ribosomal densities are summarized in Table 1.

FIG. 5.

A simulation showing that improving the translation rate of some codons in a gene can decrease the translation rate of a different gene. (A) Codons translation time of the analyzed genes, from left to right: gene SIF2 before (blue) and after mutations (red) and of gene RPL27B (green). (B) Average number of ribosomes on a single mRNA. (C) Average translation rate of mutated SIF2 gene (cyan circles) and RPL27B (blue squares) as function of number of ribosomes in the pool, calculated with TASEP model in a pool of six mRNAs copies per gene. The simulation starts with the number of ribosomes needed for translating genes SIF2 and RPL27B and increases this value until reaching the number of ribosomes needed for translating the mutated version of SIF2 gene and RPL27B gene.

Table 1.

Translation Rate and Number of Allocated Ribosomes on mRNAs of SIF2, Mutated SIF2, and RPL27B Genes for an Infinite Pool of Ribosomes

	SIF2	Mutated SIF2	RPL27B
Translation rate	0.00046	0.00086	0.00138
Number of ribosomes on mRNA	26.5	43.5	9.9

As can be seen from Table 1, for an infinite number of ribosomes in the pool, after reducing translation times of some of SIF2's codons, the translation rate of the gene increased by a factor of almost two, but also its ribosomal density.

In order to demonstrate the effect of this change in a host with a finite number of ribosomes, we simulated an environment of six mRNA copies for each one of the two genes (the mutated SIF2 and RPL27B). The simulation starts with a number of ribosomes that is equal to the needed number of ribosomes for each one the genes (before the mutation) multiplied by six. In the case of a finite number of ribosomes, we also assume that the initiation rate (initiation time⁻¹) is proportional to the current number of available ribosomes in the pool, thus the initiation time of each one of the translated mRNAs was defined as: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} B (t) = c \frac {numRibosomesInPool} {numFreeRibosomes (t)} \end{align*} \end{document}

where c is a normalization factor that was defined as the minimal translation time of codons in the host. Therefore we started with a pool that contains the number of ribosomes needed for translating the original SIF2 and RPL27B genes and in each step of the simulation we increased the number of ribosomes until reaching the needed number of ribosomes for the mutated SIF2 and RPL27B genes.

Figure 5B–C present the steady state translation rates and number of ribosomes on mutated SIF2 (cyan circles) and RPL27B (blue squares) for different number of ribosomes in the pool. As can be seen from the figures, translation rate of the mutated SIF2 gene did not change significantly for various amounts of free ribosomes in the pool (cyan circles) and was similar to the translation rate calculated for a pool of infinite number of ribosomes (Table 1). However, for a low number of ribosomes in the pool, translation rate of the RPL27B gene decreased significantly from 0.00138 to 0.00084 (blue squares). Increasing the number of ribosomes in the pool in the presence of the mutated SIF2 gene caused translation rate of the RPL27B gene to gradually increase until reaching a translation rate value similar to the one obtained for an infinite number of ribosomes (0.00132 vs. 0.00138).

For a low number of ribosomes in the pool, the average number of ribosomes per mRNA in the simulation decreased both for the mutated gene SIF2 and RPL27B gene (30.2 vs. 43.5) and (4.2 vs. 9.9) respectively, as can be seen in Figure 5B. This value also increased for both genes as the number of ribosomes in the pool increased, reaching similar values to those estimated with an unlimited number of ribosomes in the pool (45.49 vs.43.5 for mutated SIF2 gene and 8.23 vs. 9.9 for RPL27B gene).

Although ribosomal density of both genes was reduced for a low number of available ribosomes in the pool, translation rate of the mutated SIF2 gene did not decrease significantly. This could be explained by the specific features of the mutated gene: it was engineered with relatively efficient codons at its 5′ end, while codons at the 3′ end preserved their original relatively high translation time values. When a high number of ribosomes were available in the pool, they tended to accumulate on the 3′ end of the mutated gene, causing ribosomes to spend more time on its mRNA due to ribosomal “traffic jam.” Decreasing the number of available ribosomes up to a certain level (i.e., when no ribosome is delayed by other ribosomes) reduced the number of accumulated ribosomes on the mRNA without reducing translation time.

On the other hand, the ability of the mutated SIF2 gene to accumulate ribosomes reduced the number of free ribosomes in the pool, causing to a decrease in the ribosomal allocation of the RPL27B gene, that eventually led to ribosomal starvation and a decrease in its translation rate. When increasing the number of ribosomes in the pool, this value increased until reaching the translation rate value calculated for this gene when no restrictions were made on the number of available ribosomes.

It is important to note that the phenomenon reported in this section highly depends on the location of the altered codons and the magnitude of change in their translation time. Decreasing translation time of codons that do not eventually increase mRNA ribosomal density has no effect on the number of available ribosomes in the host, thus cannot affect in this manner the translation rate of other genes. This simulation aimed at demonstrating that increasing translation rate of a specific gene can reduce protein production rate of other genes, thus potentially interfering with the growth rate of the host. With this in mind, changing translation rate of a selected gene by using synonymous mutations should take into consideration changes in ribosomal allocation. In the next sections we show how to include this constraint in the design of genes with altered translation rates.

2.5. Attaining a low ribosomal allocation limit

As mentioned in the introduction, Tuller et al. (2010a) showed that on average the first 30–50 codons of a gene are translated with lower efficiency. In addition, in eukaryotes, the last ∼50 codons were observed to show higher translation efficiency. Inspired by this observation, we show in this section that given a gene with a finite set of codons, arranging the codons by monotonically decreasing translation times, reduces to a minimum the total time a ribosome is delayed by other ribosomes on the mRNA, thus achieves in steady state the lowest number of allocated ribosomes on the mRNA and the highest translation rate per ribosome. This phenomenon is formulated in Lemmas 5 and 6 and proved in Section 4.4. Thus, this observed trend in endogenous genes is mathematically optimal.

Lemma 5

Given a gene with a finite set of codons \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{U_i \}_{i = 1}^L$$ \end{document} , sorting the codons in a descending order according to their translation times achieves minimal ribosomal allocation among all other codon permutations of this gene.

Lemma 6

Given a gene with a finite set of codons \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{U_i \}_{i = 1}^L$$ \end{document} , sorting the codons in a descending order according to their translation times maximizes translation rate per ribosome among all other codon permutations of this gene.

In such arrangement, no ribosome is delayed by other ribosomes on the mRNA, thus spends on each site only the time required for translating the relevant codon. Therefore the overall translation time a ribosome spends on the mRNA is minimized to a value dictated only by the translation efficiency of its codons.

This property was also demonstrated for the same 100 selected genes used in Sections 2.1, 2.2, and 2.3. For each one of the genes we performed 1000 random codon permutations and compared the number of ribosomes allocated on each permutation to the number of ribosomes obtained when ordering its codons according to their translation time in a monotonic decreasing order.

Figure 6A depicts for each gene the average, minimal and maximal steady state number of allocated ribosomes on the mRNA (using a vertical blue bar), and the number of allocated ribosomes for a monotonic decreasing arrangement (red square). The simulation results show that indeed the suggested arrangement of Lemma 5 attains minimal ribosomal allocation. A closer examination also showed that on average, monotonic decreasing arrangements induced only 22% of the mean number of ribosomes inferred for other gene permutations, thus suggesting that non-optimal genes in terms of ribosomal allocation can have a major influence on the host's resources.

FIG. 6.

Codons permutation with monotonic decreasing codons translation times attains minimal number of ribosomes and maximal translation rate per ribosome in comparison to all other permutations. Analysis results of 100 S. cerevisiae genes calculated with the TASEP model. For each gene, the number of ribosomes on the mRNA (A) and the translation rate per ribosome (B) was calculated for 1000 different codon permutations and compared to the decreasing arrangement of codons (red squares). As can be seen, in all 100 cases the sequence with monotonic decreasing translation time had the lowest ribosomal density and the highest translation rate per ribosome ratio.

Moreover, as proved by Lemma 6, monotonic decreasing codon arrangement maximizes the ratio between translation rate to ribosomal allocation (see parameter K in Section 4.1). Figure 6B shows the range of this ratio for each one of the 100 genes (calculated for 1000 codon permutations) using a blue bar, and the achieved ratio using the monotonic arrangement, denoted with red squares. On average, this ratio was 80% higher for the monotonic arrangements in comparison the mean ratio inferred for other gene permutations. These results indicate that maximal translation rate per ribosome (i.e., minimal translation time needed for a ribosome to translate the mRNA) can be achieved by using a monotonic arrangement in terms of codon translation times.

2.6. A heuristic for controlling the translation efficiency of a gene while minimizing its ribosomal allocation

An efficient heuristic for designing heterologous genes with a specific translation rate and minimal ribosomal density was motivated by the optimality principle of Lemma 5.

Of course, in reality, the codons order of a gene cannot be changed without altering the coding of its amino acids, therefore the arrangement suggested by Lemma 5 is usually not possible. However, we can utilize this property by building different codon sequences using only synonymous mutations (that do not alter the gene's coding) with monotonically (or almost monotonically) decreasing codon translation times and search among them the sequence that has the closest translation rate to the requested value. Thus, the new engineered gene attains the requested rate, but with a minimal ribosomal allocation, which reduces the influence on other genes translation rate (as demonstrated in Section 2.4).

We name this heuristic Minimal Ribosomal density Target translation Rate (MRTR) heuristic. MRTR can be implemented with any model of translation that satisfies the Lemmas reported in this study (DTASEP, TASEP, or other similar model) and is based on two principles:

1. Translation rate of a series can be increased by decreasing the translation time of one of its codons (Lemma 3)

2. Minimal ribosomal allocation can be achieved by engineering a series with codons arranged by monotonic decreasing times (Lemma 5)

For simplicity, let us define the translation times of a sequence by using the translation time ranking of its codons. If the number of different possible translation times in an organism is P, we consider sequences of integers with values between 1 (lowest translation time) and P (highest translation time). Thus in general, a coding sequence of L codons induces P^L possible sequences of translation ranks.

To find a series of codons with a specific translation rate and minimal ribosomal allocation coding a specific gene, binary search could be applied on only monotonic decreasing series (in terms of translation times) sorted by their translation rates.

It was already shown by Lemma 3 that translation rate of a gene could be increased by decreasing the translation time of one of the codons in the sequence, but a strict ranking of all possible codon combinations cannot be achieved straightforward only by using this principle alone. For example, given a sequence of three codons with P = 3 it is easy to determine by using Lemma 3 that the translation rate of series [3,3,3] is lower or equal to that of series [3,3,2], but the translation rate relation between series [3,3,1] and series [3,2,2] is not straightforward.

Therefore, a more systematic way is needed to sort all descending sequences according to their translation rate, to enable the application of a fast searching algorithm. Therefore, in this study translation rate of a monotonic decreasing series can be increased by decreasing the rank order of a single codon by a single rank unit in two ways:

1. Right Rule: decrease the rank order of the codon that already has the lowest rank that is not 1. In case several codons fulfill this condition, decrease the codon that is closest the 3’UTR.

2. Left Rule: decrease translation rank order of the codon with the highest rank. In case several codons fulfill this condition, decrease the codon that is closest the 3’UTR.

The suggested options were designed to preserve the series' monotonic decreasing characteristic. For example for P = 5 and L = 4 translation rate of a series with codons represented by ranks orders of [5,5,5,3] could be decreased by applying the Right Rule into [5,5,5,2] or by applying the Left Rule into [5,5,4,3].

Using these two rules, all descending series of L amino acids, each coded by P different codons, can be sorted according to their translation rate in a 2-D lattice. More details about the building of the lattice appear in 4.6.1. For example, Figure 9 shows all possible descending series for P = 4, L = 3. Each diagonal in the lattice contains a set of series that are ordered with respect to their translation rate, so that the series located at the top of a diagonal has the slowest translation rate and the series located at the bottom of a diagonal has the highest translation rate (with respect to all series in the diagonal).

To find a sequence with translation rate that is as close as possible to a specific wanted translation rate, a simple binary search could be applied on the translation rates of the series of each diagonal (Fig. 9). Let ɛ denote a certain threshold. Eventually candidate series from each diagonal that are in the limits of ± ɛ% distance from the original wanted translation rate value are re-evaluated. If several series fit this criterion, then the series with the lowest ribosomal allocation can be chosen.

It is worth mentioning that this type of mapping does not assure the uniqueness of a series in the lattice, therefore search time could be further reduced by using the fact that a series has a lower or equal translation rate in comparison to its left and right child series and their descendants. Let us assume that for a given diagonal, the wanted translation rate is bounded by two series S_j, S_j₊₁, where S_j₊₁ is the child on S_j. Using the above property, all child series of the upper bound have higher or equal translation rate, therefore they can be excluded from further searching. An example of this optimization appears in Figure 11 using the lattice presented in Figure 10. Here, the search starts from the outer left diagonal and the upper and lower bound series are contoured with green boxes. Child series of the upper bound in other diagonals (having translation rates higher than the upper bound series) are covered with a gray grid, thus can be excluded from the search. This step can be applied again when defining the lower and upper bound in the next diagonal to further reduce the number of tested series in the lattice.

Overall, the search time complexity of MRTR including the optimization step is O(f (L)(log(PL))²) and its space complexity is O(PL), where O(f (L)) is the time complexity of calculating the translation rate of a series of L codons. For more details regarding complexity calculations, see Section 4.6.

Applying the MRTR heuristic on real data measurements

When applying the heuristic on real genes, codons coding an amino acid have less than P possible values and translation rates cannot be calculated on series of codons rank order. Therefore to get a ‘full’ lattice with real translation times, for each amino acid we simply mapped each of the P possible ranking values to the real translation times of the codons coding it. An example of this mapping could be seen in Figure 12, where the host has P = 10 different codon translation times, but the specific presented amino acid can be coded by only three codons, represented by three rank orders. Therefore, several rank orders for this amino acid are missing real translation time values. To overcome this, the missing values were mapped to the closest lower rank order (and codon) that has an existing translation time. As a result, rank order series in the lattice that are selected for translation rate estimation by the binary search can be translated into series of real translation times that represent a possible codon combination (that does not alter the original gene coding).

It should be mentioned that other mapping variations could be implemented; for example, Grantham, (1974) showed that amino acids could be replaced by others with similar properties (thus increasing the number of possible translation times) or the heuristic could be applied on chunks of codons (Reuveni et al., 2011) (thus attaining P or higher different translation time values).

2.7. Demonstrating the MRTR heuristic on insulin gene

A common application in biotechnology focuses on producing human insulin from bacteria (Romanos et al., 1992; Abrahmsn et al., 1986; Moks et al., 1987). The human insulin gene depicted in Figure A1 was engineered using the MRTR heuristic, implemented with DTASEP, to achieve different translation rates (which are monotonically correlated to protein abundance) in S. cerevisiae, while consuming minimal ribosomal levels of the host.

First, a mutated version of the gene was engineered to achieve similar translation rates to the wild-type, but with a minimal ribosomal allocation (engineered gene 1). The resulting sequence is depicted in Figure A2. Second, another version of the gene was engineered to maximize translation rate without increasing ribosomal allocation of the original gene (engineered gene 2). The resulting sequence is depicted in Figure A3. Translation rates and ribosomal density were calculated for the original and both engineered genes also using TASEP. The results are summarized in Table 2. Additional details of the parameters used for calculating these values appear in Section 4.7.

Table 2.

Translation Rate of Original and Engineered Insulin Genes Calculated Using DTASEP and TASEP

	DTASEP		TASEP
	Translation rate	Number of ribosomes	Translation rate	Number of ribosomes
Original gene	0.000412	5.25	0.000346	5.33
Engineered gene 1	0.000409	2.10	0.000404	2.09
Engineered gene 2	0.001583	5.26	0.001472	5.28

As seen from the results, by using the suggested heuristic, translation rates could be preserved while reducing ribosomal usage by 60% (for both DTASEP and TASEP). On the other hand, translation rates could be increased by a factor of 3.84 for DTASEP and 4.25 for TASEP without increasing ribosomal allocation, thus allowing an increase in protein abundance without influencing the host. The wild-type and engineered gene sequences are presented in Figures A1, A2, and A3.

3. Discussion

In this study, we provided several new theorems related to the process of gene translation. In addition, we demonstrated how such theorems can be used for efficiently designing heuristics for engineering genes for heterologous gene expression.

This article emphasizes the differences between coding sequences that have been shaped by evolution and coding sequences that are optimized for various biotechnological goals. Sequences that are shaped by evolution are usually far from being optimal from the translation point of view (i.e., the models that were presented in the current study). This phenomenon can be explained by a few reasons:

First, the “optimization criteria” of evolution includes not only translation efficiency, but also additional considerations that influence the fitness of the organism. For example, the amino acid bias and the function/structure of the protein, the metabolic costs for generating the protein, the efficiency of the transcription stage, global considerations such as the effect of the translation of a single gene on other genes, and the response time—e.g., the time it takes to translate the first protein.

Second, often the evolution process does not converge to a global optimal point. For example, it is possible that all small mutations/changes in the organism's genome do not improve its fitness significantly; however, a large set of genomic changes (that cannot occur in a natural way) can improve it very significantly.

Third, organisms are evolving all the time as a response to environmental changes, resulting in constant change in the amino acids frequencies, the GC content of the genome, and cellular tRNA pool. Thus, to maintain the translation optimality of a coding sequence, the sequence should also keep evolving as a response to these changes, usually without converging to the optima point in terms of translation efficiency.

We practically demonstrated in this article how large-scale systems biology study of endogenous genes can be used for designing optimal coding sequences for biotechnological usage. In general, the approach demonstrated in this study includes the following steps:

1. Analyze endogenous genes with a required feature(s) (e.g., highly expressed genes, if the aim is maximizing translation rate or cost). Though, as discussed above, evolution does not perfectly optimize each coding sequence, thus on average, important/relevant features should be enriched in relevant groups of genes.

2. Formulate in an analytical/compact way the features that are over-represented in the gene set (e.g., highly expressed genes have relatively slower codons in the beginning of the ORF).

3. Formulate and prove mathematic theorem(s) explaining these features from a physical point of view (e.g., monotonically increasing profiles of translation efficiency optimize/minimize ribosomal density).

4. Use these theorem(s) for developing procedures for optimizing coding sequences (e.g., the MRTR heuristic).

One of the central results presented in this study is related to the effect of the order of codons on the number of ribosomes on the mRNA sequence. Our study emphasizes the need to minimize the ribosomal density in heterologous genes: genes with strong promoters but with non-optimal codons tend to consume a large number of ribosomes and cause a depletion in cellular resources, such as the number of available ribosomes and can decrease protein abundance of other genes. It is important to highlight the fact that slower codons at the beginning of a coding sequence lower its ribosomal density, even for cases when the number of ribosomes on the mRNA is relatively low and there are almost no collisions between ribosomes. This is done by effectively increasing the initiation time of ribosomes, such that the portion of the time that the gene is occupied by a ribosome(s) is lower.

In this study, we focused on the simplest models that capture the effect of codon order on translation efficiency. We supplied a few central lemmas about DTASEP, after showing that it supplies a fair and fast approximation of the TASEP model. These models can be generalized in various ways: for example, one can challenge the steady state assumption, consider other features of the coding sequence that may contribute to the translation rate of a codon (Tuller et al., 2010b, 2011), the effect of number of available ribosomes on the initiation rate, and the recycling of tRNA molecules (Cannarozzi et al., 2010). We aim to analytically analyze such models in the future in a similar way to the one presented here.

In addition, we aim to develop algorithms for optimizing coding sequences while considering additional features of the coding sequences such as the 5′UTR and 3′UTR of the gene (that may affect the initiation and termination rates) and the GC content of the gene that may affect the stability of its mRNA (Gu et al., 2010; Raab et al., 2010). This will increase the accuracy of the used theoretical models to enable synthesis of various genes with different translation rates that could be biologically validated in the near future.

4. Methods and Proofs

4.1. Modeling translation with DTASEP

Zhang et al. (1994) described the translation process using a deterministic set of equations; we name this model Deterministic TASEP (DTASEP). The mRNA was described as a 1-D lattice of L sites, L depicting the number of codons. Ribosomes were described by particles covering H consecutive sites at a time. A site can be covered only by a single ribosome at a time and ribosomes can translate a single codon at a time. A ribosome of length H can translate codon i if all i . . i + H − 1 sites on the lattice are free of other ribosomes. Initiation time describes the amount of time it takes a ribosome to bind itself to the mRNA chain, once the initiation site is available. Let us denote the initiation time by B(t) where t represents time. This value depends on several features, such as the number of free ribosomes and folding energy (Kozak, 1987; Alberts et al., 2002; Plotkin and Kudla, 2010; Gingold and Pilpel, 2011).

A ribosome can start binding to the mRNA if the initiation site is available and the first H codons are not occupied. Let us denote the time it takes a ribosome to translate codon i by U_i(t). In this study translation times were assumed to be constant, i.e., B(t) = B, U_i(t) = U_i. The model iteratively simulates attachment of ribosomes to the lattice and their advance on it according to the supplied parameters. Two important features are calculated in steady state: translation rate and the amount of ribosomes on the lattice. The following parameters were used in the model Zhang's model:

• n: index number of the ribosome participating in the translation simulation

• N: steady state index number, defined as the state where the number of ribosomes on the message has reached a constant value

• L: the number of sites in the lattice (i.e. the number of codons in the mRNA sequence)

• H: the size of the ribosome in terms of number of codon sites; also used to define the minimum space (in codon units/sites) between adjacent ribosomes on the lattice

• B_n: the time it takes for the n-th ribosome to form an initiation complex given that the initiation site is available. In this study it is assumed that B_n = B

• Q_n: the needed time for the ribosome to be released from the lattice after it reaches the stop (end) codon. In this study it is assumed that Q_n = 0

• U_n,i: the ‘nominal time’ required for the n-th ribosome to translate the i-th codon if no other ribosome ahead is blocking it to move forward. In this work we assumed U_n,i = U_i

• E_n,i: the ‘actual time’ it takes the n-th ribosome to translate the i-th codon. This value depends on U_i and on the delay caused by other ribosomes down the lattice, therefore E_n,i ≥ U_n. By definition E_n_,0 = B and E_n,L₊₁ = Q = 0. For simplicity let us mark E_N,i = E_i

• T_n,p: the total time it takes the n-th ribosome to finish translating the p-th codon and move forward to the next site, starting from the time the initiation site becomes available for binding. Therefore, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} T_{n, p} = \sum_{j = 0}^{p} E_{n, j} \tag{1} \end{align*} \end{document}

Notice that T_n,p = 0 for p < 0 or n < 1. For simplicity, let us mark T_N,p = T_p.

• I_n: the time the n-th ribosome has to wait until the binding site becomes available and the first H codons to be free of other ribosomes. This time is composed from the time it takes the n − 1-th ribosome to form an initiation complex and from the time it takes it to translate the first H codons. Therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I_{n} = B + \sum_{j = 1}^{H}E_{n - 1, j} = \sum_{j = 0}^{H}E_{n - 1, j} \tag{2} \end{align*} \end{document}

For simplicity, let us mark I_N = I; therefore, at steady state we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = 0}^{H}E_j \tag{3} \end{align*} \end{document}

• W_n,i: the time delay of the n-th ribosome on the i-th site. This value is determined by the state of both the current ribosome n and the downstream ribosome n − 1. The delay of the n-th ribosome at site i is defined as the time the n-th ribosome has to wait after finishing translating codon i until the n − 1-th ribosome finishes translating codon i + H, so that the n-th ribosome will be able to move to the next site.

Let us define t₁ = 0 as the time that the n − 1-th ribosome can bind to the mRNA and t₂ = I_n₋₁ as the time the n − 1-th ribosome moves to the H + 1 codon allowing the n ribosome to start binding to the mRNA.

At time t₃ = I_n₋₁ + T_n,i₋₁ + U_n,i the n-th ribosome finishes translating the i-th codon and can be delayed at site i only if the n − 1-th ribosome did not finish translating codon i + H. The overall time it takes the n − 1-th ribosome to finish translating the first n + H codons and move to site i + H + 1 is defined by T_n_−1,i+H. Let us define this time as t₄. This of course is also the time the n-th ribosome moves to site i + 1. The overall time it takes the n-th ribosome to translate the first i codons and move to the next site is therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} t_4 = I_{n - 1} + T_{n, i - 1} + U_{n, i} + W_{n, i} \end{align*} \end{document}

where W_n,i is the delay of ribosome n at site i caused by the the n − 1-th ribosome. Using both definitions of t₄ we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} t_4 = I_{n - 1} + T_{n, i - 1} + U_{n, i} + W_{n, i} = T_{n - 1, i + H} \tag{4} \end{align*} \end{document}

following the next mathematical relationship: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} W_{n, i} = T_{n - 1, i + H} - [ T_{n, i - 1} + U_{n, i} + I_{n - 1} ] \tag{5} \end{align*} \end{document}

Therefore, the actual time the n-th ribosome spends at site i is: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_{n, i} = U_{n, i} + W_{n, i} \tag{6} \end{align*} \end{document}

Negative W_n,i has no physical meaning, therefore is set to zero. Also, for i within the H last sites on the lattice, W_n,i also takes the value of 0, since when a ribosome reaches a site i such that i > L − H and finishes translating its codon, it already covers the next H sites down the lattice, therefore no other ribosome can delay it, resulting in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} W_{n, i} = 0, \quad \forall i > L - H \tag{7} \end{align*} \end{document}

For simplicity let us mark W_N,i = W_i.

• P_n: the translation time of the mRNA by the n-th ribosome, which can be expressed as: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} P_n = \sum_{j = 0}^{L} E_{n, j} \tag{8} \end{align*} \end{document}

For simplicity let us mark P_N = P.

• D: the number of ribosomes per message at steady state. Defined as the number of initiations occurring in the period of time it takes a single ribosome to translate the mRNA in steady state. Therefore D is determined by the following relationship: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} D = \frac {P} {I} \tag {9} \end{align*} \end{document}

• R_n: the release rate of a ribosome from the message. Also referred as translation rate. This value is defined as {the time interval between the n-th and (n − 1)-th ribosome release}⁻¹, which is mathematically expresses as: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} R_n = \frac {1} {P_n + I_{n - 1} - P_{n - 1}}, \ n > 1 \tag {10} \end{align*} \end{document}

In steady state P_n = P_n₋₁ = P_N = P and I_n₋₁ = I, therefore the steady state translation rate is: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} R_N = \frac {1} {I} \tag {11} \end{align*} \end{document}

For simplicity, let us mark R_N = R.

• K: the steady state translation rate per ribosome measure, defined as the steady state translation rate divided by the number of ribosomes on the lattice in steady state \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} K = \frac {R} {D} = \frac {\frac {1} {I}} {\frac {P} {I}} = \frac {1} {P} \tag {12} \end{align*} \end{document}

4.2. Modeling translation with TASEP model

Translation was also modeled using TASEP, as previously described by different studies (MacDonald et al., 1968; Shaw et al., 2003; Reuveni et al., 2011). Similarly to DTASEP, the mRNA is characterized by L sites. Each ribosome on the chain covers H codons and any codon may be covered by a single ribosome at most.

In each step of the simulation, a single ribosome can attach itself to the lattice or advance on it if the first/next \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\left\lfloor \frac {H} {2} \right\rfloor + 1$$ \end{document} codons are not occupied. The time between initiation attempts is exponentially distributed with a rate of B⁻¹, where B is the initiation time defined in DTASEP in Section 4.1. The time between jumps attempts of a ribosome from site i to site i + 1 is also exponentially distributed with a rate of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$U_i^{- 1}$$ \end{document} , U_i as defined in Section 4.1.

Thus, the time between events (initiation/jump) is exponentially distributed with rate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\mu \{n_i \} = B^{- 1} + \sum\limits_{i = 1}^N n_iU_i^{- 1}$$ \end{document} where n_i = 1 if codon i is being translated and n_i = 0 otherwise. Therefore, the initiation probability is defined as (Bμ{n_i})⁻¹ and the probability of a ribosome to jump from site i to site i + 1 is given by n_i(U_iμ{n_i})⁻¹, as shown also by Reuveni et al. (2011).

The steady state translation rate was determined by counting the total ribosomes that finished translating the mRNA during the simulation divided by the total simulation time. In this study 1,000,000 steps were used for achieving an initial scattering of ribosomes on the mRNA and another 100,000,000 steps for calculating the translation rate.

The number of needed iterations for achieving steady state solution highly depends on the values of L, H, B and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{U_i \}_{i = 1}^L$$ \end{document} . To assure reliable simulation results, the number of iterations was set to achieve almost constant translation rates (±2%) for the 10 slowest (sum of codons translation times) examined genes in S. cerevisiae for different initiation times B₀, B₀/2, B₀/4, where B₀ was defined to be min \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{U_i \}_{i = 1}^L$$ \end{document} . This assured that translation rates of “slow” genes would not artificially decrease due to lack of enough iterations. The model supplies ribosomal density per codon, translation rate and average number of ribosomes on the mRNA chain in steady state.

4.3. Showing correlation between DTASEP and TASEP models

In the absence of direct measurements of translation rates, protein abundance (PA) was selected as a predictive measure, assuming that PA of a gene is expected to increase monotonically with translation rate (Reuveni et al., 2011).

In this study characteristics of the translation models were demonstrated on genes taken from S. cerevisiae genome, unless stated otherwise. Codon translation times were taken from Tuller et al. (2010a).

To estimate the degree of similarity between translation rates calculated with DTASEP and TASEP in respect to the effect of codon ordering on translation, 100 random genes with different translation rates were selected from the S. cerevisiae genome and for each gene codon locations were permutated 1000 times. Translation rate was calculated for each permutation using the mathematical models. Spearman correlation coefficient between TASEP and DTASEP translation rates was calculated for each gene apart (for all its 1000 permutations). The results of this validation are presented in Section 2.1 and in Figure 1B,C.

4.4. Characteristics of DTASEP translation model

Lemma 1

In steady state, if a ribosome that finished translating the i-th codon is being delayed by other ribosomes down the lattice, the following relationship exits: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = i + 1}^{i + H}E_j \tag{13} \end{align*} \end{document}

otherwise \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I \geq \sum_{j = i + 1}^{i + H}E_j \tag{14} \end{align*} \end{document}

Proof

By definition, in steady state the delay of the ribosome at site i is determined by the following relationship: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} W_i & = T_{i + H} - T_{i - 1} - U_i - I \\ & = \sum_{j = 0}^{i + H}E_j - \sum_{j = 0}^{i - 1}E_j - U_i - I \\ & = \sum_{j = i}^{i + H}E_j - U_i - I & (15) \end{align*} \end{document}

By definition \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_i = U_i + W_i \end{align*} \end{document}

therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_i & = U_i + \Bigg(\sum_{j = i}^{i + H}E_j - U_i - I \Bigg) \\ & = U_i + E_i + \sum_{j = i + 1}^{i + H}E_j - U_i - I & (16) \end{align*} \end{document}

thus \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = i + 1}^{i + H} E_j \tag{17} \end{align*} \end{document}

For W_i=0, using the definition of W_i and eq. (15) we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum_{j = i}^{i + H}E_j - U_i - I \leq 0 \tag{18} \end{align*} \end{document}

therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I \geq \sum_{j = i + 1}^{i + H}E_j \tag{19} \end{align*} \end{document}

Lemma 2

Proof

The steady state translation rate is defined as R = I⁻¹. By definition, we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I & = \sum_{j = 0}^{H}E_j \\ & = B + \sum_{j = 1}^{H}E_j \\ & = B + \sum_{j = 1}^{H}U_j + \sum_{j = i}^{H}W_j & (20) \end{align*} \end{document}

The delay W_j of the last H codons is zero. Let us mark by m the biggest index with W_m > 0. Therefore by using Lemma 1, I can be expressed only by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = m + 1}^{m + H}U_j \tag{21} \end{align*} \end{document}

However, the I variable from eq. (20) is at least of order of O(B), while the variable I from eq. (21) is of order of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$O (\{U_j \}_{j = 1}^L)$$ \end{document} , where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$O (\{U_j \}_{j = 1}^L) < < O (B)$$ \end{document} . Therefore, such m cannot exist, resulting in W_j = 0, ∀j. Therefore, eq. (20) can be rewritten as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = B + \sum_{j = 1}^{H} U_j \tag{22} \end{align*} \end{document}

indicating that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$O (I) = O (B) > O (\{U_j \}_{j = 1}^L)$$ \end{document} , making B the dominant factor influencing I, and therefore the limiting translation rate factor. ▪

Lemma 3

A decrease in one of the codon's translation time U_i cannot decrease the translation rate R of a gene.

Proof

Let us select a general codon i and mark the decrease in its translation time by ΔU_i, so that the codon's new translation time will be \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \widetilde{U}_i = U_i - \Delta U_i, \ \Delta U_i > 0 \tag{23} \end{align*} \end{document}

Let us mark the rest of the codons translation time after this change with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{U}_j$$ \end{document} , such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{U}_j = U_j$$ \end{document} . Let us mark the initiation time before the change in the codon's translation time with I and with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} the initiation time after a decrease in the translation of one of the codons. As mentioned in the definitions section, R = I⁻¹; therefore, we will show that a decrease in translation time of a general codon i cannot increase I, i.e., \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$I \geq \widetilde{I}$$ \end{document} .

First, let us define m as the biggest index of the site with W_m > 0, s.t. W_m+k = 0, ∀k > 0. Let us mark using \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} the codon index with this property after a decrease in the translation time of codon i. Because the last H sites in the chain are not delayed, W_j = 0, L − H < j ≤ L. In case none of the codons is delayed, then m = 0.

Using the above definition, let us check the relation between I and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} for all possible values of m, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} and i, mapped by the following cases:

1. m = 0

2. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} = m < i$$ \end{document}

3. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} = m \ge i$$ \end{document}

4. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$m < \tilde{m}, i \leq \tilde{m}$$ \end{document}

5. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$m < \tilde{m} \leq i$$ \end{document}

6. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} < m$$ \end{document}

resulting in:

1. Given m = 0 and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} = 0$$ \end{document} , then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$W_j = \widetilde{W}_j = 0$$ \end{document} , ∀j.

If i ≤ H then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{U}_i < U_i$$ \end{document} , therefore I and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} can be expressed as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = 0}^H U_j > \sum_{j = 0}^H {\widetilde U}_j = {\widetilde I} \tag{24} \end{align*} \end{document}

If i > H then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$I = \widetilde{I}$$ \end{document} .

If m = 0 and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} > 0$$ \end{document} , we can express I and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} at site \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} by using Lemma 1, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{W}_{\tilde{m}} > 0$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$W_{\tilde{m}} = 0$$ \end{document} , resulting in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases}I \ge \sum \limits_{j = \tilde {m} + 1}^{\tilde{m} + H} U_j \\ \widetilde {I} = \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H} \widetilde {U}_j \le \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H}U_j\end{cases} \tag{25} \end{align*} \end{document}

2. For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} = m < i$$ \end{document} we can express I and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} for site \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} , where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{W}_{\tilde{m}} > 0$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$W_{\tilde{m}} > 0$$ \end{document} . \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = \tilde {m} + 1}^{\tilde {m} + H}U_j \ge \sum_{j = \tilde {m} + 1}^{\tilde {m} + H} \widetilde {U}_j = \tilde {I} \tag{26} \end{align*} \end{document}

resulting in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$I \geq \widetilde{I}$$ \end{document} .

3. For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} = m \geq i$$ \end{document} , let us express I, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} : \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = \tilde {m} + 1}^{\tilde {m} + H}U_j = \sum_{j = \tilde {m} + 1}^{\tilde {m} + H} \widetilde {U}_j = \widetilde {I} \tag{27} \end{align*} \end{document}

4. For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$m < \tilde{m}, i \leq \tilde{m}$$ \end{document} let us express I and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} at site \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} by using Lemma 1, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{W}_{\tilde{m}} > 0$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$W_{\tilde{m}} = 0$$ \end{document} : \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases}I \ge \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H} U_j \\ \widetilde {I} = \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H} \widetilde {U}_j = \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H}U_j\end{cases} \tag{28} \end{align*} \end{document}

5. For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$m < \tilde{m} \leq i$$ \end{document} let us express I and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} at site \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} by using Lemma 1, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{W}_{\tilde{m}} > 0$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$W_{\tilde{m}} = 0$$ \end{document} : \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases}I \ge \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H} U_j \\ \widetilde {I} = \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H} \widetilde {U}_j \le \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H}U_j\end{cases} \tag{29} \end{align*} \end{document}

6. For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} < m$$ \end{document} let us express I and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{I}$$ \end{document} at site \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} by using Lemma 1, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{W}_{\tilde{m}} > 0$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$W_{\tilde{m}} > 0$$ \end{document} : \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases}I = \sum \limits_{j = \tilde {m} + 1}^{m + H} U_j + \sum \limits_{j = \tilde {m} + 1}^{m + H} W_j \\ \widetilde {I} = \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H} \widetilde {U}_j \le \sum \limits_{j = \tilde {m} + 1}^{\tilde {m} + H}U_j\end{cases} \tag{30} \end{align*} \end{document}

where W_j are the possible delays at sites \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m} + 1 . . \tilde{m} + H$$ \end{document} before changing translation time of codon i. Therefore we can write the following relationship \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = \tilde {m} + 1}^{m + H} U_j + \sum_{j = \tilde {m} + 1}^{m + H} W_j \ge \sum_{j = \tilde {m} + 1}^{m + H} \widetilde {U}_j \ge \sum_{j = \tilde {m} + 1}^{\tilde {m} + H} \widetilde {U}_j = \widetilde {I} \tag{31} \end{align*} \end{document}

resulting again in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$I \geq \widetilde{I}$$ \end{document} .

We have shown that for all possible values of m, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{m}$$ \end{document} and i, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$I \geq \widetilde{I}$$ \end{document} , therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$R \leq \widetilde{R}$$ \end{document} . ▪

Lemma 4

Adding a codon to the mRNA cannot increase its translation rate R.

Proof

As mentioned in the definition section, R = I⁻¹. Let us assume that R is the steady state translation rate of a mRNA with L codons. Now, let us assume that we decrease the translation time of one of the codons i from U_i by by U_i, such that U_i = 0. The reduction of the translation time of a codon to 0 is equivalent to removing it from the sequence, thus obtaining a new sequence with L − 1 codons. Let us mark the translation rate of the new sequence with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\widetilde{R}$$ \end{document} . Using Lemma 3, a decrease in translation time of a codon cannot decrease the sequence translation rate, resulting in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} R \le \widetilde {R} \tag{32} \end{align*} \end{document}

Therefore, the translation rate R of a sequence with L codons cannot be higher than the translation rate of its partial series (preserving the same codon order) with a lower number of codons. ▪

Lemma 5

Given a gene with a finite set of codons \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{U_i \}_{i = 1}^L$$ \end{document} , sorting the codons in a descending order according to their translation times, achieves minimal ribosomal allocation among all other codon permutations of this gene.

Proof

Let us assume that for such codon arrangement a ribosome at steady state can be delayed at some sites. Let us define by m the biggest codon index such as W_m > 0. Using Lemma 1, for such site the following relationship exists: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = m + 1}^{m + H} E_j = \sum_{j = m + 1}^{m + H} U_j \tag{33} \end{align*} \end{document}

However, by definition \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = B + \sum_{j = 1}^H E_j = \sum_{j = 1}^H U_j + \sum_{j = 1}^HW_j \tag{34} \end{align*} \end{document}

Because of the selected arrangement (U_i ≥ U_i+1), we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum_{j = 1}^H U_j \ge \sum_{j = m + 1}^{m + H}U_j \tag{35} \end{align*} \end{document}

therefore by combining eqs. (33), (34), and (35), we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = B + \sum_{j = 1}^H E_j > \sum_{j = 1}^H U_j > \sum_{j = m + 1}^{m + H}U_j = I \tag{36} \end{align*} \end{document}

therefore, such site m with W_m > 0 cannot exist, resulting in W _j = 0, ∀j.

In DTASEP, the number of ribosomes on the lattice at steady state, D, is defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} D = \frac {P} {I} \tag {37} \end{align*} \end{document}

which can be also expressed as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} D = \frac {P} {I} = \frac {\sum \limits_{j = 0}^L E_j} {\sum \limits_{j = 0}^H E_j} = 1 + \frac {\sum \limits_{j = H + 1}^L E_j} {\sum_{j = 0}^H E_j} \tag {38} \end{align*} \end{document}

To minimize the number of ribosomes D on the mRNA we need to minimize the numerator while maximizing the denominator of eq. (38): \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases}min \sum \limits_{j = H + 1}^L E_j \\ max \sum \limits_{j = 0}^H E_j\end{cases} \tag{39} \end{align*} \end{document}

For a monotonic decreasing series (U_i ≥ U_i+1), we get that the nominator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum \limits_{j = H + 1}^L E_j$$ \end{document} is equal to the translation time sum of the L − H codons with the lowest translation times and contains no delay. The denominator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum \limits_{j = 0}^H E_j$$ \end{document} is equal to the translation time sum of the H codons with the highest translation times, again with no delays.

We will show that no other codons arrangement can achieve a lower D value than achieved with a monotonic decreasing series.

First, let us mark with I* the total initiation time of a monotonic decreasing series, having the slowest H codons located in the first H sites. Let us mark those codons with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{\widetilde{U}_i \}_{i = 1}^H$$ \end{document} . Now we will prove that for a general series with H < L, its total initiation time I is bounded by I*.

For a general (non-monotonic decreasing) series, let us mark again with m the biggest codon index such that W_m > 0. If such m exists, then by using Lemma 1, we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = m + 1}^{m + H} U_j \tag{40} \end{align*} \end{document}

Because of the non-monotonic arrangement we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \sum_{j = m + 1}^{m + H} U_j \le B + \sum_{j = 0}^H \widetilde {U}_j = I^* \tag{41} \end{align*} \end{document}

If such m does not exist, then I is defined only from the translation time of the first H codons, which again result in I ≤ I*.

It was already shown that for a general H, a monotonic decreasing series has W_j = 0, therefore such arrangement minimizes the nominator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum \limits_{j = H + 1}^L E_j$$ \end{document} of eq. (38). A monotonically decreasing series has I* ≥ I in comparison to all other series, thus maximizing the denominator. Thus, a monotonic decreasing series minimizes the ribosome density per message, D, achieving its lowest bound. ▪

Lemma 6

Given a gene with a finite set of codons \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{U_i \}_{i = 1}^L,$$ \end{document} sorting the codons in a descending order according to their translation times, maximizes translation rate per ribosome among all other codon permutations of this gene.

Proof

By definition, the steady state translation rate per ribosome measure K is mathematically defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} K = \frac {1} {P} \tag {42} \end{align*} \end{document}

Thus, in order to maximize K, P should be minimized. Using the definition of P, we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} P = \sum_{j = 0}^L E_j = \sum_{j = 0}^L U_j + \sum_{j = 0}^L W_i \tag{43} \end{align*} \end{document}

It was already shown in Lemma 5 that a monotonically decreasing series has W_i = 0, thus minimizes P and maximizes K. ▪

4.5. Used data

Various properties of the used translation models in this paper were demonstrated on genes taken from the S. cerevisiae genome. Codon translation times were taken from Tuller et al. (2010a). The insulin gene sequence was taken from the National Center for Biotechnology Information.

4.6. MRTR heuristic for determining translation rate using synonymous mutations—additional details

4.6.1. Ordering all descending series according to their translation rate

As mentioned in Section 2.6, in order to find a sequence with a specific translation rate and minimal ribosomal allocation, we can apply a binary search algorithm on all descending sequences (according to their codon translation times) sorted by their translation rate. As already shown in Section 2.6, the sorting of all descending series is not trivial due to the multiple possibilities of increasing translation rate of a series only by decreasing translation time of a single codon.

The P possible codon translation times of a host can be described also according to their rank order W_i. Therefore W_i = 1 represents the lowest translation time while W_i = P represents the highest translation time in the host. In this study, we used two methods for increasing translation rate of a series:

1. Right Rule: decrease the rank order of the codon that already has the lowest rank that is not 1. In case several codons fulfill this condition, decrease the codon that is closest the 3’UTR.

2. Left Rule: decrease translation rank order of the codon with the highest rank. In case several codons fulfill this condition, decrease the codon that is closest the 3’UTR.

For specific implementation of these two rules, see also the pseudo code in Figures 7 and 8.

FIG. 7.

Left Rule pseudo code.

FIG. 8.

Right Rule pseudo code.

Therefore all descending series representing a gene with L codons and P possible codon translation times in a host can be mapped and sorted in a lattice according to their translation rate in the following manner:

1. Start with the series containing the codons with the highest possible rank order (the series with lowest translation rate), i.e., \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{W_i \}_{i = 1}^L = P$$ \end{document}

2. Apply Right Rule on the current sequence and reapply it recursively until reaching the series with the fastest possible translation rate, i.e. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{W_i \}_{i = 1}^L = 1$$ \end{document} . This step builds the outer right diagonal of the lattice.

3. For each sequence in the outer left diagonal, apply the Left Rule and reapply it recursively until reaching the series with the fastest possible translation rate, i.e., \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{W_i \}_{i = 1}^L = 1$$ \end{document} . This step builds all internal right-to-left diagonals.

An example of this building for L = 3 and P = 4 can be seen in Figure 10. Of course another lattice can be achieved by first building the outer left diagonal using the Left Rule and then by applying recursively on each one of the series the Right Rule, all left-to-right internal diagonals could be built. An example of this building for L = 3 and P = 4 can be seen in Figure 9. Although the two lattices contain the same series, they are mapped in a different order, therefore the lattices are not identical.

FIG. 9.

An example of the search lattice for L = 3 and P = 4. The external left diagonal is first built by applying the Left Rule recursively, starting with the series with the slowest translation rate (in this example [4 4 4]). Then, on each series in the outer left diagonal the Right Rule is applied recursively until translation rate could not be further increased (in this example [1 1 1]). Each child node in the lattice represents a series with higher or equal translation rate in comparison to its father's sequence.

FIG. 10.

Another mapping of the lattice for L = 3 and P = 4. In this method the outer right diagonal is first built by applying the Right Rule recursively, and then, on each one of its series the Left Rule is applied. Each child node in the lattice represents a series with higher or equal translation rate in comparison to its father's sequence.

FIG. 11.

Optimizing search in the lattice. After searching the wanted translation rate in the external left diagonal, two sequences are found to upper and lower bound the wanted translation rate, depicted in this figure by green boxes. Given the upper bound, series below it in left diagonals (underlined by the blue arrow) and their child series have higher or equal translation rate to the upper bound, therefore can be excluded from the search (marked with the gray triangle). Thus the search in the next diagonal can be reduced, as depicted by the red arrow in this figure.

FIG. 12.

Mapping example between continuous ranking variable W_i values to real (sparse) translation time values U_i of codons coding the same amino acid. In this figure P=10. The left table depicts the possible values of the ranking variable (1-10) for this specific host. The middle table presents the mapping between ranking variables to codon translation times for a selected amino acid, where only three codons can code it. As a result, other ranking orders are missing a mapping to real translation times. To overcome this, ranking orders without a mapping are mapped to the closest lower possible translation rate, achieving a full mapping between the ranking variable and real translation times (without changing the coding of the amino acid), as depicted in the right table.

4.6.2. Time complexity analysis for building the lattice

The straightforward time complexity of Left Rule and Right Rule methods is O(L). This complexity can be further reduced to O(1) by copying the series from the previous node on the lattice and managing the indexes of the codons with the highest and lowest rankings. The external right and left diagonals of the lattice contain (P − 1) L + 1 nodes, therefore the total number of nodes in the lattice is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum_{i = 1}^{(P - 1) L + 1} i = \frac {((P - 1) L + 1) ((P - 1) L + 2)} {2} \end{align*} \end{document}

Thus, the time complexity of building the lattice is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O \left(\frac {((P - 1) L + 1) ((P - 1) L + 2)} {2} \right) = O (L^2P^2) \end{align*} \end{document}

4.6.3. Memory complexity analysis for building the lattice

As shown in the previous section, there are \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\frac {((P - 1) L + 1) ((P - 1) L + 2)} {2}$$ \end{document} nodes in the lattice, each containing an array storing a sequence of length L. As already mentioned in Section 4.6.1, each series in the lattice differs from its parent series exactly by one codon, which has a ranking lower exactly by one. Therefore, each node of the lattice can actually contain only the index of the changed codon (relative to its parent). Moreover, the heuristic can be implemented by building the outer right diagonal and then building only one internal diagonal at a time from one of the nodes of the outer right diagonal, therefore only two diagonals are needed to be stored in memory simultaneously. The longest diagonal has (P − 1) L + 1 nodes, thus, the space complexity is O(PL).

4.6.4. Time complexity analysis for search heuristic

Given a set of N sorted numbers, the time complexity of a binary search is O (log (N)). The time complexity for calculating the translation rate given a sequence of L codons and a translation model is O(f (L)), therefore the time complexity of finding the closest possible rate on a diagonal of N sequences is O(log (N) f (L)). Given (P − 1)L + 1 internal diagonals of length N, the time complexity of finding the closest rate in each diagonal is therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O (((P - 1) L + 1) \log (N) \;f (L)) = O (PL \log (N) \;f (L)) \end{align*} \end{document}

However, the diagonals in the lattice are not equal, but of length 1. (P − 1)L + 1, reducing time complexity to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O \left(\sum_{i = 1}^{(P - 1) L + 1} \log (i) f (L) \right) & = O \left(f (L) \log \left(\prod_{i = 1}^{(P - 1) L + 1}i \right) \right) \\ & = O (f (L) ((P - 1) L) \log ((P - 1) L + 1)) \\ & = O (f (L) PL \log (PL)) \end{align*} \end{document}

This of course it an upper bound for the search heuristic. Using the suggested optimization in Section 2.6, on average the length of each diagonal can be reduced by a factor of 2 (relatively to its previous diagonal), getting the following complexity: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O \left( f (L) \Bigg\{ log ((P - 1) L + 1) + log \Bigg( \frac {(P - 1) L + 1} {2} \Bigg) + log \Bigg( \frac {(P - 1) L + 1} {4} \Bigg) + \cdot \cdot + 1 \Bigg\} \right) =& \\ O \left( f (L) \bigg\{ (log ((P - 1) L + 1))^2 + \left( 1 + \frac {1} {2} + \cdot \cdot + 2^{- log ((P - 1) L + 1)} \right) \bigg\} \right) =& \\ O (f (L) log ((P - 1) L + 1))^2) =& \\ = O (f (L) (log (PL))^2)& \end{align*} \end{document}

Finally, to find the closest rate among all candidate rates (one series per internal diagonal) we need another ((P − 1)L + 1) comparisons, overall attaining a time complexity of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O \left(f (L) (\log (PL))^2 \right) + O (PL) \end{align*} \end{document}

In general, O (f(L)) ≫ O (PL); therefore, the last factor can be removed, resulting in time complexity of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O (f (L) (log (PL))^2) \end{align*} \end{document}

4.6.5. Time complexity analysis for mapping ranking series to real translation time series

A translation table between codon ranking order to real translation times could be built once for each amino acid type, with time complexity of O(P). Therefore, the mapping of each one of the rank orders of a series to its real translation time can be done in O(1). Each series contains L codons; therefore, the time complexity required to map between a ranking series to a translation time series is O(L). The mapping can be applied on a rank order series only when the binary search on each diagonal requests to know a series specific translation time. In the previous section, it was shown that the translation model is activated O((log(PL))²)) times; therefore the mapping time complexity is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O (L (log (PL))^2) \end{align*} \end{document}

4.6.6. General time and memory complexity analysis

The general time complexity is determined by building and searching on the lattice, resulting in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O (L^2P^2) + O (L (log (PL))^2) + O (f (L) (log (PL))^2) \end{align*} \end{document}

Given that O(f (L)) ≫ O(L²P²), the overall time complexity is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} O (f (L) (log (PL))^2) \end{align*} \end{document}

The memory complexity is determined by the dimension of the lattice, resulting in O(PL) space complexity.

4.6.7. Implementation optimizations

This MRTR heuristic can be implemented using DTASEP or TASEP models to calculate the translation rate of a given series. However, because DTASEP parameters are deterministic, less iterations (by a few orders) are needed to achieve steady state translation rate when compared to TASEP. To shorten running times, MRTR could be first implemented using DTASEP and then its results could be revalidated with TASEP.

4.7. Applying MRTR on insulin gene—technical details

The human insulin sequence was taken from for Biotechnology Information is depicted in Figure A1. Codon translation times were taken from Tuller et al. (2010a).

The genes in both demonstrations were engineered with MRTR, using the DTASEP model and also re-validated with TASEP. These models were run with the parameters appearing in Table 3.

Table 3.

Parameters Values for DTASEP and TASEP Used for the Demonstration of the MRTR Heuristic

Variable	Value
B₀ (for DTASEP)	11
B₀ (for TASEP)	11
H (in codon units)	11
ɛ for producing engineered gene 1	2
ɛ for producing engineered gene 2	5
Initial number of iterations for TASEP	2.5^*10⁴
Steady state number of iterations for TASEP	10⁶

4.8. DTASEP, TASEP, and MRTR running times—a short benchmark

DTASEP, TASEP, and MRTR running times were measured on a Quad Core working station with 4G memory and Ubuntu 11.04 operating file system. The code was implemented and run on Matlab 7.8, 32-bit.

For benchmarking purposes, DTASEP and TASEP were run 100 times on the insulin gene presented in Figure A1 to get an average running time of the models, using the parameters presented in Table 3. The MRTR running time was measured for generating both engineered genes. The results are presented in Table 4.

Table 4.

Average Running Times of DTASEP, TASEP, and Running Time of MRTR (for Engineering Gene 1 and gene 2) on the Human Insulin Gene Described in Figure A1

Task	Running time
DTASEP (wild-type gene)	0.94 msec
TASEP (wild-type gene)	76 sec
MRTR (engineered gene 1)	1347 sec
MRTR (engineered gene 2)	1352 sec

Appendix

Appendix A

FIG. A1.

Wild-type human insulin sequence.

FIG. A2.

Engineered insulin sequence achieving similar translation rate as the wild-type with minimal ribosomal allocation

FIG. A3.

Engineered insulin sequence achieving maximal translation rate with similar ribosomal allocation as the wild-type

Appendix B

Another deterministic model describing the translation process was suggested Reuveni et al. (2011). The mRNA strand was also described by a 1-D lattice of L sites and the ribosome length was set to the size of a single codon. To overcome this limitation, mRNA molecules were coarse grained into chunks of C sites. This parameter was associated to the ribosome's various geometrical properties and selected by maximizing the correlation between translation rates of this model to protein abundance. Given that the first site is not occupied, ribosome initiation rate was denoted by λ (which is equivalent to B⁻¹ parameter in DTASEP). A ribosome occupying site i moves with a rate λ_i to site i + 1, given that the later is not occupied by another ribosome (λ_i is equal to U_i⁻¹, used in the DTASEP model). The translation process of this model is depicted by a set of differential equations, describing each the movement of a ribosome from site i to site i + 1. Their solution supplies the steady state occupation probabilities of each site and the ribosome flow through the system.

B.1. General model description

The following parameters were used to describe the RFM model:

• L - the number of sites (codons) in the translated mRNA

• C - the number of sites in a chunk

• λ - initiation rate, given that the first site is not occupied by another ribosome

• λ_i - the rate of a ribosome translating codon i, given that no ribosome occupies site i + 1

• p_i(t) - the probability that site i is occupied by a ribosome at time t

• π_i - the steady state occupation probability of site i

• R - the steady state ribosome flow through the system

• D - number of ribosomes per message at steady state, defined as the sum of densities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} D = \sum \nolimits_{j = 1}^L \pi_j \end{align*} \end{document}

• The flow rate of ribosomes into the system is equal to the initiation rate multiplied by the probability of the first site to be empty, i.e. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \lambda [ 1 - p_1 (t) ] \end{align*} \end{document}

• The flow rate of ribosomes at site i is given by the flow entering site i from site i − 1 minus the flow exiting site i to site i + 1 \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \frac {dp_i (t)} {dt} = \lambda_{i - 1} p_{i - 1} (t) [ 1 - p_i (t) ] - \lambda_ip_i (t) [ 1 - p_{i + 1} (t) ] \end{align*} \end{document}

• K - the steady state translation rate per ribosome measure is defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} K = \frac {R} {D} \end{align*} \end{document}

The ribosome flow model is described by the following set of differential equations: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases} \frac {dp_1 (t)} {dt} = \lambda [ 1 - p_1 (t) ] - \lambda_1p_1 (t) [ 1 - p_2 (t) ] \\ \frac {dp_i (t)} {dt} = \lambda_{i - 1} p_{i - 1} (t) [ 1 - p_i (t) ] - \lambda_ip_i (t) [ 1 - p_{i + 1} (t) ] \qquad 1 < i < L \\ \frac {dp_L (t)} {dt} = \lambda_{L - 1} p_{L - 1} (t) [ 1 - p_L (t) ] - \lambda_{LpL} (t) \end{cases} \tag {44} \end{align*} \end{document}

The steady state solution of eq. (44) is: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases}R = \lambda [ 1 - \pi_1 ] = \lambda_1 \pi_1 [ 1 - \pi_2 ] \\ R = \lambda_{i - 1} \pi_{i - 1} [ 1 - \pi_i ] = \lambda_i \pi_i [ 1 - \pi_{i + 1} ] \qquad 1 < i < L \\ R = \lambda_{L - 1} \pi_{L - 1} [ 1 - \pi_L ] = \lambda_L \pi_L\end{cases} \tag{45} \end{align*} \end{document}

where the steady state occupation probabilities \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{\Pi_i\}_{i=1}^L$$ \end{document} are constant in time.

B.2. Correlation between RFM translation rates and PA

The RFM model uses ribosomes of single site length. To overcome this limitation, mRNA sites were coarse grained into chunks which their size was optimized by maximizing the correlation between PA of S. cerevisiae and its genes translation rates. Maximal Spearman correlation coefficient achieved a value of 0.67 (P values < 10⁻¹⁴), resulting in chunks of 23 sites, similarly to previous obtained results by Reuveni et al. (2011).

B.3. Correlation between RFM and TASEP translation rates

To establish similarity between RFM and TASEP models, translation rates were calculated for 1000 codon location permutations for different 100 genes from S. cerevisiae genome. Average Spearman correlation coefficient between TASEP and RFM translation rates was 0.89 (P < 10⁻⁸).

B.4. Characteristics of the RFM model

Lemma 2

A low initiation rate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\lambda < < \{\lambda_j \}_{j = 1}^L$$ \end{document} becomes the limiting translation rate factor.

Proof

This proof is taken Reuveni et al. (2011).

As seen from eq. (45), the steady state translation rate R is determined by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\lambda_{i - 1} \pi_{i - 1} [ 1 - \pi_i ]$$ \end{document} . The probabilities π_i are by definition smaller then 1, therefore R is limited by the slowest rate in the system \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} R \leq \min \{\lambda_j \}_{j = 0}^L \tag{46} \end{align*} \end{document}

Thus for low initiation rates we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} R \leq \lambda < < \min \{\lambda_j \}_{j = 1}^L \tag{47} \end{align*} \end{document}

Lemma 3

A positive change to one of the λ_i-s increases the protein production R.

Proof

Let us negatively assume that a positive change in one of the λ_is decreases R.

A general change of Δλ to λ_i,1 < i < L will change the steady state solution of eq. (45) to: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases}R + \Delta R = \lambda [ 1 - \pi_1 - \Delta \pi_1 ] = \lambda_1 (\pi_1 + \Delta \pi_1) [ 1 - (\pi_2 + \Delta \pi_2) ] \\ R + \Delta R = \lambda_{j - 1} (\pi_{j - 1} + \Delta \pi_{j - 1}) [ 1 - \pi_j - \Delta \pi_j ] \\ \quad \quad \quad \ \; = \lambda_j (\pi_j + \Delta \pi_j) [ 1 - \pi_{j + 1} - \Delta \pi_{j + 1} ] \quad 1 \leq \, j < L, j \neq i \\ R + \Delta R = \lambda_{i - 1} (\pi_{i - 1} + \Delta \pi_{i - 1}) [ 1 - \pi_i - \Delta \pi_i ] \\ \quad \quad \quad \ \; = (\lambda_i + \Delta \lambda) (\pi_i + \Delta \pi_i) [ 1 - \pi_{i + 1} - \Delta \pi_{i + 1} ] \ 1 \leq \,i < L \\ R + \Delta R = \lambda_{L - 1} (\pi_{L - 1} + \Delta \pi_{L - 1}) [ 1 - \pi_n - \Delta \pi_L ] \\ \quad \quad \quad \ \; = \lambda_L (\pi_L + \Delta \pi_L) \end{cases} \tag{48} \end{align*} \end{document}

After extracting eq. (45) from eq. (48) we get that: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases} \Delta R = - \lambda \Delta \pi_1 \\ \Delta R = \lambda_j \Delta \pi_j (1 - \pi_{j + 1} - \Delta \pi_{j + 1}) - \lambda_j \pi_j \Delta \pi_{j + 1} \ 1 \leq \,j < L, j \neq i \\ \Delta R = (\lambda_i \Delta \pi_i + \Delta \lambda \pi_i + \Delta \lambda \Delta \pi_i) (1 - \pi_{i + 1} - \Delta \pi_{i + 1}) - \lambda_i \pi_i \Delta \pi_{i + 1} \ 1 \leq i < L \\ \Delta R = \lambda_L \Delta \pi_L\end{cases} \tag{49} \end{align*} \end{document}

All rates{λ_i } and occupation probabilities{π_i}are positive by definition, therefore if ΔR < 0, from eq. (49) we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta R = \lambda_L \Delta \pi_L \end{align*} \end{document}

resulting in Δπ_L < 0.

Using Δπ_L < 0, we can determine ΔR for i = L − 1, with the help of eq. (49), getting \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta R = \lambda_{L - 1} \Delta \pi_{L - 1} (1 - \pi_L - \Delta \pi_L) - \lambda_{L - 1} \pi_{L - 1} \Delta \pi_L \tag{50} \end{align*} \end{document}

If ΔR < 0 then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \lambda_{L - 1} \Delta \pi_{L - 1} (1 - \pi_L - \Delta \pi_L) - \lambda_{L - 1} \pi_{L - 1} \Delta \pi_L < 0 \tag{51} \end{align*} \end{document}

therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \lambda_{L - 1} \Delta \pi_{L - 1} (1 - \pi_L - \Delta \pi_L) < \lambda_{L - 1} \pi_{L - 1} \Delta \pi_L \tag{52} \end{align*} \end{document}

But

1. λ_L−1 > 0

2. (1 − π_L − Δ_πL) > 0

3. π_L−1 >; 0

4. Δπ_L < 0

therefore it can be derived from eq. (52) that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta \pi_{L - 1} < 0 \tag{53} \end{align*} \end{document}

When assuming ΔR < 0 we can show in the same manner that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \{\Delta \pi_j \} < 0, \quad 1 + i \,\leq \,j \,\leq\, L \tag{54} \end{align*} \end{document}

By checking eq. (45) for a general i we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta R = (\lambda_i \Delta \pi_i + \Delta \lambda \pi_i + \Delta \lambda \Delta \pi_i) (1 - \pi_{i + 1} - \Delta \pi_{i + 1}) - \lambda_i \pi_i \Delta \pi_{i + 1} \quad 1 \leq \,i < L \tag{55} \end{align*} \end{document}

Using the assumption ΔR < 0 we get from eq. (55) that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} (\lambda_i \Delta \pi_i + \Delta \lambda \pi_i + \Delta \lambda \Delta \pi_i) (1 - \pi_{i + 1} - \Delta \pi_{i + 1}) - \lambda_i \pi_i \Delta \pi_{i + 1} < 0 \tag{56} \end{align*} \end{document}

Again, this equation can be rewritten as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} (\lambda_i \Delta \pi_i + \Delta \lambda \pi_i + \Delta \lambda \Delta \pi_i) (1 - \pi_{i + 1} - \Delta \pi_{i + 1}) < \lambda_i \pi_i \Delta \pi_{i + 1} \tag{57} \end{align*} \end{document}

By using λ_i > 0, 0 < π_i < 1, Δπ_i+1 < 0 on the last inequality we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} (\lambda_i \Delta \pi_i + \Delta \lambda \pi_i + \Delta \lambda \Delta \pi_i) (1 - \pi_{i + 1} - \Delta \pi_{i + 1}) < 0 \tag{58} \end{align*} \end{document}

For 0 < π_i < 1, Δπ_i+1 < 0 the right factor holds (1 − π_i+1 − Δπ_i+1) > 0, therefore we can reduce the last inequality to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta \pi_i (\lambda_i + \Delta \lambda) + \Delta \lambda \pi_i < 0 \tag{59} \end{align*} \end{document}

However, Δλπ_i ≥ 0, (λ_i + Δλ) > 0 therefore we conclude that Δπ_i < 0.

Using this, we can show that for a change Δλ to λ_i, 1 ≤ i < L, if Δ R < 0 then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \{\Delta \pi_j \} < 0, 1 \,\leq \,j\, \leq \,L \tag{60} \end{align*} \end{document}

When the change Δλ > 0 is applied to λ_L, we get from eq. (49) that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \begin{cases} \Delta R = - \lambda \Delta \pi_1 \\ \Delta R = \lambda_j \Delta \pi_j (1 - \pi_{j + 1} - \Delta \pi_{j + 1}) - \lambda_j \pi_j \Delta \pi_{j + 1} \quad 1 \leq j < L \\ \Delta R = (\lambda_L + \Delta \lambda) \Delta \pi_L + \pi_L \Delta \lambda\end{cases} \tag{61} \end{align*} \end{document}

By checking equation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta R = (\lambda_n + \Delta \lambda) \Delta \pi_L + \pi_L \Delta \lambda \tag{62} \end{align*} \end{document}

under the assumption that ΔR < 0 we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} (\lambda_L + \Delta \lambda) \Delta \pi_L + \pi_L \Delta \lambda < 0 \tag{63} \end{align*} \end{document}

By definition λ_L > 0, Δλ > 0, π_L > 0, therefore Δπ_L < 0. Using this, we similarly can show that for a change Δλ > 0 applied to λ_L, the assumption ΔR < 0 derives \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \{\Delta \pi_j \} < 0, 1 \,\leq \,j \,\leq\, L \tag{64} \end{align*} \end{document}

Therefore we showed that for any positive increase Δλ > 0 to any λ_j, 1 ≤ j ≤ L, causes a decrease in protein production rate ΔR < 0, the occupation probabilities will also decrease, i.e.{Δπ_j} < 0.

However, by definition (see eq. (49)) \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta R = - \lambda \Delta \pi_1 \tag{65} \end{align*} \end{document}

For ΔR < 0 we showed that Δπ₁ < 0. Again, by definition λ > 0, therefore \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} - \lambda \Delta \pi_1 > 0 \tag{66} \end{align*} \end{document}

contradicting our assumption.

Therefore, we conclude that for any positive increase Δλ > 0, in any of the λ_j, 1 ≤ j ≤ L, ΔR cannot decrease.

Let us show that for Δλ > 0, the solution of eq. (45) must change. Let us define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \underline{\lambda}_1 = [ \lambda_1, \lambda_2, \ldots \lambda_L ]^T \tag{67} \end{align*} \end{document}

and let us define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \underline{\lambda}_2 = [ \lambda_1, \ldots, \lambda_j + \Delta \lambda, \ldots \lambda_L ]^T \tag{68} \end{align*} \end{document}

Let us denote the solution of eq. (45) given \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\lambda}_1$$ \end{document} as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\pi}_1$$ \end{document} and the solution to eq. (45) given \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\lambda}_2$$ \end{document} as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\pi}_2$$ \end{document} .

Claim

For the defined \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\lambda}_1, \underline{\lambda}_2$$ \end{document} , the solutions to eq. (45) \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\pi}_1, \underline{\pi}_2$$ \end{document} must be different, i.e. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\pi}_1 \neq \underline{\pi}_2$$ \end{document} .

Proof

Let us assume \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\pi}_1 = \underline{\pi}_2$$ \end{document} . Then from the eq. (45), we get that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \lambda_L \pi_{1L} = \lambda_L \pi_{2L} = R \tag{69} \end{align*} \end{document}

therefore the production rate R is identical in both cases. This means that the j^th equation in the equation set (45) fulfills \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \lambda_j \pi_{1, j} [ 1 - \pi_{1, j + 1} ] = (\lambda_j + \Delta \lambda) \pi_{2, j} [ 1 - \pi_{2, j + 1} ] = R \tag{70} \end{align*} \end{document}

Because \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\underline{\pi}_1 = \underline{\pi}_2$$ \end{document} , we get that the last relation is correct only if \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta \lambda = 0 \tag{71} \end{align*} \end{document}

contradicting the made assumption.

In the same manner we can also contradict the assumption when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \underline{\lambda}_2 = [ \lambda_1 + \Delta \lambda, \ldots, \lambda_j, \ldots \lambda_L ]^T \tag{72} \end{align*} \end{document}

or \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \underline{\lambda}_2 = [ \lambda_1, \ldots, \lambda_j, \ldots \lambda_L + \Delta \lambda ]^T \tag{73} \end{align*} \end{document}

Because {Δπ_j} is the steady state solution for eq. (45), for Δλ > 0 the occupation probabilities{Δπ_j} have to change, therefore any positive increase Δλ > 0 will positively increase the protein production rate R. ▪

Lemma 4

Adding a new codon the mRNA in the RFM model decreases the translation rate R.

Proof

See proof for DTASEP model.

B.5. The correctness of Lemma 5 and Lemma 6 for RFM

Using DTASEP, it was proven that given a gene with a finite set of codons \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{\lambda_j \}_{j = 1}^L,$$ \end{document} sorting its codons in descending order according to their translation times (or ascending according to their translation rates) achieves minimal ribosomal allocation (Lemma 5) and maximal translation rate per ribosome (Lemma 6) among all other codon permutations of this gene.

This claim seems to be correct for most codon permutations when validated with various simulations on the RFM model. However, some scenarios were found not to obey this rule. For example for λ = 10 and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{\lambda_j \}_{j = 1}^4 = [1, 5, 8, 10]$$ \end{document} , the sequence [1,5,10,8] achieves lower ribosomal density in comparison to the codon arrangement sorted by translation efficiency [1,5,8,10] (D = 1.3047 vs. D = 1.3071). To exclude that this difference results from precision inaccuracies caused by the differential toolbox used for solving eq. (44), the analytical solution of R was calculated using eq. (45), resulting in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} 1 - R / \lambda = \frac {R / \lambda_1} {1 - \frac {R / \lambda_2} {1 - \frac {R / \lambda_3} {1 - R / \lambda_4}}} \tag {74} \end{align*} \end{document}

as shown also by Reuveni et al. (2011).

This equation can be rewritten as as a cubic equation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \frac {R^3} {\lambda \lambda_2 \lambda_4} - R^2 \left(\frac {1} {\lambda} \left(\sum_{j = 2}^{4} \frac {1} {\lambda_i} \right) + \frac {1} {\lambda_1 \lambda_3} + \frac {1} {\lambda_1 \lambda_4} + \frac {1} {\lambda_2 \lambda_4} \right) + R \left(\frac {1} {\lambda} + \sum_{j = 1}^4 \frac {1} {\lambda_i} \right) - 1 = 0 \tag {75} \end{align*} \end{document}

and analytically solved, supporting the presented findings.

Another example contradicting Lemma 6 was found for λ = 10 and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{\lambda_j \}_{j = 1}^3 = [ 1, 2, 3 ]$$ \end{document} where the maximal ratio K is achieved by the permutation [1,3,2], resulting in K = 0.4082, while the ascending arrangement [1,3,2] achieves only K = 0.3933.

Footnotes

Acknowledgments

We would like to thank Elchanan Mossel for helpful discussions. T.T. was partially supported by a Koshland fellowship at Weizmann Institute of Science.

Disclosure Statement

No competing financial interests exist.

References

Abrahmsn

, Moks

, Nilsson

et al. 1986. Secretion of heterologous gene products to the culture medium of Escherichia coli. Nucleic Acids Res., 14:7487–500.

Alberts

, Johnson

, Lewis

et al. 2002. Molecular biology of the cell. Garland Science.

Arava

, Wang

, Storey

J.D.

et al. 2003. Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA, 100:3889–3894.

Burgess-Brown

N.A.

, Sharma

, Sobott

et al. 2008. Codon optimization can improve expression of human genes in Escherichia coli: a multi-gene study. Protein Express. Purif., 59:94–102.

Cannarozzi

, Schraudolph

N.N.

, Faty

et al. 2010. A role for codon order in translation dynamics. Cell, 141:355–367.

Comeron

J.M.

, Aguad

1998. An evaluation of measures of synonymous codon usage bias. J. Mol. Evol., 47:268–274.

Drummond

D.A.

, Wilke

C.O.

2009. The evolutionary consequences of erroneous protein synthesis. Nat. Rev. Genet., 10:715–724.

Duret

, Mouchiroud

1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA, 96:44824487.

Gingold

, Pilpel

2011. Determinants of translation efficiency and accuracy. Mol. Syst. Biol., 7:481.

10.

, Zhou

, Wilke

C.O.

2010. A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput. Biol., 6:e1000664.

11.

Gustafsson

, Govindarajan

, Minshull

2004. Codon bias and heterologous protein expression. Trends Biotechnol., 22:346–353.

12.

Heinrich

, Rapoport

T.A.

1980. Mathematical modelling of translation of mrna in eucaryotes; steady states, time-dependent processes and application to reticulocytest. J. Theor. Biol., 86:279–313.

13.

Ikemura

1982. Correlation between the abundance of yeast transfer rnas and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J. Mol. Biol., 158:573–597.

14.

Ingolia

N.T.

, Ghaemmaghami

, Newman

J.R.S.

et al. 2009. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science, 324:218–223.

15.

Kozak

1987. An analysis of 5′-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res., 15:8125–8148.

16.

Kudla

, Murray

A.W.

, Tollervey

et al. 2009. Coding-sequence determinants of gene expression in Escherichia coli. Science, 324:255–258.

17.

Kurt

, Michael

2010. How the sequence of a gene can tune its translation. Cell., 141:227–229.

18.

MacDonald

C.T.

, Gibbs

J.H.

, Pipkin

A.C.

1968. Kinetics of biopolymerization on nucleic acid templates. Biopolymers, 6:1–5.

19.

Moks

, Abrahmsen

, Holmgren

et al. Expression of human insulin-like growth factor i in bacteria: use of optimized gene fusion vectors to facilitate protein purification. Biochemistry, 26:5239–5244.

20.

Mueller

, Coleman

J.R.

, Papamichail

et al. 2010. Live attenuated influenza virus vaccines by computer–aided rational design. Nat. Biotechnol., 28:723–726.

21.

Newman

J.R.S.

, Ghaemmaghami

, Ihmels

et al. 2006. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441:840–846.

22.

Plotkin

J.B.

, Kudla

2010. Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet., 12:32–42.

23.

Raab

, Graf

, Notka

et al. 2010. The GeneOptimizer Algorithm: using a sliding window approach to cope with the vast sequence space in multiparameter DNA sequence optimization. Syst. Synth. Biol., 4:215–225.

24.

Reuveni

, Meilijson

, Kupiec

et al. 2011. Analysis of translation elongation with a ribosome flow mode. PLoS Comput. Biol., 7:e1002127.

25.

Romanos

A.M.

, Scorer

C.A.

, Clare

J.J.

1992. Foreign gene expression in yeast: a review. Yeast, 8:423–488.

26.

Sharp

P.M.

, Li

W.H.

1987. The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res., 15:1281–1295.

27.

Shaw

L.B.

, Zia

R.K.P.

, Lee

K.H.

2003. Totally asymmetric exclusion process with extended objects: a model for protein synthesis. Phys. Rev. E, 68:021910+.

28.

Taniguchi

, Choi

P.J.

, Li

et al. 2010. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329:533–538.

29.

Thomas

C.E.

, Ming-Qun

2011. Heterologous gene expression in E. Coli. Humana Press: Totowa, NJ.

30.

Tuller

, Carmi

, Vestsigian

et al. 2010a. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell, 141:344–354.

31.

Tuller

, Waldman

Y.Y.

, Kupiec

et al. 2010b. Translation efficiency is determined by both codon bias and folding energy. Proc. Natl. Acad. Sci. USA, 107:3645–3650.

32.

Tuller

, Veksler

, Gazit

et al. 2011. Composite effects of gene determinants on the translation speed and density of ribosomes. Genome Biol., 12:R110.

33.

Uemura

, Aitken

C.E.

, Korlach

et al. 2010. Real-time tRNA transit on single translating ribosomes at codon resolution. Nature, 464:1012–1017.

34.

Vogel

, de Sousa Abreu

, Ko

et al. 2010. Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol. Syst. Biol., 6:400.

35.

Welch

, Govindarajan

, Ness

J.E.

et al. 2009. Design parameters to control synthetic gene expression in Escherichia coli. PLoS ONE, 4:e7002+.

36.

Wenzel

S.C.

, Müller

2005. Recent developments towards the heteroluogous expression of complex bacterial natural product biosynthetic pathways. Curr. Opin. Biotechnol., 16:594–606.

37.

Zhang

, Goldman

, Zubay

1994. Clustering of low usage codons and ribosome movement. J. Theor. Biol., 170:339–354.