Abstract

T
The last comprehensive treatment of the problem of the protein-coding alphabet was the Weber and Miller study of 1981 (Weber and Miller, 1981), which gave a case-by-case account of why each of the 20 might be a useful addition to proteins and why other amino acids found by prebiotic synthesis in Urey-Miller experiments or on the Murchison meteorite might be absent. Since then, although considerable work has been done on the order of addition of amino acids to the genetic code (Trifonov, 2000; Di Giulio and Amato, 2009; Higgs, 2009), relatively little work has focused on understanding the structure of the protein-coding alphabet.
Many characteristics of amino acids have been frequently noted as inherently important for protein structure formation; these include charge, hydrophobicity, side chain volume, and propensity to appear in specific secondary structures. Similarly, many factors might be important for metabolic/biological reasons, such as the availability or complexity of synthesis (including prebiotic synthesis). Studies of amino acid properties that avoid the problems inherent in using exchange of amino acids over evolutionary time in modern proteins (Di Giulio, 2001) support hydrophobicity, size, and charge as important features of how the amino acids differ from one another (Atchley et al., 2005; Yampolsky and Stoltzfus, 2005; Stoltzfus and Yampolsky, 2007). These features have also been extensively used in studies of the optimality of the genetic code itself (see Knight et al., 1999, for review).
In this paper (Philip and Freeland, 2011), Philip and Freeland go beyond Lu and Freeland's earlier work (Lu and Freeland, 2006, 2008), and that of other quantitative models of the genetic code (Ardell and Sella, 2002; Sella and Ardell, 2002), by defining a metric for measuring the diversity of a pool of amino acids that takes into account not only the range of a parameter of interest (e.g., hydrophobicity) but its evenness across that range. This measure, the “coverage” of an amino acid pool, is more biologically and chemically meaningful because it scores most highly the pool of amino acids that maximizes heterogeneity with respect to the relevant characteristic.
Using this approach, Philip and Freeland show that evolving genetic codes tend to sample the pool of amino acids that are actually found in the 20 far more than chance would predict. Specifically, they simulated 1 million genetic codes, drawing 8 of the 50 plausible prebiotic candidates (or from a larger pool of 76 candidates including biosynthetically but not prebiotically plausible amino acids), and found that the 20 had both greater range and greater evenness than randomly chosen sets for size, charge, and hydrophobicity, or any combination of those features. The results strongly suggest that the 20 are very special compared to other sets of 20 amino acids that might have arisen. They do not, however, appear to be the best of all possible sets of amino acids on these measures; as with the codon assignments of the genetic code itself, they appear to be optimized rather than optimal (Freeland et al., 2000). In other words, although the vast majority of alternative choices are worse, there are better ones to be found when the space is searched extensively.
This paper provides an example of how computational approaches are rapidly adding new dimensions of testability to “origin of life” questions. Such approaches work best as companions to bench experiments but act as a sketch for where best to focus our time and money to fill in the details. As computational chemistry techniques continue to improve, especially in concert with our synthetic biology capabilities, it seems likely that we will soon be able to build novel organisms that will provide a view of ancestral life. Similarly, ancient environments have already been inferred from ancestral state reconstructions of the proteins they contained (Gaucher et al., 2003). The prospects for making fundamental discoveries about previously inaccessible evolutionary pathways thus seem increasingly bright.
Footnotes
Acknowledgments
We would like to thank Dan Knights, Justin Kuczynski, and Laura Wegener Parfrey for useful discussion of this manuscript.
