Abstract
Introduction
Innovation
In this study, we introduced the concept of thioredoxomes (i.e., sets of thiol oxidoreductases in organisms). We also developed computational tools for the identification of these proteins on a genome-wide scale. Using these tools, we characterized thioredoxomes in various organisms, including those of yeast and humans, which in turn revealed the composition and properties of thioredoxomes and, therefore, of thiol-based redox control in general.
Many thiol oxidoreductases have previously been identified and characterized; however, understanding of the overall sets of such proteins in organisms and of their combined and protein-specific functions is quite limited (11, 12). This is because these proteins are difficult to identify by available protein function prediction methods (5 –7, 10). Two thiol-based redox systems, thioredoxin and glutaredoxin systems, have been particularly well characterized (14, 15), with many thousands of publications on each of their components, including thioredoxin, thioredoxin reductase, glutathione reductase, glutaredoxin, glutathione peroxidase, and peroxiredoxin. Several additional systems that are based on thiol oxidoreductases include the prokaryotic disulfide bond formation (Dsb) system (16) and the protein disulfide isomerase-based system located in the endoplasmic reticulum (ER) of eukaryotes (4, 25). Methionine sulfoxide reductases, which are proteins that repair oxidatively damaged methionines, also received much attention (17, 18, 20), and several other thiol-based redox regulatory processes have also been described (12).
Catalytic redox-active Cys residues in thiol oxidoreductases are highly conserved, and many such proteins have homologs, in which Cys is replaced with selenocysteine (Sec) (12). In these homologous selenoproteins, Sec is also the catalytic redox-active residue. However, while Cys residues may have many functions, Sec is always a catalytic redox residue. Thus, identification of a Sec residue in the position corresponding to Cys in a protein indicates a redox function of this protein and points to the exact location of the catalytic redox-active Cys residue. Although selenoproteins are relatively rare, recent dramatic increase in sequence information derived from various genome and other sequencing projects allows efficient identification of both selenoproteins and the corresponding Cys-containing thiol oxidoreductases.
In addition, many thiol oxidoreductases form protein complexes, in which reducing equivalents are transferred from one thiol oxidoreductase to another, both containing conserved catalytic Cys residues. If one such protein is a known thiol oxidoreductase, establishing a functional link between the two proteins may identify the second protein as an additional thiol oxidoreductase. For example, many proteins that form thiol oxidoreductase pathways transfer reducing equivalents from one protein to another to show fusion events. These proteins also show similar transcriptional regulation, and in bacteria, they are often located in the same operons. These considerations suggest a second approach to identify thiol oxidoreductases. Thiol oxidoreductases can also be identified by extensive homology analyses, as many of these proteins cluster within several protein folds, but diverge to the extent that standard BLAST analyses do not detect homology (12). Thus, exhaustive homology analyses within the sequence universe offer a third independent approach to detect thiol oxidoreductases.
In this work, we used these three strategies to carry out genome-wide searches for thiol oxidoreductases and further characterized the overall functions of thioredoxomes in organisms. These data illustrate the universal use of thiol oxidoreductases in cellular life and show the essential and widespread functions of thiol-based redox control in regulation of cellular processes.
Results
Genome-wide identification of thiol oxidoreductases
Thiol oxidoreductases are involved in a variety of biological functions (Fig. 1) and are characterized by different folds, protein length, and location of their catalytic redox-active Cys residues (12, 21). Various classes of thiol oxidoreductases show no similarity with regard to patterns of occurrence in organisms. To examine the use of thiol oxidoreduction in organisms across the three domains of life, we sought to identify these enzymes on a genome-wide scale. We used three independent methods: (i) identification of Cys/Sec pairs in homologous sequences; (ii) comparative genomics approach that functionally links known and unknown thiol oxidoreductases through domain fusions, and (iii) exhaustive homology searches starting from known thiol oxidoreductases.

The first method (Fig. 2A) is based on the observation that the majority of known thiol oxidoreductases have homologs, in which the catalytic redox Cys is replaced with Sec. Whereas only some Cys residues serve thiol oxidoreductase functions in proteins (11), Sec always serves redox function. Thus, identification of a pair of homologs, wherein one is a Sec-containing protein and another a Cys-containing protein, suggests a catalytic redox function for the Cys that aligns with Sec and is flanked by homologous regions. This method is relatively simple, efficient, and is independent of protein family, protein structure, and organism of origin (12). To identify thiol oxidoreductases on a genome-wide scale, we used Cys-containing protein sequences from the non-redundant NCBI database. We also utilized a nucleotide sequence database that included all completely sequenced genomes, expressed sequence tags (ESTs), and environmental genome projects from NCBI. These datasets were then cross-analyzed, with TBLASTN, to identify Cys/Sec pairs.

The second method (Figs. 2B–2D) took advantage of the observation that thiol oxidoreductases often form functional complexes/modules, wherein one thiol oxidoreductase acts on another. For example, thioredoxin reductase provides reducing equivalents to thioredoxin, which in turn reduces peroxiredoxin. All these proteins are thiol oxidoreductases containing catalytic redox-active Cys residues. These linkages can be captured by comparative genomics methods (i.e., these proteins often form fusions or are located in the same operon in prokaryotes). Continuing with the example shown above, both thioredoxin reductase-thioredoxin and thioredoxin-peroxiredoxin fusion proteins are known, and, sometimes, these proteins cluster in operons. Thus, to identify novel thiol oxidoreductases, we employed known thiol oxidoreductases, searched for domains fused with these proteins in representative organisms, and further filtered out the set to select the fused domains that contained conserved Cys residues.
For comparison, we developed a third set of predicted thiol oxidoreductases by carrying out exhaustive PSI-BLAST searches starting with known thiol oxidoreductases. By this method, distant homologs of these proteins could be identified that conserve the catalytic redox Cys. The three approaches were then applied to various sets of open reading frames (ORFs), proteomes of model organisms, and other protein datasets (Fig. 3A). All three approaches were efficient in detecting thiol oxidoreductases. For example, the Cys/Sec method detected 27,701, domain fusion 17,799, and PSI-BLAST 20,367 proteins in 803 completely sequenced bacterial genomes (Fig. 3B). In addition, these sets of proteins were highly overlapping, with the domain fusion and PSI-BLAST protein sets being essentially a subset of the Cys/Sec set. Since the PSI-BLAST set included all known thiol oxidoreductases, the Cys/Sec approach was clearly highly efficient in identifying these proteins: it found essentially all such proteins (and also predicted additional candidate thiol oxidoreductases). Based on this information, we defined sets of thiol oxidoreductases in organisms as those included in either Cys/Sec or PSI-BLAST sets. We further designate these sets as thioredoxomes. It should be noted that, in this work, we did not analyze noncatalytic redox Cys, which are subject to posttranslational modification, such as glutathionylation and S-nitrosylation.

Properties of thioredoxomes
The basic properties of thioredoxomes were further examined. First, we observed a consistent increase in the number of thiol oxidoreductases with the increase in the proteome size. The dependence was approximately linear (Fig. 4A). This pattern contrasted with the changes in the abundance of basic metabolic enzymes, such as glycolytic enzymes (Fig. 4A), whose numbers only slightly increased in organisms with the larger proteomes. This observation held for thiol oxidoreductases identified with any of the three methods (Fig. 4B). The thioredoxin fold dominated thioredoxomes, accounting for approximately 50% of thiol oxidoreductases in organisms (Supplementary Fig. S1A; Supplementary data are available online at

We also found a strong correlation between the number of thiol oxidoreductases and the number of catalytic redox-active Cys residues (Supplementary Fig. S1B). The latter was slightly larger: the majority of thiol oxidoreductases had a single catalytic Cys, but some of these enzymes were composed of two or more thiol oxidoreductase domains, each having the catalytic redox Cys. The largest number of catalytic redox-active Cys in a thiol oxidoreductase was four (in protein disulfide isomerase). We observed a correlation with the proteome size for both thioredoxin-fold and non-thioredoxin-fold thiol oxidoreductases (Supplementary Figs. S1C and S1D), suggesting an even distribution of these protein groups in organisms.
Analysis of thioredoxomes of various bacterial phyla revealed a highly variable occurrence of thiol oxidoreductases (Fig. 4C). This distribution correlated with the genome/proteome size rather than with being a part of a particular bacterial group or phylum. These properties of prokaryotic thioredoxomes were also evident in eukaryotes. Eukaryotes had larger thioredoxomes, but only because their proteomes were larger. When prokaryotic and eukaryotic thioredoxomes were normalized to the proteome size or compared in organisms with equivalent proteome size, the prokaryotic thioredoxomes were actually slightly larger (Fig. 4D).
The smallest thioredoxomes corresponded to the archaeon Nanoarchaeum equitans and seven bacteria from the Borrelia taxon (B. afzelii, B. burgdorferi, B. burgdorferi, B. duttonii, B. garinii, B. recurrentis, and B. turicatae). N. equitans has three thiol oxidoreductases: thioredoxin reductase, thioredoxin and peroxiredoxin (Fig. 5A), whereas Borrelias have thioredoxin reductase, thioredoxin, and coenzyme A disulfide reductase. Interestingly, B. hermsii has an additional glutathione peroxidase. The largest detected thioredoxome, that of Arabidopsis thaliana, consisted of 194 proteins (Fig. 5B).

An NADPH-dependent thioredoxin system, consisting of thioredoxin reductase, thioredoxin, and peroxiredoxin, is the main system that controls the redox state of Cys in proteins, and we found that it was present in essentially all analyzed organisms. However, host-associated Phytoplasma asteris Onion Yellows (an organism with 1021 ORFs) lacked the thioredoxin system. Yet, this organism still had four thiol oxidoreductases: arsenate reductase, rhodanese-like protein, lipoamide dehydrogenase, and hydroperoxide reductase OsmC. Further analysis of bacterial habitats shown that host-associated organisms had fewer thiol oxidoreductases (Supplementary Fig. S2). We hypothesize that phytoplasma Onion Yellows and certain other host-associated organisms may access thiol oxidoreductases of the host for their metabolic/regulatory needs. In contrast to host-associated organisms, terrestrial organisms had greater numbers of thiol oxidoreductases (Supplementary Fig. S2).
Representative thioredoxomes
Next, we examined thioredoxomes of model organisms. The S. cerevisiae set of thiol oxidoreductases had 47 proteins covering 31 conserved domains, including 10 that previously have not been characterized with regard to thiol oxidoreductase function. It would be of interest to characterize these candidate thiol oxidoreductases. Subdivided by protein family, the largest families of thiol oxidoreductases in Saccharomyces cerevisiae were glutaredoxin-like proteins (five monothiol and three dithiol glutaredoxins), protein disulfide isomerases (six proteins), peroxiredoxins (five proteins), ubiquitin-activating-like proteins (three proteins), thioredoxins (three proteins), and sulfhydril oxidases and methionine sulfoxide reductases (Fig. 6). The majority of yeast thiol oxidoreductases localized to cytosol (20 proteins) and the ER (10 proteins).

The human thioredoxome consisted of 111 proteins containing catalytic redox-active Cys and 25 proteins containing Sec, for a total of 136 proteins (Fig. 7A). The largest protein families were thioredoxin-like (19 proteins) and protein disulfide isomerase-like proteins (12 proteins) (Fig. 7B). As in yeast, the majority of these proteins localized to the cytosol and the ER (42 and 34 proteins, respectively), whereas there was lower abundance of mitochondrial thiol oxidoreductases (based on the presence of a predicted signal peptide and available localization data) (Fig. 7C).

Roles of environmental factors in influencing thioredoxome composition
We examined a possible influence of environmental factors, such as growth temperature and oxygen content, on the size of thioredoxomes. Thermophilic and hyperthermophilic showed smaller thioredoxomes in comparison with psycophilic organisms (Supplementary Fig. S3). Oxygen content also did not influence thioredoxomes significantly, with the exception of the microaerobic group, which had fewer thiol oxidoreductases (Supplementary Fig. S4). It should be noted that our observations do not exclude a possible role of environmental factors in regulating thiol oxidoreductase gene expression.
Distribution of thiol oxidoreductase families in bacterial phyla. We analyzed the distribution of thiol oxidoreductase families in various bacterial phyla and found that common thioredoxin-fold proteins, such as thioredoxins and peroxiredoxins, were present in almost all searched organisms. However, other thiol oxidoreductases, such as DsbB, GST, DsrE, and HesB, had scattered patterns of occurrence. This observation suggested that some thioredoxin-fold oxidoreductases formed a core of thioredoxomes while other thiol oxidoreductases had more specialized functions.
Database and computational resources
As a result of our work, several web tools were developed (located at two mirror sites:
Discussion
Thiol oxidoreductases form a group of functionally-related proteins that utilize redox chemistry of their catalytic Cys for redox regulation of cellular processes. These proteins are represented by several protein families and folds (2, 3, 11 –13, 22, 24, 27) and the extent of their utilization in biology has been unclear. Thiol oxidoreductases are also important because they could serve as a tool to understand organization and mechanisms of thiol-based redox control. Nearly every cellular process is now known to be regulated by the redox status of catalytic Cys residues in proteins, as well as by post-translational modifications, such as S-nitrosylation, glutathionylation, and disulfide bonding. However, Cys residues in proteins have many functions, such as metal coordination, structural stabilization, membrane targeting, post-translational modifications, and redox and non-redox catalysis (11). Thus, sequence- and structure similarity-based algorithms alone cannot be used for the identification of thiol oxidoreductases and characterization of sets of these proteins in organisms (5 –7, 9, 10).
In this work, we defined the overall use of thiol oxidoreductases by organisms in the three domains of life. Much of the previous work focused on the characterization of known thiol oxidoreductases, especially proteins of the thioredoxin fold, such as thioredoxins, glutaredoxins, protein disulfide isomerases, peroxiredoxins, and glutathione peroxidases. However, how many such proteins as well as other (especially nonthioredoxin fold) thiol oxidoreductases are present in organisms was not clear. We applied three independent methods for genome-wide identification of thiol oxidoreductases. The most efficient approach was the one that identified Sec/Cys pairs flanked by conserved sequences (12, 19). This method is based on the observation that Sec residues in proteins exclusively serve redox functions and that these redox Sec are replaceable with Cys only in the catalytic sites of thiol oxidoreductases. Therefore, identification of Sec/Cys pairs indicates the identity of redox-active catalytic Cys residues in proteins. This approach had the best sensitivity of all tested approaches and detected nearly all known thiol oxidoreductase families. Two other methods analyzed fusions of thiol oxidoreductases to other domains containing conserved Cys residues and carried out exhaustive searches for distant homologs of known thiol oxidoreductases. These two methods were somewhat less sensitive than the Sec/Cys approach; however, their selectivity was still sufficient to use them as alternative methods for thiol oxidoreductase prediction (Fig. 4B). A combination of the three methods then offered an opportunity to describe the sets of thiol oxidoreductases in organisms.
With this approach, we showed that thiol oxidoreductases are present in all living organisms (for which genome sequences are available) and generally account for 0.5%–1% of the proteome, establishing thioredoxomes as essential and unexpectedly abundant enzymatic systems in cells. We observed that host-associated organisms have smaller thioredoxomes, but surprisingly, we did not find strong influence of environmental factors. However, environmental factors could regulate gene expression of thiol oxidoreductases and thus influence their function. An interesting observation was that the size of thioredoxomes correlated with the proteome size. Such correlation was observed in all branches of life, and it was especially pronounced in prokaryotes.
Many thiol oxidoreductases are strategically located in between electron donors (e.g., NADPH) and acceptors (e.g., hydrogen peroxide, oxygen, reactive nitrogen species) and establish a flow of reducing equivalents that is linked to diverse cellular processes, such as antioxidant defense, repair of oxidative protein modifications, protein folding, DNA synthesis and repair, and other processes. A correlation of the number of proteins in a system with the proteome size is not unique for thiol oxidoreductases. Similar trends were observed for signaling proteins, such as kinases, phosphatases, and transcription factors (1, 23, 26). It appears that the extent of the use of such proteins supports organismal complexity. The role of thiol oxidoreductases in regulation and signaling is an interesting, yet poorly studied area of redox biology. Known signaling functions are limited to a group of thioredoxin-fold proteins, such as thioredoxins, glutaredoxins, and thiol peroxidases. The observed correlation between proteomes and thioredoxomes suggests a general role of thioredoxomes in biology, which provide the backbone of thiol interaction and redox regulation in the cell.
The sets of thiol oxidoreductases also illustrate the complexity of thiol-based redox control in cells and organisms and provide tools for the analyses of these processes. We believe it would be important to further examine the combined function of thiol oxidoreductases (i.e., the thioredoxome). In this regard, this research area could benefit from the previous studies that characterized systems, such as kinomes and phosphoproteomes, in cells. Currently, identification of redox-regulated Cys residues in proteins is difficult and there are no known sequence- or structure-based motifs that could describe these residues. Therefore, an ability to efficiently identify proteins that act on these redox-regulated Cys should be of great value, and these proteins could be used as tools for comparative genomic analyses of thiol-based redox regulation. From our work, it is already clear that since thiol oxidoreductases are present in all living organisms, the thiol-based redox control is a widespread, basic, essential process in all life forms.
Materials and Methods
Thiol oxidoreductases were identified by three independent methods. First, these proteins were found through sequence similarity to selenoproteins and their catalytic redox-active Cys were defined as the amino acid corresponding in sequence alignments to Sec. In this method, Cys-containing proteins from NCBI non-redundant database and completely sequenced genomes were searched against a local non-redundant dataset of known selenoproteins and their Cys-containing homologs, which was separately compiled based on our previous research, literature searches, and additional homology analyses that identified distant homologs of known selenoproteins. A standalone TBLASTN program was used with the expectation value of 1e-3. Proteins with Cys corresponding to Sec in multiple alignments were considered as thiol oxidoreductases.
Second, a collection of protein domains that are fused to known thiol oxidoreductases was generated using STRING (
Third, all predicted ORFs from NCBI were searched against a database of known thiol oxidoreductases using NCBI BLAST program with the expectation value of 1e-3 to identify distant homologs of thiol oxidoreductases. These analyses were supplemented with exhaustive PSI BLAST searches starting involving known thiol oxidoreductases.
Thiol oxidoreductases identified in these searches were analyzed separately and were also used to create a combined database of thiol oxidoreductases. In addition, associated web tools were prepared, including the Sec/Cys pair method, a tool to search for homologs of known thiol oxidoreductases, a tool to search for thiol oxidoreductase domain fusion events, and a method combining all three search strategies. These search tools are located at two mirror sites:
Footnotes
Acknowledgments
This research was completed in part utilizing the PrairieFire Beowulf cluster from Research Computing Facility of the University of Nebraska–Lincoln. Supported by NIH Grant GM065204.
Author Disclosure Statement
The authors declare no competing interests.
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
