Bioinformatics of Selenoproteins

Abstract

Significance:

Bioinformatics has brought important insights into the field of selenium research. The progress made in the development of computational tools in the last two decades, coordinated with growing genome resources, provided new opportunities to study selenoproteins. The present review discusses existing tools for selenoprotein gene finding and other bioinformatic approaches to study the biology of selenium.

Recent Advances:

The availability of complete selenoproteomes allowed assessing a global distribution of the use of selenocysteine (Sec) across the tree of life, as well as studying the evolution of selenoproteins and their biosynthetic pathway. Beyond gene identification and characterization, human genetic variants in selenoprotein genes were used to examine adaptations to selenium levels in diverse human populations and to estimate selective constraints against gene loss.

Critical Issues:

The synthesis of selenoproteins is essential for development in mice. In humans, several mutations in selenoprotein genes have been linked to rare congenital disorders. And yet, the mechanism of Sec insertion and the regulation of selenoprotein synthesis in mammalian cells are not completely understood.

Future Directions:

Omics technologies offer new possibilities to study selenoproteins and mechanisms of Sec incorporation in cells, tissues, and organisms.

Introduction

Selenium is an essential trace element present in selenoproteins in the form of the amino acid selenocysteine (Sec). The human genome encodes 25 selenoproteins (51), which play important roles in health and disease [reviewed in references (40, 54)]. Most selenoproteins are enzymes with Sec located at the active site and perform redox reactions in diverse cellular functions, including removal of hydroperoxides, reduction of thioredoxin, repair of oxidized methionines, and metabolism of thyroid hormones, among others.

Sec is encoded by the opal stop codon UGA, which is recoded through the concerted action of a dedicated tRNA and protein machinery, triggered in selenoprotein mRNAs by the presence of specific RNA motifs. Selenoproteins are found scattered in eukaryotes, bacteria, and archaea, with important differences in the mechanisms of Sec insertion in the three domains (Fig. 1).

FIG. 1.

Pathways of Sec biosynthesis (A) and Sec incorporation in response to a UGA codon in (B) bacteria, (C) archaea, and (D) eukaryotes. Sec is synthesized from serine in a single step in bacteria by SELA [(A); top path], and in a two-step conversion by PSTK and SEPSECS in eukaryotes and archaea [(A); bottom path]. The selenium donor is Se-P, provided by selenophosphate synthetase SEPHS2 (called SELD in prokaryotes). The location and structure of the SECIS motif differ between the three domains. SELB/aSELB/EEFSEC, Sec-specific elongation factor. SECISBP2, eukaryotic SECIS-binding protein. The SECIS-binding factor is not known in archaea. Sec, selenocysteine; SECIS, selenocysteine-insertion sequence; Se-P, selenophosphate.

This review discusses the bioinformatic approaches used to address the challenges of selenoprotein annotation in genomes, the most relevant computational tools and resources in selenium genomics, and the current knowledge in the evolution and distribution of selenoproteins across the tree of life.

Preface: the discovery and characterization of Sec

Following the identification of selenium as an integral constituent of some proteins in 1973 (3, 27, 86, 96), a series of studies elucidated the molecular and genetic basis for the synthesis of the newly discovered selenoproteins. It was established that selenium was present in selenoproteins in the form of Sec (22, 49), and that the Sec residue was encoded by an in-frame TGA codon (UGA in mRNA) (45, 112). A Sec-inserting tRNA (tRNA-Sec) was identified as the SelC gene in Escherichia coli (56) and animals (55). Thus, Sec was recognized as an expansion to the genetic code, through which the nonsense codon UGA is recoded as a sense codon to specify Sec insertion. The letter “U” was added into the one letter amino acid code to designate Sec (9).

The sufficient Sec recoding signal was identified as a cis-acting RNA secondary structure present in selenoprotein mRNAs, which was termed Sec-insertion sequence (SECIS) (8, 98, 113) (Fig. 2). Besides tRNA-Sec, additional trans-acting protein factors required for Sec insertion were then identified, including a Sec-specific elongation factor (EEFSEC) and an SECIS-binding protein (SECISBP2), revealing lineage-specific differences between the bacterial, archaeal, and eukaryotic Sec systems (54). Sec is the most widespread expansion to the genetic code in nature, and UGA is the only codon with ambiguous meaning in all domains of life. For this reason, selenoproteins are commonly mispredicted by standard gene finding programs, and misannotated in genome annotation projects and protein databases.

FIG. 2.

Consensus structures and conserved sequence motifs in SECIS elements in the three domains.

Selenoprotein Gene Annotation

Selenoproteins are usually mispredicted in nonmodel organisms because standard gene annotation programs consider UGA only as stop signal, whereas correct identification of Sec-UGA codons requires additional curation steps. Thus, selenoprotein sequences are commonly truncated in protein databases. Typical misannotations include Sec-UGA considered stop, the Sec-UGA-containing exon being skipped, or coding sequence (CDS) starting downstream of Sec-UGA (Fig. 3D). Errors in selenoprotein annotation are propagated as protein sequences from databases are used to annotate new genomes.

FIG. 3.

Selenoprotein gene annotation in eukaryotic nucleotide sequences. (A) Sequence homology approach: a previously known selenoprotein sequence is used to identify the coding exons of a selenoprotein gene. The Sec residue in the query aligns with a TGA in the target nucleotide sequence, with sequence conservation on both sides of the TGA. (B) A complete eukaryotic selenoprotein gene model, consisting of a TGA-containing ORF and a properly located SECIS element in the 3′UTR. (C) Ab initio approach: the coding exons of the gene are identified by their coding potential, combining triplet periodicity and predicted splice sites. Good triplet periodicity is found past the Sec-TGA. (D) Common misannotations generated by standard gene prediction programs, where TGA is considered only as stop and excluded from the coding sequence, resulting in a truncated protein sequence. (E) Alternatively, the sequence upstream of a candidate SECIS can be used to search a protein database to find TGA/Cys alignments and conservations on both sites. Cys, cysteine; ORF, open reading frame; UTR, untranslated region. Color images are available online.

The main features to support the correct identification and annotation of selenoprotein candidates are (i) the presence of a properly located SECIS; (ii) identification of selenoprotein homologues (Sec/Sec alignment); (iii) identification of cysteine (Cys)-containing homologues (Sec/Cys alignment); and (iv) sequence signatures of protein coding potential both at the 5′ and 3′ of UGA. Finding selenoproteins encoded in genomic sequences can be divided into two conceptually different problems: finding known selenoproteins and finding novel selenoproteins.

Gene finding for known selenoproteins

Selenoprotein genes can be identified by their homology to sequences of previously characterized selenoproteins (Fig. 3A). Briefly, a “query” Sec-containing amino acid sequence is used to scan a “target” DNA or RNA sequence assembly (e.g., genome, transcriptome, or metagenome) using Tblastn (2) or analogous approaches. Tblastn translates the nucleotide sequence into all six frames and finds high-scoring regions of amino acid sequence similarity between query and target. Hits that align the Sec residue in the query with an in-frame TGA in the target are potential indicators of a selenoprotein gene. Typically, SECIS elements are then searched at an appropriate distance from candidates (SECIS prediction is summarized below). Homology searches allow identifying not only selenoproteins but also additional standard genes in the same protein families. Typically, these homologues carry Cys in place of Sec at the active site, so we refer to them as Cys-homologues.

Gene finding for novel selenoproteins

The strategies used for identification of novel selenoproteins (i.e., those without any Sec-containing homologue) rely on the identification of SECIS elements. These cis-acting elements differ among the domains of life in location (3′ untranslated region [UTR] in eukaryotes, archaea; flanking Sec-UGA in bacteria) (Fig. 1), as well as RNA structure and identity (Fig. 2), so that every domain requires dedicated adjustments of the general approach. Briefly, a DNA or an RNA sequence assembly is scanned to find SECIS candidates with dedicated methods (detailed later). The corresponding region of each SECIS is analyzed for the occurrence of TGA-containing open reading frames (ORFs), and then evaluated in terms of their likelihood to be coding for a selenoprotein. In the field of gene prediction, two main principles exist to assess the “coding potential” of a sequence: (i) ab initio gene finding and (ii) homology-based matching.

In ab initio gene prediction (Fig. 3C), coding regions are distinguished from noncoding regions by virtue of intrinsic sequence features, such as nucleotide composition, codon usage, and triplet periodicity. The conserved patterns of the splice junctions are also used to build complete gene models, including intron/exon structures. Ab initio prediction is used for de novo identification of selenoproteins, searching for such protein coding signature extending past a TGA codon with a predicted SECIS at the appropriate location (15, 46).

In homology-based matching, nucleotide sequences nearby SECIS elements are translated in six frames and aligned to a database of proteins. This procedure is analogous to the detection of known selenoprotein outlined above, in which a TGA is aligned to Sec residue of an annotated protein. For novel selenoproteins, however, matches between a TGA codon and a Cys residue of an annotated protein are considered. The assumption is that Cys-homologues already exist annotated in protein databases for selenoproteins, which are not discovered yet. Since almost all selenoproteins have Cys-homologues, this strategy allows discovering novel selenoproteins.

Sec machinery factors

The biosynthesis and cotranslational insertion of Sec require the action of a dedicated genetic machinery. While essential for human and other vertebrates, Sec is not ubiquitous to life. The identification of the Sec machinery in the genomes of different species has been used to perform large-scale surveys of the distribution of Sec usage across the tree of life (55, 73, 79, 82, 88). Besides the cis-acting SECIS element, the Sec pathway in eukaryotes requires six obligate trans-acting factors: tRNA-Sec (Sec-tRNA[Ser]Sec) charged with Sec (100), selenophosphate synthetase SEPHS2 (99), l-seryl-tRNA(Sec) kinase PSTK (12), Sec synthase SEPSECS (101), Sec-specific elongation factor EEFSEC (95), and SECIS-binding protein SECISBP2 (24). Additional protein cofactors are believed to be involved in UGA recoding (1, 10, 20), but they are not specific for Sec and their roles are not well understood (65).

The biosynthesis of Sec occurs on its cognate tRNA, whose anticodon UCA is complementary to UGA. tRNA-Sec is initially aminoacylated with serine (Ser) by seryl-tRNA synthetase, which provides the backbone for Sec synthesis. In bacteria, the pyridoxal phosphate-dependent protein Sec synthase (SELA) converts Ser to Sec. In eukaryotes and archaea, a two-step reaction occurs, in which Ser is first phosphorylated by the kinase PSTK, producing phosphoserine, which is then converted to Sec by the Sec synthase SEPSECS (100) (phylogenetically distinct from bacterial SELA). The active selenium donor selenophosphate is synthesized from selenide by selenophosphate synthetase SEPHS2 (SELD in prokaryotes) (34, 64).

Cotranslational incorporation of Sec is promoted by the SECIS secondary structure and requires the Sec-specific elongation factor EEFSEC in eukaryotes and SELB in prokaryotes. In bacteria, SELB also functions as an SECIS-binding protein, whereas, in eukaryotes, SECISBP2 binds the SECIS (EEFSEC has no SECIS-binding activity). The Sec pathway in archaea is analogous to eukaryotes, but notably there is no SECISBP2 and its function is still not assigned in this lineage (84). Identification of Sec machinery in all domains is typically performed by homology matching. Specialized tools have been developed for this task, described later.

The Rise of Selenium Genomics

Selenium genomics of eukaryotes

The investigation of selenoprotein genes in genomes was kick-started by the development of programs to detect SECIS elements in sequences. The first computational tool for eukaryotic SECIS finding was SECISearch (53). The program was used to predict SECIS elements in transcript sequences, which were then analyzed to find corresponding TGA-containing ORFs with the selenoprotein-coding potential. An analogous approach was carried out (57). These two studies led to the first three novel selenoproteins identified through computational means, and Sec incorporation into these proteins was confirmed by labeling the cells with radioactive selenium. These three proteins are currently known as MSRB1, SELENOT, and SELENON (33).

The first complete selenoproteome was characterized 2 years later in the then newly sequenced genome of Drosophila melanogaster (15, 75). One study used an ab initio strategy, in which TGA-containing ORFs near predicted SECIS elements were assessed for coding potential using the program geneid (37). Geneid was modified specifically for this task to allow in-frame TGA in ORFs. Another study used a similar approach focusing on the prediction of SECIS elements followed by the identification of selenoprotein ORFs (75). This resulted in the identification of three selenoproteins in D. melanogaster, two of which were novel.

A major advance in the field was the characterization of the complete human selenoproteome (51). This study exploited the sequence conservation of SECIS elements in orthologous selenoprotein genes between human, rat, and mouse, to improve the specificity of SECIS genomic searches. Twenty-five human selenoproteins were identified, seven of which were novel.

Later studies also used comparative approaches to discover additional selenoproteins in other organisms. For example, the selenoprotein U (SELENOU), found in fish and present as Cys-homologue in mammals, was found by comparing TGA-containing ORF predictions from the puffer fish Fugu and human (16). The limitations of the comparative approaches, which require Sec- or Cys-containing sequence homologues, were highlighted by the ab initio discovery of selenoprotein J (SELENOJ) in another puffer fish, which has no homologue in mammals (14). Since then, many more selenoproteomes have been described using computational approaches (5, 47, 48, 60, 61, 71, 93, 111).

Selenium genomics of prokaryotes

Naturally, analogous computational approaches were applied to bacteria and archaea, accounting for the differences in the domain-specific SECIS elements (Fig. 1) (50). The bacterial SECIS (bSECIS) is located immediately downstream of the UGA, within the CDS (7), and its function is constrained by the stem-loop structure and the distance to the UGA codon (21). The conserved structural features among known bSECIS elements were used to build a consensus model, which allowed developing the program bSECISearch to identify selenoproteins in bacterial genomes (105). The program scans a nucleotide sequence and examines the occurrence of potential bSECIS downstream of each UGA triplet. SECIS-independent approaches based on homology matching with Cys-containing proteins have also been applied to bacterial genomes and environmental samples, yielding many novel selenoproteins (104, 106).

More recently, a novel selenoprotein named DUF466, with a predicted C-terminal Sec residue, was identified in Helicobacter pylori (25), and analysis on bacterial genomes with tRNA-Sec but without known selenoproteins revealed putative Sec-containing “redox-active disulfide 2” genes in Brachyspira bacteria (88). As of today, more than 50 bacterial selenoproteins are known (103). It is estimated that only 20%–25% of bacteria use Sec, with a scattered distribution across different phyla (73, 79).

Archaeal selenoprotein genes possess, unlike bacteria, an SECIS element located in the 3′ UTR (98). A SECIS-based approach was developed to identify selenoproteins in archaeal genomes (52). The number of archaeal genomes that contain selenoproteins is limited. Until recently, they were confined to Methanococcales and Methanopyrales (85). Then, analyses of uncultivated metagenomes from the newly discovered Asgard archaea revealed selenoproteins in Lokiarchaeota (70) and Thorarchaeota (59).

Computational Tools for Selenoprotein Gene Prediction

The fast pace of genome sequencing in the last decade has rendered impractical the manual identification of selenoproteins. Thanks to the following computational tools dedicated to selenium genomics, a large number of genomes can be analyzed nowadays with little or no manual intervention.

Selenoprofiles (68) is a homology-based pipeline for prediction of known selenoproteins in genomic sequences. The program is also suited for finding Cys-containing homologues, as well as the protein factors of the Sec machinery. The program comes with a default set of manually curated profiles for all known selenoproteins and machinery protein factors, and can be extended to annotate standard genes (89). Selenoprofiles is available for download.

SECISearch3 (67, 69) is currently the most efficient and widely used method for identification of eukaryotic SECIS elements. SECISearch3 constitutes an improvement of the original SECISearch (53) through the incorporation of covariance models, improving speed and accuracy of RNA motif finding. The program is also a component of Seblastian (67, 69), an SECIS-dependent pipeline for the identification of eukaryotic selenoproteins. Seblastian analyzes the regions upstream of SECIS elements to find selenoprotein genes by homology. Seblastian can be used either to identify known selenoproteins (matching ORFs to Sec-containing annotated homologues) or to predict novel selenoproteins (matching to Cys-homologues). Both SECISearch3 and Seblastian are available online as web servers. Then, structural analysis of eukaryotic SECIS can be performed with a dedicated tool, SECISaln (18).

For finding bacterial SECIS elements, the method bSECISearch (105) is used. This program, available online, scans nucleotide sequences and returns potential bSECIS elements as well as their host ORF. bSECIS elements are identified by matching a predefined structural pattern and several sequence constraints.

SelGenAmic (46) is an ab initio gene predictor specifically developed for selenoprotein genes. The program is based on geneid (37) and uses the coding potential and splice sites to build gene models that include an in-frame TGA codon. Coupled with SECIS prediction, the program has been used to identify selenoproteins in various metazoans (47, 48).

Secmarker (88) is a program for the identification of tRNA-Sec in genomes, built upon realizing that existing tools for generic tRNA finding in genomes performed badly for tRNA-Sec. This gene is a good marker for the presence of selenoproteins in a genome. Therefore, Secmarker can be used to quickly scan thousands of genomes and predict which ones might encode for selenoproteins. The program is available online as web server and for download.

Selenoprotein databases

Due to the historic unreliability of broadly used bioinformatic resources for selenium research, a number of specialized databases have been created for selenoproteins. SelenoDB was built to provide manually curated selenoproteomes, which include gene, protein, and SECIS sequences for a number of model organisms (13). Its second release expanded the database with automatic annotations by Selenoprofiles to include a larger number of genomes, mostly vertebrates (81). Then, the database dbTEU is a collection of proteins from both prokaryotes and eukaryotes, which are related to trace elements, including selenoproteins (108). Next, recode is a database for genes that use a nonstandard translation through recoding events, such as Sec and other types of stop codon redefinition (6). Finally, recent efforts were made to reannotate selenoproteins in the NCBI Reference Sequence database (RefSeq) for prokaryotes (39) and vertebrates (80).

Distribution of Selenoproteins Across the Tree of Life

The rise of sequencing technologies brought great biodiversity at the disposal of scientists in the form of nucleotide sequences. Many researchers analyzed sequences throughout the tree of life to profile the taxonomic distribution of selenoproteins and other forms of selenium utilization. There is compelling evidence to support that the Sec trait evolved only once, for example, prokaryotes and eukaryotes use analogous Sec biosynthesis and insertion pathways, and some selenoproteins are shared among the three domains. The Sec trait was then lost in many lineages independently resulting in a scattered distribution of selenoproteins in nature (72, 73, 109).

Eukaryotes

The distribution of selenoproteins in eukaryotes outlines a highly dynamic evolutionary history (Fig. 4). Many selenoprotein families are shared between single-cell eukaryotes and vertebrates, notably selenophosphate synthase (SEPHS2), glutathione peroxidases (GPXs), thioredoxin reductases, and methionine-R-sulfoxide reductase, among others (54), indicating an early origin for most eukaryotic selenoproteins (32). Some of them subsequently replaced Sec by Cys, or were lost altogether, in different lineages (61). The number of selenoproteins varies greatly between organisms: the largest selenoproteome described to date corresponds to the harmful pelagophyte alga Aureococcus anophagefferens, with 59 selenoproteins (35, 36), while the nematode Caenorhabditis elegans encodes only a single selenoprotein (93), and many other eukaryotes lack selenoproteins completely (73).

FIG. 4.

Distribution of selenoproteins and their extinctions in eukaryotes. The approximate phylogeny of sequenced eukaryotic genomes is shown in a circular tree. The selenoproteome size (number of selenoproteins in the genome) is shown for each species by the length of the black bars. The reported Sec extinctions are indicated in red. The recent discovery of selenoproteins in fungal species revealed multiple Sec extinctions in Fungi. Protists show a highly scattered distribution of selenoproteins. Color images are available online.

Selenoproteins are widespread among metazoans. The richest selenoproteomes among animals are found in vertebrates and other deuterostomes [such as lancelet (48) and echinoderms]. The reconstructed ancestral vertebrate selenoproteome consists of 28 genes, whose evolutionary history was thoroughly described (71). The 25 human selenoproteins are tightly conserved across mammals (mouse has 24, with GPX6 containing Cys instead of Sec). Fish genomes encode up to 38 selenoprotein genes, product of multiple lineage-specific gene duplications (71). Three selenoprotein families are vertebrate-specific: SELENOI (EPT1 or SelI), SELENOV (SelV), and SELENOE (Fep15, found only in fish). Interestingly, SELENOI is involved in neural development (42) and appeared to be one of the most important human selenoproteins, based on selective constraints against loss-of-function (LoF) variants in the general population (87).

The patterns of selenoprotein evolution in vertebrates have been studied to investigate the evolutionary forces that shaped them. Two environmental factors have been proposed to play important roles in the evolution of selenoproteins: availability of selenium and aquatic environment. The two effects may be intertwined, as selenium is abundant in water but scarce in land; however, an alternative explanation is that selenoproteins are more beneficial in aquatic than terrestrial environments because of different exposures to oxidants. Several observations support the role of these factors. Aquatic organisms typically encode for larger sets of selenoproteins than terrestrial ones (5, 61).

Further insights come from the analysis of SELENOP, the only human selenoprotein with multiple Sec residues. SELENOP is a secreted plasma protein that distributes selenium throughout the body, prioritizing tissues with high requirements for this element (11). SELENOP functions as a quantitative marker of selenium utilization at various evolutionary scales: its concentration in plasma reflects selenium availability in mouse and human (11, 41), and its number of encoded Sec residues correlates with the selenoproteome size across genomes of vertebrates (63) and other animals (5). Notably, SELENOP has fewer Sec residues in mammals compared with teleost fishes (63), again supporting the role of aquatic environment in shaping selenoprotein evolution. Indeed, recent evolutionary analyses showed that the SELENOP Sec content is under selection in fish, but in contrast it declined in a neutral manner in mammals (90). In the same study, terrestrial vertebrates showed relaxed evolutionary constraints throughout the whole set of genes that use or regulate selenium, when compared with fish (90).

Protostomes, the sister clade of deuterostomes (which includes vertebrates), encompass a huge diversity of invertebrate animal life-forms. Insects and nematodes encode minimal selenoproteomes: D. melanogaster has three selenoproteins, and C. elegans only one. Some noninsect arthropods encode larger selenoproteomes (Fig. 4), but no systematic study has been reported for these groups yet. All known selenoprotein-less animals belong to arthropods or nematodes. Within arthropods, Sec was lost independently in several insect orders (17, 62, 66), and in at least two genera of mites (arachnids) (88). Some parasitic plant nematodes are also devoid of selenoproteins (78), while other nematodes have minimal selenoproteomes (93) (Fig. 4).

An extensive study describing the evolution of SELENOP in Metazoa, including diverse invertebrate lineages, was recently reported (5). SELENOP is present throughout Metazoa, but it was lost in tunicates, crustaceans, platyhelminthes, most nematodes, and most insects. The Sec content appeared to be particularly high in some Lophotrochozoa (lineage including Annelida, Nemertea, and molluscs): SELENOP of pacific oyster Magallana gigas has 46 Sec-UGAs; in freshwater mussel Elliptio complanata, it has 132 Sec-UGAs (5). The remarkable diversity of invertebrates is particularly interesting for the study of the evolution of selenoproteins and can provide valuable insights into the biology of selenium in animal life.

Selenoproteins were recently discovered in fungi, a lineage that was previously believed to be devoid of these proteins (72). At least nine genomes from diverse early-branching fungal phyla appear to encode selenoproteins and the Sec machinery. The absence of Sec in most fungal genomes is the product of multiple independent Sec extinction events (Fig. 4). Notably, Sec was lost at the root of Dikarya, which comprises Ascomycota and Basidiomycota, and led to the absence of selenoproteins in Saccharomyces cerevisiae and other yeasts. No selenoproteins have been identified in land plants so far (73), whereas algae use Sec and are often selenoprotein rich (4, 58, 61, 77).

Protists are a paraphyletic group of unicellular eukaryotes representing massive and largely unexplored biodiversity. The distribution of selenoproteins is highly scattered throughout protist genomes (73, 88) including many selenoprotein-less species, as well as the largest selenoproteomes known (4, 36) (Fig. 4).

Bacteria

Analysis of fully sequenced genomes indicates that selenoproteins are used by 20%–25% of bacteria, displaying a widely scattered distribution across different lineages (73, 79, 109). Around 50 selenoprotein families have been identified so far in bacteria (103), only a few of which are also present in eukaryotes: selenophosphate synthetase (SelD), GPX, deiodinase-like, methionine-S-sulfoxide reductase (MsrA), alkyl hydroperoxide reductase C (AhpC) and radical SAM domain proteins (RSAM). Most bacterial selenoproteins were predicted using bioinformatics in genomes or metagenomes derived from environmental sequencing projects (19, 52, 104 –106).

Despite the fact that the majority of bacteria (∼75%) do not use Sec, most of the clades contain members with selenoproteins. Selenoprotein-rich bacteria were identified in three phyla: Deltaproteobacteria, Clostridia, and Synergistetes (79), with the largest selenoproteome to date identified in Syntrophobacter fumaroxidans (Deltaproteobacteria) with 39 selenoproteins (107). Selenoprotein-containing bacteria encompass a great diversity, but most of them are obligate or facultative anaerobic (83, 92). Many of the characterized selenoproteins are involved in energy metabolism, as well as other processes such as redox cycling and Sec synthesis itself (83, 92). However, most bacterial selenoproteins were identified through bioinformatic analysis and their functions could only be inferred by sequence homology. Additional studies will be necessary to better understand the biology of selenium in prokaryotes.

In addition, it is very likely that many selenoproteins have not been identified yet, and more efficient and accurate tools will be necessary to analyze the overwhelming number of genomes and metagenomes being sequenced. Recently, it was discovered that some bacteria use non-UGA codons for Sec incorporation (76), in a remarkable example of flexibility of the genetic code. tRNA-Sec genes were identified that recognize the stop codons UAG and UAA, and 10 other sense codons. Accordingly, selenoprotein genes that contain the corresponding non-UGA codon in the Sec position were identified, and Sec insertion by those tRNAs was confirmed experimentally.

Archaea

Selenoproteins in archaea show a very narrow distribution, observed so far only in three phyla (84). Nine selenoprotein families are known, the majority of which are involved in methanogenesis (84, 92). The selenoprotein synthesis pathway in archaea resembles the eukaryotic system, rather than the bacterial one (92). The recent discovery of selenoproteins in the newly described Asgard lineage, the closest archaeal relatives to eukaryotes (102), brought key insights into the evolution of Sec insertion (70). Asgard archaea have selenoproteins with SECIS elements that resemble eukaryotes, rather than other archaea (methanogens), indicating that the eukaryotic SECIS motif predated the origin of eukaryotes. A key feature of this motif, the kink-turn core, was also found in genomes of methanogens: it forms part of the archaeal SECIS of a single selenoprotein gene, vhuD, leading to speculation that vhuD is the progenitor of eukaryotic SECIS elements. A long-standing question regarding the selenoprotein synthesis pathway in archaea still remains elusive: the identity of the archaeal SECIS-binding protein is unknown (84).

Beyond Gene Prediction

Functional studies: Ribo-seq

The synthesis of selenoproteins is essential for mammals and its disruption causes lethality in mice [reviewed in ref. (23)]. Mutations in selenoproteins or in genes required for their synthesis result in various diseases in humans [reviewed in refs. (28, 30, 91)]. Selenoprotein synthesis has been extensively studied, and all the required and sufficient factors for Sec insertion in eukaryotes are known (38). Yet, many aspects of the mechanism for Sec incorporation are still unclear, including the roles of the various factors, its efficiency, and regulation (44). Ribosome profiling (Ribo-seq) has emerged as an important tool for studying the mechanism of selenoprotein synthesis (26). This relatively new technique is based on high-throughput sequencing of ribosome-protected fragments (RPFs), ∼30-nucleotide long reads roughly corresponding to the sequence covered by a single ribosome. The analysis of RPFs reveals which regions of the transcriptome are under active translation. Quantification of RPFs provides a measure of ribosome abundance with codon resolution.

Ribo-seq experiments are useful to study many aspects of translation. Particularly, for selenoproteins, it allows to quantify UGA readthrough as a proxy for Sec incorporation efficiency. Translation of selenoproteins in mouse liver has been used to study the effects of dietary selenium (43, 94), the role of Secisbp2 on UGA recoding and selenoprotein mRNA stability (29), and the effects of pathogenic missense mutations in Secisbp2 mouse models (110). One particularly interesting example of UGA recoding is the gene SelenoP, which contains two SECIS elements and multiple UGA codons. Ribo-seq was applied to study the role of each of the two SECIS in mouse (74), and more recently, to study the translation of the 46 UGA-containing SelenoP of pacific oyster (5).

Population genetics

The study of genetic variation in selenoprotein genes from human populations is a powerful tool to study human adaptation, selection, and disease. Recently, the population dynamics of variants in selenoproteins and selenium-related genes was analyzed to address the response of different populations to variable selenium levels in the soil (97).

More recently, selenoprotein genes were analyzed to assess the frequency of LoF variants in populations, yielding estimated levels of tolerance to the loss of these genes in human (87). These were compared with knockout phenotypes in mice. While good correspondence was found between human and mouse, notable differences were found in some genes, such as difference in tolerance of GPx4, in which mice were more sensitive, strikingly, iodothyronine deiodinases seemed to not tolerate LoF variants in humans, whereas their deletion in mice produced only a mild phenotype (31).

Conclusions

Twenty years on, bioinformatics has been pushing forward the field of selenium research. Since the main genetic features of selenoproteins were established, scientists found in selenoproteins intriguing questions that could be addressed by sequence analysis. Selenoproteins were an early application of bioinformatics, before the human genome was published. The prediction of selenoproteins in nucleotide sequences is still challenging, mainly due to the growing number of genomes and transcriptomes available. Automated programs alleviated this, and now allow carrying out large-scale surveys, which provided a much-detailed map of Sec usage across the tree of life.

Selenoprotein research also benefited from new sequencing techniques, such as ribosome profiling, for it is now possible to estimate the efficiency of Sec incorporation in vivo. Sequencing of the genomes and exomes of human populations, which is now in the range of hundreds of thousands of individuals, as well as sequencing of diseased patients, promises new insights into the molecular mechanisms of selenoproteins and the role of selenium in health and disease.

Funding Information

Supported by the National Institutes of Health grants to V.N.G.

Footnotes

Abbreviations Used

References

Allmang

, Wurth

, and Krol

. The selenium to selenoprotein pathway in eukaryotes: more molecular partners than anticipated. Biochim Biophys Acta, 1790: 1415–1423, 2009.

Altschul

, Madden

, Schäffer

, Zhang

, Miller

, and Lipman

. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25: 3389–3402, 1997.

Andreesen

and Ljungdahl

. Formate dehydrogenase of Clostridium thermoaceticum: incorporation of selenium 75, and the effects of selenite, molybdate, and tungstate on the enzyme. J Bacteriol, 116: 867–873, 1973.

Araie

, Sakamoto

, Suzuki

, and Shiraiwa

. Characterization of the selenite uptake mechanism in the coccolithophore Emiliania huxleyi (haptophyta). Plant Cell Physiol, 52: 1204–1210, 2011.

Baclaocos

, Santesmasses

, Mariotti

, Bierła

, Vetick

, Lynch

, McAllen

, Mackrill

, Loughran

, Guigó

, Szpunar

, Copeland

, Gladyshev

, and Atkins

. Processive recoding and metazoan evolution of selenoprotein P: up to 132 UGAs in molluscs. J Mol Biol, 431: 4381–4407, 2019.

Bekaert

, Firth

, Zhang

, Gladyshev

, Atkins

, and Baranov

P V

. Recode-2: new design, new search tools, and many more genes. Nucleic Acids Res, 38: D69–D74, 2010.

Berg

, Baron

, and Stewart

. Nitrate-inducible formate dehydrogenase in Escherichia coli K-12. II. Evidence that a mRNA stem-loop structure is essential for decoding opal (UGA) as selenocysteine. J Biol Chem, 266: 22386–22391, 1991.

Berry

, Banu

, Chen

, Mandel

, Kieffer

, Harney

, and Larsen

. Recognition of UGA as a selenocysteine codon in type I deiodinase requires sequences in the 3′ untranslated region. Nature, 353: 273–276, 1991.

Böck

, Forchhammer

, Heider

, Leinfelder

, Sawers

, Veprek

, and Zinoni

. Selenocysteine: the 21st amino acid. Mol Microbiol, 5: 515–520, 1991.

10.

Bulteau

and Chavatte

. Update on selenoprotein biosynthesis. Antioxid Redox Signal, 23: 775–794, 2015.

11.

Burk

and Hill

. Regulation of selenium metabolism and transport. Annu Rev Nutr, 35: 109–134, 2015.

12.

Carlson

, Xu

, Kryukov G

, Rao

, Berry

, Gladyshev

, and Hatfield

. Identification and characterization of phosphoseryl-tRNA(Ser Sec) kinase. Proc Natl Acad Sci U S A, 101: 12848–12853, 2004.

13.

Castellano

, Gladyshev

, Guigó

, and Berry

. SelenoDB 1.0: a database of selenoprotein genes, proteins and SECIS elements. Nucleic Acids Res, 36: D332–D338, 2008.

14.

Castellano

, Lobanov A

, Chapple

, Novoselov S

, Albrecht

, Hua

, Lescure

, Lengauer

, Krol

, Gladyshev

, and Guigó

. Diversity and functional plasticity of eukaryotic selenoproteins: identification and characterization of the SelJ family. Proc Natl Acad Sci U S A, 102: 16188–16193, 2005.

15.

Castellano

, Morozova

, Morey

, Berry

, Serras

, Corominas

, and Guigó

. In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Rep, 2: 697–702, 2001.

16.

Castellano

, Novoselov S

, Kryukov G

, Lescure

, Blanco

, Krol

, Gladyshev

, and Guigó

. Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep, 5: 71–77, 2004.

17.

Chapple

and Guigó

. Relaxation of selective constraints causes independent selenoprotein extinction in insect genomes. PLoS One, 3: e2968, 2008.

18.

Chapple

, Guigó

, and Krol

. SECISaln, a web-based tool for the creation of structure-based alignments of eukaryotic SECIS elements. Bioinformatics, 25: 674–675, 2009.

19.

Chaudhuri

and Yeates

. A computational method to predict genetically encoded rare amino acids in proteins. Genome Biol, 6: R79, 2005.

20.

Chavatte

, Brown

, and Driscoll

. Ribosomal protein L30 is a component of the UGA-selenocysteine recoding machinery in eukaryotes. Nat Struct Mol Biol, 12: 408–416, 2005.

21.

Chen

GFT

, Fang

, and Inouye

. Effect of the relative position of the UGA codon to the unique secondary structure in the fdhF mRNA on its decoding by selenocysteinyl tRNA in Escherichia coli . J Biol Chem, 268: 23128–23131, 1993.

22.

Cone

, Del Río

, Davis

, and Stadtman

. Chemical characterization of the selenoprotein component of clostridial glycine reductase: identification of selenocysteine as the organoselenium moiety. Proc Natl Acad Sci U S A, 73: 2659–2663, 1976.

23.

Conrad

and Schweizer

. Mouse models that target individual selenoproteins. In: Selenium, edited by Hatfield

, Schweizer

, Tsuji

and Gladyshev

. Cham, Switzerland: Springer International Publishing, 2016, pp. 567–578.

24.

Copeland

, Fletcher

, Carlson

, Hatfield

, and Driscoll

. A novel RNA binding protein, SBP2, is required for the translation of mammalian selenoprotein mRNAs. EMBO J, 19: 306–314, 2000.

25.

Cravedi

, Mori

, Fischer

, and Percudani

. Evolution of the selenoproteome in Helicobacter pylori and Epsilonproteobacteria. Genome Biol Evol, 7: 2692–2704, 2015.

26.

Dalley

, Baird

, and Howard

. Studying selenoprotein mRNA translation using RNA-Seq and ribosome profiling. Methods Mol Biol, 1661: 103–123, 2018.

27.

Flohe

, Günzler

, and Schock

. Glutathione peroxidase: a selenoenzyme. FEBS Lett, 32: 132–134, 1973.

28.

Fradejas-Villar

Consequences of mutations and inborn errors of selenoprotein biosynthesis and functions. Free Radic Biol Med, 127: 206–214, 2018.

29.

Fradejas-Villar

, Seeher

, Anderson

, Doengi

, Carlson

, Hatfield

, Schweizer

, and Howard

. The RNA-binding protein Secisbp2 differentially modulates UGA codon reassignment and RNA decay. Nucleic Acids Res, 45: 4094–4107, 2017.

30.

Friedmann Angeli JP and Conrad

Selenium and GPX4, a vital symbiosis. Free Radic Biol Med, 127: 153–159, 2018.

31.

Galton

, Schneider

, Clark

, and St. Germain DL. Life without thyroxine to 3,5,3′-triiodothyronine conversion: studies in mice devoid of the 5′-deiodinases. Endocrinology, 150: 2957–2963, 2009.

32.

Gladyshev

VN.

Eukaryotic selenoproteomes. In: Selenium, edited by Hatfield

, Schweizer

, Tsuji

and Gladyshev

. Cham, Switzerland: Springer International Publishing, 2016, pp. 127–139.

33.

Gladyshev

, Arnér

, Berry

, Brigelius-Flohé

, Bruford

, Burk

, Carlson

, Castellano

, Chavatte

, Conrad

, Copeland

, Diamond

, Driscoll

, Ferreiro

, Flohé

, Green

, Guigó

, Handy

, Hatfield

, Hesketh

, Hoffmann

, Holmgren

, Hondal

, Howard

, Huang

, Kim

H-Y

, Kim

, Köhrle

, Krol

, Kryukov

G V.

, Lee

, Lei

, Liu

, Lescure

, Lobanov

A V.

, Loscalzo

, Maiorino

, Mariotti

, Sandeep Prabhu

, Rayman

, Rozovsky

, Salinas

, Schmidt

, Schomburg

, Schweizer

, Simonović

, Sunde

, Tsuji

, Tweedie

, Ursini

, Whanger

, and Zhang

Selenoprotein gene nomenclature. J Biol Chem, 291: 24036–24040, 2016.

34.

Glass

, Singh

, Jung

, Veres

, Scholz

, and Stadtman

. Monoselenophosphate: synthesis, characterization, and identity with the prokaryotic biological selenium donor, compound SePX. Biochemistry, 32: 12555–12559, 1993.

35.

Gobler

, Berry

, Dyhrman

, Wilhelm

, Salamov

, Lobanov A

, Zhang

, Collier

, Wurch

, Kustka

, Dill

, Shah

, VerBerkmoes

, Kuo

, Terry

, Pangilinan

, Lindquist

, Lucas

, Paulsen

, Hattenrath-Lehmann

, Talmage

, Walker

, Koch

, Burson

, Marcoval

, Tang

Y-Z

, Lecleir

, Coyne

, Berg

, Bertrand

, Saito

, Gladyshev

, and Grigoriev

I V

. Niche of harmful alga Aureococcus anophagefferens revealed through ecogenomics. Proc Natl Acad Sci U S A, 108: 4352–4357, 2011.

36.

Gobler

, Lobanov A

, Tang

, Turanov

, Zhang

, Doblin

, Taylor

, Sañudo-Wilhelmy

, Grigoriev I

, and Gladyshev

. The central role of selenium in the biochemistry and ecology of the harmful pelagophyte, Aureococcus anophagefferens . ISME J, 7: 1333–1343, 2013.

37.

Guigó

, Knudsen

, Drake

, and Smith

. Prediction of gene structure. J Mol Biol, 226: 141–157, 1992.

38.

Gupta

, Demong

, Banda

, and Copeland

. Reconstitution of selenocysteine incorporation reveals intrinsic regulation by SECIS elements. J Mol Biol, 425: 2415–2422, 2013.

39.

Haft

, DiCuccio

, Badretdin

, Brover

, Chetvernin

, O'Neill

, Li

, Chitsaz

, Derbyshire

, Gonzales

, Gwadz

, Lu

, Marchler

, Song

, Thanki

, Yamashita

, Zheng

, Thibaud-Nissen

, Geer

, Marchler-Bauer

, and Pruitt

. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res, 46: D851–D860, 2018.

40.

Hatfield

, Schweizer

, Tsuji

, and Gladyshev

. (Eds). Selenium: Its Molecular Biology and Role in Human Health . Cham, Switzerland: Springer International Publishing, 2016.

41.

Hill

, Xia

, Åkesson

, Boeglin

, and Burk

. Selenoprotein P concentration in plasma is an index of selenium status in selenium-deficient and selenium-supplemented Chinese subjects. J Nutr, 126: 138–145, 1996.

42.

Horibata

, Elpeleg

, Eran

, Hirabayashi

, Savitzki

, Tal

, Mandel

, and Sugimoto

. EPT1 (selenoprotein I) is critical for the neural development and maintenance of plasmalogen in humans. J Lipid Res, 59: 1015–1026, 2018.

43.

Howard

, Carlson

, Anderson

, and Hatfield

. Translational Redefinition of UGA codons is regulated by selenium availability. J Biol Chem, 288: 19401–19413, 2013.

44.

Howard

and Copeland

. New directions for understanding the codon redefinition required for selenocysteine incorporation. Biol Trace Elem Res, 192: 18–25, 2019.

45.

Hüttenhofer

and Böck

RNA structures involved in selenoprotein synthesis. Cold Spring Harb Monogr Arch 35, 1998.

46.

Jiang

and Liu

. SelGenAmic: an algorithm for selenoprotein gene assembly. Methods Mol Biol, 1661: 29–39, 2018.

47.

Jiang

, Liu

, and Ni

. In silico identification of the sea squirt selenoproteome. BMC Genomics, 11: 289, 2010.

48.

Jiang

, Ni

, and Liu

. Evolution of selenoproteins in the metazoan. BMC Genomics, 13: 446, 2012.

49.

Jones

, Dilworth

, and Stadtman

. Occurrence of selenocysteine in the selenium-dependent formate dehydrogenase of Methanococcus vannielii . Arch Biochem Biophys, 195: 255–260, 1979.

50.

Krol

Evolutionarily different RNA motifs and RNA-protein complexes to achieve selenoprotein synthesis. Biochimie, 84: 765–774, 2002.

51.

Kryukov

, Castellano

, Novoselov S

, Lobanov A

, Zehtab

, Guigó

, and Gladyshev

. Characterization of mammalian selenoproteomes. Science, 300: 1439–1443, 2003.

52.

Kryukov

and Gladyshev

. The prokaryotic selenoproteome. EMBO Rep, 5: 538–543, 2004.

53.

Kryukov

, Kryukov

, and Gladyshev

. New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements. J Biol Chem, 274: 33888–33897, 1999.

54.

Labunskyy

, Hatfield

, and Gladyshev

. Selenoproteins: molecular pathways and physiological roles. Physiol Rev, 94: 739–777, 2014.

55.

Lee

, Rajagopalan

, Kim

, You

, Jacobson

, and Hatfield

. Selenocysteine tRNA[Ser]Sec gene is ubiquitous within the animal kingdom. Mol Cell Biol, 10: 1940–1949, 1990.

56.

Leinfelder

, Zehelein

, Mandrand-Berthelot

, and Böck

. Gene for a novel tRNA species that accepts L-serine and cotranslationally inserts selenocysteine. Nature, 331: 723–725, 1988.

57.

Lescure

, Gautheret

, Carbon

, and Krol

. Novel selenoproteins identified in silico and in vivo by using a conserved RNA structural motif. J Biol Chem, 274: 38147–38154, 1999.

58.

Liang

, Wei

, Xu

, Li

, Kumar Sahu

, Wang

, Li

, Fu

, Zhang

, Melkonian

, Liu

, Wang

, and Liu

. Phylogenomics provides new insights into gains and losses of selenoproteins among Archaeplastida. Int J Mol Sci, 20: 3020, 2019.

59.

Liu

, Zhou

, Pan

, Baker

, Gu

J-D

, and Li

. Comparative genomic inference suggests mixotrophic lifestyle for Thorarchaeota. ISME J, 12: 1021–1031, 2018.

60.

Lobanov

, Delgado

, Rahlfs

, Novoselov

, Kryukov

, Gromer

, Hatfield

, Becker

, and Gladyshev

. The Plasmodium selenoproteome. Nucleic Acids Res, 34: 496–505, 2006.

61.

Lobanov

A V

, Fomenko

, Zhang

, Sengupta

, Hatfield

, and Gladyshev

. Evolutionary dynamics of eukaryotic selenoproteomes: large selenoproteomes may associate with aquatic life and small with terrestrial life. Genome Biol, 8: R198, 2007.

62.

Lobanov

A V

, Hatfield

, and Gladyshev

. Selenoproteinless animals: selenophosphate synthetase SPS1 functions in a pathway unrelated to selenocysteine biosynthesis. Protein Sci, 17: 176–182, 2008.

63.

Lobanov

A V

, Hatfield

, and Gladyshev

. Reduced reliance on the trace element selenium during evolution of mammals. Genome Biol, 9: R62, 2008.

64.

Low

, Harney

, and Berry

. Cloning and functional characterization of human selenophosphate synthetase, an essential component of selenoprotein synthesis. J Biol Chem, 270: 21659–21664, 1995.

65.

Mahdi

, Xu

X-M

, Carlson

, Fradejas

, Günter

, Braun

, Southon

, Tessarollo

, Hatfield

, and Schweizer

. Expression of selenoproteins is maintained in mice carrying mutations in SECp43, the tRNA selenocysteine 1 associated protein (Trnau1ap). PLoS One, 10: e0127349, 2015.

66.

Mariotti

Selenocysteine Extinctions in Insects. Cham, Switzerland: Springer, 2016, pp. 113–140.

67.

Mariotti

SECISearch3 and seblastian: in-silico tools to predict SECIS elements and selenoproteins. Methods Mol Biol, 1661: 3–16, 2018.

68.

Mariotti

and Guigó

. Selenoprofiles: profile-based scanning of eukaryotic genome sequences for selenoprotein genes. Bioinformatics, 26: 2656–2663, 2010.

69.

Mariotti

, Lobanov

, Guigo

, and Gladyshev

. SECISearch3 and Seblastian: new tools for prediction of SECIS elements and selenoproteins. Nucleic Acids Res, 41: e149, 2013.

70.

Mariotti

, Lobanov

, Manta

, Santesmasses

, Bofill

, Guigó

, Gabaldón

, and Gladyshev

. Lokiarchaeota marks the transition between the archaeal and eukaryotic selenocysteine encoding systems. Mol Biol Evol, 33: 2441–2453, 2016.

71.

Mariotti

, Ridge

, Zhang

, Lobanov

, Pringle

, Guigo

, Hatfield

, and Gladyshev

. Composition and evolution of the vertebrate and mammalian selenoproteomes. PLoS One, 7: e33066, 2012.

72.

Mariotti

, Salinas

, Gabaldón

, and Gladyshev

. Utilization of selenocysteine in early-branching fungal phyla. Nat Microbiol, 4: 759–765, 2019.

73.

Mariotti

, Santesmasses

, Capella-Gutierrez

, Mateo

, Arnan

, Johnson

, D'Aniello

, Yim

, Gladyshev

, Serras

, Corominas

, Gabaldón

, and Guigó

. Evolution of selenophosphate synthetases: emergence and relocation of function through independent duplications and recurrent subfunctionalization. Genome Res, 25: 1256–1267, 2015.

74.

Mariotti

, Shetty

, Baird

, Wu

, Loughran

, Copeland

, Atkins

, Howard

, Sen

, Loughran

, Copeland

, Atkins

, and Howard

. Multiple RNA structures affect translation initiation and UGA redefinition efficiency during synthesis of selenoprotein P. Nucleic Acids Res, 45: 13004–13015, 2017.

75.

Martin-Romero

, Kryukov

, Lobanov

, Carlson

, Lee

, Gladyshev

, and Hatfield

. Selenium metabolism in Drosophila. Selenoproteins, selenoprotein mRNA expression, fertility, and mortality. J Biol Chem, 276: 29798–29804, 2001.

76.

Mukai

, Englert

, Tripp

, Miller

, Ivanova

, Rubin

, Kyrpides

, and Söll

. Facile recoding of selenocysteine in nature. Angew Chem Int Ed Engl, 55: 5337–5341, 2016.

77.

Novoselov

, Rao

, Onoshko

, Zhi

, Kryukov

, Xiang

, Weeks

, Hatfield

, and Gladyshev

. Selenoproteins and selenocysteine insertion system in the model plant cell system, Chlamydomonas reinhardtii . EMBO J, 21: 3681–3693, 2002.

78.

Otero

, Romanelli-Cedrez

, Turanov

, Gladyshev

, Miranda-Vizuete

, and Salinas

. Adjustments, extinction, and remains of selenocysteine incorporation machinery in the nematode lineage. RNA, 20: 1023–1034, 2014.

79.

Peng

, Lin

, Xu

Y-Z

, and Zhang

. Comparative genomics reveals new evolutionary and ecological patterns of selenium utilization in bacteria. ISME J, 10: 2048–2059, 2016.

80.

Rajput

, Pruitt

, and Murphy

. RefSeq curation and annotation of stop codon recoding in vertebrates. Nucleic Acids Res, 47: 594–606, 2019.

81.

Romagné

, Santesmasses

, White

, Sarangi

, Mariotti

, Hübler

, Weihmann

, Parra

, Gladyshev

, Guigó

, and Castellano

. SelenoDB 2.0: annotation of selenoprotein genes in animals and their genetic diversity in humans. Nucleic Acids Res, 42: D437–D443, 2014.

82.

Romero

, Zhang

, Gladyshev

, and Salinas

. Evolution of selenium utilization traits. Genome Biol, 6: R66, 2005.

83.

Rother

Prokaryotic selenoprotein biosynthesis and function. In: Selenium, edited by Hatfield

, Schweizer

, Tsuji

, and Gladyshev

. Cham, Switzerland: Springer International Publishing, 2016, pp. 47–58.

84.

Rother

and Quitzke

. Selenoprotein synthesis and regulation in archaea. Biochim Biophys Acta Gen Subj, 1862: 2451–2462, 2018.

85.

Rother

, Resch

, Wilting

, and Böck

. Selenoprotein synthesis in archaea. Biofactors, 14: 75–83, 2001.

86.

Rotruck

, Pope

, Ganther

, Swanson

, Hafeman

, and Hoekstra

. Selenium: biochemical role as a component of glutathione peroxidase. Science, 179: 588–590, 1973.

87.

Santesmasses

, Mariotti

, and Gladyshev

. Tolerance to selenoprotein loss differs between human and mouse. Mol Biol Evol, 37: 341–354, 2020.

88.

Santesmasses

, Mariotti

, and Guigó

. Computational identification of the selenocysteine tRNA (tRNASec) in genomes. PLOS Comput Biol, 13: e1005383, 2017.

89.

Santesmasses

, Mariotti

, and Guigó

. Selenoprofiles: a computational pipeline for annotation of selenoproteins. Methods Mol Biol, 1661: 17–28, 2018.

90.

Sarangi

, Romagné

, and Castellano

. Distinct patterns of selection in selenium-dependent genes between land and aquatic vertebrates. Mol Biol Evol, 35: 1744–1756, 2018.

91.

Schoenmakers

, Schoenmakers

, and Chatterjee

. Mutations in humans that adversely affect the selenoprotein synthesis pathway. In: Selenium. Its Molecular Biology and Role in Human Health, edited by Hatfield

, Schweizer

, Tsuji

, and Gladyshev

. Cham, Switzerland: Springer International Publishing, 2016, pp. 523–538.

92.

Stock

and Rother

. Selenoproteins in archaea and Gram-positive bacteria. Biochim Biophys Acta, 1790: 1520–1532, 2009.

93.

Taskov

, Chapple

, Kryukov G

, Castellano

, Lobanov A

, Korotkov K

, Guigó

, and Gladyshev

. Nematode selenoproteome: the use of the selenocysteine insertion system to decode one codon in an animal genome?. Nucleic Acids Res, 33: 2227–2238, 2005.

94.

Tsuji

, Carlson

, Anderson

, Seifried

, Hatfield

, and Howard

. Dietary selenium levels affect selenoprotein expression and support the interferon-γ and IL-6 immune response pathways in mice. Nutrients, 7: 6529–6549, 2015.

95.

Tujebajeva

, Copeland

, Xu

, Carlson

, Harney

, Driscoll

, Hatfield

, and Berry

. Decoding apparatus for eukaryotic selenocysteine insertion. EMBO Rep, 1: 158–163, 2000.

96.

Turner

and Stadtman

. Purification of protein components of the clostridial glycine reductase system and characterization of protein A as a selenoprotein. Arch Biochem Biophys, 154: 366–381, 1973.

97.

White

, Romagné

, Müller

, Erlebach

, Weihmann

, Parra

, Andrés

, and Castellano

. Genetic adaptation to levels of dietary selenium in recent human history. Mol Biol Evol, 32: 1507–1518, 2015.

98.

Wilting

, Schorling

, Persson

, and Böck

. Selenoprotein synthesis in archaea: identification of an mRNA element of Methanococcus jannaschii probably directing selenocysteine insertion. J Mol Biol, 266: 637–641, 1997.

99.

X-M

, Carlson

, Irons

, Mix

, Zhong

, Gladyshev

, and Hatfield

. Selenophosphate synthetase 2 is essential for selenoprotein biosynthesis. Biochem J, 404: 115–120, 2007.

100.

X-M

, Carlson

, Mix

, Zhang

, Saira

, Glass

, Berry

, Gladyshev

, and Hatfield

. Biosynthesis of selenocysteine on its tRNA in eukaryotes. PLoS Biol, 5: e4, 2007.

101.

Yuan

, Palioura

, Salazar

, Su

, O'Donoghue

, Hohn

, Cardoso

, Whitman

, and Soll

. RNA-dependent conversion of phosphoserine forms selenocysteine in eukaryotes and archaea. Proc Natl Acad Sci U S A, 103: 18923–18927, 2006.

102.

Zaremba-Niedzwiedzka

, Caceres

, Saw

, Bäckström

, Juzokaite

, Vancaester

, Seitz

, Anantharaman

, Starnawski

, Kjeldsen

, Stott

, Nunoura

, Banfield

, Schramm

, Baker

, Spang

, and Ettema

TJG

. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature, 541: 353–358, 2017.

103.

Zhang

Prokaryotic selenoproteins and selenoproteomes. In: Selenium. Its Molecular Biology and Role in Human Health, edited by Hatfield DL, Schweizer U, Tsuji PA, and Gladyshev VN. Cham, Switzerland: Springer International Publishing, 2016, pp. 141–150.

104.

Zhang

, Fomenko

, and Gladyshev

. The microbial selenoproteome of the Sargasso Sea. Genome Biol, 6: R37, 2005.

105.

Zhang

and Gladyshev

. An algorithm for identification of bacterial selenocysteine insertion sequence elements and selenoprotein genes. Bioinformatics, 21: 2580–2589, 2005.

106.

Zhang

and Gladyshev

. Trends in selenium utilization in marine microbial world revealed through the analysis of the global ocean sampling (GOS) project. PLoS Genet, 4: e1000095, 2008.

107.

Zhang

and Gladyshev

. General trends in trace element utilization revealed by comparative genomic analyses of Co, Cu, Mo, Ni, and Se. J Biol Chem, 285: 3393–3405, 2010.

108.

Zhang

and Gladyshev

. dbTEU: a protein database of trace element utilization. Bioinformatics, 26: 700–702, 2010.

109.

Zhang

, Romero

, Salinas

, and Gladyshev

. Dynamic evolution of selenocysteine utilization in bacteria: a balance between selenoprotein loss and evolution of selenocysteine from redox active cysteine residues. Genome Biol, 7: R94, 2006.

110.

Zhao

, Bohleber

, Schmidt

, Seeher

, Howard

, Braun

, Arndt

, Reuter

, Wende

, Birchmeier

, Fradejas-Villar

, and Schweizer

. Ribosome profiling of selenoproteins in vivo reveals consequences of pathogenic Secisbp2 missense mutations. J Biol Chem [Epub ahead of print];, DOI: jbc.RA119.009369, 2019.

111.

Zhu

S-Y

, Li

X-N

, Sun

X-C

, Lin

, Li

, Zhang

, and Li

J-L

. Biochemical characterization of the selenoproteome in Gallus gallus via bioinformatics analysis: structure-function relationships and interactions of binding molecules. Metallomics, 9: 124–131, 2017.

112.

Zinoni

, Birkmann

, Stadtman

, and Böck

. Nucleotide sequence and expression of the selenocysteine-containing polypeptide of formate dehydrogenase (formate-hydrogen-lyase-linked) from Escherichia coli . Proc Natl Acad Sci U S A, 83: 4650–4654, 1986.

113.

Zinoni

, Heider

, Bock

, and Böck

. Features of the formate dehydrogenase mRNA necessary for decoding of the UGA codon as selenocysteine. Proc Natl Acad Sci U S A, 87: 4660–4664, 1990.