Abstract
Recent research on genome evolution of large DNA viruses has highlighted a number of incredibly dynamic processes that can facilitate rapid adaptation. The genomes of amphibian-like ranaviruses - double-stranded DNA viruses infecting amphibians, reptiles, and fish (family Iridoviridae) - were examined to assess variation in genome content and evolutionary processes. The viruses studied were closely related, but their genome content varied considerably, with 29 genes identified that were not present in all of the major clades. Twenty-one genes had evidence of recombination, while a virus isolated from a captive reptile appeared to be a mosaic of two divergent parents. Positive selection was also found to be acting on more than a quarter of Ranavirus genes and was found most frequently in the Spanish common midwife toad virus, which has had a severe impact on amphibian host communities. Efforts to resolve the root of this group by inclusion of an outgroup were inconclusive, but a set of core genes were identified, which recovered a well-supported species tree.
Introduction
Large DNA viruses have traditionally been viewed as somewhat evolutionarily inactive, exhibiting low substitution rates and cospeciation alongside their hosts. 1 This may have sprung from an overemphasis on the effect of the molecular mechanisms of replication (eg, the specific enzymes utilized and the resultant error rates, which ultimately determine mutation rates) and a lack of consideration of other biological factors that determine substitution rates such as generation time, life cycle, and selection. 2 Indeed, although phylogenetic concordance between hosts and viruses has been observed frequently, in some cases it has been shown to stem from preferential host switching rather than cospeciation. 3 Several lines of evidence from recent studies show that large DNA virus genomes can be highly dynamic. Genome evolution via lateral gene transfer occurs frequently with examples of the acquisition of genetic material from eukaryotes, bacteria, and bacteriophages.4–8 Double-stranded DNA viruses exhibit considerable variation in substitution rates. 2 Viruses such as the variola virus (a poxvirus causing smallpox) appear to have substitution rates on par with RNA viruses, and there is tentative evidence for high rates in other virus groups too.9,10 Recombination between viruses is also now known to be extremely widespread in some large DNA viruses.11,12
Nucleocytoplasmic large DNA viruses form a monophyletic group of seven virus families (Asfarviridae, Ascoviridae, Iridoviridae, Marseilleviridae, Mimiviridae, Phycodnaviridae, and Poxviridae). 13 Nucleocytoplasmic large DNA viruses share a common ancestor and a core set of genes, and their origins are thought to be concomitant with eukaryogenesis.4,14 They are differentiated from other large DNA viruses by their capacity to establish replication factories in the cytoplasm of host cells. 15
The family Iridoviridae consists of five genera that infect vertebrates and invertebrates. 16 They have received increased attention recently due to the impact on amphibian populations by members of the genus Ranavirus.17,18 Ranaviruses have a very broad host range - infecting amphibians, reptiles, and fish – and are thought to have undertaken a number of host jumps. 19 They can have severe impacts on individual host species or entire amphibian communities.20,21 The amphibian-like ranaviruses (ALRVs) 19 are considered emerging pathogens, and humans are thought to have contributed to their geographic distribution through the movement of hosts, which prompted their inclusion on the World Organization for Animal Health's list of notifiable pathogens.22,23 Humans may also have affected Ranavirus evolution; virus strains isolated from aquaculture have increased virulence compared to wild relatives,24,25 while the cohousing of diverse amphibian species through trade may present enhanced opportunities for recombination among divergent viruses.26–28
ALRVs with sequenced whole genomes are geographically widespread and have been isolated over a 50-year period. Genomes range from 100 to 130 kb in size and contain approximately 100 genes with low functional annotation rates.18,29 The ALRVs form a distant sister clade to grouper iridovirus-like ranaviruses (GIV-like; pathogens of marine fish) and are frequently divided into three groups: Ambystoma tigrinum virus-like (ATV-like), common midwife toad virus-like (CMTV-like), and Frog virus 3-like (FV3-like). 30 This division of the ALRVs is suggestive of underlying systematic relationships but has not yet been explicitly defined. There has been a gradual increase in the number of published Ranavirus whole genomes' but genomes have generally been sequenced one at a time applying a variety of approaches. There have been few comparative genomic analyses within this group with even fewer attempts to examine the group's evolutionary dynamics. However, a study examining Ranavirus genes for evidence of positive selection associated with host-switching found recombination in a high proportion of studied genes as well as positive selection in a number of newly acquired genes that might be associated with host switching events. 31 Another recent study focused on closely related, monophyletic isolates of ATV in North America and found little variation in genome content but did find evidence of positive selection and recombination among the regional isolates. 32
Methods
Data
Sixteen published ALRV genomes were downloaded from NCBI nucleotide database (full GenBank [.gbk] format and the nucleotide sequence only format [FASTA]) (Table 1). These viruses were isolated in the USA, Europe, China, and Australia over a 50-year period from 1965 to 2013 (Table 1). The same files were also downloaded for the GIV-like ranaviruses for use as outgroups and in some comparative analyses (Table 1).
Isolation history and source of data for Ranavirus genomes used in comparative genomics analyses.
For some isolates, the year of collection is provided in the GenBank record or accompanying publication. For others, the year of collection is estimated from other information in the accompanying publication, and this is indicated in the table by appending (approximately) to the year.
Comparative genomics
Genome contiguity was assessed by aligning complete virus genomes with NUCmer v3.1, which is a fast aligner for multiple, closely related genomes, using a minimum cluster length of 100 nucleotides. The output was visualized with Mummerplot. 33 To identify orthologous genes, ALRV genomes were first aligned using the Mugsy whole genome aligner (version 1, release 2.3) with default settings 34 and putative orthologs were clustered using Mugsy-Annotator 0.5 35 (coverage cutoff = 0.8, identity = 0.8, and query coverage = 0.8) with GenBank files containing original annotations for each virus supplied as input. Mugsy-Annotator is a command line program, which uses a whole genome multiple alignment and the genomes' associated features to identify annotation anomalies such as fragmentation, inconsistent starts, and missing features. Orthologous genes are grouped in clusters, and the algorithm requires loci to encode functional proteins (ie, no premature stop codons) for cluster membership. The sequence indices generated by Mugsy-Annotator were then used to extract ortholog sequences from whole genomes using the EMBOSS program Extractseq. 36 Sequences were then aligned with reference to translated amino acid sequences using TranslatorX, 37 which was run locally from the command line (with MAFFT as the aligner, amino acid sequences translated from the nucleotide sequences, and default settings used for other parameters).
The amount of variation in amino acid and nucleotide alignments was summarized with AMAS, 38 a python command line utility, which allows the computation of basic statistics from alignments. Alignments were checked for the presence or absence of sequences from each of the 16 isolates to generate a matrix (Supplementary Table 1; 16 viruses and 130 orthologs), which was used to summarize the number of genomes containing the ortholog as well as a summary of its occurrence across the virus phylogeny. In addition, the occurrence of each ortholog in the genomes of the GIV-like ranaviruses was assessed by a BLASTP search (Blast 2.2.29+, e-value cutoff at 1 × 10−10, four threads, and otherwise default BLASTP settings) against the published coding regions of GIV-like viruses. Orthologs found in all 16 ALRVs as well as both GIV-like viruses were classed as Ranavirus-core. Orthologs were then processed through a published pipeline to find functional annotations. 39
After removing stop codons, orthologs were assessed for recombination with GARD,40,41 which is implemented in HyPhy and uses a genetic algorithm to find recombination breakpoints in multiple sequence alignments. GARD was run in parallel in an MPI environment with five threads using the HKY85 substitution model. Alignments were split at breakpoints identified by GARD for further analysis. Prunier 42 identifies likely occurrences of lateral gene transfer by analyzing the lack of congruence between the topologies of a reference species tree and a given gene tree. All orthologous genes were screened for such conflict. A phylogeny generated from an ALRV-core gene set after removing genes with evidence of recombination (tree construction method described below) was supplied as the species tree. Prunier called RAxML to generate gene trees from nucleotide alignments for each orthologous gene using the GTR model of nucleotide substitution and 100 bootstraps using the rapid bootstrapping algorithm.
Each parent alignment and any split alignments (generated with reference to the GARD output using a custom script) were converted to interleaved phylip format using Prank v.140603. 43 These alignments were used to construct phylogenetic trees by maximum likelihood using RAxML 44 ; applying the GTR model of nucleotide substitution with rate variation modeled by a gamma distribution with four categories. Ten maximum-likelihood trees were generated on distinct starting trees in RAxML's rapid hill-climbing mode, and 100 bootstrap replicates were calculated and annotated on the best maximum-likelihood tree.
The adaptive version of Branch-Site REL 45 (implemented in HyPhy) was used to detect episodic selection. Branch-Site REL uses random effects models to allow variation in substitution rates between branches across a phylogeny in order to overcome shortcomings in previous branch-site models of selective pressure that can lead to erroneous inferences. 45 Each alignment was analyzed for positive selection using the random effects branch-site model, applying the universal genetic code and allowing alpha to vary from branch to branch and omega to vary among branch-site combinations. The RAxML trees were provided as input, and all branches were tested before applying the default Holm–Bonferroni correction for multiple comparisons. Positive selection was also analyzed with BUSTED, which uses the Branch-Site REL framework to test for evidence of episodic selection by estimating the proportion of selected codons across all branches of the phylogenetic tree instead of the branch-specific method of Branch-Site REL. 46 Again, the universal genetic code was used.
Phylogenetics
To address the evolutionary relationships among ALRVs, a phylogenetic tree including GIV as an outgroup was constructed. The results of Ranavirus genome content comparisons (described above) were used to select a set of orthologs conserved among ALRVs and GIV-like ranaviruses. Orthologs with evidence of recombination were removed from this core set. Nucleotide sequences were aligned with reference to amino acid sequences with TranslatorX. 37 Nucleotide and amino acid alignments were each concatenated separately using phyutility, 47 and all columns containing gaps were removed using trimAl (nogaps option). 48 The final multiple sequence alignments were used to construct species trees in Mr. Bayes v3.2.2 49 and RAxML, 44 using each ortholog as a partition and applying GTR + gamma (four rate categories) across all partitions. Markov chain Monte Carlo (MCMC) chains (two runs and four chains each) were run for 250,000 generations in Mr. Bayes. Twenty starting trees were generated in RAxML and support values were generated from 100 bootstrap replicates.
In order to obtain the maximum amount of data possible for the reconstruction of phylogenetic relationships among ALRVs, nucleotide sequences of the ALRV-core orthologs (only those present in all sixteen ALRVs) without evidence of recombination were used in phylogeny reconstruction. Alignments were concatenated and processed following the same pipeline described above (using 500,000 generations for MCMC chains). This tree was used as the species tree in Prunier analyses.
Results
Genome content
A total of 130 orthologous clusters were found in ALRV genomes. The attempted functional annotation of these orthologs conducted here resulted in only 40 annotations (Supplementary Table 2). The majority of genes were shorter than 1000 nucleotide bases in length (Supplementary Fig. 1). The 26 genes identified previously 29 as core genes that are conserved across the whole family (Iridoviridae) were among the orthologous clusters and were each found in all sixteen ALRV genomes. An additional 35 orthologs (a total of 61) were found in all viruses in the genus (Ranavirus-core genes; ALRVs and GIV-like viruses). A further four orthologs (total = 65) were present in all 16 ALRVs, but a total of 101 orthologs occurred in at least one member of all three major clades of the ALRVs.
Orthologous genes with evidence of recombination.

Phylogenetic relationships of amphibian-like ranaviruses (ALRVs) based on 51 ALRV-core genes having removed genes with evidence of recombination. Maximum likelihood and Bayesian trees were constructed from a concatenated nucleotide alignment 44,940 nucleotide bases in length, which was partitioned by loci.
Twenty-nine orthologs were therefore considered clade-specific within the ALRV. Each major clade had four clade-specific orthologs. There were also 12 orthologs in common between the FV3-like and CMTV-like clades that were not found in ATV-likes, and there was one ortholog that was lost in CMTV-likes (ie, present in ATV- and FV3-likes) compared with four orthologs lost in FV3-likes. The rate of functional annotation was very low among these accessory genes: the only orthologs with some available indication of function were a putative integrase-like protein, which was absent in the ATV-likes, a putative capsid maturation protease that occurred only in ATV-likes (and only in the fish pathogens of this group - European sheatfish virus [ESV] and epizootic haematopoietic necrosis virus [EHNV]), and a novel US-22 family protein in the CMTV-likes.
In addition, 38 singletons - defined as novel open reading frames present in only one genome - were found in ALRV genomes. Thirty of these singletons occurred in the two fish pathogens of the ATV-like clade (25 in ESV and five in EHNV), which have larger genomes than the other ALRVs (Table 1). Singletons were also found in ADRV2010 (2), CGSIV (2), CMTV_NL (1), STIV (1), and TFV (2).
Removing genes with signs of recombination resulted in a set of 51 orthologous genes that were found in all 16 ALRVs (ALRV-core; Supplementary Table 2). The final concatenated alignment contained 16 taxa and 44,940 nucleotide bases. Reconstruction of phylogenetic relationships of ALRVs using this 51-gene set resulted in a well-supported phylogeny (Fig. 1). When rooted on ATV-like viruses, the tree returned ATV-like, CMTV-like, and FV3-like viruses as three monophyletic clades with tortoise ranavirus 1 (ToRV1) intermediate to the ATV-likes and CMTV-like/FV3-like clade.
Systematics and genome arrangement
After removing genes with evidence of recombination, a set of 39 orthologous genes present in the GIV-like viruses and all 16 ALRVs (Supplementary Table 2) were retained for use in phylogeny reconstruction. The concatenated nucleotide alignment contained 17 taxa and 39,498 nucleotide bases. The concatenated amino acid alignment contained 13,166 residues. In spite of the large amount of data, phylogenetic evidence for the root of the ALRVs was inconclusive - there was support for ATV on its own or as part of a clade of ATV-like viruses as the root with the topology sensitive to the method followed for tree construction (Supplementary Fig. 2). It was therefore not possible to determine whether ATV-like viruses should be considered monophyletic or paraphyletic.

Alternative hypotheses for the evolutionary history of the ALRVs accounting for phylogenetic signal and genome arrangement. Genome segments involved in major rearrangements are color filled: black for the 30 kb rearrangement and the combined black and gray segments for the 88 kb rearrangement (see text). Orientation of genome segments is indicated by arrows and position relative to horizontal axis. (A) Ancestral ATV-like arrangement and paraphyly of ATV-likes. (B) Ancestral cMTV-like arrangement and monophyly of ATV-likes.
An examination of ALRV genome arrangement, using whole genome alignments, did not resolve the ALRV root either. Genome contiguity is very consistent among ALRVs, but there have been two major rearrangements during ALRV evolution (Fig. 2). These rearrangements support the division of the ALRVs into three groups but do not resolve whether there is a systematic basis for all three (ie, groupings which correspond to monophyletic clades). CMTV-like viruses are distinguished from ATV-like viruses by an approximately 30 kb inversion. This inversion bisects an annotated open reading frame at one end in ATV. A steroid oxidoreductase-like protein occurs as a full-length gene in CMTV-like viruses but occurs as a truncated version in ATV (ortholog identifier = C367o). However, this gene is present at full length in EHNV and in the same orientation as CMTV-likes but absent all together in ESV. Therefore, two alternative hypotheses remain concerning the root and ancestral genome arrangement of ALRVs (Fig. 2).
A second major inversion of approximately 88 kb -containing the older 30 kb rearrangement described earlier - occurred on the branch to the ancestor of all FV3-like viruses (Fig. 2). Again, this inversion bisects an open reading frame at one end when the original annotation of the CMTV_SP virus is used as a reference. There were a number of additional smaller rearrangements, the largest an approximately 13 kb long inversion in CGSIV relative to the very closely related ADRV isolates.
Recombination and selection
Twenty-one orthologous genes were found with evidence of recombination (P < 0.05) using GARD (Table 2). Genes affected by recombination include neurofilament triplet H1-like protein genes, a myristoylated membrane protein gene, and a D5 family NTPase/ATPase, but two-thirds lacked functional annotation (Supplementary Table 2). Prunier revealed 36 orthologous genes where there was significant conflict between the gene tree and the species tree provided. Phylogenies constructed from individual ortholog alignments were used to assess the phylogenetic position of ToRV1 across its genome. Moving along the genome, small, spatially clustered groups of genes supported equivalent phylogenetic positions for ToRV1 -either FV3-like or a sister group to the FV3-like/CMTV-like clade - but the supported position alternated (Fig. 3).

ToRV1 as a mosaic virus. Colored blocks represent putative genes (the position above or below the line along with the respective arrows indicates orientation). Colors represent ToRV1′s position in the gene tree (within the FV3-like clade, the sister taxon to the FV3-like/cMTV-like clade [root], or unresolved). Tick marks are positioned at 10 kbp intervals.
Twenty-eight genes were found with evidence of episodic selection using the branch-site method (Table 3). Again, functional annotation was lacking, but eight were annotated and three of the 29 were members of the Iridoviridae-core group. Most of the selected genes were present in all clades, but this may merely reflect lower power to detect these effects with fewer taxa. Two genes only found in the FV3/CMTV clades (functional annotation lacking for both) were also shown to have undergone positive selection.
Amphibian-like ranavirus orthologous genes under positive selection.
Branch-Site REL identifies the branches in a phylogeny where selection has occurred. Among the 28 genes “with evidence of selection, the terminal branches leading to ATV and ESV (five genes), TFV (six genes), and CMTV_SP (seven genes) featured most frequently. Other terminal branches where positive selection was detected were EHNV (three genes); ToRV1, STIV, and GGRV (all two); CGSIV and RGV (one). In terms of selection on ancestral branches, the ATV-likes and the viruses isolated form Chinese giant salamanders were found within the affected clades most frequently. Annotated orthologs that have experienced selection included a nuclear calmodulin-binding protein gene, neurofilament triplet H1-like protein genes, myristoylated membrane protein genes, and an ATPase-dependent protease.
The alignment-wide approach using BUSTED identified many of the same genes as Branch-Site REL (Table 3): 24 of the 28 genes identified by Branch-Site REL were also significant for positive selection using BUSTED, while there were six additional genes identified by BUSTED as candidates for having undergone episodic positive selection (including DNA polymerase).
Discussion
The traditional view of large DNA viruses was that they were slow evolving and their genomes were lacking in dynamic changes. This study suggests how dynamic genome change in ranaviruses is linked to important aspects of their biology and adds to work on other virus families that challenge this traditional view.9,11 The ALRVs of the family Iridoviridae are important, emerging pathogens of amphibians, fish, and reptiles. Their genomes have been subjected to widespread recombination and positive selection and have a considerable accessory genome. Although the viruses examined here are closely related, they exhibit a number of processes leading to dynamic change; their genome content varies systematically and individually, there is frequent recombination, and many genes have been the target of positive selection.
ALRV genomes were examined for the presence or absence of functional coding sequences to establish core gene sets as well as their accessory genomes. Some of the variation in genome content maps to the virus phylogeny with groups of genes that are lineage specific. ALRVs are highly variable in their host range and virulence,20,21,50,51 and variation in genome content is considered an important factor in explaining observed phenotypic differences between virus types in such key traits. 52 Unfortunately, this study found a similar, low rate of functional information for Ranavirus genes to previous studies; less than one-third of orthologs returned functional information from homology searches.19,53 This lack of functional annotation precludes predictions about how such lineage-specific genes might impact host-pathogen interactions, but gene knockout approaches will continue to be useful methods to investigate these roles. 54
Recombination was also found to be widespread in Ranavirus genomes and appeared to involve viruses separated by large geographic distances (based on existing patchy knowledge of Ranavirus geographic distribution). This result confirms and extends previous findings: Abrams et al. 31 found evidence of widespread recombination when they examined a subset of the viruses studied here in positive selection analyses, while Epstein and Storfer 32 also showed that regional isolates of ATV have been recombining.
In addition to assessing orthologs independently for evidence of recombination, the phylogenetic relationships of ToRV1 were compared across its genome. Phylogenetic analysis of each gene along the genome revealed two alternative positions for ToRV1, either at the base of the FV3-like clade or as a sister taxon to the FV3-like/CMTV-like clade. Genes returning each position were spatially clustered, with short sections of the genome supporting one position or the other, suggesting that ToRV1 is a mosaic virus. Of course, alternative explanations could be a sequencing library made up of two divergent viruses arising via lab contamination or coinfection. 55 These alternative hypotheses could be assessed - though not distinguished - with access to the original short reads. This virus was isolated from a captive, exotic reptile originating from the pet trade and as such can be considered at higher risk of exposure to diverse, geographically distant viruses resulting in a genuine mosaic virus or the type of mixed infection that might lead to an error in genome assembly.
There have been a number of attempts to use phylogenetics to address Ranavirus phylogeography and host jumps,19,30,56 but the apparent mosaic nature of ToRV1 shows that this can be very challenging. Recombination within genes can result in inaccurate inferences when those genes are used as candidates for phylogeny reconstruction. 57 Ranavirus genes with evidence of recombination include neurofilament triplet H1-like protein genes, which have been used previously for phylogenetics. 58
The 26 genes conserved across the family Iridoviridae have been frequently used in Ranavirus phylogenetics as a more robust alternative to candidate gene approaches, but six of these were shown here to be recombinant among ALRVs (Table 2). Stöhr et al. 30 classed ToRV1 as CMTV-like based on its genome arrangement - which matches viruses of that clade - but considered it an intermediate virus, given its placement in an FV3-like clade in a phylogenetic tree constructed from amino acid sequences of 17 genes. However, the phylogeny constructed here from 51 ALRV-core genes after removing recombinant loci provides strong support for the placement of ToRV1 as the outgroup to a CMTV-like/FV3-like clade. This remains consistent with Ranavirus genome arrangements: the most recent common ancestor of ToRV1 and all CMTV-like and FV3-like ranaviruses shared the genome arrangement of CMTV-likes, with a major genome rearrangement occurring in an ancestor of FV3-likes following divergence of CMTV-likes and FV3-likes.
The finding that positive selection on Ranavirus genes is common (acting on ~20% of the genome) agrees with other recent studies.31,32 Selection on eIFalpha – a gene thought to be involved in immune evasion - has been shown previously in all studied strains of Ambystomatid salamander viruses and was found under selection again here in the Spanish CMTV (CMTV_SP; a virulent virus shown to have had a community-level impact 20 ) and RGV (a virus isolated from an aquaculture facility 59 ). Genes sharing homology with myristoylated membrane proteins were also found to have undergone positive selection on multiple branches. Myristoylation can play a key role in the envelopment of large DNA viruses and anchoring of virus particles to cell membranes facilitating entry to host cells.60,61
In total, seven genes (more than any other virus studied) were found under selection along the terminal branch, leading to the Spanish isolate of CMTV, which is notable for the observations of its severe impacts on entire amphibian communities. 20 A putative nuclear calmodulin-binding protein was among these. Calmodulin is a calcium receptor that plays a critical role in a variety of cell responses to virus infection. 62 The potential capacity of this protein to modify conserved host pathways makes it an interesting target in efforts to understand the host range of this virus isolate.
As discussed, this study follows a number of others in attempting to use phylogenetics to reconstruct the evolutionary history of ALRVs. Jancovich et al. 19 proposed two alternate patterns of host switching consistent with their analysis of phylogenetic data, and the root of the ALRV was key to distinguish between them. Unfortunately, the analysis presented here failed to identify the root with any confidence since the topology and support was sensitive to the data supplied and the method applied. There was some support for ATV alone as the outgroup (which would mean that ATV-like viruses are paraphyletic) but monophyly of ATV-likes (ATV, EHNV and ESV) was also supported. The monophyly of ATV-likes with ATV at the root of this clade is significant as it suggests that the ancestor of ALRVs was an amphibian pathogen and there was a jump back into fish.
This study also highlighted some interesting results surrounding viruses from aquaculture. There was evidence of frequent positive selection in the tiger frog virus (TFV), which was isolated from an aquaculture facility in China. 53 These findings may therefore support the hypothesis that such facilities provide suitable environments for the rapid evolution of increased virulence. 50 Ranavirus isolates from aquaculture facilities in North America have already been shown to have increased virulence compared to related viruses from the wild, and the potential for spillback of viruses from industry to the wild should be considered a significant conservation threat. 50
Author Contributions
Conceived and designed the experiments: SJP. Analyzed the data: SJP. Wrote the article: SJP. The author reviewed and approved the final article.
Supplementary Material
Footnotes
Acknowledgments
The author thanks Richard Nichols for helpful discussions and also thanks Florent Lasalle for advice and sharing custom scripts to run HYPHY programs in batch. Finally, the author thanks four anonymous reviewers who provided helpful feedback, which improved the article.
