Abstract
Abstract
Bacteria exist in a wide range of habitats ranging from psychrophilic through mesophilic to thermophilic. These different habitats have distinct environmental restriction for their existence. These microorganisms evolve themselves to survive in a specific habitat through the phenotypic and genotypic changes. In the bacterial domain, in silico analysis of 16S rRNA gene sequences using Mega 5.2 software by computing nucleotide composition, and evaluating their significance by statistical analysis using analysis of variance through Statistical Package for the Social Sciences (SPSS) version 16.0, revealed the habitat-specific bias in the occurrence of four types of nucleosides (A, T, C, and G) in the 16S rRNA gene. This hypothesis is also supported by Duncan's multiple range significance test at p=0.05 and also by the clustering of bacterial species of the same habitat group in the neighbor-joining tree of 150 different bacterial species of different psychrophilic, mesophilic, and thermophilic habitats (50 from each). The results on the probability of substitution (transition and transversion) in 16S rRNA gene sequences suggest that there is a habitat-specific selection pressure that possibly happens at the level of replication and repair process that results in a decreasing frequency of occurrence of adenine and thymine in the order psychrophilic>mesophilic>thermophilic species, and in an increasing frequency of occurrence of cytosine and guanine in the order psychrophilic<mesophilic<thermophilic species of bacteria.
1. Introduction
I
In the present study, we selected the 16S rRNA gene to explore the possibility of replication-associated and repair process-associated mechanisms for an adaptive mutation strategy at the genetic level among bacterial species inhabiting in different habitats, as 16S rRNA gene sequences have been derived directly from genomic DNA of various organisms. There are ambiguous reports for the correlation between optimum growth temperature and G+C content of genomic DNA of organisms (Galtier and Lobry, 1997; Nakashima et al., 2003; Zheng and Wu, 2010). The 16S rRNA gene has been represented as the most important target to study bacterial ecology and bacterial identification in bacterial taxonomy; for the same purpose, it has been sequenced from a large number of bacterial species, which are available at the public database domain (Benson et al., 2012). The 16S rRNA gene has conserved regions throughout the bacterial domain (Weisburg et al., 1991), which are being used as a universal primer (forward and reverse) to amplify it from the genomic DNA (Baker et al., 2003). Along with the conserved region, it also has variable regions, which have opportunity to get evolved in a habitat-specific bias within the nucleotide composition, as a part of the adaptation strategy of bacteria. This study represents an exhaustive evaluation of the habitat-specific bias in the nucleotide composition of the 16S rRNA gene of 150 different bacterial species (which represent 150 different genera), 50 from each psychrophilic, mesophilic, and thermophilic habitats.
2. Materials And Methods
2.1. Sequence data
We have retrieved 16S rRNA gene sequences of 150 bacterial species (containing 50 species each from psychrophilic, mesophilic, and thermophilic group) from NCBI GenBank database (Benson et al., 2012). Each type of species represents a different genus and also has been published and submitted to various culture collection centers throughout the world. The retrieved 16S rRNA gene sequences were analyzed through their alignment both separately in psychrophilic, mesophilic, and thermophilic groups, and altogether for all 150 bacterial species using Mega 5.2 alignment process (Tamura et al., 2011).
2.2. Statistical analysis
The average frequency of occurrences of different nucleotides (A, T, C, and G) in the 16S rRNA gene was calculated in silico, using nucleotide composition measurement through Mega 5.2 software separately for 50 bacterial species each of psychrophiles, mesophiles, and thermophiles (Tamura et al., 2011). The nucleotide composition of the 16S rRNA gene was also evaluated statistically by calculating the average of each of the 4 nucleotides of the 50 bacterial species each from psychrophilic, mesophilic, and thermophilic groups. The average (±SE) values for each nucleotide from three bacterial groups have been statistically calculated using analysis of variance (ANOVA) through Statistical Package for the Social Sciences (SPSS) version 16.0. The difference between mean values for each nucleotide between three bacterial groups was tested for significance by Ducan's multiple range test at p=0.05 (Covert et al., 2004).
2.3. Probability of substitution (transition and transversion ratio)
The probabilities of substitution of one base by other in the 16S rRNA gene were measured in silico computing the MCL transition
2.4. Phylogenetic relatedness
The phylogenetic relatedness among 150 bacterial species (50 each of psychrophiles, mesophiles, and thermophiles) was evaluated by constructing a neighbor-joining tree of their 16S rRNA gene sequences by Mega 5.2 software tool (Tamura et al., 2011).
3. Results and Discussion
We have found the habitat-specific bias in the nucleotide composition of the 16S rRNA gene between bacterial species of psychrophilic, mesophilic, and thermophilic regions. The alignment of 16S rRNA gene nucleotide sequences of psychrophiles, mesophiles, and thermophiles separately and together (of 150 different species) exhibited the presence of conserved regions among all 150 bacterial species.
3.1. Habitat-specific bias in the nucleotide composition of the 16S rRNA gene
The statistical analysis of 16S rRNA gene sequences of the bacterial species from psychrophile, mesophile, and thermophile groups using Mega 5.2 and ANOVA has revealed a correlation between variable region's nucleotide composition of the 16S rRNA gene and the habitats of bacteria. The frequencies of occurrence for the four nucleotides (A, T, C, and G) in the 16S rRNA gene were different among the bacterial species of psychrophiles, mesophiles, and thermophiles. The mean (±SE) values for the frequency of occurrence of purines (A and G) are as follows: for adenine, they were in decreasing order from psychrophilic to mesophilic to thermophilic; for guanine, they were in increasing order from psychrophilic to mesophilic to thermophilic. Those in the case of pyrimidines (T and C) are as follows: for thymine, they were in a decreasing order from psychrophilic to mesophilic to thermophilic; for cytosine, they were in an increasing order from psychrophilic to mesophilic to thermophilic bacterial species.
The different frequency of occurrence for each nucleotide in the 16S rRNA gene among bacterial species from three different habitats was observed to be significant when tested using Duncan's multiple range test at p=0.05 (Table 1). The above result is in accordance with the earlier finding in the literature regarding the increased G+C content in thermophilic bacterial species compared with mesophilic (Galtier and Lobry, 1997; Wang et al., 2006) because the G:C pair has three hydrogen bonds and the A:T pair has two hydrogen bonds, and that is why the G:C pair is more thermally stable than the A:T pair (Nakashima et al., 2003).
Mean values in column 3 followed by superscript letters are significantly different as determined by SPSS at p=0.05 according to DMRT.
3.2. Probability of substitution (transition and transversion ratio)
Two types of substitution (transition and transversion) in nucleotide sequences have been reported in the molecular world (Ina, 1998; Strandberg and Salter, 2004). The transition substitution refers to a purine substituted by a purine (A↔G) only, and a pyrimidine substituted by a pyrimidine (T↔C) only, while the transversion substitution refers to a purine substituted by a pyrimidine (A↔T, A↔C, G↔T, and G↔C) and a pyrimidine substituted by a purine (T↔A, T↔G, C↔A, and C↔G) in the nucleotide sequences (Collins and Jukes, 1994; Yang and Yoder, 1999). In the transition substitution, in the case of purines, the probability of substitution of G by A is in decreasing order from psychrophiles (9.74) to mesophiles (9.73) to thermophiles (8.28), and for A by G, mesophiles score a higher value (12.18) than psychrophiles (11.67) and thermophiles (11.79), while in the case of pyrimidines, the probability of substitution of T by C is in an increasing order from psychrophiles (14.77) to mesophiles (16.68) to thermophiles (17.44), and for C by T, mesophiles again score a higher value (13.92) than both psychrophiles (13.38) and thermophiles (11.87). In the transversion substitution, the probability of substitution for A↔T and G↔T was in decreasing order from psychrophiles (5.17) to mesophiles (4.63) to thermophiles (4.37), and the probability of substitution for A by C and G by C is in increasing order from psychrophiles (5.7) to thermophiles (6.42), but lower in mesophiles (5.55), and the probability of substitution for T by A and C by A is in decreasing order from psychrophiles (6.53) to mesophiles (6.02) to thermophiles (5.99), and that for T by G and C by G is in increasing order from psychrophiles (7.82) to thermophiles (8.52), but lower in mesophiles (7.54). But altogether, there is an increase in the G+C content for thermophiles, and in A+T for psychrophiles.
The results for the probability of substitution are consistent with the increased G+C content in thermophiles compared with mesophiles and psychrophiles (Fig. 1). In the transition substitution, there is an increasing order for T by C and a decreasing order for G by A; in the transversion substitution, there is a decreasing order for T by A, C by A, A by T, and G by T from psychrophiles to mesophiles to thermophiles. There is a habitat-specific selection pressure that confirms the presence of decreased portion of G+C content and increased portion of A+T content in psychrophiles as compared with mesophiles and thermophiles, and increased portion of G+C content and decreased portion of A+T content in thermophiles as compared with mesophiles and psychrophiles.

The probability of substitution of one base (on y-axis) to another base (on x-axis) in the 16S rRNA gene sequence.
3.3. Phylogenetic relatedness
The 16S rRNA gene sequence comparison provides the basis to evaluate the phylogenetic relationship among the species of eubacteria, because it has conserved regions that exist throughout the bacterial domain. These conserved regions are being used to evaluate the phylogenetic relatedness among bacterial species. The phylogenetic and evolutionary relatedness is deduced through measurement of identity among nucleotide sequences of the conserved and variable regions of the 16S rRNA gene between bacterial species (Ludwig and Schleifer, 1994).
The neighbor-joining tree was constructed using the 16S rRNA gene sequences of 150 bacterial species from psychrophile, mesophile, and thermophile groups that exhibited clustering of bacterial species inhabiting the same habitat, which revealed the possibility of habitat-specific selection pressure that would work at the genetic level during the DNA replication and repair process for their adaptation to different habitats (Fig. 2). The species from psychrophiles made a separate cluster, and at a few places have shown clustering with mesophiles, but not with thermophiles. The species from mesophiles formed a separate cluster and shared clustering with psychrophiles and thermophiles. The species from thermophiles made a separate cluster, and at a few places shared clustering with mesophiles but not with psychrophiles.

Neighbor-joining tree based on almost-complete 16S rRNA gene sequences showing the relationship between 150 different bacterial species from psychrophilic (P), mesophilic (M), and thermophilic (T) habitats (50 from each). Numbers at nodes are levels of bootstrap support (percentages) based on a neighbor-joining analysis of 1,000 resampled datasets by using the Tamura 3-parameter method. Scale bar=0.05 substitutions per nucleotide position.
4. Conclusion
The habitat-specific bias in the average frequency of occurrence of each of the four nucleotides by Mega 5.2 and ANOVA through SPSS version 16.0, and the probability of substitution (transition and transversion) in the 16S rRNA gene among 150 different bacterial species (50 from each psychrophilic, mesophilic, and thermophilic groups) provide comprehensive insights into the habitat-specific selection pressure that possibly works at the genetic level during the replication and repair processes. The clustering of bacterial species, belonging to the same habitat in the neighbor-joining tree, also supports the hypothesis that there would be a habitat-specific evolutionary selection pressure that works in the direction to make favorable changes in the basic composition of genetic material that results in decreasing frequency of occurrence of adenine and thymine from psychrophilic through mesophilic to thermophilic species, and in increasing frequency of occurrence of cytosine and guanine from psychrophilic through mesophilic to thermophilic species of bacteria.
Footnotes
Acknowledgments
Financial support in the form of senior research fellowship from the University Grants Commission to H.R., senior research fellowship from the University Grants Commission to A.K., junior research fellowship from the Council of Scientific and Industrial Research to L.T., and a grant under R&D program from the University of Delhi is gratefully acknowledged.
Author Disclosure Statement
The authors declare that there are no conflicting financial interests.
