Abstract
microRNAs represent a class of noncoding small RNAs of approximately 20–23 nt length, which are evolutionarily conserved and play a vital role in various biological processes by either degrading or repressing mRNA translation. The Felis catus (cat) genome sequence has been published, and just revealed the number of miRNAs in the genome–-without mention of any further details on these miRNAs. This paper discusses an in silico comparative approach using all known sequences of vertebrate pre-miRNA as query sequence, and report 405 putative miRNAs from cat genome. We determine the identity values of pre-miRNAs and mature miRNAs besides statistical sequence characteristics. Interestingly, among 405 miRNAs–-90, 53 and 50 showed 100% identity to cattle, human and dog, respectively. Further, we have validated 6 miRNAs, whose identity are <85% with the query sequence and validated them using MiPred algorithm. We also identify 25 miRNA clusters in cat based on their homologs in other vertebrates. Most importantly, based on identities among pre-miRNA, mature miRNA, miRNA families and clusters, we observe that miRNAs from cat are more identical to cattle, than humans. Our results, therefore may add a new dimension to the studies related to the evolution of cat.
Introduction
miRNAs, a class of conserved small noncoding regulatory RNAs, which do not translate into a protein and are involved in regulation of gene expression. They inhibit translation by binding to their target mRNAs at the 3′ untranslated regions (UTRs) followed by their degradation or inhibiting its expression. 1 Studies have shown that miRNAs are also involved in control of organ development, stem cell differentiation and developmental timing.2,3 Different reports reveals the role of miRNAs in diseases such as cancer and other infections. 4 Recent studies on miRNAs have begun to understand the influence of the microRNA network or an individual miRNA on cellular processes. 5
By March, 2009, 9539 hairpin sequence entries have been registered in miRBase.6–8 Sequence analysis have shown that some mature miRNAs are phylogenetically conserved, particularly in the first 7–8 residues at the 5′ end in species of the same kingdom. 6 Quite a few mature miRNA sequences are conserved between animals and plants. For example, mir-854, has been identified in C. elegans, mouse, human and plants. 9
There are two major approaches for miRNA discovery, which are computational identification and cDNA cloning. Since some miRNAs are expressed at a low level and the expression of many miRNAs has spatiotemporal specificity, it is difficult to find them through cDNA cloning. However, computational approaches can predict the miRNAs specifically expressed or with low abundance. The hairpin sequences of precursor miRNAs are phylogenetically diverse. In addition, the genomic locations of miRNA precursor genes and the folding structures of miRNAs have been used to identify previously unknown miRNAs. The three characteristics that allow miRNA genes to be identified using computational approaches are: i. miRNAs are generally derived from 70–100 nucleotide precursor transcripts having an extended stem-loop structure; ii. miRNAs are usually conserved between genomes of related species; and iii. miRNAs display a characteristic pattern of evolutionary divergence.10–20 Additionally genomic mapping of known miRNAs have enabled identification of orthologous miRNAs in other species where genomic annotations are lacking.21,22
Experimental evidence reveals that miRNAs play important roles in a variety of disease, such as cancer, diabetes, viral infection, cardiac disease, as well as in stem cell biology.14–20,23 Some miRNAs are present in the genome as clusters where multiple miRNAs are aligned in the same orientation and transcribed as polycistronic structure, which may function synchronously and cooperatively. Studies have provided evidence for miRNA regulation of many essential oncogenes including BCL2, RAS, MYC, p53.22,24–28
The sequence of the Felis catus (cat) genome was published recently. 29 Although, 179 cat miRNAs were reported in the same study, sequences and their characteristics were not provided. The Ensemble database has 845 entries of predicted miRNAs, however, the data set is automated from clustered BLAST hits without any filtering parameters, annotated using Identity percent, homologs, or gap criteria. In the present study by using computational approach, we iterate the BLAST hit sequences and further validate the sequences with less than 85% identity using miPred and thus report 405 putative miRNAs and their statistical characteristics in cat genome.
Materials and Methods
To search for potential conserved miRNAs, the sequences of previously identified pre-miRNAs (miRNA genes) for all known animals were obtained from miRBase Sequence Database, release 13.0. 30 We have obtained sequences of all vertebrates from release 13.0 (March 2009). This downloaded set contained nearly 7383 miRNA sequences, which were both predicted and experimentally validated. We performed BLAST search with expect value 0.01, mismatch −2 using all previously reported vertebrate pre-miRNAs and query against the cat genome assembly (2.8X) Build 0.1 (release 01/14/2009). 31 All the hits were downloaded in FASTA format and used for further analysis. The hits were carefully examined and iterated to remove the duplicate hits of same miRNA gene based on similar genome coordinates and with maximum optimal values of identity, alignment length, bit score values and gaps between the query and hit sequences, whose values differ from one miRNA to another. Also, for the similar bit score value hits, minimal folding energy was considered. For those hits with identity <85% were screened with MiPred algorithm (Fig. 1). 32 Clusters were identified by comparing the newly reported miRNAs to the previously identified 33 miRNA clusters in miRBase. The sequence characteristics analysis was performed by identifying base frequencies of A, G, C, U, A + U, G + C of the hit sequences. Independently, statistical analysis of all vertebrate miRNA gene sequences were performed and compared with the iterated putative pre-miRNA sequences. Minimal folding energy (MFE) of the secondary structure was obtained using m-FOLD 34 for all the putative pre-miRNA genes. The adjusted minimal folding energy (AMFE) and the MEFI (minimal energy folding index), was calculated as previously described by Zhang. 35

Schematic representation of the miRNA gene search procedure to identify homologs to reported miRNAs.
Results and Discussion
Pre-miRNA and mature miRNAs of cat
All the sequences from BLAST hits were stored and iterated as mentioned above. The resulting sequences are referred as putative pre-miRNA sequences. Thus, we identify 405 putative pre-miRNAs, of which 87 have 100% identity, 168 have identity between 95%–99.9%, 112 have identity between 90%–94.9% and 32 have between 85%–89.9% of identity (Fig. 2, Suppl. Table 1, 2). The hits with less than 85% identity were screened using miPred 32 to predict if any of these sequences could be considered as putative pre-miRNA sequence. We found that 6 sequences with less than 85% identity could be considered as real miRNAs (Fig. 2, Suppl. Table 1, 2).

Identity percent of pre-miRNAs and mature miRNAs. Identity percent of Felis catus hits were compared with query sequences.
Statistical analysis of pre-miRNA sequence of Felis catus and other known vertebrates.
The units for length is number of nucleotides.
Comparison of MFE, AMFE, MFEI of Felis catus and other known vertebrates.
The units for MFE and AMFE is kcal/mol.
The mature miRNA sequences were compared for their identities with their query sequences and found that 357 miRNAs have 100% identity, 20 have between 95%–99.9%, 19 have between 90%–94.9%, 5 between 85%–89.9% and 4 with less than 85% identity (Fig. 2). The number of miRNAs that overlap based on their identity between pre- and mature miRNAs, were used to construct a matrix in an excel sheet (Suppl. Table 3). A bird view of the matrix showed that 87 (21.48%) miRNAs had both pre- and mature sequences with 100% identity (Fig. 3). We also find that 156 (38.51%) miRNAs have an identity of 100% with mature sequence and 95%−99.9% with pre-miRNA sequences (Fig. 3). Further, it was noted that among the 6 pre-miRNA sequences with identity of <85%, 3 (0.74%) were having 100% identity at the mature sequence. Overall, 357 mature miRNAs with 100% identity suggests high conservation of sequences at the mature miRNA level (Fig. 3).

Histogram showing number of pre-miRNAs as a subset of mature miRNAs based on their percent identity. X-axis denotes the different percent identities of mature miRNA, y-axis denotes the same for pre-miRNAs and z-axis shows the number of miRNAs.
Identity with other vertebrates
We wondered what could be the identity pattern of the putative cat miRNAs among all vertebrate species studied. In particular, our interest was more to those species from which the predicted putative miRNAs were deduced from BLAST hits with maximum identity percent. For this, we constructed a matrix for the number of pre-miRNAs over query species based on the identity percent (Suppl. Table 4). We found that 96 pre-miRNAs are identical to Bos taurus (cattle), 67 to Homo sapiens (human), 54 to Canis familiaris (dog) and so on (Fig. 4A, Suppl. Table 4). Interestingly, 29 of the miRNAs have 100% identity to cattle, followed by 12 to human, 17 to dog and 5 to mouse (Fig. 4A). Further, 42 miRNAs have identity between 95%–99.9% to cattle, followed by 35 to human, 29 to Dog and 14 mouse. Hence our studies based on miRNA identity analysis suggest that more number of cat miRNAs is identical to cattle than to human.

Comparison of percent identity of pre-miRNAs of cat with other vertebrates.
Further, we were curious to know whether the identity pattern between cat and other vertebrates were also similar at mature miRNA level. In order to investigate this we examined mature miRNA identity across the vertebrates, by constructing a similar matrix as described for pre-miRNAs (Suppl. Table 5). Our results show that at 100% identity–-90 miRNA sequences were identical to cattle, followed by 53 in humans, 50 in dogs and 30 in mice (Fig. 4B). While the number of mature miRNAs do not show distinct pattern below 99.9% identity, this is due to more number of miRNAs shift towards 100% identity at mature sequence. Therefore, these results indicate that just like pre-miRNA, mature miRNA analyses also showed that cattle is closer to cat than other vertebrates.

Histogram showing number of mature miRNAs distribution with respect to mature miRNA identities across vertebrate species. X-axis shows mature miRNA identity (%), y-axis shows the vertebrate species studied and z-axis is the number of miRNAs.
Statistical sequence characteristics
Sequence characteristics of pre- and mature miRNAs are reported in plants and very recently in animals. It is important to perform a statistical analysis comparing with the rest of reported animal miRNAs, for understanding the distribution of random bases variables across the animal kingdom. Further, these statistical values could play an important role in understanding the miRNAs and deducing more computational methods for predicting more true positive hits. 36 We performed a detailed analysis of sequence characteristics of both the cat and all known vertebrate miRNA sequences (Table 1). Our study shows that the length of cat pre-miRNAs varies between 56 to 146 nucleotides (nt) on an average of 86.62 ± 14.4 nt, which is consistent with the other known vertebrates (Table 1). The mean base composition of pre-miRNA sequences in cat and other vertebrates are identical. The frequency of U in cat is higher and is 28.3% (Fig. 5A). Such higher frequency of U has also been reported for all other known vertebrate miRNAs. 36 Due to the high frequency of A + U, the base frequency of G + C is generally found to be much less in cat which is the case even in other vertebrates (Table 1). 37 It is well understood that A + U composition decreases the stability of the pre-miRNA secondary structure; it gets easily processed to mature miRNA by RNA induced silencing complex (RISC). 38 Further, we find that G/C and U/A ratio for vertebrates and cat are 1.16 ± 0.24, 1.22 ω 0.1 and 1.18 ± 0.36, 1.22 ± 0.03, respectively, which is again identical and vertebrate's base ratios are consistent with the previous study. 20 We further analyzed the frequency of nucleotides at each position in the mature miRNA of cat. We find that U is predominant at positions 1, 6, 9, 13, 17 and 21 (Fig. 4A). Generally, in plants, cytosine is the dominant nucleotide at position 19; however, this is not reported to any animal. 38 Instead this is the first report of guanine being dominant (31.8%) at the position 19, followed by Uracil (28.4%), Cytosine (19.51%) and Adenine (15.81%) in cat. Further, we also find that G + C frequency is higher than A + U at position 19 and 11; and interestingly at position 15 it is equal, which is unique. Earlier studies have shown that uracil nucleotide is the predominant at 5′ end of the mature miRNA. Based on this it has been proposed that the 5′ end may play an important role in biogenesis of mature miRNA through recognition of the targeted miRNA precursors by RISC. 38 Consistent to this, our studies have also shown that uracil is present predominantly at 5′ end of the cat miRNAs (Fig. 5B), adding an additional support to the existing studies.

Graph showing base frequency across the length of the mature miRNA sequences. A, U, G, C frequency (%) was plotted against the length of mature miRNA sequences.

End base frequency in miRNA of cats.
Previously it has been demonstrated that compared to other noncoding RNAs, pre-miRNAs have lower MFE. 38 Therefore, MFE was considered as one of the important factors to identify miRNA genes. 32 We find that the average MFE values for cat pre-miRNAs is 35.6 ± 9.8 kcal/mol. Adjusted minimal folding energy (AMFE) and MFEI was calculated to normalize the MFE values for the sequence length (Table 2). Altogether, these statistical features could play a significant role in further understanding the secondary formation of the pre-miRNAs.
microRNA family and cluster
miRNAs are usually grouped into gene families. A typical miRNA family contains several mature miRNAs with identical sequences and could be present anywhere in the genome of an organism. 39 In most of the cases, at least one member of the family has been experimentally identified, while others are identified based on in silico sequence homology analysis. Often there are differences on either ends of miRNAs, even belonging to the same family. However, it is not known whether these affect the function of miRNA. Generally the region between 2nd and 7th nucleotides (6 nt length) of 5′ end of mature miRNA sequence is referred to as the “seed region”. 21 Such seed regions are mostly conserved among miRNAs of same family. However, the physiological significance of such multiple miRNAs with conserved seed sequences is still unclear. Interestingly, among the 405 miRNAs identified by us from cat, 200 belong to 57 miRNA gene families (Suppl. Table 6). A matrix was constructed to analyze the distribution of microRNA family within vertebrates (Suppl. Table 7). Interestingly, we find that among 57 families of miRNA in cat, 38 are from cattle, 24 from human, 22 from dog and 15 are from mouse. This further confirms the observed closeness between cat and cattle.
It is well known that miRNA genes often form clusters within the genome. 13 Clusters are defined as miRNA genes present in the same orientation of gene and are transcribed as one polycistronic transcriptional unit. At a threshold sequence length 500–1000 bp, we identify 32 clusters from cats (Suppl. Table 8). In an earlier study 29 it has been reported that mir-let-7a-1 and mir-125a were missing from cat miRNA clusters 24 and 6, respectively. However, we found that both mir-let-7a-1 and mir-125a are in fact present in the respective clusters (Suppl. Table 8). The total miRNA identified in that study was only about 179, in contrast to 405 in our study, which could be due to incomplete sequencing of Felis catus genome during 2007. 29 It is also important to point out that some clusters, which have been reported from other vertebrates, are not seen in the cat (data not shown).
Conclusion
Many in silico prediction of miRNAs have been developed to help in experimental studies of miRNA discovery. Some approaches for the prediction of miRNAs generally are, genome wide prediction of hairpin structures, filtering or scoring of hairpins on the basis of their similarity to known miRNA hairpins and experimental validation of putative candidates. 40 Different criteria were proposed to reduce the false positive and to increase specificity of prediction. 41 In the present study we identify 405 putative miRNAs in cat based on computational comparative approach analysis. By using MiPred, we validate 6 miRNAs whose identity is <85%, yet do not affect the standard deviation values in sequence characteristics. Statistical sequence characteristics of cat miRNAs clearly lie within the range of other vertebrates. As mentioned earlier, experimental evidences are needed to further confirm whether such miRNAs with less identity can be considered as miRNAs. In this method of predicition, we may be missing out those novel miRNAs who may have not yet been identified. However, evolutionary conservation is considered an important feature of the hairpin sequence analyses, hence, is often used to identify conserved noncoding sequences in different genomes. 39 Phylogenetic shadowing has been used for combined selection and filtering of miRNA candidates. 42 Statistical sequence characteristic considerations could be an important in further understanding the folding of pre-miRNAs into secondary structures and comparing sequence characters of the miRNAs among other organisms. Further, based on comparative homolog analysis at pre-miRNA, mature miRNA and miRNA family levels it was revealed that cat miRNAs are much closer to cattle, than human, followed by dog and mouse. Conservation of 357 mature sequences reveals that seed regions are untouched during the evolution of cat genome. This suggests that mRNA regions could also be conserved, which further indicate that cat genome could be more identical to cattle, than to humans. Therefore, our results add a new dimension to the studies related to evolution of cat.
Footnotes
Acknowledgement
We thank the Cat Genome Sequencing Group for their efforts and making data available for the public.
Disclosure
This manuscript has been read and approved by all authors. This paper is unique and is not under consideration by any other publication and has not been published elsewhere. The authors and peer reviewers of this paper report no conflicts of interest. The authors confirm that they have permission to reproduce any copyrighted material.
Supplementary Table Legends
Suppl Table 1. Table showing the number of pre- and mature miRNAs in cat.
Suppl Table 2. Detailed list of predicted miRNAs from cat.
Suppl Table 3. Matrix showing distribution of total miRNAs between pre-miRNA and mature identity percent.
Suppl Table 4. Matrix showing distribution of total miRNAs between pre-miRNA identity and vertebrates.
Suppl Table 5. Matrix showing distribution of total miRNAs between mature identity and vertebrates.
Suppl Table 6. List of family of miRNAs.
Suppl Table 7. Matrix showing distribution of total miRNAs between family miRNAs and vertebrates.
Suppl Table 8. Detailed list of predicted clustered miRNAs.
