Abstract
In this study, 222 genome survey sequences were generated for Trypanosoma rangeli strain P07 isolated from an opossum (Didelphis albiventris) in Minas Gerais State, Brazil. T. rangeli sequences were compared by BLASTX (Basic Local Alignment Search Tool X) analysis with the assembled contigs of Leishmania braziliensis, Leishmania infantum, Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi. Results revealed that 82% (182/222) of the sequences were associated with predicted proteins described, whereas 18% (40/222) of the sequences did not show significant identity with sequences deposited in databases, suggesting that they may represent T. rangeli-specific sequences. Among the 182 predicted sequences, 179 (80.6%) had the highest similarity with T. cruzi, 2 (0.9%) with T. brucei, and 1 (0.5%) with L. braziliensis. Computer analysis permitted the identification of members of various gene families described for trypanosomatids in the genome of T. rangeli, such as trans-sialidases, mucin-associated surface proteins, and major surface proteases (MSP or gp63). This is the first report identifying sequences of the MSP family in T. rangeli. Multiple sequence alignments showed that the predicted MSP of T. rangeli presented the typical characteristics of metalloproteases, such as the presence of the HEXXH motif, which corresponds to a region previously associated with the catalytic site of the enzyme, and various cysteine and proline residues, which are conserved among MSPs of different trypanosomatid species. Reverse transcriptase–polymerase chain reaction analysis revealed the presence of MSP transcripts in epimastigote forms of T. rangeli.
Introduction
T. cruzi is sympatric with other trypanosomatids, including Trypanosoma rangeli, which can infect humans but does not cause disease (Hoare 1972). T. rangeli has been detected in vertebrate and invertebrate hosts of T. cruzi in different regions of Brazil (Miles et al. 1983, Steindel et al. 1991, Coura et al. 1996, Ramirez et al. 2002). T. cruzi and T. rangeli are distinct protozoans in terms of morphological, biological, and biochemical aspects. However, patients infected with T. rangeli produce antibodies that recognize T. cruzi antigens in immunological tests (Vásquez et al. 1997, Cuba-Cuba 1998), a fact that makes the specific diagnosis of the chronic phase of Chagas' disease even more difficult. Thus, the development of new techniques for the correct identification of these trypanosomatids is of utmost importance.
The analysis of parasite DNA sequences has become a useful tool for the establishment of different aspects of the biology of these organisms, for the identification of cell function-related genes, for the selection of new drug targets, and for the identification of antigens that can be used for diagnosis or the development of vaccines. In the genome survey sequencing (GSS) approach, random sequences of a given organism are generated and used for the investigation of coding regions based on their similarity with sequences deposited in public databases, and this approach does not require complete sequencing of the organism's genome. In view of the diverse characteristics of protozoans, such as relatively compact genomes and a high gene density, the GSS approach has been applied successfully to the discovery of genes in Cryptosporidium parvum, Leishmania braziliensis, Leishmania major, T. cruzi, and Trypanosoma vivax (Liu et al. 1999, Porcel et al. 2000, Akopyants et al. 2001, Laurentino et al. 2004, Guerreiro et al. 2005).
Despite the importance of T. rangeli in the epidemiology of American trypanosomiasis, little is known about its genome organization and few DNA sequences of the parasite are available in databases. This knowledge would facilitate the development of new tools for the differential diagnosis of infections caused by T. cruzi and T. rangeli. Here we describe the in silico analysis of 222 T. rangeli genomic sequences generated by the GSS approach. Bioinformatic analysis permitted the identification of 40 parasite sequences distinct from sequences available in databanks and of members of the major surface protease (MSP or gp63) family, a gene family not yet reported in this parasite.
Materials and Methods
Parasite strain and culture conditions
T. rangeli strain P07 was isolated from the blood culture of an opossum (Didelphis albiventris) captured in Minas Gerais, Brazil (Ramirez et al. 2002). Parasite samples maintained for approximately 2 months in culture were cryopreserved in liquid nitrogen. The parasite strain was thawed and cultured at 28°C in liver infusion tryptose medium supplemented with 3% (v/v) human urine (Ferreira et al. 2007). Parasite density was determined by counting in a hemocytometer.
DNA and RNA purification
Genomic DNA was isolated from T. rangeli P07 by alkaline lysis of a 40 mL culture of epimastigotes in the exponential phase of growth (Lages-Silva et al. 2001). Plasmid DNA was extracted by the alkaline lysis method (Sambrook et al. 1989). Briefly, selected clones were inoculated into 2 mL LB (Luria-Bertani) medium (2% tryptose, 0.5% yeast extract, and 8.5 mM NaCl) containing 50 μg/mL ampicillin and incubated for 18 h at 37°C under constant shaking. Total RNA was purified from approximately 107 epimastigote forms of T. rangeli (P07) using the TRIzol® reagent (Invitrogen, Carlsbad, CA). Aliquots containing 10 μg of total RNA were analyzed in a 1.2% agarose formaldehyde/MOPS (3-(N-morpholino) propanesulfonic acid) gel stained with ethidium bromide (Sambrook et al. 1989) to assess RNA integrity.
Library construction
Approximately 500 ng genomic DNA from T. rangeli P07 was digested with BamHI and BglII restriction endonucleases according to manufacturer instructions (Gibco BRL, Rockville, MD). The DNA samples were electrophoresed through 0.7% agarose gel and the bands corresponding to DNA fragments of 2–4 kb were recovered from the gel and purified using the GFX PCR DNA kit (Amersham Biosciences, Freiburg, Germany). The inserts were ligated into pUC18 linearized by digestion with BamHI using T4 DNA ligase (Invitrogen) and transformed into Escherichia coli DH10B by electroporation. The clones obtained were selected by incubation on LB medium supplemented with 1.5% bacteriological agar, 50 μg/mL ampicillin, 80 μg/mL X-gal, and 80 μg/mL IPTG (isoproplyl β-D-thiogalacto pyranoside). A total of 384 clones were individually organized on four microtiter plates containing 100 μL LB medium and 50 μg/mL ampicillin and incubated for 16 h at 37°C. The recombinant clones were identified based on the positions that they occupied on the different plates. The quality of the library was evaluated by double digestion of plasmid DNA with HindIII and EcoRI (Gibco BRL) and the digestion products were analyzed on 1.2% agarose gel.
Nucleic acid analysis
The nucleotide sequences were analyzed in an automatic ABI 3100 sequencer (Applied Biosystems, Foster City, CA) using the BigDye kit (Applied Biosystems) and the M13 forward primer. The reaction was carried out in an Eppendorf Mastercycler Gradient thermocycler under the following conditions: 96°C for 2 min, followed by 40 cycles at 96°C for 15 s, 50°C for 15 s, and 60°C for 4 min. The chromatograms generated by sequencing of the clones of the genome library were processed with the PHRED program. The sequences obtained were deposited in the GSS database of GenBank (accession numbers FI104298 to FI104302, FI111014 to FI111027, FI569251 to FI569456, and GS815965 to GS815968).
Reverse transcriptase–polymerase chain reaction (PCR) was performed with MSP-specific primers (Tr-MSP-Forw: 5′-CGTTGTCCGATTGAAGGTTT-3′; Tr-MSP-Rev: 5′-TCAGTGACCCACGACAACAT-3′) and 0.2 μg of T. rangeli total RNA using the kit SuperScript® One-Step RT-PCR System with Platinum® Taq DNA Polymerase (Invitrogen). Amplification products were analyzed in ethidium bromide-stained 1.5% agarose gels.
Sequence analysis
The process of in silico analysis of T. rangeli sequences comprised the following steps: (i) search for similarity with sequences deposited in nonredundant protein databases; (ii) comparison with genomes of other trypanosomatids whose genomes have been completely sequenced or are in the phase of annotation: T. cruzi clone CL Brener, T. brucei strain TREU927/4, L. major Friedlin, Leishmania infantum JPCM5, and L. braziliensis M2904 (all available at
Local alignments of T. rangeli sequences against the available complete genomes of other organisms were performed by BLAST (Basic Local Alignment Search Tool) searches (Altschul et al. 1997). The alignments were considered to be valid when presenting E-values ≤1.0 × 10−12 for nucleotide analysis and ≤1.0 × 10−7 for proteins. The annotation and graphical output of the sequences generated were performed using in-house–developed PERL scripts to analyze and format the results. Individual analysis of the results permitted the exclusion of alignments of low-complexity regions. Multiple alignments of protein sequences were performed using the ClustalW program (Thompson et al. 1994).
Results
Library validation
Plasmid DNA of the recombinant clones was digested with BamHI and BglII. Sixty nine of the 384 plasmids analyzed were discarded because they did not contain the insert. Plasmid DNA of the remaining 315 clones was sequenced. Fifteen (4.8%) additional clones also did not contain the insert, 6 (1.9%) possessed small inserts (less than 100 bp), and 41 (13%) provided sequences of low quality and so these were not included in this study. After searching for similarity with other sequences in databases, 31 (9.8%) sequences were excluded because they were present more than once in our database. Thus, 222 valid nonredundant sequences were analyzed. These sequences presented a mean GC content of 51.9% and a mean insert size of 1.5 kb.
Similarity with sequences deposited in public databases
The similarity between the T. rangeli sequences generated and sequences of complete trypanosomatid genomes deposited in databases (

Distribution of the 222 valid Trypanosoma rangeli sequences according to best hits (lowest E-values) with genomic sequences of trypanosomatids available in databases (
With respect to the evolutive conservation of the 182 predicted proteins of T. rangeli in the genome of other trypanosomatids, 109 (59.9%) were shared with T. brucei, T. cruzi, and Leishmania spp., 30 (16.5%) were observed in T. cruzi and T. brucei, 11 (6.1%) were shared with T. cruzi and Leishmania spp., and 31 (17.0%) were shared only with T. cruzi. Only one sequence (0.5%) was shared with T. brucei and Leishmania and not with T. cruzi.
The Cluster of Orthologous Groups (
The Cluster of Orthologous Groups (
Table 2 shows the distribution of protein-coding sequences of T. rangeli according to predicted function and the number of representatives showing significant similarity with the sequences of other trypanosomatids. Considering the similarities in the predicted proteins shared by the five trypanosomatid species, we observed that 13 sequences of the genomic library of T. rangeli presented sequence identity with the MSPs of other trypanosomatids. Next are tyrosin aminotransferase, protein kinase, and heat shock protein, with four representatives in the T. rangeli sequences obtained. Among the predicted sequences shared only by species of the genus Trypanosoma, trans-sialidase had the largest number of representatives in the T. rangeli GSS (10 sequences), followed by elements related to non-LTR (long-terminal repeat) retrotransposons (retrotransposon hot spot elements, 7), mucin-associated surface protein (4), and ubiquitin hydrolase (2).
Tb, Trypanosoma brucei; Tc, Trypanosoma cruzi; Lm, Leishmania major; Li, Leishmania infantum; Lb, Leishmania braziliensis.
Identification and expression of the MSP sequences (gp63) in the genome of T. rangeli
BLASTX analyses of 13 predicted T. rangeli MSPs revealed the highest sequence identity with T. cruzi MSPs (38–65% identity) and the lowest sequence identity with L. infantum (29–40% identity). Only one clone (accession number FI111021) had significant and exclusive identity with the MSPs of T. cruzi, and sequence FI111014 had significant sequence identity only with parasites of the genus Trypanosoma (Table 3). In addition to the significant sequence identity, other characteristics suggest that these sequences correspond to members of the MSP family, including the presence of the HEXXH motif, which contains two histidines and one glutamic acid conserved in all trypanosomatid species studied and corresponds to a region previously associated with the catalytic site of Leishmania MSP (Fig. 2A).

Multiple sequence alignment analysis and expression of T. rangeli major surface proteases (MSPs). (
MSP, major surface protease.
Expression of T. rangeli MSP transcripts was assessed by reverse transcriptase-PCR using a pair of primers designed from a sequence that codes for the putative catalytic site of MSPs (GenBank accession number FI111023). The predicted 212 bp product was detected in samples containing T. rangeli RNA and no amplification was observed in a negative control sample (Fig. 2B). An additional band of approximately 160 bp was observed in the sample from the P07 strain (Fig. 2B, lane 1), suggesting the occurrence of polymorphisms within the coding regions of T. rangeli MSP genes.
Discussion
Diseases caused by trypanosomatids are collectively considered to be neglected diseases because few resources are invested by the private sector in the development of new tools for the treatment and diagnosis of these diseases. In the case of Chagas' disease the situation is slightly more complicated because the serological diagnosis of chagasic infection might be confused with infection by a nonpathogenic organism, that is, T. rangeli, generating a marked socioeconomic impact in countries where the disease is endemic. Recent studies have estimated approximately 2600 documented human single or mixed infections by T. rangeli in Latin America (reviewed in Guhl and Vallejo 2003). The occurrence of T. rangeli infections in Brazil is considered rare (Coura et al. 1996); however, new cases of human infection with this parasite have been described recently (De Sousa et al. 2008). Within this context, methods based on the detection of parasite nucleic acids would provide an alternative for the development of new strategies for the differential diagnosis of infections caused by T. cruzi and T. rangeli. However, the development of such methods requires the generation of various sequences, and although the genomic sequence of T. cruzi has been determined (El-Sayed et al. 2005a), the number of T. rangeli sequences available in GenBank remains very small. In this study, 222 high-quality sequences obtained from the genome of T. rangeli P07 were generated. Only 12 (5.4%) of these sequences have been previously sequenced and are available in the nucleotide databank of GenBank. The mean GC content of the sequences generated was 51.9%, a value more closely similar to that of T. cruzi (51%) than those reported for T. brucei (46.4%) and L. major (59.7%) (El-Sayed et al. 2005b). In silico analysis also demonstrated high conservation between the predicted coding regions of T. rangeli and the sequences of other trypanosomatids, especially T. cruzi, with 80.6% of the T. rangeli sequences presenting the best hit with this parasite. The high conservation of coding regions between T. rangeli sequences and other trypanosomatids agrees with the results of a previous comparative analysis between genomic sequences of T. brucei, T. cruzi, and L. major (El-Sayed et al. 2005b). In addition, the high percentage of T. rangeli sequences presenting best hit with T. cruzi compared with the small number of sequences presenting best hit with T. brucei (0.9%) and Leishmania (0.5%) supports the hypothesis that T. rangeli is phylogenetically more related to T. cruzi (Stercoraria) than to T. brucei (Salivaria), as demonstrated by ribosomal RNA sequence analysis of various species of the family Trypanosomatidae (Hughes and Piontkivska 2003).
Comparative analysis between the genomes of T. cruzi, T. brucei, and L. major has shown that the genome of these organisms consists of 32%, 26%, and 12% species-specific sequences, respectively, most of them apparently being members of surface antigen families (El-Sayed et al. 2005b). The T. rangeli sequences that did not show similarity to sequences deposited in GenBank accounted for 18% of the sequences generated (40 sequences). Subsequent studies may indicate whether these sequences correspond to coding regions of the parasite genome and, ultimately, whether they correspond to genes associated with the survival strategy of this parasite in its vertebrate and invertebrate hosts. Further, species-specific sequences may serve as targets for the detection of parasite DNA by PCR.
In this study, various sequences encoding predicted proteins in T. rangeli that showed restricted identity to T. brucei and T. cruzi were detected, including trans-sialidases, mucin-associated surface protein, MSPs, and dispersed gene family-1. In T. brucei and T. cruzi, these sequences are mainly found in subtelomeric regions and seem to be associated with specific strategies of host cell invasion (El-Sayed et al. 2005b).
MSPs or gp63 represent a class of surface glycoproteins with metalloprotease activity, which were first described in Leishmania (Russell and Wilhelm 1986), and subsequently in T. brucei (El-Sayed and Donelson 1997), T. cruzi (Cuevas et al. 2003), and the so-called lower trypanosomatids (reviewed in Santos et al. 2006). In this study, we identified for the first time in T. rangeli 13 sequences corresponding to MSPs described in other trypanosomatids. Some of the T. rangeli MSPs isolated present the HEXXH motif, which is characteristic of the catalytic site of metalloproteases and contains two histidines and one glutamic acid that are conserved among all trypanosomatid MSP sequences studied and are essential for proteolytic activity (McGwire and Chang 1996). The presence of MSP transcripts in the epimastigote forms of T. rangeli opens new possibilities to study the function of this gene family in a nonpathogenic trypanosome. The various cysteine and proline residues found in the T. rangeli MSP sequences are related to the formation of the three-dimensional structure of the enzyme. In Leishmania spp., T. brucei, and T. cruzi, the MSP genes are polymorphic and are arranged in tandem. The GSS approach does not permit to determine the genomic organization of individual genes. However, the high percentage of MSP sequences analyzed (5.85%) may reflect the process of construction of our GSS library, which involved the cloning of BamHI/BglII restriction fragments and can lead to the overrepresentation of tandemly repeated genes. Additionally, the fact that, in most cases, T. rangeli MSPs rescued distinct sequences from the trypanosomatid genomes studied suggests that T. rangeli MSPs are also polymorphic.
Analysis of genomic sequences and the determination of the presence of predicted surface proteins are relevant for the understanding of the host–parasite relationship in T. rangeli infection, especially the life cycle of the parasite in the vertebrate host, which is a fundamental but still poorly understood aspect in the biology of T. rangeli parasitism.
Footnotes
Acknowledgments
The authors thank Dr. Angela Kaysel Cruz, Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Brazil, for permitting the use of the sequencing facilities, and Tânia Paula Aquino Defina for technical assistance in processing DNA sequences. This study was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grants 479184/2008-9 and 301375/2005-4) and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG, grant APQ-00135-08). This study was taken in part from the thesis to be submitted by K.A.M.F. for the partial fulfillment of the degree of Doctor of Philosophy, Curso de Pós-Graduação em Medicina Tropical e Infectologia, Universidade Federal do Triângulo Mineiro, Brazil. All experiments were conducted according to Brazilian laws.
Disclosure Statement
No competing financial interests exist.
