Genome Survey Sequence Analysis and Identification of Homologs of Major Surface Protease ( gp63 ) Genes in Trypanosoma rangeli

Abstract

In this study, 222 genome survey sequences were generated for Trypanosoma rangeli strain P07 isolated from an opossum (Didelphis albiventris) in Minas Gerais State, Brazil. T. rangeli sequences were compared by BLASTX (Basic Local Alignment Search Tool X) analysis with the assembled contigs of Leishmania braziliensis, Leishmania infantum, Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi. Results revealed that 82% (182/222) of the sequences were associated with predicted proteins described, whereas 18% (40/222) of the sequences did not show significant identity with sequences deposited in databases, suggesting that they may represent T. rangeli-specific sequences. Among the 182 predicted sequences, 179 (80.6%) had the highest similarity with T. cruzi, 2 (0.9%) with T. brucei, and 1 (0.5%) with L. braziliensis. Computer analysis permitted the identification of members of various gene families described for trypanosomatids in the genome of T. rangeli, such as trans-sialidases, mucin-associated surface proteins, and major surface proteases (MSP or gp63). This is the first report identifying sequences of the MSP family in T. rangeli. Multiple sequence alignments showed that the predicted MSP of T. rangeli presented the typical characteristics of metalloproteases, such as the presence of the HEXXH motif, which corresponds to a region previously associated with the catalytic site of the enzyme, and various cysteine and proline residues, which are conserved among MSPs of different trypanosomatid species. Reverse transcriptase–polymerase chain reaction analysis revealed the presence of MSP transcripts in epimastigote forms of T. rangeli.

Introduction

P rotozoans of the family Trypanosomatidae comprise a large group of species, some of them being the causative agents of diseases of worldwide importance. For example, Leishmania spp. is responsible for a group of diseases known as leishmaniasis, Trypanosoma brucei causes sleeping sickness or African trypanosomiasis, and Trypanosoma cruzi is the causative agent of Chagas' disease, a condition that affects approximately 10 million people, especially in Latin America (Schofield et al. 2006).

T. cruzi is sympatric with other trypanosomatids, including Trypanosoma rangeli, which can infect humans but does not cause disease (Hoare 1972). T. rangeli has been detected in vertebrate and invertebrate hosts of T. cruzi in different regions of Brazil (Miles et al. 1983, Steindel et al. 1991, Coura et al. 1996, Ramirez et al. 2002). T. cruzi and T. rangeli are distinct protozoans in terms of morphological, biological, and biochemical aspects. However, patients infected with T. rangeli produce antibodies that recognize T. cruzi antigens in immunological tests (Vásquez et al. 1997, Cuba-Cuba 1998), a fact that makes the specific diagnosis of the chronic phase of Chagas' disease even more difficult. Thus, the development of new techniques for the correct identification of these trypanosomatids is of utmost importance.

The analysis of parasite DNA sequences has become a useful tool for the establishment of different aspects of the biology of these organisms, for the identification of cell function-related genes, for the selection of new drug targets, and for the identification of antigens that can be used for diagnosis or the development of vaccines. In the genome survey sequencing (GSS) approach, random sequences of a given organism are generated and used for the investigation of coding regions based on their similarity with sequences deposited in public databases, and this approach does not require complete sequencing of the organism's genome. In view of the diverse characteristics of protozoans, such as relatively compact genomes and a high gene density, the GSS approach has been applied successfully to the discovery of genes in Cryptosporidium parvum, Leishmania braziliensis, Leishmania major, T. cruzi, and Trypanosoma vivax (Liu et al. 1999, Porcel et al. 2000, Akopyants et al. 2001, Laurentino et al. 2004, Guerreiro et al. 2005).

Despite the importance of T. rangeli in the epidemiology of American trypanosomiasis, little is known about its genome organization and few DNA sequences of the parasite are available in databases. This knowledge would facilitate the development of new tools for the differential diagnosis of infections caused by T. cruzi and T. rangeli. Here we describe the in silico analysis of 222 T. rangeli genomic sequences generated by the GSS approach. Bioinformatic analysis permitted the identification of 40 parasite sequences distinct from sequences available in databanks and of members of the major surface protease (MSP or gp63) family, a gene family not yet reported in this parasite.

Materials and Methods

Parasite strain and culture conditions

T. rangeli strain P07 was isolated from the blood culture of an opossum (Didelphis albiventris) captured in Minas Gerais, Brazil (Ramirez et al. 2002). Parasite samples maintained for approximately 2 months in culture were cryopreserved in liquid nitrogen. The parasite strain was thawed and cultured at 28°C in liver infusion tryptose medium supplemented with 3% (v/v) human urine (Ferreira et al. 2007). Parasite density was determined by counting in a hemocytometer.

DNA and RNA purification

Genomic DNA was isolated from T. rangeli P07 by alkaline lysis of a 40 mL culture of epimastigotes in the exponential phase of growth (Lages-Silva et al. 2001). Plasmid DNA was extracted by the alkaline lysis method (Sambrook et al. 1989). Briefly, selected clones were inoculated into 2 mL LB (Luria-Bertani) medium (2% tryptose, 0.5% yeast extract, and 8.5 mM NaCl) containing 50 μg/mL ampicillin and incubated for 18 h at 37°C under constant shaking. Total RNA was purified from approximately 10⁷ epimastigote forms of T. rangeli (P07) using the TRIzol^® reagent (Invitrogen, Carlsbad, CA). Aliquots containing 10 μg of total RNA were analyzed in a 1.2% agarose formaldehyde/MOPS (3-(N-morpholino) propanesulfonic acid) gel stained with ethidium bromide (Sambrook et al. 1989) to assess RNA integrity.

Library construction

Approximately 500 ng genomic DNA from T. rangeli P07 was digested with BamHI and BglII restriction endonucleases according to manufacturer instructions (Gibco BRL, Rockville, MD). The DNA samples were electrophoresed through 0.7% agarose gel and the bands corresponding to DNA fragments of 2–4 kb were recovered from the gel and purified using the GFX PCR DNA kit (Amersham Biosciences, Freiburg, Germany). The inserts were ligated into pUC18 linearized by digestion with BamHI using T4 DNA ligase (Invitrogen) and transformed into Escherichia coli DH10B by electroporation. The clones obtained were selected by incubation on LB medium supplemented with 1.5% bacteriological agar, 50 μg/mL ampicillin, 80 μg/mL X-gal, and 80 μg/mL IPTG (isoproplyl β-D-thiogalacto pyranoside). A total of 384 clones were individually organized on four microtiter plates containing 100 μL LB medium and 50 μg/mL ampicillin and incubated for 16 h at 37°C. The recombinant clones were identified based on the positions that they occupied on the different plates. The quality of the library was evaluated by double digestion of plasmid DNA with HindIII and EcoRI (Gibco BRL) and the digestion products were analyzed on 1.2% agarose gel.

Nucleic acid analysis

The nucleotide sequences were analyzed in an automatic ABI 3100 sequencer (Applied Biosystems, Foster City, CA) using the BigDye kit (Applied Biosystems) and the M13 forward primer. The reaction was carried out in an Eppendorf Mastercycler Gradient thermocycler under the following conditions: 96°C for 2 min, followed by 40 cycles at 96°C for 15 s, 50°C for 15 s, and 60°C for 4 min. The chromatograms generated by sequencing of the clones of the genome library were processed with the PHRED program. The sequences obtained were deposited in the GSS database of GenBank (accession numbers FI104298 to FI104302, FI111014 to FI111027, FI569251 to FI569456, and GS815965 to GS815968).

Reverse transcriptase–polymerase chain reaction (PCR) was performed with MSP-specific primers (Tr-MSP-Forw: 5′-CGTTGTCCGATTGAAGGTTT-3′; Tr-MSP-Rev: 5′-TCAGTGACCCACGACAACAT-3′) and 0.2 μg of T. rangeli total RNA using the kit SuperScript^® One-Step RT-PCR System with Platinum^® Taq DNA Polymerase (Invitrogen). Amplification products were analyzed in ethidium bromide-stained 1.5% agarose gels.

Sequence analysis

The process of in silico analysis of T. rangeli sequences comprised the following steps: (i) search for similarity with sequences deposited in nonredundant protein databases; (ii) comparison with genomes of other trypanosomatids whose genomes have been completely sequenced or are in the phase of annotation: T. cruzi clone CL Brener, T. brucei strain TREU927/4, L. major Friedlin, Leishmania infantum JPCM5, and L. braziliensis M2904 (all available at www.genedb.org); (iii) BLASTN (Basic Local Alignment Search Tool N; search of a nucleotide database using a nucleotide query comparison of the sequences generated with T. rangeli sequences deposited in the nucleotide databank of GenBank.

Local alignments of T. rangeli sequences against the available complete genomes of other organisms were performed by BLAST (Basic Local Alignment Search Tool) searches (Altschul et al. 1997). The alignments were considered to be valid when presenting E-values ≤1.0 × 10⁻¹² for nucleotide analysis and ≤1.0 × 10⁻⁷ for proteins. The annotation and graphical output of the sequences generated were performed using in-house–developed PERL scripts to analyze and format the results. Individual analysis of the results permitted the exclusion of alignments of low-complexity regions. Multiple alignments of protein sequences were performed using the ClustalW program (Thompson et al. 1994).

Results

Library validation

Plasmid DNA of the recombinant clones was digested with BamHI and BglII. Sixty nine of the 384 plasmids analyzed were discarded because they did not contain the insert. Plasmid DNA of the remaining 315 clones was sequenced. Fifteen (4.8%) additional clones also did not contain the insert, 6 (1.9%) possessed small inserts (less than 100 bp), and 41 (13%) provided sequences of low quality and so these were not included in this study. After searching for similarity with other sequences in databases, 31 (9.8%) sequences were excluded because they were present more than once in our database. Thus, 222 valid nonredundant sequences were analyzed. These sequences presented a mean GC content of 51.9% and a mean insert size of 1.5 kb.

Similarity with sequences deposited in public databases

The similarity between the T. rangeli sequences generated and sequences of complete trypanosomatid genomes deposited in databases (www.genedb.org) was analyzed using BLASTX (Basic Local Alignment Search Tool X), considering the best hit (lowest E-value) of sequence alignments. Among the 222 sequences selected, 179 (80.6%) showed the best hit with T. cruzi, 2 (0.9%) with T. brucei, and 1 (0.5%) with L. braziliensis (Fig. 1). Forty (18%) sequences showed no significant identity with the sequences available in the database, suggesting that they are specific for T. rangeli (Fig. 1). In addition, the search for T. rangeli sequences present in our library and already available in GenBank revealed 12 sequences (5.4%), indicating that 210 of the sequences generated were new sequences of the parasite.

FIG. 1.

Distribution of the 222 valid Trypanosoma rangeli sequences according to best hits (lowest E-values) with genomic sequences of trypanosomatids available in databases (www.genedb.org). Hits presenting the lowest E-values were considered for analysis.

With respect to the evolutive conservation of the 182 predicted proteins of T. rangeli in the genome of other trypanosomatids, 109 (59.9%) were shared with T. brucei, T. cruzi, and Leishmania spp., 30 (16.5%) were observed in T. cruzi and T. brucei, 11 (6.1%) were shared with T. cruzi and Leishmania spp., and 31 (17.0%) were shared only with T. cruzi. Only one sequence (0.5%) was shared with T. brucei and Leishmania and not with T. cruzi.

The Cluster of Orthologous Groups (www.ncbi.nlm.nih.gov/cog) and Gene Ontology (www.geneontology.org) databases were used for the functional classification of the sequences. Based on BLASTX searches performed in the assembled trypanosomatid genome sequences, 49.5% (90/182) of the T. rangeli GSSs were classified as hypothetical proteins and 50.5% (92/182) as predicted protein products. Most of the predicted proteins are involved in cell communication, metabolism, and cell growth and maintenance (Table 1).

Table 1.

Distribution of the Sequences Found According to Predicted Function by Comparison with Databases ^a

Predicted proteins	Number of sequences
Cell communication	29
Cell growth and maintenance	13
Metabolism	15
Protein metabolism	3
RNA metabolism	5
Protein transport	5
Unknown function	25
Hypothetical proteins	90

The Cluster of Orthologous Groups (ncbi.nlm.nih.gov/cog) and Gene Ontology (www.geneontology.org) databases were used for functional classification of the sequences.

Table 2 shows the distribution of protein-coding sequences of T. rangeli according to predicted function and the number of representatives showing significant similarity with the sequences of other trypanosomatids. Considering the similarities in the predicted proteins shared by the five trypanosomatid species, we observed that 13 sequences of the genomic library of T. rangeli presented sequence identity with the MSPs of other trypanosomatids. Next are tyrosin aminotransferase, protein kinase, and heat shock protein, with four representatives in the T. rangeli sequences obtained. Among the predicted sequences shared only by species of the genus Trypanosoma, trans-sialidase had the largest number of representatives in the T. rangeli GSS (10 sequences), followed by elements related to non-LTR (long-terminal repeat) retrotransposons (retrotransposon hot spot elements, 7), mucin-associated surface protein (4), and ubiquitin hydrolase (2).

Table 2.

Distribution of Trypanosoma rangeli Genome Survey Sequences and Number of Sequences Showing High Percentage of Similarity with Other Trypanosomatids

Sequences of Trypanosoma rangeli		Representatives with similarity in databases
Predicted protein/sequence name	Number (GenBank accession number)	Tb	Tc	Lm	Li	Lb
Amino acid transporter	2 (F1569273 and F1569437)	2	2	2	2	2
Calpain-like cysteine peptidase	1 (F1569398)	1	1	1	1	1
CCR4-associated factor	1 (F1569268)	1	1	1	1	1
Chaperone DNAJ protein	1 (F1569253)	1	1	1	1	1
Citrate synthase	1 (F1569333)	—	1	—	—	—
Condensin, subunit 1	1 (F1569364)	1	1	1	1	1
Dispersed gene family protein 1	2 (F1569442 and GS815964)	—	2	—	1	1
Dynein heavy chain	1 (F1569376)	1	1	1	1	1
Elongation factor 2	1 (F1569414)	1	1	1	1	1
ER (Endoplasmic reticulum) lumen retaining receptor protein	1 (F1569310)	1	1	1	1	1
Expression site-associated gene	1 (F1569327)	1	1	—	—	—
Fatty acid elongase	1 (F1569303)	1	1	1	1	1
Geranyl transferase type II	1 (F1569423)	1	1	1	1	—
Glycosyl transferase-like protein	2 (F1569455 and F1569379)	2	2	2	2	2
Guanine nucleotide-binding protein	1 (F1569390)	1	1	1	—	1
Heat shock protein 40	1 (F1569313)	1	1	1	1	1
Heat shock protein 70	3 (F1569450, F1569432, and F1569365)	3	3	3	3	3
Hexose transporter	3 (F1569412, F1569361, and F1569351)	3	3	3	3	3
Kinesin	3 (F1569267, F1569454, and F1569337)	2	3	2	2	2
KU80	1 (F1569399)	1	1	1	1	1
Major surface protease	13 (FI111027, FI111026, FI111025, FI111024, FI111023, FI111022, FI111021, FI111020, FI111019, FI111018, FI111017, FI111016, and FI111014)	12	13	11	11	10
Meiosis recombination protein	1 (F1569353)	1	1	—	—	—
Membrane-bound acid phosphatase	3 (F1569320, F1569428, and F1569401)	3	3	3	3	3
Mitotic cyclin	1 (F1569326)	1	1	—	—	—
Mucin-associated surface protein	4 (F1569258, F1569377, F1569335, and GS815965)	1	4	—	—	—
Phosphatidylinositol 3-related kinase	1 (F1569370)	1	1	—	—	—
Protein kinase	4 (F1569325, F1569443, F1569381, and F1569348)	3	4	2	2	2
Protein phosphatase	1 (F1569431)	—	1	1	1	1
Pyrroline-5-carboxylate synthetase	1 (F1569307)	—	1	1	1	1
Retrotransposon hot spot (RSH element)	7 (F1569324, F1569312, F1569456, F1569425, F1569408, F1569299, and F1569368)	3	7	—	—	—
RNA-binding protein	1 (F1569323)	1	1	1	1	1
RNA-editing protein	1 (F1569255)	1	1	1	1	1
rRNA methyltransferase	1 (F1569298)	1	1	1	1	1
Serine/threonine-protein kinase	3 (F1569316, F1569270, and F1569343)	2	3	2	2	2
Succinyl-coA:3-ketoacid-coenzyme A transferase	1 (F1569391)	1	1	1	1	1
Synthaxin, putative	1 (F1569280)	1	1	1	1	1
Thioredoxin	1 (F1569438)	1	1	1	1	1
Transcription modulator/accessory	1 (F1569300)	1	1	1	1	1
Trans-sialidase	10 (F1569266, F1569449, F1569434, F1569413, F1569409, F1569404, F1569402, F1569388, F1569372, and F1569344)	6	10	—	—	—
Tyrosine aminotransferase	4 (F1569367, F1569285, F1569309, and F1569366)	1	4	4	4	4
Ubiquitin hydrolase	2 (F1569331 and F1569451)	2	2	2	2	2
Vacuolar protein sorting complex	1 (F1569293)	1	1	1	1	1
Hypothetical proteins	90	71	89	58	59	54
Total	182	140	181	116	117	111

Tb, Trypanosoma brucei; Tc, Trypanosoma cruzi; Lm, Leishmania major; Li, Leishmania infantum; Lb, Leishmania braziliensis.

Identification and expression of the MSP sequences (gp63) in the genome of T. rangeli

BLASTX analyses of 13 predicted T. rangeli MSPs revealed the highest sequence identity with T. cruzi MSPs (38–65% identity) and the lowest sequence identity with L. infantum (29–40% identity). Only one clone (accession number FI111021) had significant and exclusive identity with the MSPs of T. cruzi, and sequence FI111014 had significant sequence identity only with parasites of the genus Trypanosoma (Table 3). In addition to the significant sequence identity, other characteristics suggest that these sequences correspond to members of the MSP family, including the presence of the HEXXH motif, which contains two histidines and one glutamic acid conserved in all trypanosomatid species studied and corresponds to a region previously associated with the catalytic site of Leishmania MSP (Fig. 2A).

FIG. 2.

Multiple sequence alignment analysis and expression of T. rangeli major surface proteases (MSPs). (A) Comparison between the amino acid sequences of the predicted MSPs of T. rangeli (GenBank number FI111023), Trypanosoma cruzi (XP_821023), Trypanosoma brucei (XP_846998), Leishmania braziliensis (XP_001567219), Leishmania major (AAC39120), and Leishmania infantum (XP_001463697) was performed with ClustalW software. The FI111023 sequence of T. rangeli shows the characteristic residues of the catalytic site of MSPs (highlighted in gray). (B) Ethidium bromide-stained agarose gel showing the amplification products of reverse transcriptase–polymerase chain reaction with a primer pair (Tr-MSP-Forw and Tr-MSP-Rev) specific for a MSP sequence and total RNA obtained from P07 (lane 1) and SO29 (lane 2) strains of T. rangeli as templates. A reaction containing parasite RNA as a template without the reverse transcriptase was performed as a negative control (lane 3). Unincorporated oligonucleotides are seen as diffuse bands below 100 bp. MM, 100 bp molecular marker (Invitrogen). *, residues or nucleotides in column are identical in all sequences in the alignment; :, conserved substitutions have been observed; •, semi-conserved substitutions are observed.

Table 3.

Trypanosoma rangeli Predicted Major Surface Protease Sequences and Their Similarities with Other Trypanosomatids Major Surface Protease Proteins

	Percentage of similarity of MSP proteins from trypanosomatid databanks and respective GeneDB identification numbers
GenBank number of predicted Trypanosoma rangeli MSP sequences	Trypanosoma brucei	Trypanosoma cruzi	Leishmania major	Leishmania infantum	Leishmania braziliensis
FI111014	32 (Tb11.02.5640)	38 (Tc00.1047053506587.100)	—	—	—
FI111016	39 (Tb08.29O9.350)	60 (Tc00.1047053510281.20)	34 (LmjF10.0465)	30 (LinJ10.0780)	30 (LbrM31_V2.2260)
FI111017	39 (Tb11.02.5640)	52 (Tc00.1047053505965.10)	31 (LmjF31.2000)	29 (LinJ31.2360)	—
FI111018	54 (Tb08.29O9.350)	55 (Tc00.1047053511257.60)	34 (LmjF31.2000)	31 (LinJ31.2360)	32 (LbrM31_V2.2260)
FI111019	43 (Tb11.02.5630)	54 (Tc00.1047053508475.30)	41 (LmjF10.0480)	40 (LinJ10.0810)	42 (LbrM10_V2.0590)
FI111020	54 (Tb08.29O9.350)	57 (Tc00.1047053508545.40)	30 (LmjF31.2000)	31 (LinJ31.2360)	31 (LbrM31_V2.2260)
FI111021	—	65 (Tc00.1047053506921.10)	—	—	—
FI111022	35 (Tb08.29O9.350)	54 (Tc00.1047053506289.140)	32 (LmjF31.2000)	31 (LinJ31.2360)	30 (LbrM31_V2.2260)
FI111023	59 (Tb08.29O9.350)	55 (Tc00.1047053506435.370)	35 (LmjF10.0480)	36 (LinJ10.0780)	42 (LbrM31_V2.2260)
FI111024	39 (Tb08.29O9.350)	61 (Tc00.1047053510281.20)	35 (LmjF10.0460)	29 (LinJ31.2360)	34 (LbrM10_V2.1690)
FI111025	59 (Tb08.29O9.350)	60 (Tc00.1047053507993.350)	37 (LmjF31.2000)	35 (LinJ31.2360)	35 (LbrM31_V2.2260)
FI111026	39 (Tb08.29O9.350)	52 (Tc00.1047053506289.170)	35 (LmjF10.0460)	29 (LinJ31.2360)	37 (LbrM10_V2.0560)
FI111027	58 (Tb08.29O9.350)	56 (Tc00.1047053508545.40)	31 (LmjF31.2000)	31 (LinJ31.2360)	32 (LbrM31_V2.2260)

MSP, major surface protease.

Expression of T. rangeli MSP transcripts was assessed by reverse transcriptase-PCR using a pair of primers designed from a sequence that codes for the putative catalytic site of MSPs (GenBank accession number FI111023). The predicted 212 bp product was detected in samples containing T. rangeli RNA and no amplification was observed in a negative control sample (Fig. 2B). An additional band of approximately 160 bp was observed in the sample from the P07 strain (Fig. 2B, lane 1), suggesting the occurrence of polymorphisms within the coding regions of T. rangeli MSP genes.

Discussion

Diseases caused by trypanosomatids are collectively considered to be neglected diseases because few resources are invested by the private sector in the development of new tools for the treatment and diagnosis of these diseases. In the case of Chagas' disease the situation is slightly more complicated because the serological diagnosis of chagasic infection might be confused with infection by a nonpathogenic organism, that is, T. rangeli, generating a marked socioeconomic impact in countries where the disease is endemic. Recent studies have estimated approximately 2600 documented human single or mixed infections by T. rangeli in Latin America (reviewed in Guhl and Vallejo 2003). The occurrence of T. rangeli infections in Brazil is considered rare (Coura et al. 1996); however, new cases of human infection with this parasite have been described recently (De Sousa et al. 2008). Within this context, methods based on the detection of parasite nucleic acids would provide an alternative for the development of new strategies for the differential diagnosis of infections caused by T. cruzi and T. rangeli. However, the development of such methods requires the generation of various sequences, and although the genomic sequence of T. cruzi has been determined (El-Sayed et al. 2005a), the number of T. rangeli sequences available in GenBank remains very small. In this study, 222 high-quality sequences obtained from the genome of T. rangeli P07 were generated. Only 12 (5.4%) of these sequences have been previously sequenced and are available in the nucleotide databank of GenBank. The mean GC content of the sequences generated was 51.9%, a value more closely similar to that of T. cruzi (51%) than those reported for T. brucei (46.4%) and L. major (59.7%) (El-Sayed et al. 2005b). In silico analysis also demonstrated high conservation between the predicted coding regions of T. rangeli and the sequences of other trypanosomatids, especially T. cruzi, with 80.6% of the T. rangeli sequences presenting the best hit with this parasite. The high conservation of coding regions between T. rangeli sequences and other trypanosomatids agrees with the results of a previous comparative analysis between genomic sequences of T. brucei, T. cruzi, and L. major (El-Sayed et al. 2005b). In addition, the high percentage of T. rangeli sequences presenting best hit with T. cruzi compared with the small number of sequences presenting best hit with T. brucei (0.9%) and Leishmania (0.5%) supports the hypothesis that T. rangeli is phylogenetically more related to T. cruzi (Stercoraria) than to T. brucei (Salivaria), as demonstrated by ribosomal RNA sequence analysis of various species of the family Trypanosomatidae (Hughes and Piontkivska 2003).

Comparative analysis between the genomes of T. cruzi, T. brucei, and L. major has shown that the genome of these organisms consists of 32%, 26%, and 12% species-specific sequences, respectively, most of them apparently being members of surface antigen families (El-Sayed et al. 2005b). The T. rangeli sequences that did not show similarity to sequences deposited in GenBank accounted for 18% of the sequences generated (40 sequences). Subsequent studies may indicate whether these sequences correspond to coding regions of the parasite genome and, ultimately, whether they correspond to genes associated with the survival strategy of this parasite in its vertebrate and invertebrate hosts. Further, species-specific sequences may serve as targets for the detection of parasite DNA by PCR.

In this study, various sequences encoding predicted proteins in T. rangeli that showed restricted identity to T. brucei and T. cruzi were detected, including trans-sialidases, mucin-associated surface protein, MSPs, and dispersed gene family-1. In T. brucei and T. cruzi, these sequences are mainly found in subtelomeric regions and seem to be associated with specific strategies of host cell invasion (El-Sayed et al. 2005b).

MSPs or gp63 represent a class of surface glycoproteins with metalloprotease activity, which were first described in Leishmania (Russell and Wilhelm 1986), and subsequently in T. brucei (El-Sayed and Donelson 1997), T. cruzi (Cuevas et al. 2003), and the so-called lower trypanosomatids (reviewed in Santos et al. 2006). In this study, we identified for the first time in T. rangeli 13 sequences corresponding to MSPs described in other trypanosomatids. Some of the T. rangeli MSPs isolated present the HEXXH motif, which is characteristic of the catalytic site of metalloproteases and contains two histidines and one glutamic acid that are conserved among all trypanosomatid MSP sequences studied and are essential for proteolytic activity (McGwire and Chang 1996). The presence of MSP transcripts in the epimastigote forms of T. rangeli opens new possibilities to study the function of this gene family in a nonpathogenic trypanosome. The various cysteine and proline residues found in the T. rangeli MSP sequences are related to the formation of the three-dimensional structure of the enzyme. In Leishmania spp., T. brucei, and T. cruzi, the MSP genes are polymorphic and are arranged in tandem. The GSS approach does not permit to determine the genomic organization of individual genes. However, the high percentage of MSP sequences analyzed (5.85%) may reflect the process of construction of our GSS library, which involved the cloning of BamHI/BglII restriction fragments and can lead to the overrepresentation of tandemly repeated genes. Additionally, the fact that, in most cases, T. rangeli MSPs rescued distinct sequences from the trypanosomatid genomes studied suggests that T. rangeli MSPs are also polymorphic.

Analysis of genomic sequences and the determination of the presence of predicted surface proteins are relevant for the understanding of the host–parasite relationship in T. rangeli infection, especially the life cycle of the parasite in the vertebrate host, which is a fundamental but still poorly understood aspect in the biology of T. rangeli parasitism.

Footnotes

Acknowledgments

The authors thank Dr. Angela Kaysel Cruz, Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Brazil, for permitting the use of the sequencing facilities, and Tânia Paula Aquino Defina for technical assistance in processing DNA sequences. This study was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grants 479184/2008-9 and 301375/2005-4) and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG, grant APQ-00135-08). This study was taken in part from the thesis to be submitted by K.A.M.F. for the partial fulfillment of the degree of Doctor of Philosophy, Curso de Pós-Graduação em Medicina Tropical e Infectologia, Universidade Federal do Triângulo Mineiro, Brazil. All experiments were conducted according to Brazilian laws.

Disclosure Statement

No competing financial interests exist.

References

Akopyants

, Clifton

, Martin

, Pape

et al. A survey of the Leishmania major Friedlin strain V1 genome by shotgun sequencing: a resource for DNA microarrays and expression profiling. Mol Biochem Parasitol, 2001; 113:337–340.

Altschul

, Madden

, Schäffer

, Zhang

et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997; 25:3389–3402.

Coura

, Fernandes

, Arboleda

, Barrett

et al. Human infection by Trypanosoma rangeli in the Brazilian Amazon. Trans R Soc Trop Med Hyg, 1996; 90:278–279.

Cuba-Cuba

. Review of the biologic and diagnostic aspects of Trypanosoma (Herpetosoma) rangeli. Rev Soc Bras Med Trop, 1998; 31:207–220.

Cuevas

, Cazzulo

, Sanchez

. gp63 homologues in Trypanosoma cruzi: surface antigens with metalloprotease activity and a possible role in host cell infection. Infect Immun, 2003; 71:5739–5749.

De Sousa

, da Silva Fonseca

, Dos Santos

, Dos Santos Pereira

et al.

Trypanosoma rangeli Tejera, 1920, in chronic Chagas' disease patients under ambulatory care at the Evandro Chagas Clinical Research Institute (IPEC-Fiocruz, Brazil)

Parasitol Res, 2008; 103:697–703.

El-Sayed

, Donelson

. African trypanosomes have differentially expressed genes encoding homologues of the Leishmania GP63 surface protease. J Biol Chem, 1997; 272:26742–26748.

El-Sayed

, Myler

, Bartholomeu

, Nilsson

et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science, 2005a. 309:409–415.

El-Sayed

, Myler

, Blandin

, Berriman

et al. Comparative genomics of trypanosomatid parasitic protozoa. Science, 2005b. 309:404–409.

10.

Ferreira

, Lemos-Junior

, Lages-Silva

, Ramirez

et al. Human urine stimulates in vitro growth of Trypanosoma cruzi and Trypanosoma rangeli. Parasitol Res, 2007; 101:1383–1388.

11.

Guerreiro

, Souza

, Wagner

, De Souza

et al. Exploring the genome of Trypanosoma vivax through GSS and in silico comparative analysis. OMICS, 2005; 9:116–128.

12.

Guhl

, Vallejo

. Trypanosoma (Herpetosoma) rangeli Tejera, 1920: an updated review. Mem Inst Oswaldo Cruz, 2003; 98:435–442.

13.

Hoare

. Herpetosoma from man and other mammals. The Trypanosomes of Mammals: A Zoological Monograph. Hoare

. Oxford: Blackwell Scientific Publications, 1972; 288–323.

14.

Hughes

, Piontkivska

. Phylogeny of Trypanosomatidae and Bodonidae (Kinetoplastida) based on 18S rRNA: evidence for paraphyly of Trypanosoma and six other genera. Mol Biol Evol, 2003; 20:644–652.

15.

Lages-Silva

, Crema

, Ramirez

, Macedo

et al. Relationship between Trypanosoma cruzi and human chagasic megaesophagus: blood and tissue parasitism. Am J Trop Med Hyg, 2001; 65:435–441.

16.

Laurentino

, Ruiz

, Fazelinia

, Myler

et al. A survey of Leishmania braziliensis genome by shotgun sequencing. Mol Biochem Parasitol, 2004; 137:81–86.

17.

Liu

, Vigdorovich

, Kapur

, Abrahamsen

. A random survey of the Cryptosporidium parvum genome. Infect Immun, 1999; 67:3960–3969.

18.

McGwire

, Chang

. Posttranslational regulation of a Leishmania HEXXH metalloprotease (gp63). The effects of site-specific mutagenesis of catalytic, zinc binding, N-glycosylation, and glycosyl phosphatidylinositol addition sites on N-terminal end cleavage, intracellular stability, and extracellular exit. J Biol Chem, 1996; 271:7903–7909.

19.

Miles

, Arias

, Valente

, Naiff

et al. Vertebrate hosts and vectors of Trypanosoma rangeli in the Amazon Basin of Brazil. Am J Trop Med Hyg, 1983; 32:1251–1259.

20.

Porcel

, Tran

, Tammi

, Nyarady

et al. Gene survey of the pathogenic protozoan Trypanosoma cruzi. Genome Res, 2000; 10:1103–1107.

21.

Ramirez

, Lages-Silva

, Alvarenga-Franco

, Matos

et al. High prevalence of Trypanosoma rangeli and Trypanosoma cruzi in opossums and triatomids in a formerly-endemic area of Chagas disease in Southeast Brazil. Acta Trop, 2002; 84:189–198.

22.

Russell

, Wilhelm

. The involvement of the major surface glycoprotein (gp63) of Leishmania promastigotes in attachment to macrophages. J Immunol, 1986; 136:2613–2620.

23.

Sambrook

, Fritsch

, Maniatis

. Molecular Cloning: A Laboratory Manual, 2nd. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 1989.

24.

Santos

, Branquinha

, D'Avila-Levy

. The ubiquitous gp63-like metalloprotease from lower trypanosomatids: in the search for a function. An Acad Bras Cienc, 2006; 78:687–714.

25.

Schofield

, Jannin

, Salvatella

. The future of Chagas disease control. Trends Parasitol, 2006; 22:583–588.

26.

Steindel

, Pinto

, Toma

, Mangia

et al. Trypanosoma rangeli (Tejera, 1920) isolated from a sylvatic rodent (Echimys dasythrix) in Santa Catarina Island, Santa Catarina State: first report of this trypanosome in southern Brazil. Mem Inst Oswaldo Cruz, 1991; 86:73–79.

27.

Thompson

, Higgins

, Gibson

. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994; 22:4673–4680.

28.

Vásquez

, Krusnell

, Orn

, Sousa

et al. Serological diagnosis of Trypanosoma rangeli infected patients. A comparison of different methods and its implications for the diagnosis of Chagas' disease. Scand J Immunol, 1997; 45:322–330.