Getting Useful Information from RNA-Seq Contaminants: A Case of Study in the Oil-Collecting Bee Tetrapedia diversipes Transcriptome

Abstract

To the Editor:

The RNA-Seq is a straightforward technique widely used in studies of gene expression, especially for nonmodel species. This approach results in a comprehensive data set of genes expressed and their frequency without the need of species-specific probes or a reference genome. Because plant and animal species are constantly interacting with each other and the RNA-Seq is not a species-specific approach, it is highly probably that one can find genes from alien species in a transcriptome data set. Indeed, as reported herein, the data analyses of Tetrapedia diversipes transcriptome revealed contaminant transcripts from plants and parasites. The deep exploitation of this plant contaminant data set proved to be a useful source of information concerning the biology and behavior of this bee.

The oil-collecting bee T. diversipes is a solitary species native of the Neotropical region. This species is bivoltine, that is, presents two main reproductive generations during the year: one in the hot and wet season (generation one—G1) and the second during the cold and dry season (generation two—G2). The developmental cycle since egg till adult varies significantly between the two generations because the prepupal larvae from G2 enter in diapause (Alves-dos-Santos et al., 2006). To understand the differences between each reproductive generation, we have used the RNA-Seq technique to sequence the transcriptome from female foundresses from G1 and G2.

Nine T. diversipes from each reproductive generation were collected in front of trap nests at the city of São Paulo (Brazil) in an area close to a small secondary semideciduous forest containing many native and ornamental plants (Alves-dos-Santos et al., 2006). To identify the contaminant transcripts, complete assembled transcriptomes from G1 and G2 of T. diversipes foundresses were blasted against the Uniref database (August, 2015) using the Annocript program (v1.2) (Musacchia et al., 2015). Scripts in R (v3.1.3), bash, Python (v2.7.9), and manual checking were then used to identify and select contaminant transcripts from plants (pipeline and scripts available at https://github.com/nat2bee/trans-contamination/tree/master).

From the transcriptomes of G1 and G2 female foundresses, respectively, 857 and 538 transcripts were identified as plant contaminants. Contaminant transcripts from G1 blasted against 28 plant families and almost 50% of them (13) were found exclusively in G1. Whereas in G2, 19 different families were identified and four were only found in G2 (Table 1). These results indicate that the richness of plants visited by females from G1 is greater than that of plants visited by females from G2, which may be related to the floral bloom during spring. Our data corroborate an earlier study on pollen diversity storage in T. diversipes nests (Menezes et al., 2012).

Table 1.

Classification, Numbers, and Proportion of the Contaminant Transcripts from Plants Found in the Transcriptome of Tetrapedia diversipes Foundresses from Generations One (G1) and Two (G2)

Plant family	No. of G1 contaminant transcripts	% of G1 contaminant transcripts	No. of G2 contaminant transcripts	% of G2 contaminant transcripts
Aizoaceae	1	0.13	—	—
Amaranthaceae	255	31.95	1	0.20
Arecaceae	6	0.75	—	—
Asteraceae	2	0.25	—	—
Brassicaceae	11	1.38	6	1.19
Cactaceae	1	0.13	—	—
Caryophyllaceae	2	0.25	—	—
Chenopodiaceae	3	0.38	—	—
Cleomaceae	2	0.25	5	0.99
Curcubitaceae	8	1.00	4	0.79
Euphorbiaceae	329	41.23	331	65.67
Fabaceae	16	2.00	10	1.98
Lamiaceae	1	0.13	—	—
Lentibulariaceae	2	0.25	—	—
Lythraceae	—	—	1	0.20
Malvaceae	19	2.38	25	4.96
Moraceae	—	—	6	1.19
Musaceae	3	0.38	—	—
Myrtaceae	4	0.50	44	8.73
Nelumbonaceae	2	0.25	6	1.19
Oleaceae	2	0.25	—	—
Onagraceae	—	—	3	0.60
Pedaliaceae	30	3.76	—	—
Phrymaceae	33	4.14	—	—
Poaceae	1	0.13	1	0.20
Rhizophoraceae	—	—	1	0.20
Rosaceae	14	1.75	17	3.37
Rubiaceae	6	0.75	—	—
Rutaceae	5	0.63	10	1.98
Salicaceae	18	2.26	22	4.37
Solanaceae	11	1.38	8	1.59
Vitaceae	11	1.38	3	0.60

Table 1 presents all plant families identified among the contaminant transcripts in each generation and their proportion in the data set. These findings are in agreement with previous ecological studies (Alves-dos-Santos et al., 2006; Menezes et al., 2012), especially regarding the use of the Euphorbiaceae as the main pollen source in larval feeding. Furthermore, because Amaranthacea and Euphorbiaceae are the two main families visited during G1 and Euphorbiaceae is the main source during G2, the hypothesis that T. diversipes is not a truly polilectic species but has preferences for specific families is supported. As oil source, it is known that this bee uses plants from the Malpighiacea family, but it is not clear whether other families are also visited (Alves-dos-Santos et al., 2006). In the present data set, transcripts from the Cucurbitaceae and Solanaceae families were found in both generations, which suggests that these families may be also visited for oil collection.

Therefore, as described here, the use of contaminant transcripts might be a useful source of information not only for the study of insect–plant interactions but also for analyses of other associations such as parasitism and symbioses. These data are usually neglected in transcriptomic studies, but the present results indicate that contaminant transcripts from any transcriptomic data set can be extremely valuable to answer different biological questions.

Nevertheless during this type of analyses, it is important to keep in mind that the public databases used for identification through blast are incomplete and transcripts identification may be deficient in some cases. Also when the transcripts are from highly conserved genes, the identification of a taxonomic group may be compromised. Thus, the use of the reported approach associated with ecological observations or as a general and comparative tool is recommended.

Footnotes

Acknowledgments

The authors would like to thank Isabel Alves-dos-Santos for the support during bee collection, to Susy Coelho for technical assistance, to FAPESP (São Paulo Research Foundation, process numbers 2013/12530-4 and 2012/18531-0) for financial support, and to the reviewers for suggestions. This work was developed in the Research Center on Biodiversity and Computing (BioComp) of the Universidade de São Paulo (USP), supported by the USP Provost's Office for Research.

Author Disclosure Statement

The authors declare that no conflicting financial interests exist.

References

Alves-dos-Santos

, Naxara

SRC

, and Patrício

EFLRA

. (2006). Notes on the Morphology of Tetrapedia diversipes Klug 1810 (Tetrapediini, Apidae), an Oil-collecting bee. Braz J Morphol Sci, 23, 425–430.

Menezes

, Gonçalves-Esteves

, Bastos

EMAF

, Augusto

, and Gaglianone

. (2012). Nesting and use of pollen resources by Tetrapedia diversipes Klug (Apidae) in Atlantic Forest areas (Rio de Janeiro, Brazil) in different stages of regeneration. Rev Bras Entomol, 56, 86–94.

Musacchia

, Basu

, Petrosino

, Salvemini

, and Sanges

. (2015). Annocript: a flexible pipeline for the annotation of transcriptomes also able to identify putative long noncoding RNAs. Bioinformatics, 31, 2199–2201.