Abstract
The advent of next generation sequencing technologies is providing new insight into HIV-1 diversity and evolution, which has created the need for bioinformatics tools that could be applied to the characterization of viral quasispecies. Here we present Nautilus, a bioinformatics package for the analysis of HIV-1 targeted deep sequencing data. The DeepHaplo module determines the nucleotide base frequency and read depth at each position and computes the haplotype frequencies based on the linkage among polymorphisms in the same next generation sequence read. The Motifs module computes the frequency of the variants in the setting of their sequence context and mapping orientation, which allows for the validation of polymorphisms and haplotypes when strand bias is suspected. Both modules are accessed through a user-friendly GUI, which runs on Mac OS X (version 10.7.4 or later), and are based on Python, JAVA, and R scripts. Nautilus is available from
W
Here we present Nautilus, a bioinformatics package for the analysis of HIV-1 TDS data. The program consists of a graphical user interface (GUI) with two modules: DeepHaplo and Motifs. Using as an input an alignment file in the SAM format,
10
DeepHaplo computes the nucleotide base frequency and read depth at each position, and presents the results in tabular and graphic formats (Fig. 1a–f). To facilitate the visualization of the different facets of the data, results are represented including or omitting alignment gaps, and in linear or logarithmic scales. A novel feature of DeepHaplo is the implementation of a hash algorithm (Supplementary Fig. S1; Supplementary Data are available online at

Read depth and frequencies of single nucleotide variants and haplotypes can be computed by the DeepHaplo module.
DeepHaplo uses the mapping orientation information provided in the bitwise FLAG value in the SAM file 10 to compute the frequencies of nucleotide bases at each position and the haplotypes in each orientation. This feature, combined with the analysis of the Motifs module, allows the validation of polymorphisms and haplotypes when strand bias is suspected. In Motifs, interrogated positions are identified through a user-defined threshold for MAF, and the frequency of variants at each position is computed for the forward and reverse orientations. Motifs also calculates the number of forward and reverse reads supporting a given variant in the setting of the sequence context surrounding the candidate variant, as this has been shown to strongly influence strand bias (e.g., homopolymers). 12 Figure 2a shows a real case of a polymorphic position where the variants are equally supported by reads in both orientations (compare the blue and red bars), whereas Fig. 2b shows that the A variant is observed only in reads in the reverse orientation, likely reflecting a sequencing artifact.

The Motifs module provides information about the frequency of single nucleotide variants based on mapping orientation and the sequence context surrounding the putatively polymorphic position.
In summary, Nautilus represents a new suite of bioinformatics tools to support the analysis of TDS data in order to facilitate the application of NGS to the characterization of HIV-1 populations and evolution. Nautilus runs on Mac OS X (version 10.7.4 or later), and is based on Python, JAVA, and R scripts (required packages are stated in the accompanying user manual), and is freely available from
Footnotes
Acknowledgments
This work was supported in part by an Interagency Agreement (Y1-AI-2642-12) between the U.S. Army Medical Research and Materiel Command and the National Institute of Allergy and Infectious Diseases. This work was also supported by a cooperative agreement (W81XWH-07-2-0067) between the Henry M. Jackson Foundation for the Advancement of Military Medicine and the U.S. Department of Defense.
The opinions expressed in this article are those of the authors and do not represent the official views of the U.S. Department of Health and Human Services, the National Institute of Allergy and Infectious Diseases, the U.S. Department of Defense, or the Department of the Army.
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
