Evaluation of Automatic Analysis of Ultradeep Pyrosequencing Raw Data to Determine Percentages of HIV Resistance Mutations in Patients Followed-Up in Hospital

Abstract

A major obstacle to using next generation sequencing (NGS) technology in clinical routine practice is reliable data analysis. Thousands of sequences need to be aligned and validated, to exclude sequencing artifacts and generate accurate results. We compared two analysis pipelines for Roche 454 ultradeep pyrosequencing (UDPS) raw data generated from HIV-1 clinical samples: a commercial and fully automated Web-based software NGS HIV-1 Module (SmartGene, Zug, Switzerland) vs. the Amplicon Variant Analyzer software (AVA, 454 Life Sciences; Roche). Results were also compared to those obtained with Sanger sequencing. HIV-1 reverse transcriptase and protease genes from 34 plasma samples were submitted to Sanger sequencing and GS Junior UDPS. Raw UDPS data (sff files) from all samples were analyzed with AVA 2.7 software plus manual review of the alignments and the fully automated SmartGene NGS HIV-1 Module prototype (SMG). Results obtained with both analysis pipelines showed good correlation (85.0%). Divergent results were mainly observed at homopolymer positions, such as K101, where the frame-aware alignment and error corrections of the automated approach were more efficient and more accurate, both in terms of detecting and quantifying drug resistance mutations. Our study shows that NGS data can easily be analyzed via a fully automated analysis pipeline, here the SmartGene NGS HIV-1 Module, thus minimizing the need for manual review of alignments by the user, otherwise essential to ensure accurate results. Such automated analysis pipelines may facilitate the adoption of NGS platforms in the routine clinical laboratory.

Introduction

HIV resistance mutations have been demonstrated against all existing antiretroviral molecules.^1
–3 Genotypic resistance tests use sequencing conventional techniques based on the Sanger method to detect resistant mutants exceeding 20% of the population. Recently, several next generation sequencing (NGS) techniques, including ultradeep pyrosequencing (UDPS, Roche 454 technology⁴), became available. NGS enables analysis of resistant variants below the usual threshold of traditional sequencing techniques with a quantification range from 1% (or less) to 100%.^5,6 The question obviously arises as to whether these variants, observable only by the NGS technology, are associated with treatment failure, especially in at least two important clinical situations: in naive subjects who potentially are candidates for first-line treatment and in treated subjects with increasing viral load (VL) for whom selection of resistant variants is ongoing.

The transition of NGS technologies into routine practice will require the advent of commercial tests but also data-analysis pipelines that are reliable and easy to use. These data analysis pipelines ensure clustering, alignment, review for errors, and translation into amino acids for thousands of reads per sample. Resistance mutations are to be retained and further assessed in a quantitative manner (as a percentage of the total reads).⁷ In the case of UDPS, due to the lack of frame awareness in the alignments in the AVA software provided by Roche and the frequent shifts due to homopolymers, these analytical steps require much expertise and time, a major obstacle toward using this sequencing technology in routine practice.

This study compares the results of an analysis of UDPS raw data generated from HIV-1 clinical samples with 454 technology (Roche Diagnostics, France), by a new commercial and fully automated Web-based software provided by SmartGene (Switzerland; www.smartgene.com), with the results obtained through more manual analysis involving the AVA software package provided by Roche. This study should be seen as an important step toward the advent of routine UDPS practice in virology laboratories.

Materials and Methods

Samples

Thirty-four plasma samples were randomly chosen for our study (see Table 1 for details).^5,8 Seventeen corresponded to baseline plasma of HIV-infected individuals naive of antiretroviral treatment (median VL = 121,221 cp/ml) and 17 corresponded to plasma of individuals who failed a first antiretroviral therapy (median VL = 5,914 cp/ml). For these latter cases, virological failure (VF) was defined as a plasma VL >1,000 copies/ml or two consecutive VL >500 cp/ml at least 6 months after treatment initiation. In 13 cases, samples were matched pairs from the same patient at prefailure and postfailure time points.

Table 1.

Virological Characteristics of Randomly Chosen Individuals

		Baseline			Virological failure
Ind.	Subtype	Sample ID	Viral load (cp/ml)	ART	Sample ID	Viral load (cp/ml)
1	B	1N	125,416	ZVD 3TC ddI NFV	1F	5,914
2	B	—	—	ZVD 3TC ABC NFV	2F	8,435
3	B	3N	20,103	ZVD 3TC ABC LPV/r	3F	1,170
4	B	4N	45,028	ZVD 3TC NVP	4F	6,572
5	B	5N	121,221	ZVD 3TC NFV	5F	2,414
6	B	6N	119,112	3TC TDF LPV/r	6F	97,597
7	B	7N	49,268	ZVD 3TC ABC	7F	88,025
8	02_AG	8N	198,233	—	—	—
9	B	9N	38,779	ZVD 3TC ABC	9F	9,132
10	D	10N	192,100	FTC TDF LPV/r	10F	12,796
11	B	11N	782,200	FTC TDF ATZ/r	11F	2,155
12	Recombinant	—	—	ZVD 3TC NVP	12F	10,371
13	02_AG	—	—	3TC TDF EFV	13F	20,132
14	B	14N	15,857	ZVD 3TC	14F	2,438
15	B	—	—	ddI d4T NFV	15F	1,693
16	B	16N	5,86,670	ZVD 3TC ABC	16F	2,667
17	B	17N	6741	ZVD 3TC	17F	1,812
18	B	18N	4,193,590	3TC TDF EFV	18F	3,306
19	B	19N	2,516	—	—	—
20	B	20N	161,700	—	—	—
21	B	21N	173,100	—	—	—

For each individual, plasma samples available at baseline (N) and/or virological failure (F) were used for HIV-1 RT and protease sequencing.

Sanger and Ultradeep pyrosequencing (UDPS)

Viral RNA was extracted from 1 ml plasma using the High Pure Viral RNA kit (Roche). The reverse transcriptase (RT) and protease genes were amplified and population sequencing was performed as previously described.⁸ For 454 UDPS analysis, amplicon preparation, emPCR, and UDPS were performed as previously described.⁸ Raw data are available in GenBank under accession numbers SRP033482 and SRP026411 and in the HIV LANL database (www.hiv.lanl.gov/content/sequence/HIV/NextGenArchive/).

Computational analysis

For each sample, raw data (sff files) obtained after “Amplicon” Signal Processing, applied to ensure high-quality reads, were submitted to the two analysis pipelines, one using Amplicon Variant Analyzer (AVA 2.7) software (454 Life Sciences; Roche) and manual review of the alignments and the second using SmartGene NGS HIV-1 Module prototype (SMG).

AVA 2.7 pipeline

Using the identifier (MID) patterns, all reads of a sample were grouped and aligned in AVA 2.7 and variant frequencies were calculated for each nucleotide position relative to the HxB2 reference HIV-1 strain sequence (accession no. AF033819). All alignments were carefully reviewed for alignment errors and analysis was then focused on positions involved in drug resistance (drug resistance mutations, DRMs) as defined by the 2012 ANRS HIV drug resistance algorithm v22 (www.hivfrenchresistance.org). Since AVA provides mutations relative to the nucleic acid reference sequence, a variant edition process was then performed to convert the mutated nucleotide positions into amino acid mutations, subsequently associating them manually with ANRS resistance patterns without any cut-off for detection.

SMG pipeline

The SmartGene NGS HIV-1 Module is based on the proprietary IDNS (Integrated Database Network System) technology of SmartGene and is accessible via a secure web interface; users can view the progress of the analysis. It provides a web-based interface to upload *.sff files from a local directory. The module then performs the following steps in an automated manner: (1) demultiplexing and grouping of reads with regard to their MID, (2) a quality filtering to remove low-quality fragments, and (3) establishment of a work list on the basis of the corresponding MID showing the number of reads for each sample. The user can then select the analysis pipeline to be used and the appropriate cut-off for ambiguous bases/background. Then a subsequent fully automated analysis including the following steps is performed: grouping reads using the MID patterns provided by the user with the sample information, exclusion of reads unsuitable for further analysis (too short, too many errors, cannot be aligned), generation of a frame-aware nucleic acid alignment (using a proprietary alignment method of combined global and local alignments developed by SmartGene) with fixed (HxB2, accession no. K03455) or variable reference sequences, detection and correction of homopolymer-related errors, translation of the corrected alignments into amino acid sequences, and determination of mutations and respective frequencies. The mutation cut-off (0.5–20%) above which mutations shall be interpreted for resistance using one of the embedded HIV resistance algorithms is then selected. The resistance profile is then computed along with mutation frequencies as percentage values. The final results are summarized as a list of all mutations per sample with their respective frequencies. Related SmartGene patents are EP05700367 and EP07816282.

Results

Correlation between AVA and SMG NGS pipelines

Levels of HIV drug resistance mutations located in protease and RT regions were quantified from reads generated by 454 UDPS using both pipelines.

A combination of the two pipelines detected 270 DRMs among the 34 samples. Overall, the results obtained with both NGS analysis pipelines showed a 77.4% correlation in terms of detection and quantification of DRMs, with some minor differences (Table 2). We observed DRMs detected only with AVA (13.3%; 35/270) and DRMs detected only with SMG (9.3%; 25/270). However, most differences between the pipelines were found only in cases of mutations at a very low level (0.5–3.3%).

Table 2.

Comparison of Quantitated HIV-1 Drug Resistance Mutations (DRMs) Detected with AVA 2.7 (Roche) Software and the SmartGene NGS HIV-1 Module (SMG) for Patients Before Treatment (N) and at Treatment Failure (F)

	Baseline							Virological failure
					AVA		SMG					AVA		SMG
IND	Sample ID	Viral load (cp/ml)	DRMs	SS	% of DRMs	Number of reads	% of DRMs	Sample ID	Viral load (cp/ml)	DRMs	SS	% of DRMs	Number of reads	% of DRMs
1	1N	125,416	NRTI-M41L NRTI-D67N NRTI-L74I NRTI-L210W NRTI-T215C NRTI-T215S NRTI-T215Y NRTI-T215N NNRTI-E138G NNRTI-V179T PI-L10I PI-M36I PI-I47V PI-G48V PI-D60E PI-L63P PI-A71T	+ + − + − − + − − + + + − + + + +	99.4 21.0 0.3 99.9 2.1 0.3 89.8 2.6 2.2 97.9 99.6 100.0 0.0 99.6 98.5 100.0 85.0	532 1,849 1,849 1,297 1,297 1,297 1,297 1,297 3,188 2,656 473 473 473 473 473 473 473	100.0 96.1 0.0 100.0 0.6 0.0 96.9 2.5 2.1 98.1 100.0 100.0 0.5 99.5 100.0 100.0 100.0	1F	5,914	NRTI-M41L NRTI-D67N NRTI-M184V NRTI-L210W NRTI-T215Y NNRTI-E138G NNRTI-V179T PI-L10I PI-M36I PI-G48V PI-D60E PI-L63P PI-A71I PI-V82A	+ + + + + − + + + + + + + +	99.4 51.3 99.9 99.9 98.1 0.0 99.8 100.0 100.0 100.0 100.0 99.6 76.6 99.6	317 905 1,701 1,113 1,112 1,700 1,700 470 470 470 470 470 470 470	100.0 98.9 100.0 100.0 100.0 0.5 100.0 100.0 100.0 100.0 100.0 99.0 100.0 100.0
6	6N	119,112	NRTI-M41L NRTI-L210W NRTI-T215D NRTI-K219Q NNRTI-V106I PI-I15V PI-G16E PI-D60E PI-I62V PI-L63P PI-V77I	− + + − − − + − + + +	4.0 98.4 98.2 0.3 0.3 5.0 87.1 0.8 97.3 99.8 99.6	2,153 1,285 1,285 1,285 6,790 1,688 1,688 1,688 1,689 1,689 1,689	3.8 99.6 100.0 0.6 0.0 4.5 97.4 1.1 98.3 100 100	6F	97,597	NRTI-M41L NRTI-L210W NRTI-T215D NRTI-K219Q NNRTI-V179T PI-I15V PI-G16E PI-D60E PI-I62V PI-L63P PI-V77I PI-V82F	+ + + − − − + − + + + −	14.6 98.8 99.1 0.0 0.0 10.1 98.4 1.5 98.3 99.7 96.6 0.0	570 1,250 1,250 1,250 897 856 856 856 856 856 856 856	15.4 78.2 100.0 0.7 3.3 10.0 100 1.7 97.5 98.3 95.8 0.8
10	10N	19,200	NRTI-T215A NNRTI-K101E NNRTI-E138G PI-G16E PI-K20R PI-M36I PI-I50V PI-I62V PI-A71V PI-V77I	nd nd nd + + + − − − −	0.0 0.4 0.3 99.9 16.8 99.7 0.3 8.5 0.0 4.7	1,225 1,782 3,831 775 775 775 775 775 775 775	0.9 1.5 0.0 100 16.3 100.0 0.0 8.9 0.9 2.7	10F	12,796	NRTI-D67N NRTI-M184V NRTI-M184I NNRTI-K101E PI-G16E PI-K20R PI-M36I PI-V77I	− + − − + − + −	0.4 50.3 2.2 6.8 98.8 93.5 99.9 2.0	4,860 4,834 4,834 7,419 2,562 2,562 2,561 2,560	1.1 51.0 3.1 2.8 100.0 95.2 100.0 1.3
11	11N	782,200	NRTI-M184I NNRTI-V90I NNRTI-A98S NNRTI-E138A PI-V11I PI-M36I PI-M46I Pi-I50V PI-I62V PI-L63P PI-V77I PI-I85V PI-L89M	− − − − − − − − + + + − −	0.3 4.5 3.1 0.5 0.4 2.3 0.0 0.0 94.2 98.0 99.0 0.4 0.6	2,469 1,727 1,727 3,194 706 706 706 706 706 706 706 706 706	0.0 3.9 3.0 0.0 0.0 2.3 0.7 1.0 94.1 100.0 100.0 1.2 0.0	11F	2,155	NRTI-M184V NNRTI-K101E PI-G16E PI-M46I PI-M46L PI-I62V PI-L63P PI-A71V PI-G73S PI-V77I PI-I85V PI-N88S	+ − − − − + + − − + − +	99.7 0.0 0.0 5.8 24.4 99.7 100.0 8.7 18.9 99.5 0.0 67.4	2,979 1,684 774 774 774 774 774 774 774 774 774 774	100.0 0.8 1.0 7.5 23.8 100.0 100.0 13.3 15.3 100.0 1.0 71.4
13	13N		nd					13F	41,115	NRTI-D67NNRTI-M184VNNRTI-V90INNRTI-L100INNRTI-G190SPI-V11IPI-K20IPI-M36IPI-H69KPI-L89M	−−−−++++++	0.33.845.0099.699.899.6100.099.499.6	2,5712,8542,5712,5711,0341,8311,8311,8311,8301,830	0.05.248.41.392.6100.0100.0100.0100.0100.0

The mutations also detected by Sanger sequencing (SS) are labeled “+”, or “−” if not detected. nd, not determined. The cut-off of mutation detection in the SMG module is 0.5%.

Whereas the AVA analysis pipeline reported DRMs detected at very low frequency levels, the SMG pipeline has a DRM detection threshold set at 0.5% (which is above the overall mean error rate of 0.21 ± 0.07% determined in our experiments⁸). If only DRMs >0.5% were analyzed, both NGS pipelines displayed a good correlation (85%).

Differences between AVA and SMG NGS pipelines

Among all 34 analyzed samples, discrepancies on DRM detection and quantification between the AVA and SMG analysis pipelines were observed for the following 14 DRMs: PI resistance mutations L10V, I15V, G16E, M46I, I47V, I50V, and V82F; NRTI resistance mutations K65R, D67N, and K219Q; and nonnucleoside reverse transcriptase inhibitor (NNRTI) resistance mutations L100I, V179T, K101E, and Y188H.

For individual 1, the mutation D67N was quantified at levels of 21% by AVA vs. 96.1% by SMG at baseline (sample 1N) and 51.3% by AVA vs. 98.9% by SMG at failure (sample 1F) (Table 2). The GAT codon at position 67 is located at the 3′ end of a 7A stretch, where deletion of a nucleotide was often found. This led to an alignment shift visualized on AVA software, but corrected by an SMG module, explaining the difference of quantification of this D67N mutation.

The main differences observed in other samples were the detection of some DRMs at levels >0.5% with the SMG pipeline, when AVA, including manual review, did not detect them. Typical examples were L100 (sample 13F), K101 (sample 11F), and V179 (sample 6F) positions on the RT region. The codons at position 100 (Fig. 1) and 101 (Fig. 2) are located within or near homopolymeric regions, sensitive to insertion/deletion sequencing errors, and therefore resulted in alignment shifts. Thus, detection of these mutations is very difficult even with manual review.

FIG. 1.

Example of alignment of reverse transcriptase (RT) sequences obtained by 454 pyrosequencing for sample 13F. Alignments of nucleotide sequences surrounding position 100 (codon 100 is underlined) obtained by the SMG pipeline (left panel) and AVA software (right panel) were chosen to show the presence of insertions/mutations at position 100.

FIG. 2.

Example of alignment of RT sequences obtained by 454 pyrosequencing for sample 11F. Alignments of nucleotide sequences surrounding position 101 (codon 101 is underlined) obtained by the SMG pipeline (left panel) and AVA software (right panel) were chosen to show the presence of insertions/mutations at position 101.

In Figs. 1 and 2, the sequence alignments obtained from AVA revealed the presence of mutated codons. Indeed, in both cases, there was an insertion of A, leading to a change in the codon at the position of interest (100 or 101). The homopolymer error correction applied in the SMG module and the improved frame-aware sequence alignment resulted in the correct detection of DRMs at these positions for a significant number of reads. Concerning the V179T mutation, which is actually not located in a homopolymeric region (ATAGTTATC), AVA did not detect it, whereas the SMG pipeline accounted for the V179T mutation in 3.3% of the total reads (sample 6F). Careful reviewing of all the reads in this region enabled detection of this mutation suggesting that the AVA alignment process may have eliminated the corresponding reads (data not shown).

Comparison Sanger sequencing and UDPS data

The UDPS results of both pipelines were compared to Sanger sequencing results (Table 2). While the majority of DRMs present at a level >20% by both pipelines were also detected by Sanger sequencing, some exceptions were noticed. In sample 10F (Table 2), the mutation K20R was detected at 93.5% on the VF plasma sample by UDPS. Surprisingly, Sanger sequencing performed prior to the UDPS did not detect this mutation in sample 10F. In this and in other cases, Sanger sequencing and UDPS amplicons were independently generated; when resubmitting the UDPS amplicon to Sanger sequencing, the K20R mutation was indeed retrieved (data not shown). By contrast, some DRMs present at levels below 20% were also detected by Sanger sequencing, although somehow inconsistently. For instance, a K101R mutation detected at 6–7% by both NGS pipelines was also observed by Sanger sequencing (data not shown).

Discussion

Our study showed that NGS data generated with Roche 454 technology can be made suitable for a fully automated analysis pipeline, here the SmartGene NGS HIV-1 Module.

Bulk population sequencing by the Sanger method is currently the gold standard for HIV genotypic testing. Our results indicated that UDPS could afford the same reliable results with an increased sensitivity. The two analysis approaches, AVA 2.7 (plus manual validation by a user with recognized expertise in HIV) and the SmartGene analysis pipeline, showed an overall good correlation of results. However, some discrepancies with Sanger sequencing were observed. These differences could be explained by the use of several polymerases harboring different proofreading activities for amplicon preparation as well as by selection bias, especially on samples with low viral load.

NGS technologies are as efficient as the current genotyping tests for HIV drug resistance evaluation and provide additional information, i.e., low-frequency variants detection. Low-cost, laboratory scale UDPS-based HIV genotyping methods have already been described.⁹ However, there is a need for an automated analysis of HIV DRMs when using NGS technologies in routine practice for genotypic resistance testing.

Establishment of adequate thresholds is critical to discriminate true variants from sequencing errors. Indeed, while the AVA pipeline has the potential to detect mutations of less than 0.5% frequency, there is a debate about the reliability of the detection and the reproducibility and the clinical relevance of such a low-level mutation.^10,11 The threshold for detection of low-frequency DRMs is generally determined with regard to the frequency of error occurrence. The error rate in UDPS depends on the number of initial template molecules (i.e., viral load of the sample), the PCR amplification and amplification strategies, the pyrosequencing reaction itself, the signal detection, the read coverage of the studied position, and errors by bioinformatic processes, especially in homopolymeric regions.^12
–14 Several studies indicated that the main source of errors on HIV population detection was dependent on the type of polymerase used for amplicon preparation.^15
–17

Different approaches to improve the accuracy of the detected low-frequency DRMs have been developed based on flowgrams analysis^18,19 or reads analysis.^20
–22 However, they remain difficult to use for a nonexpert and are not suited for clinical routine practice. Since the AVA software does not generate frame-aware alignments, visual review of the alignments is essential to exclude alignment errors and homopolymer problems. Such a review is a time-consuming and fastidious process, but is necessary to avoid erroneous detection of mutations or missing mutations, especially in a minority of reads.

The SmartGene pipeline, a completely automated and a frame-aware alignment process, detected all mutations above the cut-off of 0.5%. A complex, proprietary homopolymer correction algorithm, which is part of the SMG pipeline, significantly reduces the problems encountered with the AVA software. The reliability of the qualitative and quantitative detection of DRMs flanked by the homopolymeric regions is of importance, since there are at least 17 protease inhibitor (PI), nucleoside reverse transcriptase inhibitor (NRTI), or NNRTI DRMs located around homopolymeric regions.⁹ One example is the NRTI K65R mutation, which has a different genetic background region in subtype B vs. C viruses. In this latter subtype, the RT KKK nucleotide template leads to the spurious detection of K65R induced by UDPS.^23,24 For the NNRTI K103N mutation, it has also been demonstrated that the estimated UDPS error rate was higher than other DRMs due to its position near the homopolymeric regions.²⁵

The SMG pipeline reduces the necessary hands-on time; an alignment visualization tool facilitates the review of problematic loci. In addition to their detection and quantification, this pipeline offers the ability to interpret drug resistance mutations according to current drug resistance algorithms of Stanford and ANRS (http://hivdb.stanford.edu/; http://hivfrenchresistance.org/). The threshold used for resistance interpretation can be adjusted by the user: a “technical” cut-off (minimum 0.5%) or an “interpretational” threshold, e.g., 20% in consideration for genotypic resistance test results. In addition, the SmartGene pipeline also detects and quantifies all other mutations with regard to a reference sequence of choice (HxB2 or American Consensus B or others) for further analysis and research.

Although the use of NGS for HIV genotypic resistance testing is becoming more standard, it should be remembered that the clinical relevance of low-frequency drug-resistant mutations on treatment is not well defined and could vary according to their amount, their nature, and the targeted therapeutic class.^26
–28 In our sample panel, none of the low-frequency DRMs detected by one of the pipelines would have impacted the response to the prescribed regimen. An L100I NNRTI resistance mutation was detected by SMG with only a 1.3% level in the postfailure sample (sample 13F). This low-frequency mutation could have been selected by efavirenz, but was not fully responsible for this NNRTI resistance since other high-frequency NNRTI mutations were detected (G190S). However, it would be interesting to extend the comparison between these two pipelines to an analysis of NGS data generated from naive patient who failed first line ART and did not harbor baseline low-frequency DRMs detected by AVA.

Thus, NGS combined with an automated bioinformatic analysis pipeline now offers a powerful and promising tool for the clinical diagnosis of drug-resistant viruses, which could potentially be applied to other NGS available platforms, whereas the supervision of such interpretations should remain in the hands of an expert virologist.

Footnotes

Acknowledgment

The authors are grateful to Thierry Lombardot, PhD, former employee with SmartGene, for his expert help in bioinformatics and data analysis.

Author Disclosure Statement

No competing financial interest.

References

Kuritzkes

: Drug resistance in HIV-1. Curr Opin Virol, 2011; 1:582–589.

Tang

and Shafer

: HIV-1 antiretroviral resistance: Scientific principles and clinical applications. Drugs, 2012; 72:e1–25.

Wensing

, Calvez

, Gunthard

, et al.: 2014 Update of the drug resistance mutations in HIV-1. Top Antivir Med, 2014; 22:642–650.

Capobianchi

, Giombini

, and Rozera

: Next-generation sequencing technology in clinical virology. Clin Microbiol Infect, 2013; 19:15–22.

Bellecave

, Recordon-Pinson

, Papuchon

, et al.: Detection of low-frequency HIV type 1 reverse transcriptase drug resistance mutations by ultradeep sequencing in naive HIV type 1-infected individuals. AIDS Res Hum Retroviruses, 2014; 30:170–173.

Hedskog

, Mild

, Jernberg

, et al.: Dynamics of HIV-1 quasispecies during antiviral treatment dissected using ultra-deep pyrosequencing. PloS One, 2010; 5:e11345.

Beerenwinkel

, Günthard

, Roth

, and Metzner

: Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front Microbiol, 2012; 3:329.

Vandenhende

M-A

, Bellecave

, Recordon-Pinson

, et al.: Prevalence and evolution of low frequency HIV drug resistance mutations detected by ultra deep sequencing in patients experiencing first line antiretroviral therapy failure. PLoS One, 2014; 9:e86771.

Dudley

, Chin

, Bimber

, et al.: Low-cost ultra-wide genotyping using Roche/454 pyrosequencing for surveillance of HIV drug resistance. PloS One, 2012; 7:e36494.

10.

Gianella

, Delport

, Pacold

, et al.: Detection of minority resistance during early HIV-1 infection: Natural variation and spurious detection rather than transmission and evolution of multiple viral variants. J Virol, 2011; 85:8359–8367.

11.

and Kuritzkes

: Clinical implications of HIV-1 minority variants. Clin Infect Dis, 2013; 56:1667–1674.

12.

Larsen

, Chen

, Maust

, et al.: Improved detection of rare HIV-1 variants using 454 pyrosequencing. PloS One, 2013; 8:e76502.

13.

Margulies

, Egholm

, Altman

, et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005; 437:376–380.

14.

Brodin

, Mild

, Hedskog

, et al.: PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data. PloS One, 2013; 8:e70388.

15.

Di Giallonardo

, Zagordi

, Duport

, et al.: Next-generation sequencing of HIV-1 RNA genomes: Determination of error rates and minimizing artificial recombination. PloS One, 2013; 8:e74249.

16.

Shao

, Boltz

, Spindler

, et al.: Analysis of 454 sequencing error rate, error sources, and artifact recombination for detection of low-frequency drug resistance mutations in HIV-1 DNA. Retrovirology, 2013; 10:18.

17.

Vandenbroucke

, Van Marck

, Verhasselt

, et al.: Minor variant detection in amplicons using 454 massive parallel pyrosequencing: Experiences and considerations for successful applications. BioTechniques, 2011; 51:167–177.

18.

Quince

, Lanzen

, Curtis

, et al.: Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods, 2009; 6:639–641.

19.

Quince

, Lanzen

, Davenport

, and Turnbaugh

: Removing noise from pyrosequenced amplicons. BMC Bioinform, 2011; 12:38.

20.

Astrovskaya

, Tork

, Mangul

, et al.: Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinform, 2011; 12(Suppl 6):S1.

21.

Iyer

, Bouzek

, Deng

, et al.: Quality score based identification and correction of pyrosequencing errors. PloS One, 2013; 8:e73015.

22.

Zagordi

, Bhattacharya

, Eriksson

, and Beerenwinkel

: ShoRAH: Estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinform, 2011; 12:119.

23.

Varghese

, Wang

, Babrzadeh

, et al.: Nucleic acid template and the risk of a PCR-induced HIV-1 drug resistance mutation. PLoS One, 2010; 5:e10992.

24.

Recordon-Pinson

, Papuchon

, Reigadas

, et al.: K65R in subtype C HIV-1 isolates from patients failing on a first-line regimen including d4T or AZT: Comparison of Sanger and UDP sequencing data. PloS One, 2012; 7:e36549.

25.

Nicot

, Saliou

, Raymond

, et al.: Minority variants associated with resistance to HIV-1 nonnucleoside reverse transcriptase inhibitors during primary infection. J Clin Virol, 2012; 55:107–113.

26.

, Paredes

, Ribaudo

, et al.: Low-frequency HIV-1 drug resistance mutations and risk of NNRTI-based antiretroviral treatment failure: A systematic review and pooled analysis. JAMA, 2011; 305:1327–1335.

27.

, Paredes

, Ribaudo

, et al.: Impact of minority nonnucleoside reverse transcriptase inhibitor resistance mutations on resistance genotype after virologic failure. J Infect Dis, 2013; 207:893–897.

28.

Cozzi-Lepri

, Noguera-Julian

, Di Giallonardo

, et al.: Low-frequency drug-resistant HIV-1 and risk of virological failure to first-line. J Antimicrob Chemother, 2015; 70:930–940.