Short Communication: HIV-DRLink: A Tool for Reporting Linked HIV-1 Drug Resistance Mutations in Large Single-Genome Data Sets Using the Stanford HIV Database

Abstract

The prevalence of HIV-1 drug resistance is increasing worldwide and monitoring its emergence is important for the successful management of populations receiving combination antiretroviral therapy. It is likely that pre-existing drug resistance mutations linked on the same viral genomes are predictive of treatment failure. Because of the large number of sequences generated by ultrasensitive single-genome sequencing (uSGS) and other similar next-generation sequencing methods, it is difficult to assess each sequence individually for linked drug resistance mutations. Several software/programs exist to report the frequencies of individual mutations in large data sets, but they provide no information on linkage of resistance mutations. In this study, we report the HIV-DRLink program, a research tool that provides resistance mutation frequencies as well as their genetic linkage by parsing and summarizing the Sierra output from the Stanford HIV Database. The HIV-DRLink program should only be used on data sets generated by methods that eliminate artifacts due to polymerase chain reaction recombination, for example, standard single-genome sequencing or uSGS. HIV-DRLink is exclusively a research tool and is not intended to inform clinical decisions.

Introduction

The advent of combination antiretroviral therapy (cART) changed HIV/AIDS from a deadly disease in most individuals to one that can be managed with lifelong treatment.¹ However, owing to the high genetic diversity of HIV-1, low-frequency drug resistance mutations exist in patients even before cART initiation^2
–4 and these low-frequency mutations may lead to the emergence of drug-resistant viral rebound and treatment failure. Recently, acquired and transmitted drug resistance have become quite prevalent in some parts of the world,^5,6 constituting a major barrier to successful treatment of HIV-1.

The Stanford HIV Database (Stanford HIVdb) is a reliable and accurate tool for interpreting HIV drug resistance in population genotypes⁷ (https://hivdb.stanford.edu). Although important, the detection of pre-existing single drug resistance mutations misses the significance of resistance mutations that are linked on the same viral genomes and were recently shown, in one study, to be associated with cART failure.⁸

One difficulty in the interpretation of linked mutations is that bulk polymerase chain reaction (PCR) may result in artifactual recombination.⁹ Therefore, identifying linked mutations can only be applied to sequences obtained using methods that eliminate PCR recombination in conjunction with pipelines that omit sequences resulting from PCR recombination and possible nucleotide mixtures, such as single-genome sequencing (SGS),¹⁰ ultrasensitive single-genome sequencing (uSGS),¹¹ or other similar next-generation sequencing (NGS) methods.^12,13 In brief, uSGS uses primer IDs like many other NGS approaches^14

–17; however, the Illumina adapters are added by ligation rather than by PCR, significantly reducing the bias and recombination that is inherent to amplification with long PCR primers.

Although uSGS can only obtain sequence reads up to ∼500 base pairs from amplicons up to 1 Kb in length, it is possible to link protease inhibitor (PI) resistance mutations to reverse transcriptase (RT) mutations by obtaining 250 base pair reads from one end of the 1 Kb amplicon and 250 base pair reads from the other end.

Significant new changes to the Pacbio platform and chemistry allow for much longer sequence reads (up to the full-length HIV genome) with much higher accuracy than previously possible with this approach.¹⁸ Pacbio technology may allow for future investigations of linkage between PI, RT, integrase (IN), and even Env drug resistance mutations^19,20 when applied to single-genome amplicons. Although many programs exist for the analysis of HIV sequencing data including their assembly, base-calling, and mutation frequencies,^{21

–28} these programs do not detect linkage of drug resistance mutations on the same viral genomes.

In this study, we describe a tool called HIV-DRLink that can quickly process thousands of HIV-1 sequences using the Stanford HIVdb server to report, not only the frequencies of single drug resistance mutations in the population, but also the frequency of mutations linked on the same viral genomes, which may be predictive of cART failure. HIV-DRLink is intended as a research tool that parses the output of Stanford HIVdb to report linked HIV-1 drug resistance mutations. It should only be used for analyzing high-quality sequences without artifact recombination from any platform, such as uSGS. HIV-DRLink is not intended for informing clinical decisions.

Materials and Methods

HIV-DRLink description

HIV-DRLink is based on the Stanford HIVdb genotypic resistance interpretation program using the Stanford command line program “Sierra Web Service 2.0: 2016—present” (https://hivdb.stanford.edu/page/webservice). Therefore, the first step in the HIV-DRLink pipeline is submission of the data set, in a fasta format, to the Stanford HIVdb. Alignment of sequences and specific information in the sequence headers are not required. For large-scale sequence data, users must download and install a Python client SierraPy from Stanford HIVdb (https://hivdb.stanford.edu/page/webservice). The output of SierraPy is a JavaScript Object Notation (JSON) file. HIV-DRLink is then used to parse the output file to calculate the frequencies of linked and individual drug resistance mutations in a sample population.

The output of HIVdb Sierra JSON files can be extensive. To simplify the output, a GraphQL protocol is used to select only the gene names (PR, RT, or IN), the mutation types (e.g., primary), and the specific mutations. The GraphQL protocol used in the pipeline is a simple text file called “simple_mutations.gql” which is:

inputSequence {

header,

mutations {

gene {name}

primaryType

text

}

Two steps are used to run the pipeline:

Step 1: Submit fasta formatted sequences to Stanford HIVdb using the following command line after the Python client SierraPy is installed locally:

sierrapy fasta input_file.fasta simple_mutations.gql -o output. Json

Where output.json is an example output file with any name.

Step 2: Run HIV-DRLink.pl on the output file, output.json, from step 1 using the following command line:

HIV-DRLink.pl output.json

It should be noted that mutations or polymorphisms in nondrug resistance positions are ignored, and thus the sequences with the same patterns of drug resistance mutations may not be identical at nonresistance sites. The results of HIV-DRLink are reported in a tab delimited text file. HIV-DRLink.pl is available at the GitHub code repository at (https://github.com/Wei-Shao/HIV-DRLink).

Meta sequence data

The pipeline was tested for speed and performance using HIV-1 subtype B sequences from Los Alamos HIV Sequence Database (https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html), including 500 pol (RT only) sequences and 200 full-length pol sequences (nucleotide positions 2045 to 5200). Because the sequences from the Los Alamos Database typically result from bulk sequencing, not single-genome sequencing, they are used for proof of principle only and, thus, reported mutations may not actually be genetically linked on the same viral genomes. For the pipeline speed and performance test, 23,781 sequences of the RT encoding fragment of pol from all subtypes were downloaded from Los Alamos HIV Sequence Database.

Clinical sequence data

Although HIV-DRLink can be used to report linkage of drug resistance mutations in sequences obtained by technologies that omit artifactual recombination, Illumina MiSeq-based uSGS data were used here to validate the pipeline on sequences obtained from a clinical sample.¹¹ The clinically derived sequences were obtained from genbank (Accession Nos. KY810858–KY812454). After filtering to remove low-quality reads, the paired end fastq files were used for bioinformatics processing to generate HIV-1 sequences of 404 bases in length that covered RT from codons 59 to 131 and from 166 to 226.¹¹

All sequences used in this manuscript were obtained from published papers and public databases. No additional IRB approval was needed.

Results

Testing the accuracy of HIV-DRLink on sequences obtained from Los Alamos HIV Database

Table 1 gives the results of an HIV-DRLink run on 500 patient-derived HIV sequences obtained from the Los Alamos database. As stated in the methods, although the training data set contains sequences generated by bulk PCR and sequencing and, therefore, true linkage cannot be determined with such data, it is used here only to assess the ability of the pipeline to accurately report drug resistance mutations and to assess its rate of processing large data sets.

Table 1.

Drug Resistance Frequencies of HIV-1 Subtype B Reverse Transcriptase Sequences

				RT DRM
DRM pattern no.	DRM pattern	No. of sequences with same pattern	DRM_pattern%	M41L	K65R	D67N	K70R	L74I	L100I	K101E	K103N	K103S	Y181C	M184I	M184V	Y188C	L210W	T215F	T215Y	K219E	K219Q
1	M41L, K70R, Y181C, M184I, L210W, T215Y	1	0.2	+			+						+	+			+		+
2	M41L, D67N,T215Y	1	0.2	+		+													+
3	K101E	1	0.2							+
4	M184V	1	0.2												+
5	L210W	1	0.2														+
6	M41L, M184V, L210W, T215Y	1	0.2	+											+		+		+
7	D67N,K70R,T215Y	1	0.2			+	+												+
8	Y188C	1	0.2													+
9	K70R	1	0.2				+
10	M41L, M184I, L210W, T215Y	1	0.2	+										+			+		+
11	M41L, D67N, M184I, L210W, T215Y	1	0.2	+		+								+			+		+
12	D67N, K70R, M184V, K219E	1	0.2			+	+								+					+
13	M41L, D67N, K70R, L74I, T215F, K219Q	2	0.4	+		+	+	+										+			+
14	M41L, D67N, L100I, K103N, T215Y	2	0.4	+		+			+		+								+
15	M41L, D67N, K70R, M184V, L210W, T215F, K219Q	2	0.4	+		+	+								+		+	+			+
16	D67N, K70R, M184V, K219Q	2	0.4			+	+								+						+
17	M41L, D67N, K70R, T215F, K219Q	2	0.4	+		+	+											+			+
18	M41L, D67N, M184V, L210W, T215Y	2	0.4	+		+									+		+		+
19	M41L, K103S, M184I, L210W, T215Y	2	0.4	+								+		+			+		+
20	M41L, D67N, K70R, M184V, T215F, K219Q	2	0.4	+		+	+								+			+			+
21	K65R, M184V	2	0.4		+										+
22	M41L, M184V, T215Y	2	0.4	+											+				+
23	K70R, M184V	2	0.4				+								+
24	D67N, K70R, T215F, K219Q	3	0.6			+	+											+			+
25	M41L, T215Y	4	0.8	+															+
26	M41L, D67N, K101E, L210W, T215Y	4	0.8	+		+				+							+		+
27	M41L, D67N, L210W, T215Y	4	0.8	+		+											+		+
28	K103N	7	1.4								+
	Total (% based on 500 total input seq)	56 (11.2)	DRM total (%)	33 (6.6)	2 (0.4)	29 (5.8)	19 (3.8)	2 (0.4)	2 (0.4)	5 ( 1)	9 (1.8)	2 (0.4)	1 (0.2)	5 ( 1)	17 (3.4)	1 (0.2)	19 (3.8)	11 (2.2)	26 (5.2)	1 (0.2)	13 (2.6)

Bulk DNA sequences from Los Alamos HIV Sequence Database (https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html).

DRM, drug resistance mutation; RT, reverse transcriptase.

The first column in Table 1 shows the number assigned to each drug resistance pattern; the second column shows the specific pattern identified, and the third column shows the number of sequence variants that share that particular pattern. Among the 500 sequences retrieved from the Los Alamos HIV Database, 56 had at least one drug resistance mutation (Table 1, bottom row).

Although some sequences had a single resistance mutation, for example, pattern 3 had only K101E, some others had two or more resistance-conferring mutations, for example, pattern 15 had M41L, D67N, K70R, M184V, L210W, T215F, and K219Q. The percentages of each resistance pattern in the population are shown in the fourth column, ranging from 0.2% to 1.4%. The remainder of the columns show the presence of each individual drug resistance mutation with the last row providing the frequency of each in the total population.

Although Table 1 shows the drug resistance patterns in RT only, our program can reveal linkage of mutations in protease (PR) and IN and other genes without additional input options or parameters. In addition to the 500 sequences already described, we downloaded an additional 200 bulk pro-pol sequences from the Los Alamos Database to test in the pipeline for analysis of linked mutations in PR–RT–IN.

Supplementary Table S1 gives an HIV-DRLink output file demonstrating that some sequences, as in the first set of data, had only one resistance mutation, for example, pattern 1 included only S147R in IN whereas others had “linked” mutations, such as pattern 6 with M46I and N88D in PR, M41L, Y215Y in RT, and G163K in IN and pattern 34 with V32I, L33F, M46I, I47V, I54L, and I84V in PR, M41L, D67N, K70R, M184V, T215F, and K219Q in RT, and G140S and Q148H in IN. As already stated, because the training data used were generated by bulk sequencing, the mutational patterns described do not report true linkage of mutations on single genomes but demonstrate the accuracy of the program to report patterns of drug resistance mutations in a hypothetical large data set.

Drug resistance detection in clinical samples using HIV-DRLink

To evaluate true linkage of drug resistance mutations on single HIV-1 genomes, we tested the pipeline on plasma HIV-1 RNA sequences obtained by uSGS¹¹ (sequences available at KY810858–KY812454). The plasma sample was obtained from an HIV-infected donor with viremic failure on ART. uSGS yielded 1,597 high-quality single-genome pol sequences covering RT codons from 59 to 131 and from 166 to 226.¹¹

Table 2 gives 12 different resistance patterns that were detected using the HIV-DRLink program, all of which had linked mutations. Although some of the patterns were rare, for example, patterns 1 to 3 comprising 0.06% of the population, pattern 12 with four linked mutations comprised 73% of the population. HIV-DRLink also calculated the levels of individual drug resistance mutations in the sample. For example, 21.54% of the sequences had the D67N mutation and 99% had T215Y.

Table 2.

Frequencies of Linked Drug Resistance in HIV1 Subtype B Reverse Transcriptase Sequences

DRM pattern no.	DRM pattern	No. of sequences with same pattern	RT DRM
DRM pattern no.	DRM pattern	No. of sequences with same pattern	DRM pattern%	D67N	K101E	Y188C	G190A	L210W	T215F	T215Y
1	L210W, T215Y	1	0.06					+		+
2	K101E, G190A, L210W, T215F	1	0.06		+		+	+	+
3	K101E, Y188C, G190A, L210W, T215Y	1	0.06		+	+	+	+		+
4	D67N, K101E, G190A, L210W	2	0.13	+	+		+	+
5	K101E, L210W, T215Y	2	0.13		+			+		+
6	G190A, L210W	2	0.13				+	+
7	K101E, G190A, T215Y	2	0.13		+		+			+
8	K101E, G190A, L210W	11	0.69		+		+	+
9	D67N, G190A, L210W, T215Y	16	1.00	+			+	+		+
10	G190A, L210W, T215Y	60	3.76				+	+		+
11	D67N, K101E, G190A, L210W, T215Y	326	20.41	+	+		+	+		+
12	K101E, G190A, L210W, T215Y	1,173	73.45		+		+	+		+
	Total (% based on 1,597 total input seq)	1,597 (100)	DRM total (%)	344 (21.54)	1,518 (95.05)	1 (0.06)	1,594 (99.81)	1,595 (99.87)	1 (0.06)	1,581 (99)

Clinical sequences analyzed by uSGS.¹¹

uSGS, ultrasensitive single-genome sequencing.

Speed of HIV-DRLink

To evaluate HIV-DRLink speed and performance, 23,781 pol sequences from the Los Alamos HIV Sequence Database were submitted to the Stanford HIVdb through Python client Sierrapy (https://hivdb.stanford.edu/page/webservice). Although it took ∼40 minutes to obtain a list of mutations for each sequence from the Stanford Database using Python client SierraPy, it took ∼1 minute to extract the data and produce the final result using HIV-DRLink.

Discussion

We developed and applied a program called HIV-DRLink that is capable of reporting linked and unlinked HIV-1 drug resistance mutations in large data sets of single-genome sequences, making use of the very well annotated and maintained Stanford HIV Drug Resistance Database (https://hivdb.stanford.edu). Although other programs that report mutation frequencies are available,^25,28 they only calculate the frequencies of individual resistance mutations in the HIV-1 population and do not report their linkage. One recent study claimed that mutations linked on the same HIV-1 genomes were associated with virologic failure even when single mutations were not,⁸ highlighting the need for a program to investigate the significance of single versus linked drug resistance mutations in other cohorts of people living with HIV. HIV-DRLink was developed to parse the outputs of Stanford HIVdb to identify linked mutations from data generated by uSGS¹¹ and other similar methods,^12,13 where PCR recombination and mixed nucleotides have been eliminated or shown to be minimal.

To test the accuracy and speed of HIV-DRLink, HIV-1 pol sequences from the Los Alamos HIV Sequence Database were queried and submitted to Stanford HIVdb. It should be noted that most sequences stored in the Los Alamos HIV and Stanford databases are from population sequencing and, therefore, each likely represents a mixture of genomes. Such data sets are used here only to assess the accuracy and speed of the program, not to directly assess linkage of mutations on single genomes. The results show that HIV-DRLink accurately and rapidly reported the frequency of drug resistance mutations by the parsing the results from the Stanford HIVdb. HIV-DRLink was further tested on uSGS data obtained from a clinical specimen where each sequence was known to have originated from a single viral template¹¹ and the program reported accurate results in <1 minute.

In conclusion, we developed a tool, HIV-DRLink, that works in conjunction with the Stanford HIVdb to rapidly report linked and unlinked HIV-1 drug resistance mutations in large data sets generated by SGS methods, including the uSGS NGS approach, that eliminate PCR-based recombination and nucleotide mixtures. HIV-DRLink is a necessary tool to further investigate the effect of single versus linked pre-existing drug resistance mutations on the outcome of ART.

Footnotes

Acknowledgments

We acknowledge that a very important part of the pipeline is to obtain and parse information from Stanford HIVdb (https://hivdb.stanford.edu). We thank Mr. Philip Tzou and Dr. Robert Shafer of Stanford HIV Drug Resistance Database for very valuable discussions. We thank Connie Kinna, Anne Arthur, Valerie Turnquist, and Sue Toms for administrative support.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

We acknowledge the funding sources for this study from NCI CCR, the Office of AIDS Research, NIH, NCI intramural funding to M.F.K., and NCI contract no. HHSN261200800001E. J.M.C. was a research professor of the American Cancer Society and was supported by Leidos contract 13XS110.

Supplementary Material

Supplementary Table S1

References

Jones

, Cremin

, Abdullah

, et al.: Transformation of HIV from pandemic to low-endemic levels: A public health approach to combination prevention. Lancet, 2014; 384:272–279.

Coffin

: HIV population dynamics in vivo: Implications for genetic variation, pathogenesis, and therapy. Science, 1995; 267:483–489.

Gupta

, Gregson

, Parkin

, et al.: HIV-1 drug resistance before initiation or re-initiation of first-line antiretroviral therapy in low-income and middle-income countries: A systematic review and meta-regression analysis. Lancet Infect Dis, 2018; 18:346–355.

Sapozhnikov

, Young

, Patel

, Chiampas

, Vaughn

. Badowski ME: Prevalence of HIV-1 transmitted drug resistance in the incarcerated population. HIV Med, 2017; 18:756–763.

Phillips

, Cambiano

, Miners

, et al.: Effectiveness and cost-effectiveness of potential responses to future high levels of transmitted HIV drug resistance in antiretroviral drug-naive populations beginning treatment: Modelling study and economic analysis. Lancet HIV, 2014; 1:e85–e93.

WHO: 2019. HIV drug resistance report 2019. World Health Organization, p. 68.

Tang

, Liu

, Shafer

: The HIVdb system for HIV-1 genotypic resistance interpretation. Intervirology, 2012; 55:98–101.

Boltz

, Shao

, Bale

, et al.: Linked dual-class HIV resistance mutations are associated with treatment failure. JCI Insight, 2019; 4:e130118.

Shao

, Boltz

, Spindler

, et al.: Analysis of 454 sequencing error rate, error sources, and artifact recombination for detection of low-frequency drug resistance mutations in HIV-1 DNA. Retrovirology, 2013; 10:18.

10.

Palmer

, Kearney

, Maldarelli

, et al.: Multiple, linked human immunodeficiency virus type 1 drug resistance mutations in treatment-experienced patients are missed by standard genotype analysis. J Clin Microbiol, 2005; 43:406–413.

11.

Boltz

, Rausch

, Shao

, et al.:Ultrasensitive single-genome sequencing: Accurate, targeted, next generation sequencing of HIV-1 RNA. Retrovirology, 2016; 13:87.

12.

Jabara

, Jones

, Roach

, Anderson

, Swanstrom

: Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc Natl Acad Sci U S A, 2011; 108:20166–20171.

13.

Zhou

, Jones

, Mieczkowski

, Swanstrom

: Primer ID validates template sampling depth and greatly reduces the error rate of next-generation sequencing of HIV-1 genomic RNA populations. J Virol, 2015; 89:8540–8555.

14.

Casbon

, Osborne

, Brenner

, Lichtenstein

: A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res, 2011; 39:e81.

15.

, Hu

, Wang

, Fodor

: Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc Natl Acad Sci U S A, 2011; 108:9026–9031.

16.

Kinde

, Wu

, Papadopoulos

, Kinzler

: Vogelstein B: Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci U S A, 2011; 108:9530–9535.

17.

Liang

, Mo

, Dong

, et al.: Theoretical and experimental assessment of degenerate primer tagging in ultra-deep applications of next-generation sequencing. Nucleic Acids Res, 2014; 42:e98.

18.

Wenger

, Peluso

, Rowell

, et al.: Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol, 2019; 37:1155–1162.

19.

Huang

, Raley

, Jiang

, et al.: Towards better precision medicine: PacBio single-molecule long reads resolve the interpretation of HIV drug resistant mutation profiles at explicit quasispecies (haplotype) level. J Data Mining Genomics Proteomics, 2016; 7:182.

20.

Van Duyne

, Kuo

, Pham

, Fujii

, Freed

: Mutations in the HIV-1 envelope glycoprotein can broadly rescue blocks at multiple steps in the virus replication cycle. Proc Natl Acad Sci U S A, 2019; 116:9040–9049.

21.

Verbist

, Thys

, Reumers

, et al.: VirVarSeq: A low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering. Bioinformatics, 2015; 31:94–101.

22.

Yang

, Charlebois

, Macalalad

, Henn

, Zody

: V-Phaser 2: Variant inference for viral populations. BMC Genomics, 2013; 14:674.

23.

Zagordi

, Bhattacharya

, Eriksson

, Beerenwinkel

: ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics, 2011; 12:119.

24.

Brumme

, Poon

AFY

: Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res, 2017; 239:97–105.

25.

Houssaini

, Assoumou

, Miller

, Calvez

, Marcelin

, Flandre

: Scoring methods for building genotypic scores: An application to didanosine resistance in a large derivation set. PLoS One, 2013; 8:e59014.

26.

Huber

, Metzner

, Geissberger

, et al.: MinVar: A rapid and versatile tool for HIV-1 drug resistance genotyping by deep sequencing. J Virol Methods, 2017; 240:7–13.

27.

Noguera-Julian

, Edgil

, Harrigan

, Sandstrom

, Godfrey

, Paredes

: Next-generation human immunodeficiency virus sequencing for patient management and drug resistance surveillance. J Infect Dis, 2017; 216:S829–S833.

28.

SahBandar

, Samonte

, Telan

, et al.: Ultra-deep sequencing analysis on HIV drug-resistance-associated mutations among HIV-infected individuals: First report from the Philippines. AIDS Res Hum Retroviruses, 2017; 33:1099–1106.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB