Mexican HIV-1 Protease Sequence Diversity

Abstract

Protease is one of three enzymes encoded within HIV's pol gene, responsible for the cleavage of viral Gag-Pol polypeptide into mature viral proteins and a target of current anti-retroviral therapy. Protease diversity analysis in Latin America has been lacking in spite of extensive studies of protease-inhibitor resistance mutations. We studied the diversity of 777 Mexican protease sequences and found that all were subtype B except one (CRF02_AG). Phylogenetic analysis suggested the existence of six different clades with geospecific contributions. Thirty-three percent of sites were conserved, 25% had conservative substitutions, and 41% exhibited physicochemical changes. The most conserved regions surrounded the active site, most of the flap domain, and a region between the 60's loop and C-terminal triad. A single sequence exhibited an active site mutation (T26S). Variable sites were mapped to a crystallographic structure, providing further insight into the distribution and functional relevance of variable sites among Mexican isolates.

During 2017, the World Health Organization (WHO) reported 36.9 million HIV-infected individuals around the globe, 3.4 million in the United States, and 230,000 ± 25,000 in Mexico. During the same year, 1.8 million new HIV infections were reported worldwide, 160,000 in the United States, and 15,000 ± 1500 in Mexico as well as 940,000, 56,000 and 4000 ± 1000 HIV-related deaths, respectively.¹ The introduction of antiretroviral therapy (ART) has been crucial at decreasing HIV morbidity and mortality to the point where life expectancy of recently infected individuals is close to that of the general population.² First-line adult ART currently relies on reverse transcriptase inhibitors. However, the preferred second-line regimen consists of a nucleoside reverse-transcriptase inhibitors plus a ritonavir-boosted protease inhibitor (PI).³

HIV Protease is encoded within the pol region and belongs to the aspartic acid protease family.⁴ Protease is responsible for the proteolytic cleavage of the Gag polyprotein into matrix, capsid, nucleocapsid, and p6 proteins as well as for Pol polyprotein processing into reverse transcriptase (p51 and p66), protease, and integrase. Env polyprotein processing into gp120 and gp41 relies on cellular protease activity, not on that of the viral protease. The protease homodimer consists of two 99 amino acid, 22 kDa monomers.⁵ As in all aspartic acid proteases, its catalytic residues (D25, T26, G27) are highly conserved.^6,7 Protease sequence diversity is mostly a consequence of ART-driven selection for mutations that lower affinity for a drug, enabling viral replication in spite of continued ART use (ARVm). As such, genetic variation in viral enzyme-encoding regions poses a major challenge for HIV prevention and treatment.⁸

Here, we present an analysis of the sequence diversity present in integrated HIV-1 sequences from 777 Mexican individuals, including 35 generated by our lab from patients residing in the state of San Luis Potosí.⁹ In addition to our local sequences, 742 Mexican nucleotide sequences were retrieved from the “Geography Search Interface” tool provided by the Los Alamos National Laboratory database.¹⁰ Samples bearing premature stop codons, indels, and evidence of APOBEC3 hypermutation were excluded as well as those having 7% or more of nucleotide ambiguity. Identification of ART-resistance mutations, frameshift mutations, indels, and stop codons used the HIVdb program for genotypic resistance interpretation. Alignments were prepared by using Clustal Omega v1.2.1 and reformatted for unanimity to the HXB2 reference by using a locally developed freely accessible tool (Sequence Unanimity Reformatting tool).

Translation of protein sequences relied on EMBOSS Transeq. HIV-1 consensus sequences for group M subtypes (A1, A2, B, C, D, F1, F2, H and G) along with outlier group O were obtained from Los Alamos National Laboratory HIV database.¹¹ Phylogenetic trees were produced from nucleotide sequences through a Markov chain Monte Carlo algorithm-based Bayesian analysis suite (BEAUti and BEAST) with a general time-reversible substitution model, estimated base frequencies, gamma plus invariant sites with four categories, and three codon partitions and using a strict clock with a coalescent (constant size) tree earlier.¹²

The 50 million sample trees produced were simplified by using a 10% burn-in to generate a single maximum clade credibility tree using TreeAnnotator v1.10.4 (A. Rambaut and A. J. Drummond, Institute of Evolutionary Biology, University of Edinburgh) and reformatted to a radial cladogram by using FigTree v1.4.4 (A. Rambaut and A. J. Drummond) to highlight clade diversity and composition. Consensus amino acid frequency and Shannon entropy were calculated online by employing amino acid equivalents representing increases in entropy with regards to HXB2 reference.¹³ Entropy levels were arbitrarily classified as high (>0.6), mid (between 0.6 and 0.2), and low (below 0.2).

Nucleotide sequences produced by our laboratory have been deposited in GenBank with accession nos. KT869026, KT869027, KT869029, KT869030, KT869034–KT869036, KT869041–KT869043, KT869045–KT869048, KT869051–KT869056, KT869059–KT869062, and KT869066–KT869076.

In all, 777 nucleotide (297 bp) and amino acid sequences (99 sites) were analyzed. Nucleotide sequence phylogenetic tree topology has 776 of the Mexican sequences forming six different clades within subtype B (clades I–VI in Fig. 1). A single isolate (MX460) representing circulating recombinant form CRF02_AG exhibited greater homology to subtype A and G consensus sequences. This isolate was obtained from a former United States resident Mexican mestizo. It should be noted, however, that the relatively short 297 bases encoding protease are barely sufficient to distinguish HIV-1 subtype D sequences from those of subtype B, a fact that would hinder proper identification of epidemiologically relevant sublineages. This warrants further study into the characterization of full Pol region sequences of Mexican origin.

FIG. 1.

Radial phylogram highlighting clades of Mexican HIV-1 protease-encoding nucleotide sequences. Phylogenetic trees reconstructed form nucleotide sequences using a Markov Chain Monte Carlo algorithm-based Bayesian analysis were reformatted by using FigTree to illustrate six clades (I–VI). The root of the tree harbors consensus O and other HIV-1 group M subtype consensus sequences (A1, A2, C, F1, F2, G, and H). Sample MX460 is identified as CRF02_AG recombinant, forming a subclade with A and G consensus sequences.

As seen from the map provided in Figure 2 and summarized in Table 1, the most extensively sampled states (Mexico City and the State of Mexico) were represented in all clades. Other states with important contribution to each of the clades include Morelos, Nuevo Leon, and Oaxaca to clade I; Jalisco, Oaxaca, and Veracruz to clade II; Jalisco, Oaxaca, and Puebla to clade III; Jalisco, Nuevo Leon, and Veracruz to clade IV; Jalisco and Morelos to clade V; and Jalisco and San Luis Potosí to clade VI.

FIG. 2.

Map depicting Mexican states for which HIV protease sequences were available (gray) at the time of the study; numbers in parentheses indicate number of sequences available.

Table 1.

Percent Contribution of Each Mexican State to the Total Number of Sequences Composing Each Clade (I–VI)

		BC	CH	CS	CX	EM	GR	HG	JC	MI	MO	NL	OA	PU	SL	SO	TL	VE
Clade I (n = 239)	n	14	3	9	32	43	9	—	29	1	19	20	18	11	12	3	—	16
Clade I (n = 239)	%	5.9	1.3	3.8	13.4	18	3.8	0	12	0.4	7.9	8.4	7.5	4.6	5	1.3	0	6.7
Clade II (n = 124)	n	1	1	5	21	23	4	—	22	—	5	8	10	9	3	2	—	10
Clade II (n = 124)	%	0.8	0.8	4.0	16.9	18.5	3.2	0.0	17.7	0.0	4.0	6.5	8.1	7.3	2.4	1.6	0.0	8.1
Clade III (n = 132)	n	1	—	7	16	37	4	—	13	—	10	9	15	11	2	1	1	5
Clade III (n = 132)	%	0.8	0.0	5.3	12.1	28.0	3.0	0.0	9.8	0.0	7.6	6.8	11.4	8.3	1.5	0.8	0.8	3.8
Clade IV (n = 115)	n	3	—	3	16	17	4	—	13	—	7	17	6	7	7	1	—	14
Clade IV (n = 115)	%	2.6	0.0	2.6	13.9	14.8	3.5	0.0	11.3	0.0	6.1	14.8	5.2	6.1	6.1	0.9	0.0	12.2
Clade V (n = 79)	n	3	1	2	17	14	1	1	9	—	11	5	3	3	2	1	—	6
Clade V (n = 79)	%	3.8	1.3	2.5	21.5	17.7	1.3	1.3	11.4	0.0	13.9	6.3	3.8	3.8	2.5	1.3	0.0	7.6
Clade VI (n = 87)	n	6	1	2	17	15	3	—	8	—	4	5	6	5	8	2	—	5
Clade VI (n = 87)	%	6.9	1.1	2.3	19.5	17.2	3.4	0.0	9.2	0.0	4.6	5.7	6.9	5.7	9.2	2.3	0.0	5.7
Total	n	28	6	28	119	149	25	1	94	1	56	64	58	46	34	10	1	56

The distribution of amino acid variations among the 777 Mexican sequences is summarized in Figure 3. Most isolates (97%) bore V^3I, which is typical of subtype B sequences. Six primary antiretroviral resistance mutations (ARVms) were observed (D30N, L33F, M46L, V82A, N88D, and L90M) as well as 12 secondary ARVms (L10I, L10V, V11I, V11L, K20R, K20M, K20I, K43T, G48R, Q58E, A71T, and A71V). Surveillance drug resistance mutations were identified in 12 sequences (MX071, KC168417, KC168143, KC169717, KC169595, KC168811, KC168469, KC169700, KC169346, KC169134, KC168724, and KC168376). Approximately 33% of the amino acid positions (33/99) were conserved, 25% sites (25/99) exhibited conservative substitutions, and physicochemical changes were present in 41% (41/99).

FIG. 3.

Mexican protease amino acid sequence alignment summary. Protein domains and functional regions are shaded to highlight functional domains and regions. These include N- and C-terminal dimerization regions of the terminal domains; the 10's loop, catalytic residues, 60's loop, and C-terminal triad of the core domains as well as the elbow and tip-of-the-flap regions of the Flap domain. Substitutions shown with a single or double diamond correspond to primary and secondary antiretroviral drug resistance mutations, respectively.

On average, individual amino acid sequences differed from HXB2 reference at 1–17 sites (7.4 ± 2.4 standard deviation). Protease regions that were found to be extremely conserved (i.e., less than eight percent protein sequences showing variations at a specific site, that is less than one percent of compiled sequences) included ⁵LWQRP⁹ in the N-terminal domain, positions E²¹–V³² surrounding the catalytic residues, residues M⁴⁶–V⁵⁶ surrounding the tip-of-the-flap region, ⁷³GTVL⁷⁶ and⁷⁸GPTP⁸¹ in the central C-terminal core domain, N⁸³–N⁸⁸ surrounding the C-terminal triad, and ⁹⁴G—F⁹⁹ spanning the C- terminal dimerization region. An active site mutation (T26S) was present in a single sequence (KC168702). The tip of the flap and C-terminal triad regions were conserved, highlighting their known functional role.

The C-terminal triad located in the core domain is highly conserved in all known retroviral proteases and involved in proper folding and dimerization.¹⁴ Only two Mexican sequences (KC168417 and KC169346) had mutations in this region (N88D) represented by a drastic physicochemical change (polar to acidic residue). Interestingly, this substitution is classified as a primary ARVm according to Stanford's database criteria. The tip of the flap region is glycine rich and very sensitive to amino acid changes with a profound impact on protease activity as described by other authors.¹⁵ A single sequence (DQ631417) had a mutation affecting the properties of this region (G49E).

The most variable protease regions include ¹⁰L—K²⁰ spanning the 10's loop of the core domain, ³³L—K⁴³ of the elbow region, and ⁶⁰D—I⁷² region surrounding the 60's loop. The amino terminal half of the elbow region is characterized by a stretch of polar residues followed by basic residues in the C-terminal half. A change from polar to acidic residues was the most common change observed in this region (S37D in 105 isolates, S37A in 7 isolates, and S37G in 25 isolates). Non-conservative substitutions within the 60's loop were observed in 48 sequences (C67 in 6, G48 in 6, and H69 in 36). Positions +12 and +63 located in the N-terminal halves of the core domains and position +37 in the flap domain had the greatest number of different amino acid changes seen (12, 15, and 14 different amino acids changes each, respectively).

Analysis of Shannon entropy further illustrates protein variation (Fig. 4). High levels of entropy were observed in residues surrounding the 10's loop, the elbow region, the 60's loop, and in the first half of the C-terminal domain. Five amino acid positions exhibited high levels of entropy (>0.6), position +12 flanking the 10's loop (hosting 12 different substitutions), position +37 located within the elbow region of the flap domain (hosting 14 substitutions), position +63 and +71 flanking the 60's loop (hosting 15 and 3 substitutions, respectively), as well as position +93 within the C-terminal domain (hosting a single substitution). Mid-levels of entropy were seen in 10 different sites.

FIG. 4.

Mexican protease protein sequence Shannon entropy map. The frequency of the most common amino acids observed at each site in the Mexican protease sequences is shown in the continuous line, whereas the Shannon entropy level for each site is shown in bar graphs. Boxes indicate protein regions, whereas domains are indicated in upper brackets.

One located in +10 at the interface of the N-terminal domain with the core domain (hosting 4 substitutions), positions +16 and +19 in the 10's loop (hosting 2 and 6 substitutions, respectively), positions +36 and +39 near the elbow region (4 substitutions each), as well as positions +64, +67, +69, +70, and +72 in or near the 60's loop (hosting 3, 5, 7, 9, and 7 substitutions, respectively).

The protein entropy levels were mapped into an X-ray diffraction crystal of HIV-1 protease dimer coupled to the antiretroviral drug saquinavir (PDB ID: 3OXC) by using PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.) and provided as a Supplementary Figure S1. In this figure, the protease dimer is represented as a blue cartoon with a surface mesh whereas saquinavir is shown in gray and the different sites of high and highest entropy are shown in a solid surface with orange and red coloring, respectively. Absolutely all sites exhibiting high and mid-entropy were located in the surface of the protease protein and confined to an area located between the elbow and the 60's loop. As expected, variable sites are more tolerated in regions not subjected to high selective pressure, such as catalytic sites and structurally important regions.

It remains unknown as to whether such distribution of variable sites provides HIV with a biological advantage having to do with Gag, Gag-Pol, and Nef precursor polyprotein processing, immune escape, or by interfering/enhancing interactions with other viral or human host proteins. However, recent evidence has shown that residues influencing PI docking are not necessarily located in or near the active site but located in distal regions such as the dimerization interface or flap.¹⁶ Although previous publications have addressed the frequency of PI ARVms, a proper study into the structural distribution of mutations among Latin-American isolates had been lacking. Protease diversity in Mexico and other parts of Latin America is likely to expand in the near future due to current human migratory dynamics.

This study represents the first attempt to characterize the extent of protease diversity present in Mexican HIV-1 isolates and provides a structural overview of its distribution. Unprecedented efforts in the field of genomics, structural virology, and pharmacology have contributed to converting HIV into a chronic manageable disease with sustained control of viral replication.

The integration of these results with other structural, biophysical, molecular dynamics, and virologic data will, undoubtedly, further our understanding of HIV evolution, structure, and function as well as sponsor future research into vaccine development and antiretroviral therapy targeting HIV proteins.

Footnotes

Acknowledgments

The authors thank the physicians and patients attended by CAPASITS San Luis Potosí for providing the samples and understanding the scope and importance of this study.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This project was funded by the Mexican National Science and Technology Council through grants CONACYT I0017 (CB-2011-01) no. 167374 and SSA/IMSS/ISSSTE-CONACYT (FONSALUD-2009-01) no. 115226.

Supplementary Material

Supplementary Figure S1

References

World Health Organization: HIV/AIDS data and statistics. Available at https://www.who.int/hiv/data/en/ (2019), accessed September 26, 2019 .

Weissberg

, Mubiru

, Kambugu

, et al.: Ten years of antiretroviral therapy: Incidences, patterns and risk factors of opportunistic infections in an urban Ugandan cohort. PLoS One, 2018; 13:e0206796.

WHO: Consolidated Guidelines on the Use of Antiretroviral Drugs for Treating and Preventing HIV Infection: Recommendations for a Public Health Approach. Geneva: World Health Organization, 2016, p. 480.

Brik

, Wong

: HIV-1 protease: Mechanism and drug discovery. Org Biomol Chem, 2003; 1:5–14.

Meher

, Patel

: Structural and dynamical aspects of HIV-1 protease and its role in drug resistance. Adv Protein Chem Struct Biol, 2013; 92:299–324.

Kohl

, Emini

, Schleif

, et al.: Active human immunodeficiency virus protease is required for viral infectivity. Proc Natl Acad Sci U S A, 1988; 85:4686–4690.

Pearl

, Taylor

: A structural model for the retroviral proteases. Nature, 1987; 329:351–354.

Vasudevachari

, Zhang

, Imamichi

, Falloon

, Salzman

: Emergence of protease inhibitor resistance mutations in human immunodeficiency virus type 1 isolates from patients and rapid screening procedure for their detection. Antimicrob Agents Chemother, 1996; 40:2535–2541.

Hernandez-Sanchez

, Guerra-Palomares

, Ramirez-GarciaLuna

, Arguello

, Noyola

, Garcia-Sepulveda

: Prevalence of drug resistance mutations in protease, reverse transcriptase, and integrase genes of North Central Mexico HIV isolates. AIDS Res Hum Retroviruses, 2018; 34:498–506.

10.

Los Alamos National Laboratory: HIV sequence database, distribution of HIV-1 sequences. Available at https://www.hiv.lanl.gov/components/sequence/HIV/geo/geo.comp accessed September 26, 2019 .

11.

Los Alamos National Laboratory: HIV sequence alignments. Available at https://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html accessed September 26, 2019 .

12.

Suchard

, Lemey

, Baele

, Ayres

, Drummond

, Rambaut

: Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol, 2018; 4:vey016.

13.

Los Alamos National Laboratory: Entropy: Shannon entropy-two. Available at https://www.hiv.lanl.gov/components/sequence/HIV/geo/geo.comp accessed September 26, 2019 .

14.

Louis

, Ishima

, Torchia

, Weber

: HIV-1 protease: Structure, dynamics, and inhibition. Adv Pharmacol, 2007; 55:261–298.

15.

Shao

, Everitt

, Manchester

, Loeb

, Hutchison

, Swanstrom

: Sequence requirements of the HIV-1 protease flap region determined by saturation mutagenesis and kinetic analysis of flap mutants. Proc Natl Acad Sci U S A, 1997; 94:2243–2248.

16.

Mobaraki

, Hemmateenejad

, Weikl

, Sakhteman

: On the relationship between docking scores and protein conformational changes in HIV-1 protease. J Mol Graph Model, 2019; 91:186–193.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.30 MB