Abstract
Polymorphisms occurring at the p6 gag protein of HIV-1 have been previously found to have an impact on viral fitness and antiretroviral (ARV) resistance, mainly on subtype B genomes. We compared p6 gag variability in a large group of 165 subtype F gag-pol sequences, with 36 subtype B sequences from the same study source, and identified sites of gag-pol coevolution under ARV selection pressure. Subtype-specific differences in the frequency of point mutations, insertions, and deletions previously associated with ARV resistance were found. Also, in our dataset of subtype F genomes a strong association between mutation P5L in the p1/p6 cleavage region of gag and the nelfinavir (NFV) resistance mutation N88DPR was found with no impact on the preference for any of the NFV resistance pathways.
B
HIV-1 subtype F is one of the most underrepresented HIV-1 subtypes, accounting for less than 1% of infections worldwide, and is quite restricted in its geographic distribution to central Africa, South America, and Eastern Europe.
1
Despite its low frequency, subtype F genomic segments are present in more than 10 CRFs, and in numerous URFs formed as a result of recombination events between subtype F and subtypes B, C, and D (Los Alamos HIV Sequence Database–Circulating Recombinant Forms,
HIV-1 intersubtype diversity has been shown to affect the pattern of drug resistance mutations selected on the pol gene upon treatment failure with the classic antiretroviral drugs targeting the PR and reverse transcriptase (RT). 6 In particular, we and others have identified subtype F-specific polymorphisms in the viral PR considered as minor resistance mutations in B subtypes, but their impact on ARV resistance is still a matter of study. 7,8
Although HIV-1 resistance is usually linked to an accumulation of mutations in the pol gene relevant for protein–drug interactions, other mutations, mainly in the transframe region outside and upstream of pol, have also been linked to drug resistance. This region encodes the p6 gag protein involved in pol packaging, particle size determination, and budding. 9 Proteolytic cleavage of the gag-pol polypeptide by the HIV-1 protease is essential for maturation and infectivity, and amino acid substitutions in the p1/p6 cleavage sites can participate in the development of resistance to protease inhibitors (PIs). 10 Although p6 gag is usually 52 amino acids long, a high degree of variability in length as well as in amino acid sequence has been reported among different HIV-1 subtypes. 11 –13 In subtype B viruses, duplications of the PTAPP domain located at the 5′ region of p6 gag (Tsg101 binding site) were found to be selected by ARV-experienced patients, with an increase in infectivity and resistance to nucleoside analog RT inhibitors. 14 Recently, these duplications were also observed to occur more frequently in ARV-experienced patients than in naive individuals infected with subtype C and F viruses. 13 Unlike the PTAP duplications studied in detail, other changes in the p6 gag protein still need to be characterized to help understand the relevance of polymorphic or conserved sites in biological processes of the HIV-1 viral cycle, and in the selection and persistence of ARV resistance patterns.
Our aim was to characterize the polymorphisms occurring at the p6 gag protein, and identify gag-pol coevolutionary changes that may affect the resistance mutation patterns in subtype F PR pol genomes present in a large dataset of BF recombinant strains from Argentina.
We analyzed data generated from Genotypic ARV Resistance Tests performed at the Laboratory of Cellular Biology and Retroviruses at the National Pediatric Hospital “Prof. J.P. Garrahan” from Buenos Aires, Argentina. A total of 201 sequences comprising p6
gag
, PR
pol
, and RT
pol
(codons 1–220) were obtained from free HIV-1 RNA extracted from plasma samples of vertically infected pediatric patients. Only 21 individuals were treatment naive, and the rest were under highly active antiretroviral therapy (HAART), with 55% of the individuals infected, with either subtype B or subtype F, receiving nelfinavir (NFV) as part of their current or previous HAART. Sequences were aligned using the Gene Cutter Tool (
Amino acid changes at p6
gag
were characterized as point mutations, insertions, deletions, or duplications. A sequence logo was generated for all p6
gag
sequences using WebLogo
15,16
(available at
where s is the number of symbols (20 for proteins) and f(b,i) are the fractions of each amino acid at position i. The third term is a small sample correction, where n is the number of sequences in the alignment. The maximum value of Rsequence is 4.32 for proteins, and the minimum value is zero.
Two Sample Logos
17
(available at
Associations between changes at p6 gag and ARV resistance mutations for subtype B and subtype F sequences were analyzed independently by pairwise nonparametric tests. Each resistance-associated amino acid site was coded into a binary variable according to the presence or absence of ARV resistance conferring residues. Site-to-site pairwise associations were evaluated on 2×2 contingency tables by Fisher's exact two-tailed test. To avoid an inflated type I error rate (frequency of false positive tests), two approaches were sequentially implemented: (1) the number of tests was narrowed according to their a priori power under the alternative hypothesis; and (2) two corrections for multiple comparisons procedure (MCP) were alternatively applied. The Bonferroni method was applied to control the per family error rate (PFER, expected number of false positive tests), and consequently the family wise error rate (FWER, probability to at least one false positive test). The Benjamini and Hochberg procedure was followed to ensure an upper limit to the false discovery rate (FDR, fraction of false positives among all significant tests).
p6 gag sequence analysis of plasma virus from 201 HIV-1 vertically infected pediatric patients showed a high degree of diversity, as can be observed in the Logos graphic representation of the alignment (Fig. 1a). Numerous insertions, deletions, and point mutations were identified, mainly in the PTAPP motif, and in the region between amino acids 20 and 35, while toward the carboxyl end of the protein a high degree of conservance in amino acids was observed. To determine polymorphic differences between subtype F and subtype B sequences from our dataset, we compared the amino acid composition with two sample logos. Figure 1b shows that subtype F sequences are enriched in certain amino acids at 11 positions throughout the p6 gag protein, differing from those prevalent in subtype B sequences: N3, A12, G16, I21, P26, Q30, K31, E33, G34, A39, and K42. Next, we looked for changes in p6 gag that could be associated with ARV treatment, by comparing subtype F sequences from naive and ARV-experienced individuals (Fig. 1c). Interestingly, we found that subtype F sequences from ARV-experienced patients were enriched in leucine at position 5 of p6 gag , while sequences from naive patients were enriched in proline. The frequency of the P5L/T/I mutation was 0% in the naive and 37% in the ARV-experienced group. For subtype B, this mutation was similar in both groups (33% in naive versus 23% in ARV experienced, data not shown) despite the small number of sequences available for comparison.

Amino acid diversity in 165 subtype F and 36 subtype B p6Gag sequences.
Next, we analyzed the frequency of duplications, insertions, and deletions in the p6 gag sequences to identify other changes that could be associated with subtype or ARV treatment (Table 1). Of the six changes found in our dataset, deletion of the amino acid at position 22, duplication of P37, and premature stop codon at position 50 were present in almost all subtype F sequences from the naive and ARV group, indicating subtype-specific polymorphisms independent of ARV exposure. In subtype B, these changes were observed in 0–30% of the sequences, and mainly in the ARV-experienced group, although differences were not statistically significant. Of the remaining three changes, insertions in the KQE motif, and between amino acids S25 and P26, were present in 0–11% of the sequences, and mainly in the ARV-experienced group. Interestingly, duplications in the PTAPP motif—previously linked with ARV resistance in subtype B—doubled in frequency in subtype F p6 gag sequences from ARV-experienced individuals, suggesting that the mechanisms leading to insertions in the PTAPP motif under ARV selection pressures are conserved across different subtypes.
aa, amino acid; ARV, antiretroviral; NS, not statistically significant.
To evaluate the association between p6 gag polymorphisms and ARV-associated resistance mutations in PR and RT, we performed pairwise comparisons in the whole dataset of 201 p6 gag -PR-RT sequences. Two point mutations in p6 gag were found to be significantly associated with ARV resistance mutations in HIV PR: (1) P5L/T/I with the secondary mutation N88D and (2) Q26P with RT polymorphisms at positions 36 and 63. While the latter three mutations are typical of subtype F genomes, the former association seems to be conditioned by ARV treatment, since N88D is selected by NFV as a compensatory mutation of the D30N resistance pathway. The mutation P5L (also referred to as P453L or P5′L) in the p1/p6 cleavage region was strongly associated with the NFV resistance mutation N88DPR only in subtype F HIV-1 genomes (p<0.0000). Of note, this association was conserved between P5L/T/I and the major NFV resistance mutation D30NPR, although it did not reach statistical significance (p=0.2982), since sequences carrying the D30NPR mutation alone do not select for the P5L/T/I mutation in p6 gag . Interestingly, the mutations at position P5 appeared as natural polymorphisms in subtype B sequences from naive individuals but were exclusively associated with treatment in subtype F sequences.
To determine if the P5L mutation was present previous to NFV treatment, we looked for pretreatment samples among the individuals who carried subtype F sequences with D30NPR+N88DPR+P5L gag mutations. From the three samples available, we found that none carried the mutations, suggesting that the selection of P5L gag is in a way conditioned by NFV selection pressures. To determine if the presence of the “compensatory” mutation (P5L gag ) favored the D30N+N88D NFV resistance mutation pathway over the other pathways described for this protease inhibitor (L90M or N88S), we compared the percentage of each of the three pathways in a larger number of PR sequences from our database, including 42 subtype B and 275 subtype F (all the latter belonging to BF recombinant pol HIV-1 genes from Argentine strains) (Fig. 2). Of note, this larger database includes the 201 p6 gag -PR pol -RT pol sequences previously investigated. We found no statistical difference between the D30N pathway and the L90M pathway in relation to the HIV-1 subtype in PR. However, it can be observed that D30N is mostly selected in the presence of the N88D mutation in BF recombinants, occurring as a sole NFV mutation in only 7% of the HIV-1 PR BF sequences, but in 22% of subtype B sequences. Despite the interplay between N88D and P5L, the majority of the NFV-resistant HIV-1 strains carry the L90M mutation independently of the HIV-1 subtype (54% in subtype BF and 44% in subtype B), and around 35% of the strains select for the D30N+N88D mutations, with only five cases carrying both D30N+L90M mutations. The N88S mutation was observed in subtype BF only at a low percentage (3%). The similar frequency of the NFV resistance pathways in both HIV-1 subtypes suggests that the presence of P5L gag in HIV-1 subtype F genomes does not confer a selective or evolutionary advantage over subtype B genomes that lack this mutation in the presence of the protease inhibitor.

Percentage of the nelfinavir (NFV) resistance mutation pathways in subtypes B and BF.
Amino acid changes in HIV-1 often occur as a result of natural variation or to selective pressures leading to interclade diversity, immune escape, and viral resistance. In both cases, amino acid changes can have consequences in the phenotypic properties of the viral strains, antiretroviral drug susceptibility, and interaction with host cellular components. We describe for the first time sequence diversity at the p6 gag region in a large dataset of subtype F gag-pol genomes. A comparison between polymorphic sites in subtype F and subtype B strains showed specific patterns of gag-pol coevolution, and in subtype F genomes a strong association between the P5L/T/I mutations at p6 gag and the D30N+N88D NFV resistance mutation pathway.
Previous analyses of p6 gag amino acid changes showed intersubtype differences in the frequency of polymorphisms, as well as duplications of specific motifs, or differences in the length of the protein, indicating a natural flexibility of p6 gag with an unknown effect on protein function and cleavage efficacy. However, most studies focused on subtype B sequences, representing 15% of our dataset. As a result of natural intersubtype divergence, subtype F strains have been shown to differ from subtype B in at least 11 amino acid positions. One of these mutations (Q26P) was strongly associated with subtype F polymorphic mutations at positions 36 and 63 of the PR gene, indicating coevolution of these sites. Interestingly, in subtype G and CRF02_AG sequences, a proline at position 26 is very frequent and has a role in restoring the loss of an MAPK phosphorylation site in p6, and the presence of a glutamine at this position was associated with the P5L/T mutation. 18 This phenomenon was not observed in our dataset of subtype F sequences, but highlights the importance of these sites in HIV-1 adaptability.
Of all the changes involving insertions and deletions in p6 gag , three of them occurred at similar frequency in both subtypes, and the other three were characteristic of subtype F sequences, indicating subtype-specific differences in gag evolution and/or subtype-specific genetic adaptation to ARV selective pressures. In particular, we found a higher proportion of insertions in the KQE motif in subtype F than in subtype B strains, opposite to previous observations where KQE duplications were seen only in subtype B. 11 Duplications in the PTAPP motif were early recognized among patients under ARV failure and were found to confer phenotypic resistance to NRTIs and PIs such as amprenavir, 14,19 and a high viral replication capacity, suggesting a selective advantage for viruses carrying PTAPP duplications. The higher frequency of these duplications among the ARV-experienced group of both subtype B and F sequences confirms the importance of these changes in the development of ARV resistance across different subtypes.
The selection and evolution of gag and PR are believed to significantly interfere with each other, a phenomenon known as “gag-PR coevolution.” In PI-resistant viruses, mutations in gag cleavage or noncleavage sites have shown to improve their replication capacity and fitness, as compensatory polymorphisms participating in the process of accumulation of mutations in response to PIs selective pressure, and facilitating gag processing by a mutant protease. 10,20 A strong association between mutation P5L in the p1/p6 cleavage region and the NFV resistance mutation N88DPR was evidenced in our dataset of subtype F genomes probably due to the high proportion of individuals treated with the protease inhibitor NFV. Recently, this association of mutations was investigated in subtype B strains through bioinformatic analysis and in vitro experiments. 21 By testing mutant HIV-1 viruses, the authors showed that the P5L gag cleavage site mutation has the potential to improve the replication capacity and gag processing of viruses with D30N/N88D, but has little effect on NFV susceptibility. Unlike subtype B p6 gag sequences, our results show that in subtype F, the P5L gag mutation is not a naturally occurring polymorphism, as it was absent in the ARV-naive population and also in pretreatment samples that later selected the mutation. Therefore, the fact that all the subtype F sequences with the N88DPR mutation also carry the P5L gag mutation suggests that the fitness-compensating effect of P5L gag on the D30N/N88D double mutants is stronger in subtype F than in subtype B. Despite this observation, we did not find subtype-specific differences in the proportion of strains selecting for the D30N+N88D, L90M, or N88S NFV resistance pathways, indicating a mild effect in vivo.
In conclusion, our results provide novel information about p6 gag sequence diversity in the underrepresented HIV-1 subtype F, and analyze the interplay between naturally occurring mutations and gag/pol coevolution. Understanding HIV-1 evolution is crucial for a better interpretation of the biological significance of amino acid changes in the context of a specific HIV-1 subtype and ARV therapy.
Footnotes
Acknowledgments
The work was supported in part by CONICET and the “Fundación A. J. Roemmers” and Andres H. Rossi was supported by a fellowship from “Fundación Garrahan.” The authors gratefully thank Carmen Gálvez and Natalia Beltramone for technical assistance.
Author Disclosure Statement
No competing financial interests exist.
