HLA Supertypes Contribute in HIV Type 1 Cytotoxic T Lymphocyte Epitope Clustering in Nef and Gag Proteins

Abstract

Induction of HIV-1-specific cytotoxic T lymphocyte (CTL) responses largely depends upon the presentation of CTL epitopes to the CD8⁺ T cells aided by a large number of different HLA class I alleles. Although several studies showed the clustering pattern of HIV-1 CTL epitopes, the underlying reason for this tendency remains unresolved. Moreover, the hypothesis that the CTL epitope clusters tend to coincide with the conserved and hydrophobic regions of HIV-1 proteins has been challenged in recent times. The present study aims to characterize and compare the HIV-1 CTL epitope clusters in terms of restricting HLA alleles, hydrophobicity, and sequence conservation in a proteome-wide manner by including a large number of experimentally validated CTL epitopes from the HIV Molecular Immunology Database. CTL epitope cluster distribution analysis in a proteome-wide manner revealed that only two HIV-1 proteins, namely Nef and Gag, have significant cluster-forming capacity where their epitope localization coincides with the hydrophobic and conserved regions. Furthermore, analyses of proteasomal cleavage sites and HLA anchoring motif frequencies in the epitope-dense regions highlighted the role of specific HLA supertypes such as HLA B*07, HLA B*58, HLA A*02, and HLA A*03 in selecting the hydrophobic and conserved amino acid positions within Nef and Gag proteins to be presented as epitopes. Based on our results, we hypothesize that the cluster-forming tendency of HIV-1 CTL epitopes is not a proteome-wide feature confined to Nef and Gag proteins. Their cluster-forming tendency largely depends on the host HLA alleles that contribute significantly in selecting functionally constrained hydrophobic regions within the HIV-1 proteome.

Introduction

I n recent years there has been a paradigm shift toward cytotoxic T lymphocytes (CTL)-based vaccine design for HIV-1 largely due to the increasing body of evidence indicating the important role of CTL in controlling HIV infection. CTLs recognize viral antigen in combination with appropriate major histocompatibility complex (MHC) molecules and exert their effect by killing the infected cells either by lysis or inducing apoptosis. Antiviral CTL activity has been shown to be correlated with clearance of virus particles during the acute phase of infection, whereas decline in CTL activity is associated with disease progression.^1,2 Furthermore, Goulder and Watkins have suggested that HIV infection progresses when viral mutants escape CD8⁺ T cell recognition.³

With the advancement of modern immunological techniques, a large number of HIV-1 CTL epitopes and restricting HLA alleles have been identified as well as deposited in various databases including the HIV molecular immunology database (www.hiv.lanl.gov/content/immunology). This provides an opportunity for detailed characterization of CTL epitopes in terms of their conservation, HLA allele binding capacity, and immunogenicity. Previous analyses revealed that HLA class I-restricted epitopes tend to form overlapping clusters known as the epitope-rich/dense region whereas the regions deficient of any epitope clusters are called the epitope-poor regions.⁴ Penciolelli et al. have identified a region of four short overlapping clusters in the Nef protein of HIV-1 that was found to be multi-HLA restricted.⁵ In another study, Currier et al. analyzed CTL epitope distribution patterns in the Gag and Nef proteins of HIV-1 from subtype A-infected subjects.⁶

Although several studies, including the one conducted by Yusim et al., indicated the cluster-forming tendency of HIV-1 CTL epitopes,⁷ the underlying reasons for this tendency remain largely unknown. Despite the contribution of different HLA alleles in selecting the overlapping amino acid positions in HIV-1 proteins is gradually being revealed, no association has been drawn between HLA alleles and the HIV-1 CTL epitope clustering pattern. A few HLA alleles such as B*57, B*27, B*52, and A*25 have been proposed as protective alleles whereas B*35 has been implicated as a risk allele.^8

–13 In recent years, this phenomenon signifying the involvement of HLA alleles in the control of HIV-1 has been reconfirmed by whole genome association studies.^14,15 Among the three HLA class I molecules, HLA-B is considered to be the most important factor for restricting HIV disease progression since HLA-B-restricted epitopes appear to be immunodominant and are more conserved than epitopes restricted by HLA-A and C.^16
–18

Furthermore, the conservation and hydrophobic pattern of the CTL epitope clusters within HIV-1 proteome have not been established unambiguously. Hughes and Hughes proposed that CTL epitopes tend to be more conserved and hydrophobic than the remaining portion of the source proteins.¹⁹ In harmony with this hypothesis, Hughes and Silva and Lucchiari-Hartz et al. concluded that the Nef-CTL epitope clusters tend to coincide with hydrophobic and conserved regions, whereas the noncluster regions are predominantly hydrophilic.^20,21 In contrast to the notion that HIV CTL epitopes show a tendency to cluster and coincide with conserved and hydrophobic regions of the proteins, a more recent study based on computationally predicted epitopes revealed that the distribution of CTL epitopes and hydrophobic amino acids in 99% of the HIV-1 protein sequences follows a random pattern and was indistinguishable from the distribution of CTL epitopes in proteins from other proteomes such as hepatitis C virus (HCV), influenza, and three eukaryote proteomes (human, Drosophila, and yeast).²²

The current study involved systematic mining of the thousands of experimentally validated CTL epitope maps deposited in the HIV molecular immunology database (www.hiv.lanl.gov/content/immunology) to evaluate the CTL epitope clusters with respect to restricting HLA alleles, hydrophobicity, and sequence conservation. Finally, in order to examine the underlying reason for the clustering tendency, we investigated the relationship between epitope cluster distribution with proteasomal cleavage sites and HLA supertypes anchoring motifs.

Materials and Methods

Construction of the HIV-1 CTL epitope dataset and retrieval of HIV-1 protein sequences

The complete list of all experimentally verified HIV CTL/CD8⁺ epitopes, mapped to within a region of 14 amino acids or less, including the best defined epitopes and their relevant information including amino acid position and restricting HLA alleles, was obtained from the HIV molecular immunology database (www.hiv.lanl.gov/content/immunology). Out of 1,232 CTL epitopes from 12 different source proteins listed in the 2010HIV molecular immunology database, over 1,000 were associated with human subjects and were used for analysis. Among the epitopes, 974 were experimentally found to be restricted by one or more HLA alleles. Each epitope is presented in its appropriate amino acid location relative to the sequence of reference strain HXB2 (GenBank accession number K03455). Premade protein alignments (one protein sequence per patients) of nine individual HIV proteins (Env, Gag, Nef, Pol, Rev, Tat, Vif, Vpr, and Vpu) representing a wide spectrum of HIV sequences including A–K subtypes were retrieved from the 2009 web alignment archive of the HIV sequence database (www.hiv.lanl.gov/content/sequence/HIV/mainpage.html). As the amino acid positions of the listed epitopes were assigned with respect to the reference strain HXB2, it was made sure that the reference strain was included in each alignment to make the amino acid position of the sequences comparable within the whole alignment.

Identification of the Epitope-Dense and Epitope-Poor regions by the cumulative binomial probability method

The regions within each HIV-1 source protein have been identified as Epitope-Dense or Epitope-Poor by the cumulative binomial probability (CBP) method as described by Schmid et al. ²² Briefly, this method counts the number of epitope C-terminals (e) within a window of a fixed size (w) and calculates the probability (P) of finding one or more C-terminal(s) within the same window of size (w). The calculated probability P is the cumulated binomial probability given as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}P = \mathop\sum_{i = e}^w ( {^w}C_i ) *f^i* ( 1 - f ) ^{w - i}\end{align*}\end{document}

where f is the average density of C-terminals and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}({^w} C_i) = \frac {w!} {i! (w - i) !} \end{align*}\end{document}

is the binomial operator. However, in our case, the average density of C-terminals (f) was calculated based on all HIV proteins, which assisted the comparison among the proteins. The region with calculated probability less than 0.05 (p<0.05) was declared as the Epitope-Dense region. To calculate the Epitope-Poor region, the calculation was repeated but in this case the number of non-C-terminals was considered where f was redefined as the average number of non C-terminal positions; e is the number of non C-terminal positions within the window. If it was decided that a region was to be neither dense nor poor, we declared the region as average.

Assessment of the sequence variability within the HIV-1 proteome by calculating Shannon entropy scores

The entropy in each HIV-1 source protein reflects the variability of amino acids in each position. Shannon entropy for each position in the premade protein alignment was calculated by the formula described by Yusim et al. ⁷ Briefly the prealigned sequences of each HIV-1 protein were obtained and the Shannon entropy E^N in each position was calculated as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}E^N = - \mathop\sum_{i}p_i^N \log ( p_i^N )\end{align*}\end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$p_i^N$$\end{document} is the frequency of the i-th amino acid in the N-th position. A certain position was chosen if valid amino acids were present in at least 60% of the sequences in that position. Once a position was selected, the gaps were not considered in the entropy calculation. To estimate distribution pattern of amino acids within regions of varying degrees of entropy, an algorithm based on Mahalanobis distance was developed (see Supplementary Text file S1 for details; Supplementary Data are available online at WWW.liebertpub.com/aid). Briefly, we calculated the average and variances of entropy values in a sliding window of length 9. The population average and variance values were calculated from the combined values from each protein. The threshold value χ was chosen from the Chi-square table ensuring 95% confidence level. Once the test is satisfied, the region can either be high or low (entropy). Therefore, for the regions that satisfied the criteria, the window average was compared with the population average and if the window average was higher than the population average, the region was declared a “High entropy region” and otherwise it was declared a “Low entropy region.” The regions that did not satisfy the criterion were declared as a “Neutral or Average entropy region.” The Pearson correlation coefficient was calculated between the arrays consisting of the amino acid positions, classified as “Epitope dense” and “Low entropy.”

Determination of the hydrophobicity pattern of HIV-1 proteins

The hydrophobicity value for each position within each HIV-1 source protein was calculated using the algorithm by Abraham and Leo.²³ Details of the hydrophobicity calculation are given in Supplementary Text file S2. The final value of hydrophobicity at a position is essentially a smoothed value of hydrophobicity scores of individual amino acids within the selected window around the position. The calculation was performed on the sequences of a certain source protein prior to their alignment, while after the alignment the positions of each sequence were mapped onto those in the aligned sequence (inserting gaps between positions). In the set of aligned sequences, the hydrophobicity value of a certain position was taken as the median of all hydrophobicity values in that position. The distribution of the individual amino acid positions within the estimated high or low hydrophobic regions was identified by the Mahalanobis distance method, as described above (see Supplementary Text file S1 for details). The Pearson correlation coefficient was calculated as previously (described in the previous section) except in this case the high hydrophobic amino acid positions were used instead of low entropy regions.

Proteasomal cleavage site prediction

The immunoproteasomal cleavage sites were predicted in a proteome-wide manner for HIV-1 through the employment of NetChop 3.0 (www.cbs.dtu.dk/services/NetChop), as described by Yusim et al. ⁷ Briefly, the NetChop tool, an artificial neural network-based program that utilizes a training dataset composed of 1,260 publicly available MHC class I ligands, relies on a flanking region of eight amino acid residues on either side of a potential position to predict its C-terminal cleavage probability in protein sequences.²⁴ The program assigns a value from 0 to 1 to each position in a protein where higher values indicate a higher probability of the cleavage. Now in order to predict the cleavage sites within individual HIV-1 proteins, the premade alignments for each protein were used as inputs and for each site of the alignment the site-specific prediction scores were calculated as the medians of the proteasomal cleavage site (PCS) scores from all protein sequences in the alignment. The distribution of scores both for poor and dense regions were compared using the Mann–Whitney statistical test. An alternative approach was also applied to capture the enrichment of C-terminal cleavage sites that involved counting the number of predicted and nonpredicted cleavage sites in both epitope-poor and epitope-dense regions based on NetChop recommendation (0.5), followed by the calculation of p value by Fisher's exact test.

Calculating the anchoring motif density

To further investigate the clustering property of HIV-1 proteins, the densities of B and F anchoring motif residues for each individual HLA supertypes were calculated in epitope-dense and nondense regions for each protein. Aligned proteins were used for this purpose in a manner similar to the “Proteasomal cleavage site” prediction. For this analysis, the anchoring motif data, specificity of HLA B and F pockets, and different www.syfpeithi.de/) and Sidney et al.²⁵ The anchoring motifs in proteins were identified by the HLA motif scanner tool hosted in the HIV database (www.hiv.lanl.gov/content/immunology/motif_scan/motif_scan).

Statistics

All statistical tests were performed with GraphPad Prism software, version 5.0, or with the R Package for Statistical Computing (www.r-project.org).

Results

CBP method reveals the CTL epitope-dense regions within the HIV proteome

The epitope-dense/poor regions for the 12 HIV-1 proteins were defined using the CBP method. Among the four proteins (Gag-p24, Nef, RT, and gp160) that exhibited a relatively higher length of epitope-dense regions (>100 aa), p24 was the most successful with two epitope-dense regions spanning 207 amino acids cumulatively (Fig. 1a). For Nef, the total length of the epitope-dense region (117 amino acids) was distributed into two unequal regions (Fig. 1b), while those of RT and gp160 were found to be uniformly dispersed into three and five regions, respectively (Fig. 1c and d). Two separate regions with a collective length of 79 amino acids within the Gag-p17 were identified as epitope dense (Fig. 1e). For the remaining proteins (Vpr, Vif, Tat, Rev, Protease, and Integrase) except Vpu (no epitope-dense region was identified) the epitope-dense regions were less than 50 amino acids as revealed by the CBP method (Supplementary Fig. S1A–F; Supplementary Data are available online at www.liebertpub.com/aid).

FIG. 1.

Epitope-dense regions with HLA allele hit count of HIV-1 proteins. This figure shows the number of unique HLA alleles along with the estimated dense amino acid positions for five different HIV-1 proteins: Gag-p24 (a), Nef (b), RT (c), gp160 (d), and Gag-p17 (e). The x-axis represents the amino acid positions of each protein whereas the y-axis corresponds to the number of unique HLA alleles restricting the amino acid positions. The horizontal gray bars indicate the cytotoxic T lymphocyte (CTL) epitope-dense amino acid positions or regions within individual proteins.

CTL epitope-dense regions of Nef, p24, and p17 coincide with the low entropy and hydrophobic regions

To assess the sequence variability within the HIV-1 proteome, the prealigned sequences of individual proteins were used to calculate the Shannon entropy for each protein. Four proteins—Integrase, Gag-p24, Protease, and Reverse transcriptase—displayed relatively lower mean entropy scores, 0.12, 0.15, 0.14, and 0.16, respectively, compared to the rest of the HIV-1 proteins including Nef (0.42) and p17 (0.40) as previously shown by Yusim et al. ⁷ The possible correlation between regional entropy (the entropy scores averaged over a window size of nine amino acid) and epitope density was revealed by Pearson correlation. Although a negative correlation was found for six proteins, namely, Nef, gp160, Integrase, RT, Gag-p24, and Gag-p17, strong negative correlations were observed for Nef (0.51), p24 (0.47), and p17 (0.62), indicating the conserved nature of their CTL epitope-dense region whereas gp160, RT, and Integrase showed a very weak negative correlation. All the remaining proteins showed a positive correlation with the positional entropy values. However, when the Pearson correlation was calculated between regional hydrophobic scores and epitope-dense regions, an opposite pattern appeared. For instance, relatively higher positive correlation coefficient values were found for Nef (0.57), Gag-p17 (0.52), and Gag-p24 (0.5) implying the hydrophobic nature of the epitope-dense regions of these proteins. Protease, Integrase, and gp160 showed an intermediary positive correlation, whereas for the rest of the proteins the correlation values were either negative or too low to be significant. Figure 2 shows the overlapping pattern of epitope-dense regions with low entropy and high hydrophobic positions within Nef, p24, and p17.

FIG. 2.

Comparison of the overlapping patterns of epitope-dense regions with low entropy and high hydrophobic regions within three HIV-1 proteins (Nef, p17, and p24). The distributions of amino acid positions identified as epitope dense, low entropy, and high hydrophobic of Nef (a), p17 (b), and p24 (c) are indicated by different bars (lower: epitope dense; middle: low entropy; and upper: high hydrophobic). The Nef represents the best overlapping pattern as its epitope-dense positions (lower) overlapped with low entropy (middle) and high hydrophobic (upper) regions. Gag-p17 also showed a reasonable overlapping pattern among the concerned regions. In the case of p24 the entire protein was found to be conserved and hydrophobic as reflected by the middle and upper color.

HLA B58, HLA A02, HLA B07, and HLA A03 supertypes were most successful in restricting the amino acid positions of epitope-dense regions of Nef, p24, and p17

The immunogenic property of the identified epitope-dense regions was examined through the analysis of the HLA alleles, restricting the dense region amino acid positions, in terms of HLA class and supertypes. A total of 40 and 75 unique class A and B alleles were reported, respectively, as restricting HLAs for the whole HIV-1 proteome. Unlike HLA A and B the number of class C-restricting alleles is lower. The total numbers of unique class A and B alleles recognizing the overlapping epitopes in the dense regions of each HIV-1 protein were plotted (Fig. 3a). The highest number of class B alleles was found to be associated with the dense region of Gag-p24 (58) followed by Nef (47) and RT (41), while the dense region epitopes of Gag-p17 were restricted by 21 class B alleles. In the case of class A alleles, the highest number was found for both gp160 and RT while the Nef, Gag-p24, and Gag-p17 dense regions were restricted by 17, 15, and 18 individual class A alleles. The individual epitope-restricting HLA alleles of class A and B were grouped in supertypes by their anchoring motif specificity as described by Sidney et al.¹⁹ The unique alleles for class A and B were found to be distributed in four (with one unclassified) and six (with 15 unclassified) HLA supertypes, respectively.

FIG. 3.

Total number of amino acid positions targeted by unique HLA class I A/B alleles and supertypes in the epitope-dense region of HIV-1 proteins. (a) Bar diagram indicating the number of unique alleles of both class A (black bars) and B (white bars) associated with the CTL epitope-dense regions of individual HIV-1 proteins. The primary y-axis corresponds to the number of unique HLA alleles, while the secondary one stands for the total amino acid length of the dense regions of HIV-1 proteins, which is shown as a gray line. The predominance of HLA B alleles was found for most of the HIV-1 proteins except for Tat. A clear bias toward B alleles was observed for Nef and p24. (b) The number of amino acid positions restricted per unique alleles of an HLA supertype as bar diagrams. This analysis was significant for the identification of important HLA supertypes that restrict the significant amino acid position within the whole HIV-1 proteome. It revealed that HLA B*58, A*02, B*07, and A*03 were the most successful supertypes in targeting large amino acid regions of the HIV-1 proteome (greater than 100 amino acids per unique HLA alleles).

Figure 3b, showing the total number amino acid position covered by each supertype, demonstrates supertypes that successfully target relatively higher numbers of amino acid positions. For instance, HLA B*58, HLA A*02, HLA B*07, and HLA A*03 supertypes cover regions spanning an over 100 amino acid position within the HIV-1 proteome while the rest of the supertypes including HLA A*24, B*08, B*27, B*44, B*62, and A*01 along with two unclassified allele groups were found to be associated with regions of less than 100 amino acids.

Figure 4 illustrates the contribution of HLA supertypes in restricting the amino acid position in dense regions of Nef, Gag-p24, Gag-p17, gp160, and RT. For all five proteins, namely Nef, Gag-p24, Gag-p17, gp160, and RT, significant numbers of amino acid positions were restricted by HLA B*07 supertypes (40%, 24%, 27%, 40%, and 45%, respectively) and HLA A*03 (41%, 46%, 27%, 26%, and 42%, respectively) (Fig. 4). HLA B*58 and A*02 were also found to be major restricting supertypes for Nef and Gag-p24. Apart from A*03 and B*07, HLA A*01 and B*08 were also found to be associated with a significant portion of the Gag-p17 dense region (Fig. 4).

FIG. 4.

Percentage of dense regions covered by different class A and B HLA supertypes in five HIV-1 proteins. Pie charts show the proportion of amino acid positions covered (as percentage) by each class B and A HLA supertypes for Nef, p24, p17, gp160, and RT. Among the class B supertypes, HLA B*07 was most predominant for all five proteins, while HLA A*03 covers the highest portion of the dense regions among the A supertypes for Nef, p17, and p24.

Estimation of PCS and anchoring motif density within the epitope-dense regions of selected HIV-1 proteins

We opted to investigate the distribution of the proteasomal cleavage sites within individual HIV-1 proteins on the basis of the hypothesis that the densities of the PCSs are higher in the epitope-dense regions compared to nondense regions. A two-sided nonparametric Mann–Whitney test revealed no significant difference in the distribution of PCS scores between the epitope-dense and epitope-poor regions of any of the HIV-1 proteins (as indicated by high p-value) implying the uniform distribution of the PCSs throughout the HIV-1 proteome. Performing Fisher's exact test after counting the number of predicted and nonpredicted C-termini cleavage sites in poor and dense regions based on the NetChop recommended score (0.5), as an alternative method, yielded the same outcome.

To further investigate the underlying reason for the observed epitope clustering tendency, the B and F anchoring motif densities in the epitope-dense and epitope-poor regions in five HIV proteins, namely Nef, Gag-p24, Gag-p17, RT, and gp160, were investigated. The amino acids identified as anchoring motifs were grouped into nine classes according to their physical properties as described by Sidney et al.²⁵ In the case of Nef the anchoring motif frequency percentages in the dense region were significantly higher for aliphatic, aromatic, basic, hydrophobic, large hydrophobic, and proline categories compared to the poor region while for p24 higher frequencies were found for all amino acid classes in the dense region (Fig. 5a and b). This analysis was also supported by the t test as significant differences were observed between the B and F pocket anchoring motif frequencies of poor and dense regions for both Nef (<0.04 and <0.02, respectively) and Gag-p24 (<0.04 for both). For the rest of the proteins including gp160, RT, and p17 no significant disparities were observed between the anchoring motif frequencies of the dense and poor region. Although for Gag-p17 the mean frequencies of F and B pocket motifs were not statistically different, the dense region frequencies of some motif categories were much higher than that of the poor region. For instance, the frequency of the aliphatic anchoring motif in the dense region was 2-fold higher compared to that of the poor region. Furthermore, aromatic and proline motifs were absent in the Gag-p17-poor region whereas the frequency of the F pocket basic motif was almost 2-fold higher in the dense region than the poor region (Fig. 5c).

FIG. 5.

B and F anchoring motif densities within the epitope poor and dense regions of six HIV-1 proteins. The figure corresponds to the B (black bars) and F (gray bars) anchoring motif densities categorized into nine types according to the physicochemical properties of the amino acid residues in the epitope poor and dense positions of six HIV-1 proteins: Nef (a), p24 (b), and p17 (c). Here the densities of the motifs were expressed as the frequency of occurrence (percentage of motif residues) within the dense and poor regions of the selected set of proteins.

Discussion

The current study aimed to investigate the clustering pattern of HIV-1 CTL epitopes and any possible correlation of regional hydrophobicity and entropy scores with the observed CTL epitope clusters. Although the cumulative lengths of the dense regions in four proteins—Nef, p24, gp160, and RT—were found to be relatively higher, the individual regions for gp160 and RT were much smaller and were distributed uniformly throughout the entire length of the protein and thus weakened the compactness of the cluster. Moreover, relatively higher lengths of both these proteins mask the true nature of the cluster as their epitope-dense region was found to be proportional to protein length (Supplementary Fig. S2). In contrast, p17 presented an opposite phenomena in which the cumulative dense region length (79 aa) is quite significant considering its small size, reflecting the compactness of the epitope cluster. Our analysis disclosed a strong negative correlation between the low entropy and epitope-dense regions for Nef and p17 in harmony with the previous findings of Yusim et al.⁷ In contrast to Yusim et al.,⁷ we observed no statistically strong correlation between the regional entropy pattern and epitope-dense region for gp160 (0.21), indicating epitopes are not concentrated in conserved regions. A random distribution pattern of epitopes throughout gp160, together with its high antigenic variability, may account for the observed weak correlation for gp160 in the current study.

However, unlike gp160, p24 shows a high level of sequence conservation and large epitope-dense region in almost its entire length, explaining the positive correlation estimated. The correlation of epitope-dense positions with hydrophobic regions Nef, p17, and p24 was strong positive, indicating their hydrophobic nature, while the rest of the proteins, including gp160, showed a weak correlation. It was evident that the epitope-dense regions of the Nef and Gag proteins were both hydrophobic and conserved in nature. This phenomenon was not observed throughout the whole HIV-1 proteome and hence it diminishes the general applicability of the hypothesis that all the HIV CTL epitopes were conserved and hydrophobic in nature. But at the same time our evidence differs from the results obtained by Schmid et al., which concluded that CTL epitopes and hydrophobic amino acids throughout the HIV-1 proteome follow a random distribution pattern for Nef and Gag proteins.²²

Next we investigated the underlying reason for the observed high CTL epitope clustering pattern of Nef, Gag-p24, and p17 by studying different aspects of CTL epitope processing and presentation including the PCSs and anchoring motifs as well as the role of HLA supertypes in selecting the HIV-1 CTL epitopes to be presented. A recent study showed that although for most of the proteins the fractions of unique HLA A and HLA B-restricted positions are equivalent in the total HIV-1 proteome, Gag-p24 and Nef seemed to be preferentially targeted by HLA B alleles.¹⁸ In the current study we observed a bias toward the HLA B alleles for the epitope-dense regions of most of the HIV-1 proteins except Tat, which harbors an identical number of unique restricting class A and B alleles (Fig. 3a). In accordance with previous findings, a major bias toward class B alleles was observed for p24 and Nef for which the number of restricting B alleles was almost 4-fold and 3-fold higher compared to class A alleles, respectively, while for p17, only minor differences between the numbers of unique A and B alleles were found.

To provide a plausible explanation for the clustering pattern of Nef and Gag proteins we hypothesized that the high density of the PCSs in the epitope-dense regions may account for the clustering pattern. However, comparison of the PCS scores in the dense and poor regions among individual proteins yielded no statistical significance, imparting the uniform distribution pattern of proteasomal cleavage sites within the HIV-1 proteome as opposed to the findings of Lucchiari-Hartz et al.²¹ Next the anchoring motif densities of five individual proteins (Nef, p24, p17, RT, and gp160) were examined where we compared the frequencies of different B and F anchoring motif categories within the epitope-dense and epitope-poor regions. A significant variation for both F and B motifs between the dense and poor regions of both Nef and p24 signifies the higher concentration of anchoring residues in dense regions, which may explain the high epitope clustering tendency of these proteins. Apart from the comparison of overall motif densities between the dense and poor regions, different motif categories consisting of residues with similar physicochemical properties were compared. For Nef, B pocket proline, aromatic, aliphatic, and small amino acids frequencies were present and were 2-fold higher in the dense region compared to the poor region. Two-fold or higher frequencies were observed for all the F pocket motif types except acidic and proline in the Nef dense region while the greatest difference in frequency was observed for aromatic followed by hydrophobic and basic F motif residues. Both Nef and p24 showed a sharp inclination toward HLA B supertypes, especially for HLA B*07 and B*58 that prefer proline and small motif residues in their B pocket, respectively. Both these supertypes prefer aromatic, aliphatic, or hydrophobic residues in their F pocket.²⁵

Among the HLA A supertypes, HLA A*03, preferring small and basic residues in the B and F pocket, respectively, was the most dominant in restricting CTL epitopes for both Nef and p24. HLA A*02, which favors B pocket small/aliphatic and F pocket aliphatic/small-hydrophobic residues,²⁵ also significantly contributed to restricting the amino acid position in the p24 dense region. All these B and F-type motif residues preferred by the successful HLA supertypes (B*07, B*58, A*03, and A*02) were identified with significantly higher frequency within the Nef and p24 dense region, which underscores the role of these HLA supertypes, in particular the HLA B alleles, in preferentially selecting the amino acid positions in the dense region for antigen presentation.

Although no significant difference was shown between the poor and dense regions in terms of anchoring motif frequencies for p17, a major variation of frequencies in several motifs was observed. For instance, the average frequencies of F pocket aliphatic, hydrophobic, and aromatic motifs preferred by both B*07 and B*08 and the B pocket aliphatic motif preferred by A*03/01/02 were much higher (over 2-fold) in the dense region in comparison to the poor region. The lower B pocket proline frequency was spotted, which certainly limits the epitope presenting capacity of the B*07 supertype. However, as the B anchoring motif is not yet established for B*08, the F pocket proline residues together with unidentified B pocket residues may account for the HLA B allele coverage within the p17 dense region. All these findings point toward the high density of B and F anchoring motifs preferred by the HLA supertypes, which restricts major portions within the dense region of Nef, p24, and p17 and thus ultimately serves as a contributing factor toward the cluster-forming tendency of these proteins. The high density of B and F pocket motif residues specially preferred by the HLA B supertypes in Nef and p24 not only explains the bias toward certain HLA B supertypes but also highlights their role in selecting large numbers of conserved and hydrophobic overlapping amino acid positions to be presented to Tc cells.

One of the focal points of the current study was to establish a rigorous definition of epitope clusters composed of experimental validated CTL epitopes and characterize those clusters in terms of hydrophobicity and amino acid variability. To achieve this, we employed some of the methods that have been previously described by Yusim et al.⁷ For instance, correlation between positional entropy scores and epitope distribution and correlation of epitope distribution with proteasome C-terminal cleavage prediction scores from the NetChop prediction program was previously done by Yusim et al.⁷ However, we argue that doing the same analyses with a larger epitope dataset could reveal some interesting aspects of epitope clustering, which was precisely our rationale behind those analyses. The conclusions drawn from the current study regarding clustering patterns to some extent match the findings of Yusim et al.⁷ in the sense that we also found nonrandom clustering patterns in three HIV-1 proteins in contrast to Schmid et al.²²

However, unlike Yusim et al.⁷ we did not observe this clustering pattern in the rest of the HIV-1 proteome. Analyzing a larger dataset as well as employing stringent criteria to qualify a set of epitopes as a cluster could account for the difference. For example, in this study 974 experimentally validated epitopes, found to be restricted by one or more HLA alleles, were included for analyses out of 1,232 CTL epitopes listed in the HIV molecular immunology database. Although we closely followed the methodology of Yusim et al.⁷ to some extent, we employed new approaches to investigating the important role of HLA alleles in selecting the preferred residues as anchoring motifs that perfectly correlates with the observed clustering phenomena of some HIV-1 proteins, namely Nef, p24, and p17. We believe this approach is original and the previous studies including that of Yusim et al.⁷ did not address this research question.

These findings could have an important impact on designing an HLA-based vaccine because the HLA-HIV epitope interaction has been shown not only to be the major modulating factor in controlling the infection but also the source of primary selective pressure inducing rapid viral evolution.^15,26 An HIV-1 vaccine should ideally encompass the class I-restricted CTL epitopes based on the HLA profiles of the target population and may be designed for a population of a specific geographic location for which the vaccine epitopes are restricted by the most frequent HLA supertypes for that population.^27,28 Therefore specific HLA supertypes contributing significantly in selecting conserved amino acid positions within Nef and Gag proteins appear to be crucial in designing HLA-based vaccines.

Footnotes

Acknowledgments

The authors thank Dr. Karina Yusim, Professor Dr. Peter J. Delves, and Professor Dr. M. Sayedul Islam for their helpful review, suggestions, and guidance.

Authors' contributions: S.C.: study design, planning, dataset construction, clustering analysis, writing; R.C.: development of the algorithm for cluster analysis, entropy and hydrophobicity calculation, writing; M.T.R.: PCS frequencies calculation; A.K.: writing and reviewing; A.R. and M.S.: anchoring motif frequency calculation. All authors read and approved the manuscript.

Author Disclosure Statement

No competing financial interests exist.

References

Borrow

, Lewicki

, Hahn

, Shaw

, Oldstone

. Virus-specific CD8+ cytotoxic T-lymphocyte activity associated with control of viremia in primary human immunodeficiency virus type 1 infection. J Virol, 1994; 68:6103–6110.

Carmichael

, Jin

, Sissons

, Borysiewicz

. Quantitative analysis of the human immunodeficiency virus type 1 (HIV-1)-specific cytotoxic T lymphocyte (CTL) response at different stages of HIV-1 infection: Differential CTL responses to HIV-1 and Epstein-Barr virus in late disease. J Exp Med, 1993; 177:249–256.

Goulder

, Watkins

. Impact of MHC class I diversity on immune control of immunodeficiency virus replication. Nat Rev Immunol, 2008; 8:619–630.

Zhang

, Cornette

, Berzofsky

, DeLisi

. The organization of human leucocyte antigen class I epitopes in HIV genome products: Implications for HIV evolution and vaccine design. Vaccine, 1997; 15:1291–1302.

Culmann-Penciolelli

, Lamhamedi-Cherradi

, Couillin

, Guegan

, Levy

, Guillet

, Gomard

. Identification of multirestricted immunodominant regions recognized by cytolytic T lymphocytes in the human immunodeficiency virus type 1 Nef protein. J Virol, 1994; 68:7336–7343.

Currier

, Visawapoka

, Tovanabutra

, Mason

, Birx

, McCutchan

, Cox

. CTL epitope distribution patterns in the Gag and Nef proteins of HIV-1 from subtype A infected subjects in Kenya: Use of multiple peptide sets increases the detectable breadth of the CTL response. BMC Immunol, 2006; 7:8.

Yusim

, Kesmir

, Gaschen

, Addo

, Altfeld

, Brunak

, Chigaev

, Detours

, Korber

. Clustering patterns of cytotoxic T-lymphocyte epitopes in human immunodeficiency virus type 1 (HIV-1) proteins reveal imprints of immune evasion on HIV-1 global variation. J Virol, 2002; 76:8757–8768.

Carrington

, Apple

, Park

, Muñoz

, Saah

, Goedert

, Winkler

, O'Brien

, Rinaldo

, Detels

, Blattner

, Phair

, Erlich

, Mann

. Influence of combinations of human major histocompatibility complex genes on the course of HIV−1 infection. Nat Med, 1996; 2:405–411

Migueles

, Sabbaghian

, Shupert

, Bettinotti

, Marincola

, Martino

, Hallahan

, Selig

, Schwartz

, Sullivani

, Connors

. HLA B*5701 is highly associated with restriction of virus replication in a subgroup of HIV-infected long term nonprogressors. Proc Natl Acad Sci USA, 2000; 97:2709–2714.

10.

Carrington

, Nelson

, Martin

, Kissner

, Vlahov

, Goedert

, Kaslow

, Buchbinder

, Hoots

, O'Brien

. HLA and HIV-1: Heterozygote advantage and B*35-Cw*04 disadvantage. Science, 1999; 283:1748–1752.

11.

Flores-Villanueva

, Yunis

, Delgado

, Vittinghoff

, Buchbinder

, Leung

, Uglialoro

, Clavijo

, Rosenberg

, Kalams

, Braun

, Boswell

, Walker

, Goldfeld

. Control of HIV-1 viremia and protection from AIDS are associated with HLA-Bw4 homozygosity. Proc Natl Acad Sci USA, 2001; 98,9:5140–5145.

12.

Carrington

, O'Brien

. The influence of HLA genotype on AIDS. Annu Rev Med, 2003; 54:535–551.

13.

Kiepiela

et al. Dominant influence of HLA-B in mediating the potential co-evolution of HIV and HLA. Nature, 2004; 432:769–775.

14.

Fellay

, Shianna

, Ge

, Colombo

, Ledergerber

, Weale

, Zhang

, Gumbs

, Castagna

, Cossarizza

et al. A whole-genome association study of major determinants for host control of HIV-1. Science, 2007; 317:944–947.

15.

Pereyra

, Jia

, McLaren

, Telenti

, de Bakker

, Walker

, Ripke

, Brumme

, Pulit

, Carrington

et al. The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science, 2010; 330:1551–1557.

16.

Kiepiela

, Leslie

, Honeyborne

, Ramduth

, Thobakgale

, Chetty

, Rathnavalu

, Moore

, Pfafferott

, Hilton

et al. Dominant influence of HLA-B in mediating the potential co-evolution of HIV and HLA. Nature, 2004; 432:769–775.

17.

Bihl

, Frahm

, Di Giammarino

, Sidney

, John

, Yusim

, Woodberry

, Sango

, Hewitt

, Henry

et al. Impact of HLA-B alleles, epitope binding affinity, functional avidity, and viral coinfection on the immunodominance of virus-specific CTL responses. J Immunol, 2006; 176:4094–4101.

18.

Costa

AIF

, Rao

, LeChenadecb

, Baarlea

, Kesmirb

. HLA-B molecules target more conserved regions of the HIV-1 proteome. AIDS, 2010; 24:211–215.

19.

Hughes

, Hughes

. Self peptides bound by HLA class I molecules are derived from highly conserved regions of a set of evolutionarily conserved proteins. Immunogenetics, 1995; 41:257–262.

20.

da Silva

, Hughes

. Conservation of cytotoxic T lymphocyte (CTL) epitopes as a host strategy to constrain parasite adaptation: Evidence from the nef gene of human immunodeficiency virus 1 (HIV-1) Mol Biol Evol, 1998; 15:1259–1268.

21.

Lucchiari-Hartz

, Lindo

, Hitziger

, Gaedicke

, Saveanu

, van Endert

, Greer

, Eichmann

, Niedermann

. Differential proteasomal processing of hydrophobic and hydrophilic protein regions: Contribution to cytotoxic T lymphocyte epitope clustering in HIV-1-Nef. Proc Natl Acad Sci USA, 2003; 100:7755–7760.

22.

Schmid

, Kesmir

, de Boer

. The distribution of CTL epitopes in HIV-1 appears to be random, and similar to that of other proteomes. BMC Evol Biol, 2009; 9:184.

23.

Abraham

, Leo

. Extension of the fragment method to calculate amino acid zwitterion and side chain partition coefficients. Proteins, 1987; 2:130–152.

24.

Nielsen

, Lundegaard

, Lund

, Kesmir

. The role of the proteasome in generating cytotoxic T-cell epitopes: Insights obtained from improved predictions of proteasomal cleavage. Immunogenetics, 2005; 57:33–41.

25.

Sidney

, Peters

, Frahm

, Brander

, Sette

. HLA class I supertypes: A revised and updated classification. BMC Immunol, 2008; 9:1.

26.

Rolland

, Tovanabutra

, deCamp

, Frahm

, Gilbert

, Sanders-Buell

, Heath

, Magaret

, Bose

, Bradfield

et al. Genetic impact of vaccination on breakthrough HIV-1 sequences from the STEP trial. Nat Med, 2011; 17:366–371.

27.

Wilson

, Pedersen

, Kunich

, Wilkins

, Mann

, Mazzara

, Tartaglia

, Celum

, Sheppard

. Cross-clade envelope glycoprotein 160-specific CD8+ cytotoxic T lymphocyte responses in early HIV type 1 clade B infection. AIDS Res Hum Retroviruses, 1998; 14:925–937.

28.

Haynes

. HIV vaccines: Where we are and where we are going. Lancet, 1996; 348:933–937.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.03 MB

0.00 MB

0.15 MB

0.05 MB

HLA Supertypes Contribute in HIV Type 1 Cytotoxic T Lymphocyte Epitope Clustering in Nef and Gag Proteins

Abstract

Introduction

Materials and Methods

Construction of the HIV-1 CTL epitope dataset and retrieval of HIV-1 protein sequences

Identification of the Epitope-Dense and Epitope-Poor regions by the cumulative binomial probability method

Assessment of the sequence variability within the HIV-1 proteome by calculating Shannon entropy scores

Determination of the hydrophobicity pattern of HIV-1 proteins

Proteasomal cleavage site prediction

Calculating the anchoring motif density

Statistics

Results

CBP method reveals the CTL epitope-dense regions within the HIV proteome

CTL epitope-dense regions of Nef, p24, and p17 coincide with the low entropy and hydrophobic regions

HLA B*58, HLA A*02, HLA B*07, and HLA A*03 supertypes were most successful in restricting the amino acid positions of epitope-dense regions of Nef, p24, and p17

Estimation of PCS and anchoring motif density within the epitope-dense regions of selected HIV-1 proteins

Discussion

Footnotes

Acknowledgments

Author Disclosure Statement

References

Supplementary Material

HLA B58, HLA A02, HLA B07, and HLA A03 supertypes were most successful in restricting the amino acid positions of epitope-dense regions of Nef, p24, and p17