Abstract
Human immunodeficiency virus type 1 (HIV-1) circulating recombinant form (CRF) 02_AG is a major recombinant variant in different geographic areas and is predominant in West and Central Africa. Of particular interest is the increased frequency of CRF02_AG in patients living in Italy. In the present study, phylogenetic analyses were performed on gag, pol (integrase), and env (gp120 and gp41) gene sequences from 34 CRF02_AG-infected patients living in Italy. Thirty out of 34 (89.4%) patients were from western Africa, 3/34 (8.8%) were born in Italy, and 1/34 (2.9%) was from Cuba. Phylogenetic analysis revealed the presence of a well-supported clade (aLRT score>0.75) of sequences only in gp120 and gp41 trees. Evolutionary rate estimation showed a faster evolution for the gp120 gene with respect to the gag, integrase, and gp41 genes. This finding was confirmed by the analysis of interpatient variability. Intrapatient variability was greater in gp120 gene sequences; 10/19 (52.6%; p<0.001) patients had a ratio of dN/dS>1 as compared with gag, integrase, and gp41 gene sequences with dN/dS ratios<1. In summary, phylogenetic analyses of CRF02_AG strains offer a perspective on intrapatient and interpatient variability among CRF02_AG-infected patients living in Italy. In addition, divergent phylogenetic relationships were observed among different genomic regions.
H
HIV-1 strains are differentiated into four major groups (M, N, O, and P). Within the M group, different subtypes (A–D, F–H, J, and K), 55 circulating recombinant forms (CRFs), and a growing number of unique recombinant forms (URFs) have been reported. 1,2 The prevalence and geographic distribution of groups, subtypes, and CRFs are rapidly evolving due to increased migration flow, international travel, and global trade. 3,4 Subtype C is responsible for the majority of HIV-1 infections worldwide (48%), followed by subtype A (12%), subtype B (11%), CRF02_AG (8%) and CRF01_AE (5%), subtype G (5%), and subtype D (2%), while subtypes F, H, J, and K cause fewer than 1% of infections. 2 In particular, the CRF02_AG variant is present in different geographic areas, and is predominant in West and Central Africa. 3
The molecular epidemiology of HIV-1 subtypes in Italy was recently updated, and revealed a modified picture since the initial spread of the epidemic. 5,6 Of particular interest was the increased frequency of CRFs in patients living in Italy, 7,8 mainly consequent to an increased immigration flow from African countries. Among HIV-1 CRFs, CRF02_AG is the most frequent in Italy. 5 In the present study, a series of CRF02_AG genotypes was examined by phylogenetic analyses of four genomic regions. Evolutionary rates and interpatient variability were estimated for each sequence dataset. Moreover, intrapatient variability was assessed in a subset of patients with multiple time point sampling.
In the period 2001–2011, 1,806 baseline genotypic resistance tests were performed from as many patients referred to the Institute of Infectious Diseases, University of Pavia, Fondazione IRCCS Policlinico San Matteo Pavia, Italy. Demographic data, including country of origin, were collected from all CRF02_AG patients. All patients were naive to integrase inhibitors, entry inhibitors (T20), and CCR5 ligand inhibitors (maraviroc) at all sampling time points. Administered treatments included RT and protease (PR) inhibitors only.
The study was approved by the Institutional Review Board of the Fondazione IRCCS Policlinico San Matteo according to guidelines on the use of biological specimens for scientific purposes and in keeping with Italian law (art.13 D.Lgs 196/2003), and after having obtained informed written consent from the patients.
Virus strain subtyping was initially based on full-length protease sequences and partial RT sequences (codons 41 to 237) performed for the evaluation of RT and PR inhibitor resistance. Sequences were analyzed using the HIV-1 subtyping tool at the Monitoring Service of the Department of Molecular Biology of the University of Siena, Siena, Italy (
Stored peripheral blood mononuclear cell (PBMC) samples from patients infected with CRF02_AG were used for this study. Proviral sequence analysis of the CRF02_AG strains was performed on the genes gag, pol (integrase), and env (gp120 and gp41) not submitted to antiviral drug pressure in these patients. After viral DNA extraction using the automatic Easy Mag extractor (Biomerieux, Lyon, France), parts of the gag gene 10 (36–281 aa; p17 and p24 partial region), pol gene 11 (716–982 aa; integrase), and env gene V3 domain (254–378 aa; gp120) 12 and (511–683 aa; gp41) were amplified by nested polymerase chain reaction (PCR) as previously reported. For the amplification of the gp41 gene the following primers were used: gp41-1 5′-TAGGAGCTTT GTTCCTTGGGTTCTTG-3′ and gp41-2 5′-GGTGAATAT CCCTGCCTAACTCTATT-3′ in the first-round PCR and gp41-3 5′-GGTTCTTGGGAGCAGCAGGAA-3′ and gp41-4 5′-CCTACCAAGCCTCCTACTATCA-3′ in the nested PCR.
All amplifications were carried out in a 100 μl reaction volume containing 10 μl of extracted DNA, 2 μl AmpliTaq Gold DNA Polymerase enzyme (Applied Biosystems, Foster City, CA), and 200 μM (each) dNTPs and 10 μl 10×PCR reaction buffer. The thermal profile was 10 min at 94°C for DNA polymerase activation, followed by 50 cycles consisting of 1 min at 95°C, 1 min at 52°C, 1 min and 10 s at 72°C, and a final 7 min extension at 72°C. Subsequently, the nested PCR was performed using 3 μl from the first PCR and the following thermal profile: 10 min at 94°C for DNA polymerase activation, followed by 30 cycles consisting of 1 min at 95°C, 1 min at 50°C, 1 min and 10 s at 72°C, and a final 7 min extension at 72°C. β2-Microglobulin was amplified in parallel as a housekeeping gene standard.
13
Direct sequencing of PCR products was performed using nested PCR primers and the BigDye Terminator v1.1 Cycle Sequencing kit (Applied Biosystems, Foster City, CA) with automatic sequencer (ABI PRISM 3100 genetic analyzer DNA Sequencer, Applied Biosystems, Foster City, CA). The sequence data were assembled and edited using Sequencher version 5.0 (Gene Codes Corporation, Ann Arbor, MI) and compared with available HIV complete genome sequences using the Blast program (
Sequences were aligned using the Clustal W program integrated in the MEGA package, version 5.0.
14
The best-fitting nucleotide substitution model was estimated using the Modeltest program and a hierarchical likelihood ratio for each of the analyzed genome regions. Maximum likelihood phylogenetic trees were constructed by using the online PhyML program.
15
The reliability of specific clusters in the inferred tree was evaluated by using the SH-like approximate likelihood ratio test (aLRT).
15
Cluster was defined as ≥3 sequences showing the node of tree with an aLRT score>0.75. Phylogenetic trees were drawn with FigTree v.1.4.0 (
HIV-1 reference sequences (including CRF02_AG sequences) were obtained from the Los Alamos HIV sequence database. Data sets of 88 complete genome CRF02_AG reference sequences from different geographic regions were compiled. Additional HIV-1 CRF02_AG sequences from other geographic regions (33 gag sequences, 171 integrase sequences, and 143 gp120 and gp41 sequences) were also included in each alignment. Breakpoints of recombination could be misleading in the interpretation of evolutionary dynamics among CRF02_AG strains. Thus, to avoid discordant results, recombination breakpoints were excluded from the analysis of the genomic regions as suggested by Abecasis et al.
16
The evolutionary rates for HIV-1 CRF02_AG strains were estimated for each sequence dataset by using Bayesian Evolutionary Analysis Sampling Trees (BEAST) software version 1.7.4.
17
The substitution rates were generated using a general-time-reversible substitution model with gamma distribution and invariable rates among sites, with Markov chain Monte Carlo (MCMC) runs of 100,000,000 steps sampled every 10,000 steps and analyzed with Tracer v.1.5 with a discarded burn-in of 20%. Different coalescent priors (constant population size, exponential growth, logistic growth, Bayesian skyline plot, and skyride) were tested using both strict and relaxed molecular clock models. Different clock models were compared by calculating the Bayes factor (BF).
18
The difference (in loge space) of marginal likelihood between any two models is loge BF. Evidence against the null model (i.e., the one with a lower marginal likelihood) was indicated by 6>(2 loge BF)>2 (positive), 10>(2 loge BF)>6 (strong), and (2 loge BF)>10 (very strong). BF calculations were performed with Tracer v.1.5 software (
Interpatient variability was calculated as the mean proportion of nucleotide identity between all sequence pairs. Only the baseline virus sequence was taken into account in the divergence estimation. To estimate the intrapatient variability, the nonsynonymous-to-synonymous substitution (dN/dS) ratio and the mean proportion of nucleotide differences between the baseline virus sequence and the more recent virus sequence for each patient were used. The ratio of dN to dS substitutions is an indication of selection pressure; values<1 indicate negative selection, values=1 indicate neutral evolution, and values>1 indicate positive selection. The rates of synonymous (dS) and nonsynonymous (dN) substitutions were calculated using the SNAP program implemented in the Los Alamos HIV Databases.
During the 10-year study period, 1,806 patients underwent HIV-1 subtyping. Among these, 214 (13.4%) were infected with HIV non-B subtype. Of these, 34/214 (15.8%) were infected with the CRF02_AG subtype and were included in the study. Thirty out of 34 (89.4%) were from western Africa (23 from Cote D'Ivoire, three from Cameroon, two from Togo, one from Ghana, and one from Burkina Faso), 3/34 (8.8%) were born in Italy, and 1/34 (2.9%) was from Cuba. We assumed that HIV-1 infection was acquired in the country of origin. The three patients born in Italy reported sexual intercourse with Ivorian sex workers.
Four phylogenetic trees were constructed with sequences from different HIV-1 genes. In the gag tree, sequences from patients living in Italy were dispersed among sequences from other countries. Only one cluster including sequences from native Italian and Ivorian patients showed an aLRT score>0.75 (Fig. 1). Likewise, in the integrase tree a single cluster with an aLRT score>0.75 was composed of sequences from two native Italian and three Ivorian patients that were different from those observed in the gag tree. The remaining sequences observed were intermixed with other sequences. In the gp120 tree 21/34 (61.8%) sequences were included in four clusters supported by an aLRT value>0.75 (Fig. 2). All of these clusters included sequences from patients from Western Africa and one native Italian patient. Of note, the III-gp120 cluster was composed only of sequences from Ivorian patients. Similarly, in the gp41 tree, 30/34 (88.2%) sequences belonged to four different clusters (Fig. 2). None of the sequences obtained in this study clustered with sequences from other countries.

Maximum likelihood analysis of CRF02_AG gag and CRF02_AG integrase sequences. Each tree shows the relationship between Italian sequences and reference sequences from other countries. Sequence labels include the number of patients, year of sample, and country of origin. The three-letter abbreviation country codes are as follows: Burkina Faso (BUR), Ivory Coast (CIV), Cameroon (CMR), Ghana (GHA), Italy (ITA), and Togo (TOG). Branch lengths were scaled in nucleotide substitutions per site as indicated by the bar at the bottom. Approximate likelihood ratio test (aLRT) SH-like p-values for supported clades (p>0.75) are also indicated.

Maximum likelihood analysis of CRF02_AG V3(gp120) and CRF02_AG gp41 sequences. Each tree shows the relationship between Italian sequences and reference sequences from other countries. Sequence labels include the number of patients, year of sample, and country of origin. The three-letter abbreviation country codes are as follows: Burkina Faso (BUR), Ivory Coast (CIV), Cameroon (CMR), Cuba (CUB), Ghana (GHA), Italy (ITA), and Togo (TOG). Branch lengths were scaled in nucleotide substitutions per site as indicated by the bar at the bottom. Approximate likelihood ratio test (aLRT) SH-like p-values for supported clades (p>0.75) are also indicated.
Notably, a close phylogenetic relationship was observed in four pairs of patients (#17 and #26, #29 and #39, #5 and #38, and #2 and #27). The sequences from each pair (all Ivorian) were situated on the same branch for all trees analyzed. However, a potential epidemiologic correlation in these patients could not be established.
Substitution rates were estimated as the median of substitutions per site per year of infection for each sequence dataset. Bayesian MCMC analyses under a skyline for the prior tree were used to estimate the rate of evolution. Analysis of the BF showed that the model using a lognormal relaxed clock fit the dataset better than models assuming an exponential relaxed clock or a strict clock. With the lognormal relaxed clock model, the estimated mean rate of evolution was 1.05×10−3 substitutions per site per year (s/s/y) (range 8.32×10−5–2.13×10−3) for the gag gene, 4.05×10−4 s/s/y (range 3.59×10−5–9.68×10−4) for integrase, 1.67×10−3 s/s/y (range 1.41×10−4–3.66×10−3) for gp120, and finally 1.34×10−3 s/s/y (range 2.30×10−4–2.77×10−3) for gp41.
The interpatient variability was estimated in each gene alignment by comparison with sequences from the baseline sample. Nucleotide identities ranged from 84.7% to 97.9% (mean 90.6%) in the gag sequences, 89.3% to 98.1% (mean 94.7%) in the integrase sequences, 50.4% to 100% (mean 73.6%) in the gp120 sequences, and 85.0% to 95.4% (mean 90.1%) in the gp41 sequences.
The intrapatient variability over time (range 1–10 years) was assessed in 19/34 (55.9%) subjects (Table 1). The median intrapatient nucleotide distance measured on the basis of pairwise comparison was higher (5.59%; p<0.05) in gp120 sequences with respect to gag (1.77%), integrase (1.28%), and gp41 (1.82%) sequences. The majority of nucleotide changes within gag, integrase, and gp41 genes involved synonymous substitutions, resulting in a ratio of dN/dS<1, while in the gp120 sequence 10/19 (52.6%; p<0.001) patients had a ratio of dN/dS>1.
Significant dN/dS values are reported in bold.
NA, not available.
Over the past decade, the epidemiology of HIV in Europe has been investigated at length because of its implications in public health, research, and clinical aspects. 2,9 In the present study, we investigated the phylogenetic relationship between HIV-1 CRF02_AG strains in patients presently living in Italy, the majority of whom were immigrants from the Ivory Coast. Four genomic regions not subjected to drug selective pressure in the study group were analyzed.
Phylogenetic analyses showed that CRF02_AG strains from Ivorian patients, from other African patients, and from native Italian patients were dispersed among the sequences from Africa, Europe, and the United States, with only single well-supported clusters in the gag and integrase trees. The same diversity observed within the gag gene was previously reported in CRF02_AG strain sequences from Cameroonian patients, 19 while no information about the phylogenetic evolution of the integrase gene in CRF02_AG strains is available. However, a recent study analyzing pol sequences of CRF02_AG strains from Cameroonian patients living in Italy 7 showed the same intermixing of Italian sequences observed in this study for integrase gene sequences. However, in the gp120 and gp41 trees four well-supported clusters were observed. However, in the gp120 tree the sequences outside the clusters were interspersed with other sequences from different countries, while in the gp41 tree, almost all sequences were included in the four clusters observed.
This finding suggests the closest evolutionary origin for gp41 gene sequences with respect to the other genes analyzed. It may be possible that these clusters were determined by their role in viral replication and pathogenesis of the env gene. In fact, the gp41 is a complex transmembrane protein that contains various well-defined conserved domains that play a role in fusion pore formation. 20 Mutations in this region have been observed to alter the fusion activity. 21 In addition, the results of this study could be explained by the occurrence of convergent evolution in sequences of the env gene as also suggested by others. 22 Nevertheless, CRF02_AG of African origin is broadly distributed across west/central Africa and has apparently been circulated stably there for many years. Thus, the CRF02_AG subtype likely underwent multiple recombination events during its evolution. 23 The entire env gene is sequenced and studied in comparison with sequences from other genomic regions, such as the pol and gag regions. 3,24 In this study, gp120 and gp41 gene sequences have been analyzed separately, and a different pattern of evolution was observed. In accordance with previous observations 3,25 gp120 gene sequences showed higher variability and evolutionary rates, while the gp41 gene sequences were observed to evolve with a rate similar to the gag gene sequences. The lower variation of gp41 gene with respect to gp120 gene has been also described by Steckbeck and collegues. 26
In a subset of patients, an analysis of sequential samples was performed and intrapatient variability was calculated. As expected, our estimates of intrahost evolution based on nucleotides were observed to be similar to those of previous studies on other HIV-1 subtypes. 27,28 Indeed, some clustering might be expected because the people who chose to move to Italy may have common roots in Africa, either a shared social or family link. Of note, the gag and gp41 sequences exhibited similar nucleotide diversity and dN/dS ratios over time. The lower variation in the gp41 is likely due to more strict requirements for structure and function due to large conformational rearrangements that occur during the fusion process. All genomic regions selected for the analyses were not under drug pressure selection; thus, it could be hypothesized that the gag and gp41 sequences were under negative selection. Most likely, the accumulation of nucleotide changes occurred before the introduction of the strains in Italy, including the three native Italian patients who acquired their CRF02_AG infection from Ivorian sexual partners. Nevertheless, closer surveillance will be necessary to better evaluate the evolution of HIV subtypes in Italy as a common route of migration in European regions.
In summary, this study offers a perspective on intrapatient and interpatient variability among CRF02_AG-infected patients living in Italy. In addition, divergent phylogenetic relationships were highlighted between different genomic regions. Thus, it is particularly important to explore evolutionary reconstruction among CRF HIV-1 strains.
Sequence Data
The sequences reported in this study have been submitted to the GenBank database under accession numbers KF922511–KF922646 and KJ612138–KJ612379.
Footnotes
Acknowledgments
The authors thank Daniela Sartori for manuscript editing and Laurene Kelly for revision of the English. The work was supported by the Ministero della Salute, Ricerca Corrente Grant 80207.
Author Disclosure Statement
No competing financial interests exist.
