Abstract
The dynamic HIV-1 epidemic has resulted in the emergence of several different subtypes and recombinant forms that may differ in biological properties. A recombinant form of CRF02_AG and subsubtype A3 (A3/02) was recently described based on env sequencing and was associated with faster disease progression rates compared with its parental strains. Here, we performed near full-length sequencing of the A3/02 variant to characterize the recombination patterns of a potential novel and more pathogenic circulating recombinant form of HIV-1 in Guinea-Bissau. HIV-1 proviral DNA was extracted from blood samples of individuals infected with the A3/02 recombinant form. The recombination patterns were investigated for six samples that were successfully amplified and sequenced. We found that all six full-length genomes were recombinant forms composed of CRF02_AG and A3 with a recombination hot-spot in the C2 region of env. However, the recombination patterns in the remaining genome differed between samples. Two samples displayed similar recombination profiles, indicative of a homogeneous recombinant form circulating in the population in Guinea-Bissau, whereas the remaining four samples represented unique recombinant forms. The characterization of five different recombination profiles indicated a high frequency of recombination. The recombination breakpoint in the C2 region was identified as the principal common feature shared between sequences, suggesting that this region may have an impact on disease progression rate. Since novel recombinant forms may have characteristics associated with a higher potential of spread in the human population, this study highlights the importance of continuous screening and surveillance of the HIV-1 epidemic.
Introduction
O
The heterogeneous nature of HIV-1 has resulted in the classification of three groups of which the major (M) group is the largest one, composed of nine different subtypes (A–D, F–H, J, K), more than 70 CRFs, and a large number of URFs. 9 Subtypes and CRFs tend to be geographically isolated with subtype B being the most prevalent subtype in North America, Europe, and Australia and subtype C being the most prevalent in South and East Africa and India. 10 However, in parts of the world in which more than one subtype is prevalent, new recombinant forms may arise. In fact, recombinant forms (CRFs and URFs) are now estimated to account for approximately 20% of the global HIV-1 infections, 11 with CRF02_AG being the most prevalent variant in several West African countries accounting for 39–83% of the infections. 12 –14 Still, three major HIV-1 subtypes have been described in Guinea-Bissau, West Africa: CRF02_AG, subsubtype A3, and a recombinant form of CRF02_AG and A3 referred to as A3/02. 13,14
Increasing evidence suggest that there may be differences between viral variants in viral load, 15 –17 disease progression rate, 13,18 –24 chemokine receptor use 25,26 and vertical transmission rate. 27 We recently reported an accelerated rate of disease progression to AIDS and AIDS-related death for individuals infected with A3/02 compared to individuals infected with A3. 13 In comparison to results from other studies, it was also suggested that infection with A3/02 was associated with the fastest progression to AIDS of all subtypes described to date. The A3/02 recombinant form was characterized based on the C2-V3 region of env with all variants sharing a recombination breakpoint in the C2 region. We hypothesize that the A3/02 recombinant may represent a new CRF in Guinea-Bissau. To investigate the recombination pattern of the entire genome, we analyzed the near full-length genome of six samples collected from individuals infected with the A3/02 recombinant.
Materials and Methods
Sample selection and DNA amplification
We previously identified 19 individuals infected with a recombinant form between the HIV-1 CRF02_AG and subsubtype A3 (referred to as A3/02
13
) in a large cohort of police officers in Guinea-Bissau.
28,29
Plasma-depleted blood cell pellets (including HIV-infected lymphocytes) or frozen whole blood samples were available from eight of the 19 individuals. DNA was extracted from blood samples using the QIAamp DNA Blood Mini kit (Qiagen, Stockholm, Sweden) according to the manufacturer's instructions. Eluted DNA was spectrophotometrically quantified using a Nanodrop 2000c (Thermo Scientific, Stockholm, Sweden). Details on polymerase chain reaction (PCR) conditions and amplification and sequencing primers (modified from
30,31
) are found in Supplementary Methods and Supplementary Tables S1 and S2 (Supplementary Data are available online at
In brief, HIV-1 proviral DNA was amplified in a nested approach with a starting input of 250–750 ng DNA in the first round. The first PCR using outer sense primers FLOF_623 and FLOR_9632 resulted in a near full-length fragment (approximately 9 kb). Nested PCR was performed in two separate reactions with primers FLIF_634 and FLIR_4980 in one reaction and primers FLIF_4650 and FLIR_9459 in the other (Fig. 1). This resulted in two overlapping fragments covering the near full-length genome of HIV-1 (positions 634–9459 according to Hxb2; GenBank accession number K03455). The second fragment was difficult to amplify and sometimes required an additional nesting step. This was performed using primers FL8_2F and FL18_2R.

Location and size of the amplified fragments in relation to Hxb2. Schematic illustration of the fragments amplified in the various PCRs. The numbers refer to the position in reference sequence Hxb2 (GenBank accession number K03455). ORF, open reading frame.
All amplifications were initially performed using the Long Expand Template PCR system (Roche Diagnostic Systems, Branchburg, NJ) according to the manufacturer's instructions. In cases in which amplification failed, amplification was further attempted using the PrimeSTAR GXL DNA polymerase kit (TaKaRa Bio, Shiga, Japan), according to the manufacturer's instructions. Amplified fragments were molecularly cloned using the InsTAclone cloning system (Thermo Scientific) and TOP10 cells (Invitrogen, Stockholm, Sweden) and amplified with Platinum Taq DNA polymerase High Fidelity (Invitrogen) according to the manufacturer's instructions using conventional M13 primers (−20 and −24). Each full-length genome, composed of two overlapping fragments, was sequenced using a total of 38 overlapping sequencing primers (Supplementary Table S1) and the BigDye Terminator v.1.1 Cycle Sequencing Kit (Applied Biosystems, Stockholm, Sweden).
Sequence analysis
All 38 nucleotide sequences of one full-length genome were assembled into one consensus sequence in CodonCode Aligner v.1.5.2 (CodonCode Corporation, Dedham, MA) and manually edited. The majority of the genome was covered by a minimum of two overlapping sequence reads but a few short regions were determined from a single read of high quality (the length of each region with a single read was 4–261 nucleotides, with a maximum 278 nucleotides over the entire genome). Sequences were aligned with the same set of subtype reference sequences as previously used to determine the subtype in env, including sequences of subsubtype A1, A3, and CRF02_AG 13 (Supplementary Table S3). Since the objective was to investigate the recombination pattern, we allowed sequences with stop codons and non-codon-based insertions/deletions to remain in the analyses. Insertions/deletions and regions that were difficult to align were removed from the multiple alignments prior to subsequent analysis.
Phylogenetic analyses
To verify that the full-length sequences originated from the same individual as the env sequences used for subtype analyses, 13 the full-length sequences were aligned with the env dataset and a maximum-likelihood (ML) phylogenetic tree was constructed using the inferred model, GTR+I+G, with Garli v.2.0. 32 Statistical support for internal branches was determined by the ML-based approximate likelihood ratio test (aLRT) Shimodaira–Hasegawa (SH)-like branch support, as implemented in PhyML. 33,34 Statistical support for branches was considered as follows: 0.800<SH>0.849, moderate support; 0.850<SH>0.899, strong support; SH>0.900, very strong support. To investigate the phylogenetic relationship of the near-full length genomes, we also constructed ML trees with SH-support as described above, using a dataset composed of the six query sequences and the reference sequences of subsubtype A1, A3, and CRF02_AG (Supplementary Table S3).
Next, the recombination pattern of the full-length sequences was analyzed in SimPlot v.3.5 35 using reference sequences of subsubtype A1, A3, and CRF02_AG (Supplementary Table S3). We used a 140 base pair (bp) sliding window along the sequence alignment in 20 bp increments, an empiric transition/transversion ratio, and an F84 model of evolution. For each analysis, 500 bootstrap replicates were generated. Potential recombination breakpoints between subtypes were considered when the percentage of permuted trees for a given subtype was above 70%. All regions longer than 100 bp with a support for recombination above 70% of the permuted trees were confirmed phylogenetically by constructing ML trees with SH-support as described above.
The analyses described above were performed by inclusion of only A-like subtypes. To ensure that regions in which subtype assignment failed did not belonged to non-A-like subtypes these regions were aligned with reference sequences of all major subtypes (downloaded from the Los Alamos Sequence Database 9 ) and ML phylogenetic trees were constructed as described above.
Ethics
The study was approved by the ethics committees of the Ministry of Health of Guinea-Bissau, the University of Lund, Sweden, and the Karolinska Institutet, Stockholm, Sweden. Study participants were counseled and provided informed oral consent.
Results
This study included samples from eight of 19 HIV-1-positive individuals who were previously described as being infected with a recombinant form of CRF02_AG and subsubtype A3 (A3/02). 13 Full-length amplification and sequencing were successful for samples from six study participants; three were men and three were women (Supplementary Table S4). The years of HIV-1 diagnosis ranged from 1993 to 2006, and the age at diagnosis ranged from 21 to 52 years. The time from estimated date of infection to collection of samples included in this study ranged from 2 to 11 years. The six study participants did not differ from the remaining 13 individuals infected with the A3/02 recombinant in terms of gender, age at seroconversion, time to AIDS, time to AIDS-related death, or date of seroconversion (data not shown).
Near full-length sequencing was performed on proviral DNA and sequences were aligned with reference sequences of CRF02_AG, A1, and A3. A short fragment of the nef gene of sample DL5918 (position 8937 to 9065 according to Hxb2, GenBank accession number K03455) displayed substantial sequence deviation compared to the other sequences, and this region was therefore replaced by gaps in this sample. We first verified that the full-length proviral DNA sequences used in this study clustered with the partial env (C2-V3) sequences obtained from viral RNA of the same individuals in a previous study. 13 ML phylogenetic trees were constructed and all six full-length sequences were found to cluster with their previously described env sequence, verifying that no sample contamination or mix-up had occurred.
Next, we investigated the recombination pattern of the six full-length genomes. BootScan analyses were performed to identify potential recombination breakpoints that were subsequently confirmed using ML phylogeny. Illustrations of the recombination patterns are presented in Fig. 2. The majority of the genomic regions could be assigned a specific subtype supported by SH-values >0.800. However, in some cases, the SH-value did not reach 0.800 despite a clear segregation between subtypes (Fig. 2). Due to the sometimes short fragment sizes, some regions could not be assigned any subtype. Although A-like subtypes are by far the most prevalent subtypes in Guinea-Bissau, other subtypes have also been described in the country. 13,14 To determine whether the unknown regions represented non-A-like subtypes, these regions were aligned with reference sequences of all major subtypes and ML phylogenetic trees were constructed. Most of the regions that could not be assigned a specific subtype in the previous analyses were here confirmed to be A-like (clustering with A-like sequences supported by SH >0.800). However, some regions did not contain enough phylogenetic signal to allow subtype assignment (Fig. 2).

Recombination pattern of six HIV-1 A3/02 recombinant forms identified in Guinea-Bissau. Regions that could not be ascertained as to subtype are shown in light gray as unknown. Among these regions, most were found to be subtype “A-like” whereas no specific subtype could be assigned to a few regions [indicated by a double cross (‡)]. CRF02_AG is shown in gray and subsubtype A3 in dark gray. Regions indicated by an asterisk (*) clustered with the assigned subtype but the SH-value for statistical support was <0.800. The scale at the bottom of the figure represents the nucleotide positions according to reference sequence Hxb2 (GenBank accession number K03455). The recombination patterns were illustrated using the recombinant HIV-1 drawing tool v.2.1.0. 9
The previously described recombination breakpoint in the C2-V3 region of env between subsubtype A3 and CRF02_AG 13 was confirmed in all sequences. The estimated recombination breakpoint positions were obtained from BootScan analyses and a recombination hot-spot was identified in the 3′ end of the C2 region (nucleotides 6812–7109 in Hxb2, GenBank accession number K03455). Despite the shared breakpoint position between A3 and CRF02_AG in the C2 region for all sequences, the length of the 5′ located A3 fragment differed between sequences and most sequences had unique recombination profiles. However, a similar pattern of recombination was observed in sequences DL3039 and DL5918 in which a fragment of vif and a fragment of env were assigned subsubtype A3, with the remainder of the genome belonging to CRF02_AG, suggesting a common ancestry of the two recombinant forms. In contrast, the other four full-length genomes (DL3234, DL3773, DL4186, and DL5308) had variable recombination patterns indicative of URFs.
Since the A3/02 sequences mainly consisted of CRF02_AG, they all clustered together with the CRF02_AG reference sequences in the phylogenetic analysis. However, the larger the part of the sequences that was composed of A3, the more the sequences separated from the CRF02_AG cluster and clustered in between the subtype-specific clusters of reference sequences. This was most evident for sequence DL3773 (the sequence with the largest fraction of the genome composed of subsubtype A3). In support of the recombination profiles, sequences DL3039 and DL5918 clustered close together in the phylogenetic tree of near full-length sequences with high statistical support (SH=1.0, Fig. 3A).

Maximum-likelihood phylogenetic analyses of the near full-length
To further investigate the relationship between the genomes, we constructed ML trees of specific subregions. Samples DL3039, DL5918, and DL4186 had very similar recombination profiles, but differed in their subtype composition in vif. Although it could not be fully supported by an SH >0.900, the vif gene of DL3039 and DL5918 appeared to be partly composed of subsubtype A3. We therefore set out to investigate the possibility of a shared ancestry for samples DL3039, DL5918, and DL4186. Separate analyses of vif did not support a common ancestry for the three samples but rather confirmed a recombinant subtype composition in the vif gene of DL3039 and DL5918 but not DL4186 (Fig. 3B).
To investigate the genetic relationship in a region in which all samples were mainly composed of CRF02_AG, phylogenetic analyses of the gag-pol region was conducted (Fig. 3C). We analyzed the region up to the breakpoint position where sample DL3234 recombined to A3 instead of CRF02_AG (position 4009 according to Hxb2, GenBank accession number K03455, Fig. 2). The results of this analysis verified a close relationship between sample DL3039 and DL5918. The distinct clustering of these two sequences from the remaining sequences suggested that DL3039 and DL5918 shared a common ancestry and have evolved separately from the other CRF02_AG composed sequences. Thus, our results indicate that DL3039 and DL5918 could represent a CRF in Guinea-Bissau. However, characterization of the same recombinant form in a third epidemiologically unlinked individual is still needed before conclusively reporting the emergence of a new second-generation CRF.
Discussion
In this study, we describe the recombination pattern of near full-length HIV-1 genomes isolated from six individuals living in Guinea-Bissau. The individuals were infected with HIV-1 recombinant strains of subsubtype A3 and CRF02_AG. In a previous study these recombinants were determined by sequencing of the HIV-1 env C2-V3 region. 13 Here, near-full length sequencing revealed that all sequences were recombinant forms of CRF02_AG and A3 and confirmed the recombination breakpoint in the HIV-1 env C2-V3 region. While an exact mapping of the breakpoint positions is difficult, our results showed that the breakpoint position was located within the C2 region, which has previously been found to be a recombination hot-spot. 36 –38
When comparing the near full-length genomes, five different recombination patterns were identified among the six sequenced genomes. Two sequences, which were very similar in subtype composition, were composed of CRF02_AG in most parts of the genome, but parts of vif and env were composed of subsubtype A3. Our results suggest that these two sequences possibly represent a common recombinant form that is circulating in Guinea-Bissau. There is no direct epidemiological link that we know of between the two individuals infected by these viruses. They were both members of the national police force of Guinea-Bissau at the time of sampling, as were all participants in the cohort. They were both females, which would be less indicative of a direct epidemiological link. Theoretically they may of course both have become heterosexually infected from the same individual (most likely by a male), but we cannot provide evidence or any indications for that. Still, further characterization of this recombinant form in a third epidemiologically unlinked individual is required before the variant may be denoted as a CRF. The remaining four sequences showed unique recombination patterns over the entire genome, thus representing distinct URFs.
We acknowledge that the sample size in this study was limited. Still, out of the 19 individuals described to be infected with the C2-V3 A3/02 recombinant form, five unique recombination patterns were described. It is possible that one, or more, CRFs could have been identified by including more individuals. However, in this study no further samples were available. CRF02_AG and subsubtype A3 are the most prevalent subtypes in Guinea-Bissau and it is therefore not surprising that the A3/02 recombinant forms have emerged. However, the high frequency of different URFs found here indicates that dual infections (superinfections and coinfections 39 ) are common in this population. The rate of dual infection varies widely among studies, most likely due to differences in the populations evaluated, frequencies of antiretroviral therapy, length of follow-up, and detection methods used. 40 However, in a recent study of the general population in Uganda the rate of superinfection was found to be comparable to the rate of primary infection. 41 Given that dual infections may be common in the general population, they could have an impact on the HIV-1 epidemic, in part due to an increased probability of the emergence of novel recombinants. Indeed, recombinant forms (CRFs and URFs) are now estimated to account for approximately 20% of worldwide infections. 11 Studies from several European countries, which have previously been dominated by HIV-1 subtype B infections, have described an increased spread of non-B subtypes and novel recombinant forms. 42 –45 Our data do not allow us to determine whether the individuals included in this study were themselves dual infected by CRF02_AG and A3, or if they were infected by a recombinant form that emerged in another individual. In three of the six included individuals, the env sequences examined by us in a previous study 13 and the full-length sequences obtained in this study were isolated from samples collected at different times post infection (sampling separated by 4–6 years). However, although env and full-length sequences clustered together and shared the same breakpoint in the C2-V3 region it is not possible to determine whether these individuals were dual infected or infected by a recombinant variant that emerged in another infected individual.
We previously reported that infection with the A3/02 recombinant was associated with a more aggressive disease progression compared to the parental A3 strain. Whether or not individuals were infected with the recombinant form, or whether it emerged in another individual and outcompeted the two parental strains, this recombinant form clearly seems to have properties that result in faster disease progression. The analysis of these full-length genomes identified the C2 recombinant breakpoint as the principal common feature shared between the recombinant forms. Further studies are required to elucidate by what mechanisms this recombination breakpoint may influence disease progression rate.
It is tempting to speculate that the breakpoint itself, the specific composition of A3 in the first part of the fragment and CRF02_AG in the second part of the fragment, or the emergence of certain motifs in the amino acid sequence when the two parental strains are combined into a new recombinant form, may be important in modulating the rate of disease progression through functional, structural, or conformational modification. A switch in coreceptor use from CCR5 to CXCR4 is associated with disease progression and the frequency of CXCR4 use has been found to differ between subtypes. 25,26,46 –48 Coreceptor tropism is usually predicted based on the V3 region, but other env regions have also been implicated as predictors of coreceptor use. Coetzer et al. recently demonstrated that the C2 region alone was sufficient for CCR5 restriction, despite the fact that the rest of env (including the V3 region) is associated with CXCR4 use. 49 Thus, it is possible that the recombination breakpoint that we identify in the C2 region of env could modulate the rate of disease progression by directly influencing coreceptor tropism.
Our characterization of five different recombination patterns composed of CRF02_AG and subsubtype A3 provides further evidence for the continuously dynamic evolution of HIV diversity and the emergence of URFs and CRFs reported lately. 50 Our results highlight the importance of surveying the HIV-1 epidemic in general and the emergence and further spread of new URFs in particular, as these may represent precursors of CRFs with a higher potential of spread in the population.
Sequence Data
The genomic sequences included in this study have been deposited in GenBank and have been assigned the following accession numbers: KR067667–KR067672.
Footnotes
Acknowledgments
This work was supported by grants from the Swedish Research Council. The members of the SWEGUB CORE group are Babetida N'Buna, Antonio Biague, Ansu Biai, Cidia Camara, Joakim Esbjörnsson, Marianne Jansson, Sara Karlson, Jacob Lopatko Lindman, Patrik Medstrand, Fredrik Månsson, Hans Norrgren, Angelica A. Palm, Gülsen Özkaya Sahin, and Zacarias José da Silva.
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
