Abstract
HIV-1 evolution generates substantial genetic diversity among isolates, the majority of which are represented in areas where multiple strains cocirculate. A heterogeneous genetic HIV-1 pool has been found in Cyprus, directing us to determine the dynamics of the local HIV-1 infection by characterizing strains isolated from 74 subjects during 2007–2009, representing 88% of the known-living HIV-1-infected population, of whom 53 are newly diagnosed therapy-naive patients and 21 are chronic patients, according to the European HIV Resistance guidelines. Near full-length genome sequences were amplified by RT-nested PCR using diluted RNA from all HIV-1 seropositives and sequenced using a newly designed assay. Resistant mutations were not found among the population of the newly diagnosed therapy-naive patients either to protease, reverse transcriptase, or integrase inhibitors. Phylogenetic analyses indicated subtype B as the main subtype (48.6%), followed by subtype A (18.9%), subtype C (10.8%), CRF02_AG (8.1%), CRF11_cpx (2.7%), and (sub)subtype F1 and CRF37_cpx (1.4% each). Six HIV-1 isolates (8.1%) were not classified in any pure (sub)subtype or circulating recombinant form (CRF). Complete phylogenetic and bootscanning analyses revealed that each isolate had a new, unique recombinant pattern and is distinct from all other CRFs or unique recombinant forms (URFs) reported so far. Two of the six isolates have the same mosaic pattern. Analogous to results of the earlier epidemiological studies, this study expands on the HIV-1 sequence database and reveals the high degree of diversity of HIV-1 infection in Cyprus.
Introduction
H
Molecular surveillance of the HIV-1 diversity has revealed significant heterogeneity in the geographic distribution of HIV-1 variants worldwide and the evolution of HIV-1 over the years. 7 Detailed local HIV-1 sequence data are essential for monitoring the HIV epidemic, for maintaining sensitive sequence-based diagnostics, 10,11 and for aiding in the design of vaccines. 12 –14 The genetic diversity of HIV-1 in Cyprus during 1986–2006 is very broad, with circulating genetic forms of group M clades A, B, C, D, F1, CRF01_AE, CRF02_AG, CRF04_cpx, 15 –17 and a novel URF (D/G) and an unclassified variant, originated from Sub-Saharan Africa. 17 In the present prospective study, we characterized near full-length genome sequences of HIV-1 strains from Cyprus in the period 2007–2009 and identified five natural unique recombinants, which are different from recombinants reported to date.
Materials and Methods
Study subjects
Blood samples were obtained from 74 consenting HIV-1-infected patients at the Cyprus Reference AIDS Clinic of Larnaca National Hospital in a 3-year period, 2007 to 2009. Sixty study subjects were diagnosed from 2007 to 2009 and 14 before 2007. Fifty-three patients represent 85% of all newly diagnosed therapy-naive HIV-1-infected people in this period, categorized according to the European HIV Resistance (EHR) guidelines. 18 The majority of study subjects were Greek-Cypriots, although a number reported traveling or living abroad in the past. All blood samples were processed at the Laboratory of Biotechnology and Molecular Virology of the University of Cyprus on the same day of sampling. A description of the clinical profile of each patient is presented in Table 1.
Indicates the laboratory code for each study subject.
F, female; M, male.
Indicates the date (month/year; −, unknown month) of the first known positive HIV antibody test.
Country of birth of the study subjects; U.K., United Kingdom.
MSM, men who have sex with men; HSX, heterosexual contact; MCT, mother-to-child transmission; TR, transfusion; OHPC, origin from a high prevalence country; IVDU, intravenous drug user; U, unknown.
Information provided by the study subjects. N/A, not available; HBV, hepatitis B virus; HCV, hepatitis C virus.
PCR amplification of near-full genome and sequencing
Patients' blood (16 ml) was collected in CPT tubes (Becton Dickinson, Annapolis, MD); peripheral blood mononuclear cells (PBMCs) and plasma were isolated using the CPT vacutainer procedure. HIV-1 RNA was extracted from 200 μl of plasma and genomic DNA was extracted from about 107 uncultured PBMCs using the QIAamp Viral RNA Mini Kit and QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA), respectively. HIV-1 sequences encoding approximately 1722 bp of the gag region, 1461 bp of the pol (protease and reverse transcription) (Pr and RT) region, 5588 bp of RNase H, integrase, vif, vpr, and vpu genes, first exons of tat and rev, gp160, and the 5′-end of the nef region were amplified from each sample by reverse transcriptase-nested polymerase chain reaction (RT-PCR) using two different dilutions of HIV-1 RNA (1:2 and 1:5) (Fig. 1A–C). The optimized band was chosen. For those samples showing multiple peaks in the genome, a dilution 1:10 of the HIV-1 RNA was used to obtain amplification from single genomes; otherwise nested PCR using genomic DNA was performed instead. Primers used in the RT-nested PCR and/or nested PCR reactions for the first two regions were described elsewhere. 16 Primers used in the first round of PCR for the third region were 2621 19 and 9181, 16 while for the second round they were 3241 19 and JL88.new (5′-TAAGTCATTGGTCTTABAGGYACYTG at positions 9013 to 9038) (Fig. 1B and C). Their positions correspond to the HXB2 strain (accession number K03455). Reagents and thermocycling profiles for the amplification of each of the first two regions have been described elsewhere. 16

Schematic representation summarizing the amplification of the full-length genome region within HIV-1 M group subtypes used in an assay described in this study.
For the amplification of the third region, in the first round of RT-PCR, 8 μl of diluted plasma RNA was used, as were 20 pmol of each primer and the 1X SuperScript III One-Step RT-PCR System with Platinum Taq High Fidelity (Invitrogen Corp., San Diego, CA) in a 50-μl volume. RT-nested PCR amplifications were carried out in an Eppendorf Master Cycler (Eppendorf, Hamburg, Germany). The thermocycling conditions were reverse transcription at 52°C for 1 h, followed by 94°C for 2 min, 40 cycles at 94°C for 15 s, 52°C for 30 s, 68°C for 7 min, and a final elongation at 68°C for 7 min. In the second round of PCR, 3 μl of primary reaction products was used, along with 20 pmol of each primer and 1X Platinum PCR SuperMix (Invitrogen Corp., San Diego, CA) in a 50-μl volume. The thermocycling conditions were 94°C for 2 min, followed by 45 cycles at 94°C for 20 s, 50°C for 30 s, 72°C for 6 min, and a final elongation at 72°C for 30 min. For nested PCR amplification the experimental conditions were the same as those described in the RT-nested PCR method with the following modifications: in the first round of PCR amplification approximately 60–100 ng of patient PBMC DNA was used instead of RNA and the reverse transcription cycle was excluded.
Amplified products from the second-round PCR were purified using the QIAquick PCR purification kit (Qiagen, Valencia, CA). Their lengths were analyzed by 1% agarose gel electrophoresis and the concentrations were quantified by UV absorbance spectrophotometry using the Nanondrop ND-1000 Spectrophotometer (Nanodrop Technologies, Wilmington, DE). In each sample, the DNA sequences encoding the gag region, pol (Pr and RT) region, RNase H, integrase, vif, vpr, and vpu genes, the first exons of tat and rev, gp160, and the 5′-end of the nef region were determined by direct sequencing in separate reactions, using second-round-amplified PCR product as the template and sequencing primers (Fig. 1D). Reagents, sequencing primers, and thermocycling profiles used in each amplification reaction of sequencing are described elsewhere, 17 with the following modifications: primer JL88.new was used instead of JL8817 and 7859 (5′-GTCTGGTATAGTGCAACAGC at positions 7859 to 7878) was used to generate contigs that could not be sequenced with primer 8530. 17 Amplicons were sequenced with the ABI 3130 genetic analyzer (Applied Biosystems, Foster City, CA). The sequence data were analyzed and edited using the Sequencing analysis 5.2 program (Applied Biosystems, Foster City, CA) and the Primer Premier 5 application (Primer Biosoft International, Palo Alto, CA). The sequence fragments of each of the three amplified regions were assembled manually, to acquire the near full-length sequences for each sample.
Determination of drug-resistance-associated mutations in pol (protease and reverse transcriptase)
Genotypic resistance mutations to antiretroviral drugs targeting the Pr and RT were determined using a genotypic in-house assay that analyzes Pr and RT genes within M-group strains described elsewhere. 16 Genotypic resistance was defined as the presence of at least one resistance-related amino acid substitution as specified by the International AIDS Society (IAS)-USA 20 and the Stanford HIV Drug Resistance Database. 21 Assessment of the possible impact of transmitted drug resistance on the therapeutic response was performed using the Stanford drug-resistant algorithm. 22
Phylogenetic analysis
The Molecular Evolution Genetic Analysis v.4 (MEGA v.4) software was used for multiple DNA sequence alignment and phylogenetic tree construction. 23 Patients' viral nucleotides sequences encoding near full-length genome regions were aligned against corresponding sequences of characterized strains of all known HIV-1 subtypes and CRFs obtained from the Los Alamos database. 24 Pairwise distance matrices were calculated using the Kimura two-parameter distance estimation approach with a transition/transversion ratio of 2.0 and a neighbor-joining (NJ) phylogenetic tree was constructed. The consistency of the phylogenetic clustering was tested using bootstrap analysis with 1000 replicates. Bootstrap values above 70 were considered adequate for subtype assignment. The subtype assignment was confirmed using the REGA algorithm. 25
GenBank accession numbers for control reference sequences in phylogenetic analyses are as follows: A1-DQ676872, A1-AB253429, A1-AF004885, A1-AB253421; A2-AF286238, A2-AF286237; B-K03455, B-AY331295, B-AY173951, B-AY423387; C-U52953, C-U46016, C-AY772699, C-AF067155; D-AY253311, D-U88824, D-K03454, D-AY371157; F1-AF077336, F1-AF075703, F1-AF005494, F1-AJ249238; F2-AJ249236, F2-AJ249237, F2-AY371158, F2-AF377956; G-AF061641, G-AF084936, G-U88826, G-AY612637; H-AF190128, H-AF005496, H-AF190127; J-AF082395, J-AF082394, J-EF614151; K-AJ249235, K-AJ249239; 01AE-U54771, 01AE-AB220944; 02AG-L39106, 02AG-AY271690; 03AB-AF414006, 03AB-AF193276; 04cpx-AF119820, 04cpx-AF049337, 04cpx-AF119819; 05DF-AF076998, 05DF-AF193253, 05DF-AY227107; 06cpx-AF064699, 06cpx-AY535659, 06cpx-AB286851; 07BC-EF368372, 07BC-EF368370, 07BC-AF286226; 08BC-AY008715, 08BC-AF286229; 09cpx-AJ866553, 09cpx-AY093605; 10CD-AF289548, 10CD-AF289549; 11cpx-AF492624, 11cpx-AF492623, 11cpx-AJ291718; 12BF-AF408629, 12BF-AF385936, 12BF-AY781128; 13cpx-DQ845388, 13cpx-DQ845387, 13cpx-AF460972; 14BG-AF450096, 14BG-AF450097; 1501B-DQ354120, 1501B-AF516184, 1501B-AF530576; 16A2D-AY945736, 16A2D-AF286239; 17BF-EU58182, 17BF-EU581827, 17BF-EU581828; 18cpx-AF377959, 18cpx-AY586541; 19cpx-AY588971, 19cpx-AY588970, 19cpx-AY894994; 20BG-AY586545, 20BG-AY586544; 21A2D-AY945737, 21A2D-AF457051; 2201A1-AY371159; 23BG-AY900571, 23BG-AY900572; 24BG-AY900574, 24BG-AY900575; 25cpx-EU693240, 25cpx-EU697906, 25cpx-EU697908; 27cpx-AJ404325, 27cpx-AM851091; 28BF-DQ085872, 28BF-DQ085873, 28BF- DQ085874; 29BF-DQ085876, 29BF-DQ085871; 31BC-EF091932, 31BC-AY727527; 3206A1-AY535660; 3301B-DQ366659, 3301B-DQ366662; 3401B-EF165541; 35AD-EF158043, 35AD-EF158040; 36cpx-EF087995, 36cpx-EF087994; 37cpx-EF116594, 37cpx-AF377957; 39BF-EU735534, 39BF-EU735536, 39BF-EU735535; 40BF-EU735538, 40BF-EU735540, 40BF-EU735539; 42BF-EU170155; 4302G-EU697904, 4302G-EU697907, 4302G-EU697909; N-AY532635, N-AJ006022, N-AJ271370; O-L20587, O-L20571, O-AY169812, O-AJ302647; CPZ-U42720, CPZ-DQ373066, and CPZ- AF103818.
Phylogenetic characterization of clusters
Patients' viral nucleotide sequences encoding near full-length genome regions were aligned against near full-length genome sequences of previously characterized HIV-1 strains obtained from the Laboratory of Biotechnology and Molecular Virology. 17 The MEGA v.4 software was used for multiple DNA alignment and phylogenetic tree construction, 23 as described above. Distance calculation was applied between HIV-1 strains forming phylogenetic clusters, which were define by <5% genetic distance.
Analysis for intersubtype recombination
To explore putative recombination patterns in the sequences we performed a bootscanning analysis using SimPlot, version 3.5.1. 26 For the query sequence we ran similarity and bootscanning analyses against a reference set of all HIV-1 M group pure subtypes (A–K and CRF02_AG) obtained from the Los Alamos database. GenBank accession numbers for control reference sequences are described above. Bootscanning was performed with a sliding window of 400 nucleotides overlapped by 50 nucleotides to define the recombinant structure. Subregion confirmatory NJ tree analyses were carried out using MEGA v.4 software 23 to confirm the subtype origin of each gene fragment. Bootstrap analysis (1000 replicates) was used to estimate the reliability of the constructed trees and a bootstrap value of 70% was considered to be definitive. To examine any potential relationships between possible recombinant sequences and previously characterized HIV-1 sequences, a BLAST search was performed using the default settings in the HIV BLAST tool available at the Los Alamos HIV Sequence Database. 27,28
Results
Clinical and epidemiological features of the study patients
The study group consisted of 74 HIV-1-seropositive patients representing 88% of the HIV-1-infected individuals diagnosed from 2007 to 2009. Each study subject is identified by a laboratory registration number ascending in chronological order of sampling. The general demographic, clinical, and epidemiological characteristics of all study subjects are summarized in Table 2. Among the 74 study subjects, 54 (73%) were male and 20 (27%) were female. Their median age was 36 years with an interquartile range (IQR) of 31 to 46. The median CD4 lymphocyte count was 361 cells/mm3 (IQR, 174 to 655) and the median plasma HIV load was 4.5 log HIV-1 RNA copies/ml (IQR, 3.7 to 5.2). Forty-seven HIV-1 seropositives were Cypriots (63.5%); 12 were from sub-Saharan Africa (16.2%), eight from Eastern Europe (10.7%), four from Western Europe (5.4%), one from North Africa, one from Asia, and one from Australia (1.4% each). Thirty-three study subjects (44.6%) were infected by heterosexual contact, 31 (41.8%) by homosexual/bisexual contact, one by transfusion, one by intravenous drug use (IVDU), and one by mother-to-child transmission (1.4% each); four were reported originating from high prevalence countries (5.3%) and three were not aware of the route of HIV-1 transmission (4.1%). Four patients were coinfected with hepatitis B virus (HBV) (5.4%) and five with hepatitis C virus (HCV) (6.8%). Seventeen subjects (22.9%) were infected in Cyprus, 17 (22.9%) in Western Europe, 14 (18.9%) in countries of Sub-Saharan Africa, four (5.4%) in Eastern Europe, two (2.7%) in the United States, one in Australia (1.4%), and one in Japan (1.4). For 18 study subjects (24.3%) the country in which the infection was most likely contracted is unknown. Among study subjects there are four epidemiological linked heterosexual couples and two homosexual couples (Table 1).
IQR, interquartile range; MSM, men who have sex with men; HSX, heterosexual contact; MCT, mother-to-child transmission; TR, transfusion; OHPC, origin from a high prevalence country; IVDUs, intravenous drug users; U, unknown; HBV, hepatitis B virus; HCV, hepatitis C virus.
Sequence analysis of near full-length HIV-1 genome
Plasma and uncultured PBMCs from all subjects were HIV-1 positive by RT-nested PCR and/or nested PCR in gag, pol (Pr and RT), and RNase H, integrase, vif, vpr, and vpu genes, gp160, and the 5′-end of nef regions. The positive PCR combined with the extensive genetic diversity of HIV-1 strains, as described in the phylogenetic analysis (Fig. 2), demonstrates that the designed protocol (Fig. 1B and C) is suitable for diverse M-group strains. All HIV-1 internal PCR products were further analyzed by nucleotide sequencing analysis using sequencing primers for each region (Fig. 1D). The near full-length genome of the HIV-1 strains was assembled from the overlapping sequenced fragments. Sequence analysis showed that the gene structure of the complete near full-length sequence (Fig. 1A) was derived from all 74 study subjects and encoded the expected nine open reading frames.

Neighbor-joining phylogenetic tree for 74 HIV-1 near full-length genome sequences from patients in Cyprus and representative reference sequences from nine known HIV-1 subtypes
Genotypic drug resistance
No major protease inhibitor (PI)-associated mutations 29 were observed in the 53 newly diagnosed therapy-naive study subjects, but a number of minor PI-associated mutations 29 (L10I, K20M/R, M36I, L63P/F/S/A/V, A71V/T, and V77I) were observed in 52 patients (98%). The amino acid substitution M36I was found in all non-B samples (51%), while the amino acid substitutions K20R and L10I were observed in two sequences (20%) and three sequences (30%) of subtype A sequences, respectively. Furthermore, amino acid substitutions L63P, A71V, and V77I were seen in 12 sequences (46%), 6 sequences (23%), and 13 sequences (88%) of subtype B isolates, respectively. Additionally, no nucleoside/nonnucleoside RT inhibitor (NRTI/NNRTI)-associated mutations 29 were seen in the untreated newly diagnosed study subjects.
Virus subtyping
The molecular epidemiological relationships among HIV-1 sequences encoding near full-length genomes (790 to 8930 nt, Fig. 1A and D) were analyzed by phylogenetic analysis. A NJ phylogenetic tree was constructed using the near full-length sequences derived from the 74 study subjects (Fig. 2). In addition to the sequences from Cyprus, previously sequenced HIV-1 isolates from diverse global locations, encompassing all nine known subtypes (A through K), all CRFs, and non-M group HIV-1 isolates were also included in the analysis, obtained from the Los Alamos database, 24 as described in Materials and Methods. According to the constructed tree shown in Fig. 2, four distinct subtypes (A, B, C, and F1) and three CRFs (CRF02_AG, CRF11_cpx, and CRF37_cpx) within the M group were identified for the Cypriot sequences: subtype A, 14 sequences (18.9%); subtype B, 36 sequences (48.6%); subtype C, eight sequences (10.8%); subtype F1, one sequence (1.4%); CRF02_AG, six sequences (8.1%); CRF11_cpx, two sequences (2.7%); and CRF37_cpx, one sequence (1.4%) (Table 2). Six sequences (8.1%), isolated from subjects CY225, CY257, CY212, CY191, CY200, and CY217, did not cluster within pure subtypes or CRFs. On the phylogenetic tree, they are scattered throughout the tree, reflecting their varied structure (Fig. 2). The sequence of the CY191 isolate is branched close to subtype F1 reference sequences with a considerable bootstrap value of 93%, while the sequence of the CY200 isolate is closest to CRF19_cpx variants with a bootstrap value of 71%.
After running the sequences with the REGA algorithm, a recombinant pattern between subtypes A and F1 was indicated for the CY191 isolate and a mosaic pattern between subtypes B and CRF02_AG for the CY200 strain (data not shown). Additionally, the sequence of CY212 clusters near the branch of B/F CRFs and subtype F reference sequences with a bootstrap value of 99%, whereas the sequence of the CY217 strain is very close to CRF02_AG strains with a bootstrap value of 97%. The REGA algorithm showed a recombinant pattern between subtypes B and F1 for the CY212 isolate and a mosaic pattern between subtypes D and CRF02_AG for the CY217 isolate (data not shown). The sequences of two isolates, CY225 and CY257, are branched in subtype B reference sequences with a considerable bootstrap value of 100% forming a small cluster. After running both sequences with the REGA algorithm, a recombinant pattern between subtypes B and A was indicated (data not shown). The unclassified sequences from these six isolates seem to be unique cases.
Phylogenetic evidence of epidemiological clusters
The derived near full-length nucleotide sequences (790 to 8930 nt, Fig. 1A and D) were further analyzed by phylogenetic analysis to elucidate any molecular epidemiological relationships between the study sequences and previously defined near full-length sequences of the Cyprus cohort in the period 1986–2006. 17 A NJ phylogenetic tree was constructed for the 74 study subjects on the basis of the 74 near full-length nucleotide sequences and 70 previously defined sequences of HIV-1 isolates from Cyprus, 17 encompassing subtypes A, B, C, and CRF02_AG (Fig. 3). According to the phylogenetic tree in Fig. 3, the 64 HIV-1 near full-length sequences fell into four distinct clades with the 70 previously defined sequences from Cyprus: subtype A, 32 sequences; subtype B, 81 sequences; subtype C, 12 sequences; and CRF02_AG, nine sequences. Four sequences classified, as shown previously, in subtype F1, CRF11_cpx, CRF37_cpx, and the six unclassified sequences formed separate clades and do not cluster with any previously characterized sequence (Fig. 3).

Neighbor-joining phylogenetic tree for 74 HIV-1 near full-length genome sequences from patients in Cyprus and 70 previously defined sequences of HIV-1 isolates from Cyprus,
17
including four known HIV-1 subtypes
The study group consisted of four heterosexual couples: in the first, the woman was from Sri Lanka (CY208) and the man from Cyprus (CY209) infected in Cyprus; in the second, the woman was from Latvia (CY213) where she was infected and the man from Cyprus (CY231) infected by the woman in Cyprus; the third (CY227–CY228) from Cyprus was infected either in Ethiopia or Dubai; and the fourth from Cyprus (CY235–CY236) was infected in Cyprus. Also, two homosexual couples from Cyprus are included in the study cohort; one (CY232–CY241) was infected in Cyprus and in the other, the first partner CY195 was infected either in Greece or in the United Kingdom and the second partner was infected in Cyprus (Table 1). In Fig. 3 the phylogenetic tree indicates that the sequences of the four heterosexual couples are very closely clustered: cluster 1 (CY235–CY236), cluster 2 (CY208–CY209), and cluster 3 (CY213–CY231) in subtype A and cluster 7 (CY227–CY228) in subtype C, with bootstrap values of 100%. The branch topologies in the phylogenetic tree (Fig. 3) reconfirm this relationship.
The diversity between the near full-length sequences of cluster 1 in subtype A is 3.1%; for cluster 2, 2.1%; and for cluster 3, 0.9%, as shown in Table 3. In cluster 7 in subtype C the sequence CY227 clusters closely to the previously defined sequence CY176, 17 derived from a female patient from Ethiopia who was infected there before 2006. The median diversity between the near full-length sequences of cluster 7 is 2% (1.7–2.1) and it may be an indication of a correlation between Ethiopian strains of subtype C (Table 3). The sequences of the homosexual couples CY195–CY196 and CY232–CY241 do not form clusters and probably have acquired the infection from different sources.
Nucleotide distances were calculated for 12 HIV-1 clusters by the Kimura method.
NA, not applicable.
As indicated in Fig. 3, two more clusters exist within subtype A. One is cluster 4 (CY057–CY058) described elsewhere, 17 a heterosexual couple from Georgia who was infected there and has a diversity of 2.9% between the near full-length sequences (Table 3). The other cluster in subtype A is cluster 5 and includes the sequences of a three-member family, the parents (CY111–CY112) described previously 17 and their daughter CY185 (Table 1). The median diversity between the near full-length sequences of this cluster is 2.2% (1.5–2.4) (Table 3). In the CRF02_AG clade, one cluster exists, cluster 6 (CY240–CY265). The epidemiological data indicate that both subjects originating from Bulgaria, CY265 a 28-year-old male infected in Cyprus and CY240 a 20-year-old female with an unknown place of infection (Table 1). Both had heterosexual contact with an anonymous person as their risk group and because their near full-length sequence diversity was very low, 1.7% (Table 3), they may have been heterosexual partners. Within subtype B, four clusters exist, 8 through 12. Cluster 8 includes near full-length sequences from three homosexuals men in their late 30s (CY216, CY246, and CY267). Study subject CY246, originating from the United Kingdom, was infected there; CY267 and CY216, originating from Cyprus, were infected in Cyprus and either in Greece or Cyprus, respectively (Table 1). Their median diversity between their near full-length sequences is very low, 1.4% (1.1–1.4), and they may therefore have been homosexual partners.
Cluster 9 includes near full-length sequences from two Cypriot heterosexual men (CY190 and CY194). Both study subjects were not aware of the place of infection (Table 1). Due to the low diversity between their near full-length sequences, (0.3%), they may have acquired the infection from a common heterosexual partner. Cluster 10 includes near full-length sequences from a heterosexual couple (CY037 and CY038) who were infected in the United Kingdom and were described previously. 17 The median diversity between their near full-length sequences is 5.6% and the branch lengths of the tree indicate that although they are epidemiologically linked, they do not cluster very closely.
Cluster 11 includes near full-length sequences from two men, a homosexual man (CY225) who acquired his HIV-1 infection in 2008 and is not aware of the place of infection and a heterosexual man (CY257) who was infected in Greece in 2009 (Table 1). The median diversity between the near full-length sequences of these study subjects is 2.5% (Table 3) and together they form a different branch within subtype B (Fig. 3). Cluster 12 includes near full-length sequences from a previously described heterosexual couple from Cyprus (CY088 and CY089) who were infected in Cyprus. 17 The diversity between the near full-length sequences of this couple is 1.3%, confirming the previous data. The relatively low genetic diversity (<5%) between the sequences in the 12 clusters in subtypes A, B, C, and CRF02_AG, except for cluster 10, reconfirms that the isolates were derived from epidemiologically linked individuals.
Evidence of mosaicism
For a more detailed comparison with the subtypes of the HIV-1 M group, the six unclassified sequences, CY191, CY200, CY212, CY217, CY225, and CY257, were analyzed by similarity and bootscanning analysis. The similarity plot and the bootscan analysis showed that the genome of each sequence consisted of segments clustering alternatively with references of at least two pure subtypes or one pure subtype and CRF02_AG (Fig. 4). The CY191 sequence consisted of segments clustering alternatively with references of subtypes F1 and A1; the CY200 sequence of CRF02_AG and subtypes G and B; the CY212 sequence of subtypes B and F1; the CY217 sequence of CRF02_AG and subtype D; and sequences CY225 and CY257 of subtypes B and A1. In the bootscan analysis, a high bootstrap value (70%) was indicated in almost all regions of the genome, supporting the classification as a recombinant strain. Subtype assignments for each segment derived from the bootscan analyses were confirmed with NJ trees, in each case with a bootstrap value of ≥70% supporting the relation with subtype references (Fig. 4).

Recombination analysis of the near full-length genome of CY191, CY200, CY212, CY217, CY225, and CY257 isolates in comparison to reference strains of all HIV-1 M-group pure subtypes, as described in Materials and Methods. The upper left diagram in each scheme indicates the gene regions and the recombination breakpoints of the isolate as determined by informative analysis. The illustration was created according to HXB2 numbering using the Recombinant Drawing tool available on the Los Alamos HIV Sequence Database website.
28
The middle left panel presents the similarity plot diagram and the bottom left panel shows the bootscan analysis performed as described in Materials and Methods. The y-axis in the similarity plot indicates the percent identity of the query sequence to a set of reference sequences; in the bootscan diagram, it indicates the bootstrap value. The x-axis shows the nucleotide position of the HXB2 genome. The dotted line indicates 70% (significant) bootstrap value. The right panel of each scheme presents the phylogenetic analysis of the interbreakpoint segments comprising each variant as defined by the similarity plot and the bootscan analysis. Analysis was performed using the neighbor-joining method with the Kimura's two-parameter distance estimation method and bootstrap analysis (1000 replicates). All reference HIV-1 pure (sub)subtypes and CRF02 of the M group were used to construct the trees and are denoted with different colors. *, Bootstrap value ≥70%; •, query sequence of each variant. Loci of genome segments are based on the HXB2 numbering. The divergence between any two sequences is obtained by summing the horizontal branch length, using the scale at the lower left. Color images available online at
Sequence CY191 contained six breakpoints (Fig. 4). The beginning of the genome, HXB2 position 790–3565, was subtype F1 with a bootstrap value of 99%, shifting to subtype A1 with a bootstrap value of 100% in the 3′-end of RT and 5′-end of integrase of pol (position 3565–4325). The third subregion in integrase (position 4325–4890) clustered with subtype F1 with a 78% bootstrap value, followed by a small fragment in the 3′-end of integrase and 5′-end of Vif (260 nucleotides, nt) that was an area of poor resolution (Fig. 4). This area clustered with reference sequences of subtype G with a low bootstrap value of 35% in the NJ tree and was considered unclassified. From the 5′-end of Vif to the middle of the first exon of Tat (position 5150–5945) the genome was of subtype A1 with a bootstrap value of 97%. The next region (position 5945–6060) on the 3′-side of the first exon of Tat and Rev was determined as unclassified as it was poorly resolved between G and F1 references, and it clustered with subtype G in the NJ tree with a 57% bootstrap value. Following this region, the genome shifted back to subtype F1 (position 6060–8930). Three regions were analyzed by NJ and all three clustered with subtype F1 with a 99% bootstrap value for position 6060–7400, 91%, for position 7400–7985, and 68% for position 7985–8930. A previous study referred to the existence of a unique F/A recombinant near full-length sequence from a Cameroonian isolate, 30 but did not have the same recombinant pattern as the one found in this study, having originating from the DRC.
Sequence CY200 contained 13 breakpoints (Fig. 4). The majority of the genome was CRF02_AG with three subtype G fragments, two subtype B fragments, and three unclassified regions. The beginning of the genome, position 790–1695, was CRF02_AG with a bootstrap value of 76%, shifting to subtype G with a bootstrap value of 71% in the 3′-end of gag (position 1695–2060). The third subregion at the 3′-end of gag and 5′-end of pol (position 2060–2570) clustered with CRF02_AG with a 91% bootstrap value, followed by a small fragment (245 nt) clustering within subtype B with a 82% bootstrap value. From the 5′-end of pol (RT) to the middle (position 2815–3230) the genome was CRF02_AG with a bootstrap value of 74%. The next region (position 3230–3700 nt) at the 3′-end of pol (RT) clustered with subtype B in the bootscan analysis. In the NJ tree it clustered between references B and D with a 63% bootstrap value. The position 3700–4050 in the pol (RNase) region was an area of poor resolution. This area clustered outside the group with reference sequences of subtypes D and B with a bootstrap value of 65% in the NJ tree and was considered unclassified. The genome shifted back to subtype G (position 4050–4395) with a bootstrap value of 95%, followed by a fragment (420 nt) that clustered within subtype B with a 99% bootstrap value. From the 3′-end of pol (integrase), Vif, and to the middle of Vpr (position 4815–5665) the genome was CRF02_AG with a bootstrap value of 81%.
The next region (position 5665–6415) from the 3′-end of vpr through the 5′-end of env (gp120) clustered within subtype G with a 99% bootstrap value, followed by a fragment (205 nt) with a bootstrap value of 81% in position 6415–6620. The following fragment was clustered in the NJ within subtype D with a 19% bootstrap value and was considered unclassified. The genome then shifted back to CRF02_AG (position 7025–8930). Three regions were analyzed with the NJ method and all three clustered with CRF02_AG with a 99% bootstrap value for position 7025–8120, a 62% value for position 8120–8470, and a 96% value for position 8470–8930.
Sequence CY212 contained six breakpoints (Fig. 4). The beginning of the genome, position 790–1310, was subtype B with a bootstrap value of 67%, shifting to subtype F1 with a bootstrap value of 67% from the 3′-end of gag to the 5′-end of pol (RNase) (position 1310–4070). The third subregion in integrase (position 4070–4875) clustered with subtype B with a 99% bootstrap value, followed by a small fragment in the 3′-end of integrase and 5′-end of Vif (585 nt) that clustered with subtype F1 with a 99% bootstrap value. The region with position 5460–6240 in the 3′-end of Vif through the 3′-end of the Vpu region clustered with reference sequences of subtypes B with a bootstrap value of 70% in the NJ tree. The fragment from the 3′-end of Vpu to the 5′-end of env (gp120) (position 6240–6520) was an area of poor resolution (Fig. 4). This area clustered outside the clade of subtype F and K sequences with a bootstrap value of 90% in the NJ tree and was considered unclassified. The genome then shifted back to subtype B (position 6520–8930) with a bootstrap value 87%.
Sequence CY217 contained four breakpoints (Fig. 4). The results revealed that the majority of the genome of CY217 was CRF02_AG with one subtype D fragment inserted in the env (gp120) region and one unclassified fragment in the pol (integrase) region (Fig. 4). The beginning of the genome, position 790–4310, was CRF02_AG with a bootstrap value of 97%. Position 4310–4760 in the middle of integrase was an area of poor resolution (Fig. 4), as it clustered with reference sequences of subtype G at a low bootstrap value of 44% in the NJ tree and was considered unclassified. The third subregion from the 3′-end of pol (integrase) to the 5′-end of env (gp120) (position 4760–6315) clustered with CRF02_AG with a 100% bootstrap value, followed by a fragment at position 6315–7665 that clustered within subtype D with a 91% bootstrap value. The last region, position 7665–8930, clustered with reference sequences of CRF02_AG with a bootstrap value of 99% in the NJ tree. In Cameroon, two previous studies referred to the existence of two recombinant forms of CRF02_AG with subtype D. In the first study a near full-length sequence of an isolate was characterized and consisted of a fragment in the 5′-end of gag clustered with reference sequences of subtype D and the rest of the sequence with reference sequences of CRF02_AG. 30 In the second study, two regions were characterized in an isolate (accession number DQ057054), gag (p17) and env (gp41). The gag (p17) region clustered with reference sequences of subtype D and the env (gp41) with reference sequences of CRF02_AG. 31 Both of them did not have the same recombinant pattern as the one found in this study.
In the phylogenetic trees of full-length genome sequences, CY225 and CY257 clustered together, and both grouped in a cluster with subtype B variants (Figs. 2 and 3). The analysis of the mosaic structure of both sequences, using SimPlot v.3.5.1, agreed with the NJ analysis of short segments (Fig. 4), which indicated that CY225 and CY257 shared identical mosaic structures, consisting predominantly of a subtype B genome with one fragment of subtype A1 in env (gp120). The fragment clustering in subtype A1 (position 6315–7710) was analyzed by NJ in two regions, position 6315–6580 and position 6580–7710, with bootstrap values of 95% and 99%, respectively. The rest of the genome clustered with subtype B and was analyzed by NJ in seven fragments. All clustered with subtype B with a 67% bootstrap value for position 790–2605, 99% for position 2605–2970, 64% for position 2970–3910, 99% for position 3910–4095, 55% for position 4095–4985, 55% for position 4985–5190, 70% for position 5190–6315, and 100% for position 8470–8930. Both sequences also clustered together in partial trees of all subtype B and A1 fragments along the genome (data not shown). Because the breakpoints (at positions 6315 and 7710) do not coincide with CRF03_AB, 32 the results allow us to define a new URF.
The unclassified regions had uncertain origin because their lengths were too short to obtain high bootstrap values. A BLAST search was performed for each near full-length sequence to carry out comparative phylogenetic analysis with HIV-1 sequences from the Los Alamos HIV Sequence Database. No similar recombinant patterns were found in sequences from the database, supporting the belief that all five recombinant sequences are URFs. It is significant that the crossover sites within the CY191, CY200, CY212, CY217, CY225, and CY257 recombinant forms correspond to the genomic regions gag, pol (RT), pol (RNase), pol (integrase), Vif, Vpr, and the 5′-end of env (gp120) of HIV-1, which have been characterized as recombination hotspots. 33 –35
Discussion
Our analysis aimed to characterize the near full-length genome sequences of 74 HIV-1 seropositives, representing 88% of the known living HIV-1-infected population in Cyprus in the period 2007–2009. Compared with the data already published by Kostrikis et al. 15 and Kousiappa et al., 17 the present study offers more detailed information on the determination of the dynamics of the local HIV-1 infection. The phylogenetic analysis of the near full-length genome of the analyzed HIV-1 samples indicated clearly that subtype B is the dominant subtype, followed by subtypes A, C, and CRF02_AG, which are strains that all dominate the global epidemic. 36 Representative strains of other subtypes have also been observed, such as subtype F1, CRF11_cpx, and CRF37_cpx. Six unclassified isolates derived from different HIV-1-seropositive individuals were further characterized by complete phylogenetic and bootscanning analyses. Each isolate had a new, unique recombinant pattern and is different from any other CRFs or URF reported so far in the literature. Two of the isolates have the same unique mosaic pattern, and, as a result, five new URFs were identified.
Sequence CY191 had a unique mosaic pattern, comprising segments of subtypes F1 and A1 and two unclassified short regions. This newly found URF was probably generated from a recombination event between two parental viruses: one belonging to the subtype F1 lineage and the other to subtype A1 in the DRC, where the infection was contracted (Table 1). The profile of HIV-1 in the DRC is characterized by high HIV-1 genetic diversity with a large number of cocirculating HIV-1 subtypes, recombinant viruses, and unclassified strains. 37,38 Sequence CY200 had a unique mosaic pattern, comprising segments of CRF02_AG and subtypes G and B and unclassified short regions. This new URF was probably generated from a recombination event between two parental viruses: one belonging to CRF02_AG and the other to subtype B, in Cyprus where the infection was contracted (Table 1). Sequence CY212 had a unique mosaic pattern, comprising segments of subtypes F1 and B and one unclassified short region. The newly found URF was probably generated from a recombination event between two parental viruses: one belonging to the subtype F1 lineage and the other to subtype B. The place or the risk group of this infection is unknown (Table 1). This new unique HIV-1 genetic form has a different recombinant structure from the eight B/F intersubtype CRFs identified in South America, and it probably shares a common origin with other CRF_BFs (CRF28_BF, CRF29_BF, and CRF40_BF) circulating in Brazil. 39 Sequence CY217 had a unique mosaic pattern, comprising segments of CRF02_AG, subtype D, and one unclassified short region. This new URF was probably generated from a recombination event between parental viruses belonging to the CRF02_AG lineage and subtype D in Cameroon, where the infection was contracted. The sequence fragment clustering in subtype D or CRF02_AG lineage in NJ trees is similar to reference sequences of subtype D or CRF02_AG, respectively, from Cameroonian isolates found using BLAST (data not shown).The profile of the HIV-1 infection in Cameroon is characterized by a high HIV-1 genetic diversity with a large number of cocirculating HIV-1 subtypes, recombinant viruses, and unclassified strains. 30,40 This new unique HIV-1 genetic form has a different recombinant structure from the two D/CRF02_AG intersubtype URFs identified in Cameroon. 30,31 Sequences CY225 and CY257 had the same unique mosaic pattern, comprising segments of subtypes B and A1. This new URF was probably generated from a recombination event between two parental viruses: one belonging to subtype B and the other to subtype A1, perhaps in Greece where the CY257 infection was contracted (Table 1). The place or the risk group of CY225 infection is unknown (Table 1). These two patients are not epidemiologically related. Both sequences clustered together in the subtype B lineage in NJ trees of pol (PR and RT) sequences with a sequence from Greece (accession number GQ399849) 41 (data not shown). At least two epidemiologically unrelated patients with a similar HIV-1 variant are required to formally designate this variant as a new CRF, but since only the pol (PR and RT) region has been sequenced from the isolate in Greece, we are not yet able to indentify this unique recombinant form as a circulating variant.
The genomic structures of the five recombinants were different from previously identified CRFs and URFs in the world. Up to now, only one URF (D/G) has been reported in Cyprus and it originated from the DRC in Sub-Saharan Africa. 17 These data reconfirm our previous findings of a polyphyletic and evolving HIV-1 infection on the island. 17 The presence of five new URFs increases the genetic complexity of the HIV-1 epidemic in Cyprus. This is not consistent with the country's small area, population size, and transmission type, but it can be explained by the population flow of many foreigners in Cyprus. The number of non-B subtypes entering Cyprus in the past 3 years has been increasing due to a large number of immigrants from Africa, where non-B strains are predominant. 36 It is well known that Africa has the greatest genetic diversity of HIV-1 in the world, 42 and in this study 18.9% of the HIV-1 infections are linked to Africa, including strains CY191 and CY217.
Recombinants are a considerable and dynamic element of the global HIV-1 epidemic and will undoubtedly influence the development of an effective vaccine. Knowledge of the ongoing evolution of HIV-1, the explosive worldwide emergence of recombinants, and the emergence of new URFs and variants highlight the importance of a continued molecular epidemiological surveillance of global HIV-1 diversity. 43 Because of these factors, the implementation of antiretroviral drug therapies 18 as well as the design of an effective vaccine have become of increasing importance. 12 –14 A proper follow-up of the HIV-1 epidemic in a specific geographic region is fundamental to establish effective preventive campaigns and reliable screening, diagnostic, and patient monitoring assays. 43 No primary mutations conferring resistance to PIs or NRTIs/ NNRTIs were observed in the isolates from newly diagnosed patients, but a high rate of minor PI mutations was seen. The results of this study do not underestimate the need for routine resistance testing before the initiation of antiretroviral therapy for HIV patients in Cyprus following the European and IAS-USA guidelines, which recommend resistance testing in chronically infected therapy-naive patients when the regional prevalence of resistance is ≥10% and ≥5%, respectively. 18,44 Therefore, there should be more active surveillance of resistance-associated mutations in untreated individuals in order to recognize, as soon as possible, any significant change that may affect their future clinical management, as well as planning and optimization of the first-line regimen and estimation of the prevalence of resistance over time.
In conclusion, we have characterized the mosaic structures and analyzed the phylogenetic relationship of five new HIV-1 intersubtype unique recombinant forms during a larger study of the phylogenetic analysis of near full-length coding sequences of HIV-1 strains in Cyprus. These results reconfirm the high genetic diversity of HIV-1 in Cyprus, which is derived from the introduction of multiple non-B genetic forms from other countries as well as the local evolutionary process. Further work is needed to examine the associations of the newly identified genetic forms in Cyprus with their biological properties, pathogenesis, transmission, immune responses, and response and resistance to antiretroviral drugs. Ongoing surveillance of HIV-1 subtypes and recombinant forms in Cyprus may have important implications for HIV-1 vaccine development and new drug design and can provide insights into the evolutionary history of the virus.
Sequence Data
GenBank accession numbers for the near full-length sequences obtained in this study are as follows: JF683736–JF683809.
Footnotes
Acknowledgments
We thank all participating subjects and staff at the Larnaca General Hospital AIDS Clinic, the Cyprus Ministry of Health, and the Cyprus National Bioethics Committee for valuable assistance. This work was supported by grants from the Cyprus Research Promotion Foundation (PENEK/0308/15), the University of Cyprus, and the Birch Biomedical Research LLC (3416-25017) awarded to L.G. Kostrikis.
Author Disclosure Statement
No competing financial interests exist.
