Abstract
HIV-1 provirus is flanked by one long terminal repeat (LTR) at each terminal. The 5′ LTR plays important roles in HIV-1 life cycle, especially, it determines HIV-1 transcription. However, there are 810 5′ LTR entries exist in the HIV-1 sequence database, accounting for only 0.085% (810/949,484). In this study, we collected plasma samples from HIV-1-infected patients in Shenzhen province and got 219 5′ LTR sequences. In addition, we found recombination in the LTR region. The recombinants (LS13145, LS11614, LS14862, and LS14863) possess an insertion of CRF01_AE segment at HXB2 482–630 bp (149 bp) in the skeleton of 5′ LTR of subtype C. At the same time, our study found that the occurrence of recombination caused changes in many transcription factor binding sites. As the increasing investigation on 5′ LTRs diversity and characterization, we will get a deeper understanding of HIV-1 transmission, evolution, and the basic mechanism of transcriptional regulation.
Introduction
HIV-1
The research on HIV-1 mainly focused on the internal regions, and the research on the 5′ LTR region has been relatively few. For example, in the HIV sequence database (
Materials and Methods
In our previous study, 83 5′ LTR reference sequences have been obtained following four recognized principles for the reference classification 8 (Supplementary Dataset S1). In this study, they are used to subtype the 5′ LTR regions of clinical isolates. In this study, plasma samples were collected from HIV-1-infected patients in Shenzhen province, China. These plasma samples were stored at −80°C until use. The demographic information of all patients is summarized in Table 1. All patients have informed consent.
Sociodemographic Characteristics of Participants
IDU, injecting drug users.
Viral RNA was extracted using QIAamp® Viral RNA Mini Kit (52904; QIAGEN) according to the manufacturer's instructions. Exactly after the natural reverse transcription process of HIV-1, the two partial LTRs at the 5′ terminus and 3′ terminus of gRNA were amplified, respectively. After sequenced and assembled through the R region, the complete LTRs were obtained. 3 The primers in the U3-R and U5-R were designed. The nested PCR primers and the relative positions to HXB2 (K03455) were shown in Supplementary Table S1.
The reverse transcription and the first round PCR were performed in a 25 μL reaction volume by using PrimeScript™ One-Step RT-PCR Kit (TAKARA, RR055A). Cycling conditions were as follows: initial incubation at 50°C for 32 min, and at 94°C for another 3 min. Subsequently, 30 cycles were performed at 94°C for 30 s, at 55°C for 30 s, and at 72°C for 1 min followed by a final extension of 5 min at 72°C. We performed the second round of PCR using Premix Taq™ (RR902Q; TAKARA) in a 50 μL reaction volume. Cycling conditions were as follows: initial incubation at 94°C for 3 min followed by 35 cycles at 94°C for 30 s, at 60°C for 30 s, and at 72°C for 1 min, a final extension at 72°C for 10 min. The PCR products were directly sequenced and sequences were assembled and edited by CONTING EXPRESS.
Recombination can dramatically change the content of the genome and confuse the phylogenetic relationship. 9,10 To investigate whether the recombination events exist among these amplified sequences, the RDP4 recombination analysis tool was used to perform systematic recombination analysis. 11 In this study, the highest acceptable p value is set to .05. The sequence is set to linear. The other parameters are the default settings for RDP4. To ensure reliability, a HIV-1 5′ LTR sequence is considered to be recombinant when the recombinant signal is supported by at least four methods with a p value ≤.05 after Bonferroni correction. The inferred breakpoint position is manually checked using the recombinant signal analysis implemented in RDP4.
To confirm and identify clinical isolates in Shenzhen province, we constructed a maximum likelihood (ML) phylogenetic tree with MEGA X software. 12 The best-fitting models of nucleotide substitution were calculated by the model selection function in MEGA X. Tree topologies were searched using subtree-pruning-and-regrafting level 3 (SPR level 3), and the initial tree was made automatically (Default-NJ/BioNJ). The confidence of each node in phylogenetic trees was determined using the bootstrap test with 500 replicates. Bootstrap support values of ≥70% were considered to be significant. The final ML trees were visualized using iTOL v6. 13
PROMO is a virtual laboratory for the identification of putative TFBS in DNA sequences from a species or groups of species of interest. The PROMO tool was used to find the characteristic TFBS in different subtypes and recombinants within 5′ LTRs.
Results
Plasma samples were collected from HIV-1 infected patients in Shenzhen, China. After amplified and assembled, we got a total of 219 5′ LTR sequences and they were deposited in the GenBank (Supplementary Table S2 and Supplementary Dataset S2). Then, the RPD4 recombination detection tool was used to investigate whether any recombination events exist among these 5′ LTRs. The recombination analysis of 219 5′ LTR sequences and 83 reference sequences revealed a new recombinant form of LTRs.
The recombinant possesses an insertion of CRF01_AE segment at HXB2 482–630 bp (149 bp) in the skeleton of 5′ LTR of subtype C (Fig. 1a). This recombinant includes four sequences supported by seven recombination methods (RDP, GENECONV, BootScan, Maxchi, Chimaera, SiSscan, and 3Seq), namely LS13145, LS11614, LS14862, and LS14863.

The recombination characterization of 5′ LTR recombinants. The same recombination pattern of the four 5′ LTR recombinants (LS13145, LS11614, LS14862, and LS14863) were confirmed by subregion trees.
To confirm the recombination among these sequences (LS13145, LS11614, LS14862, and LS14863), we extracted the parental fragments and constructed subregion phylogenetic trees with reference sequences. For the recombinant form, the results indicated that the major parent was assigned in the cluster of subtype C (Fig. 1b). The minor parent was assigned in the cluster of CRF01_AE (Fig. 1b).
Based on recombination analysis, we constructed an ML phylogenetic tree to perform 5′ LTR identification of HIV-1 strains circulating in China. Although the length of the 5′ LTR region is much shorter (634 bp) than the internal region of HIV-1. The best-fitting model of nucleotide substitution was calculated as GTR+G+I by the model selection function in MEGA X. The phylogenetic tree clearly showed distinct subtypes clusters (Fig. 2). Among the 219 amplified sequences, 61 sequences were CRF01_AE (27.85%), 3 sequences were subtype G (1.37%), 146 sequences were subtype C (66.67%), and 9 sequences were subtype B (4.11%) (Fig. 2).

The phylogenetic analysis based on 5′ LTR sequences of HIV-1 isolates in China. The ML phylogenetic tree was built using 219 amplified HIV-1 5′ LTR sequences together with 83 HIV-1 5′ LTR references. Colored ranges represent different groups and the black dot represents 83 references sequences. ML, maximum likelihood.
We further predicted the TFBS differences of the four identified recombinants circulating in China (LS13145, LS11614, LS14862, and LS14863) within 5′ LTRs by PROMO. The results revealed that recombination caused extensive changes in 226 TFBSs (Supplementary Table S3). These changed TFBSs include some very important transcription factors such as AP-1, YY1, and Pax-6.
Discussion
In our study, we obtained 219 of LTR sequences, providing more original data for subsequent research. Then, we identified the type of recombinant and constructed phylogenetic analysis for these LTRs and found a new recombination form of LTRs. Finally, it was found that a large number of TFBS were changed in the recombinants due to recombination. In addition, there is a limitation in this study. It is difficult to tell the LTRs that cluster with subtype C, CRF07_BC or CRF08_BC genomes through phylogenetic analysis currently. The best solution currently is to combine other regions (gag, pol, and env) to determine LTR subtypes. Overall, our study not only promoted people's understanding of LTR and deepened people's attention to LTR, but also played a certain role in the prevention of AIDS.
Sequences Data
The gene sequences of LS10213–LS9041 were deposited in the GenBank with the accession numbers OP938322–OP938540, respectively.
Footnotes
Authors' Contributions
Research design by L.L. and L.J. X.G, M.L, and L.J. performed the analysis. Sequence collection by H.L., B.Z., Y.W., C.W., Y.L., J.H., X.W., T.L., and J.L. X.G., M.L., L.J., and L.L. contributed to the composition of the article.
Ethics Statement
The study involving human participants was reviewed and approved by the ethics committees of Beijing Institute of Microbial Epidemiology. All participants signed written informed consent before sample donations.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study was supported by grants from National Natural Science Foundation of China (31900157) and the State Key Laboratory of Pathogen and Biosecurity (AMMS).
Supplementary Material
Supplementary Dataset S1
Supplementary Dataset S2
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
