The Epidemic Dynamics of Four Major Lineages of HIV-1 CRF01_AE Strains After Their Introduction into China

Abstract

The epidemic of HIV-1 CRF01_AE in China was driven by multiple lineages of HIV-1 viruses introduced in the 1990s and increasing; it is important to investigate their epidemic status in China. In this study, we download all available CRF01_AE sequences (n = 2,931) from China and their associated epidemiological information in the Los Alamos HIV database for our analysis to explore their epidemic status in China. The results showed there were 11 distinct clusters of CRF01_AE strains in China, and 4 major clusters that accounted for 80.0% (1,793/2,241) of Chinese CRF01_AE strains in total had led a real epidemic. Clusters 1 and 2 were epidemic among heterosexuals and injecting drug users in southern and southwestern China, while Clusters 3 and 4 were predominant among homosexuals in eastern and central China and northeastern China, respectively. HIV-1 CRF01_AE strains detected in heterosexuals had the most complex characteristic, underscoring its important role in the occurrence of multiple CRF01_AE lineages. Furthermore, epidemic history reconstruction analysis using the birth-death susceptible-infected-removed package revealed that the four clusters had gone through varying epidemic stages. Clusters 2 and 3 were near the peak of the local epidemic, while Clusters 1 and 4 were just in the very early stage of their epidemic. The epidemic status of CRF01_AE clusters in the future is mainly determined by the effect of prevention and control. Our study provides new insights into the understanding of the epidemic dynamics of CRF01_AE in China.

Introduction

CRF01_AE, initially named subtype E, is the first defined circulating recombinant form (CRF) of HIV-1.^1,2 It is also one of the most widely spread HIV-1 CRFs in the world and predominantly epidemic in Asia.³ In China, CRF01_AE was first detected among commercial sex workers who returned to the Yunnan province of China from Thailand in 1994 and was then found in injecting drug users (IDUs) from the Guangxi province of China in 1996^4,5 In our previous work, we found three clusters of CRF01_AE strains epidemic in sexual contacts and IDUs from the Guangxi province based on gag, pol, and env fragments from 349 HIV-1-infected patients.⁶ Recently, An et al. reported two distinct clusters of CRF01_AE strains dominant among men who have sex with men (MSM) in the Liaoning province based on 38 nearly full-length genome (NFLG) of HIV-1.⁷ Ye et al. further documented that four major clusters exist in mainland China from 408 fragments of gag gene,⁸ and Feng et al. found seven small clusters circulating in China based on 102 NFLG.⁹ Abubakar et al. speculated there were at least five independent introductions and five independent autochthonous clades of CRF01_AE in China based on 1,957 global CRF01_AE gag p17 sequences in the public data set.¹⁰ All of these studies support that the epidemic of CRF01_AE in China was driven by multiple lineages of HIV-1 strains introduced in the 1990s, although the exact number of lineages was not in agreement due to the differences in the number of sequences used in the phylogenetic analysis. To give a comprehensive map of the lineage of CRF01_AE in China, we used all the CRF01_AE sequences from China available in the public database for our analysis.

The network of contacts that provided the opportunity for transmission is important for viral spread, and the transmission mode can affect viral evolution, for example, the bottleneck effect in patients contracting the virus by sexual transmission,^11,12 and because HIV belongs to RNA viruses, characterized by fast evolution, the evolutionary and ecological processes are on the same timescale, and the genomic data of the epidemic viruses can preserve the dynamic message of the host that they infected, which can be used to reconstructed the population history.¹³ It is reported that a typical coalescent model is not able to distinguish prevalence from incidence,¹⁴ and it is difficult to get more detailed information from the inference result. The birth-death model is an alternative epidemiological coalescent model widely used in fast evolving viruses such as HIV and HCV.¹⁵ Stadler et al. ¹⁶ compared the inference results of the HIV-1 data set in the United Kingdom by the coalescent skyline plot and birth-death model and found that when the basic reproductive number R0 > 1, it coincided with the increase of the effective population size Ne inferred by coalescent skyline, but R0 < 1 was not reflected in coalescent skyline because there was no obvious decrease in Ne, indicating that the birth-death model may provide more information from which we can get an overall evaluation of public prevention. So to provide a different lens of the epidemic status of each lineage, the birth-death model is a better choice.

To clarify the exact epidemic status and reconstruct the population dynamics history of HIV-1 CRF01_AE lineages in China, we downloaded and used all of the CRF01_AE sequences from China and their associated epidemiology information available in the public database. After identification of the main CRF01_AE clusters in China, we tracked the population dynamics history of major clusters with the birth-death susceptible-infected-removed (BD-SIR) model in BEAST v2.10.

Materials and Methods

Sequence data set and quality control

To make use of the extensive amount of genomic data as much as possible, we downloaded all CRF01_AE sequences (one sequence per patient) from China and representative foreign strain (from 10 countries, including Central African Republic, Cameroon, Vietnam, Japan, Finland, France, India, America, Afghanistan, and the United Kingdom) from the Los Alamos HIV database (www.hiv.lanl.gov/content/index). Only the sequences with the sampling date were collected. The range of the sampling dates used in our article is from 1992 to 2013. All of the sequences were aligned with all HIV-1 subtype reference sequences using BioEdit V7.2 and subsequently checked manually to ensure that the codons were correctly aligned. The sequences with apparent problems (i.e., unusually high similarity for the pollution) were deleted from our data sets. All of the retrieved sequences were imported into the RDP 3.0 software to check for recombinants.¹⁷ Fragments representing known recombinants that were not pure CRF01_AE were also excluded, which resulted in our new data set containing 2,931 sequences.

The phylogenetic analysis and identification of CRF01_AE lineages in China

We first used all the full-length env, gag, and pol gene fragment sequences to construct the phylogenetic tree to determine the number of major lineages of CRF01_AE. Then, we selected three sequences from each major cluster as reference sequences (the strain with NFLG of sequences was preferred) to represent this lineage. For other shorter sequences (e.g., V3 loop of env gene sequences, p17 of gag gene sequences, and protease region of pol gene sequence), we adjusted the reference sequences to the length of these sequences to construct phylogenetic trees to determine their lineages. We made this strategy to prove the accuracy of the phylogenetic analysis and identification of CRF01_AE lineages in China. The general time reversible model plus a gamma distribution among site rate heterogeneity were chosen in a hierarchical likelihood ratio test on the basis of the standard Akaike information criterion by the Modeltest.¹⁸ Phylogenetic trees were constructed using the neighbor-joining and maximum likelihood methods in the MEGA 6.0 program.¹⁹ We also construct ML tree with RaxML software.²⁰ The reliability of the topology of the tree was evaluated by bootstrap analyses with 1,000 replicates. As for the patients with more than one fragment in the data sets, we combined the phylogenetic tree results based on different fragments to distinguish their lineages and counted them once in the final statistical table. The 2,931 sequences in our data set belonged to 2,241 patients. The HIV-1 sequence fragments from one patient with discordant phylogenetic results or ambiguous lineage information were not added to the specific lineages.

The population dynamics history analysis of the four main clusters

The BD-SIR model in BEAST v2.10 is a classic epidemiological model that incorporated into the birth-death process,^21,22 which is also a classical population dynamics model that accounts for changing population composition.²³ In the SIR model, the whole population is divided into three parts. S (susceptible) represents the host who is potential to be infected by pathogens and I (infected) denotes the host who has already been infected. R (removed) means the kind of host removed from the whole population because of death, being treated, or change of behavior. For the phylogenetic analysis, results of gag, pol, and env were generally consistent; we used the cluster information based on env to reconstruct the population history of each lineage for convenience. We chose the BD-SIR (serial) as the tree prior in BEAST2.10, then assigned the distributions of the parameters and the number of Markov Chain Monte Carlo (MCMC) to make sure that the effective sample size (ESS) value is large enough (>200) after we ran the generated XML file. Finally, we could draw trajectories of the number of susceptible, infected, and removed individuals from which we can get information on the viral outbreak in each lineage using RStudio.

Results

Identification of the number of CRF01_AE lineages in China

The ML trees were constructed by Mega 6 and RaxML software based on full gag, pol, and env gene sequences with sufficient genetic information, and the results are similar. ML trees constructed by Mega 6 and RaxML are shown in Figure 1 and Supplementary Fig. S1(Supplementary Data are available online at www.liebertpub.com/aid), respectively. Combine these phylogenetic results, four large distinct clusters (more than 30 sequences), some small clusters (varying from 2 to 15 sequences), and many sporadic strains of CRF01_AE were found in China. If we counted one distinct cluster, including three sequences with high bootstrap value as one lineage of CRF01_AE in China as was done in a previous study,⁹ there were 11 lineages in China (including all seven clusters identified in that study). According to our data, only four major clusters of CRF01_AE (Cluster 1 to Cluster 4) caused a real epidemic; other clusters were still restricted to a small number. The strains belonging to Cluster 1 had a close phylogenetic relationship with Vietnamese strains, and the strains in other major clusters neighbored with Thai strains in phylogenetic trees. These results were in accordance with our and other previous studies.^8
–10,24 In this study, we mainly focused on the transmission dynamics of four major CRF01_AE lineages in China, while other lineages and sporadic strains were summarized as the others. The designation and reference strains of four major CRF01_AE lineages were the same as our previous study.²⁴

FIG. 1.

HIV-1 CRF01_AE strains from China grouped into four major distinct clusters. The maximum likelihood tree analysis of all env (A), gag (B) and pol sequences (C) from worldwide representative samples of HIV-1 CRF01_AE by Mega 6 is shown. The bootstrap values >70 are shown at the corresponding nodes. The branch belonging to the CRF01_AE sequences isolated from China are in gray. The 11 distinct clusters of CRF01_AE strains identified in China are indicated. Except for Cluster 1, which included both Chinese and Vietnamese strains in a single cluster, the other clusters only contained Chinese strains. Cluster 2 was the distinct cluster and we designated it as CRF01_AE-v in a previous study.

The geographic and risk population distribution of four major CRF01_AE lineages in China

There were 2,241 (one sequence per patient) HIV-1 sequences from China in our data set. The four major clusters of CRF01_AE strains made up 80.0% (1,793/2,241) of the HIV lineages, and Cluster 2, which we designated as CRF01_AE-v, had the highest proportion (28.1%, 629/2,241).⁶ We extracted the transmission route of the strains in China from the database and original articles, and detailed information regarding the geographic sources and risk groups is shown in Supplementary Table S1. Although all four clusters were detected in more than 10 provinces in China, each cluster still retained distinct characteristics. The geographic distribution of four major CRF01_AE lineages in China is illustrated in Figure 2. Cluster 1 (n = 407), which was placed within the phylogenetic group, composed of CRF01_AE strains from Vietnam was circulating in Guangxi province, southern China, and Shanghai city, eastern China. Cluster 2 (n = 629), which was first found in Guangxi province, had the highest proportion (28.1%, 629/2,241). It was predominant in southern and southwestern China. Cluster 3 (n = 541) was dominant in Anhui province, central China, and major eastern cities. Cluster 4 (n = 216) was epidemic in Jilin and Liaoning province, northeastern China. According to the recorded risk group information, Clusters 1 and 2 were primarily transmitted by heterosexuals and IDUs, while Clusters 3 and 4 were prevalent in homosexuals. Only 38.6% and 28.3% of the sequences in Clusters 1 and 2, respectively, had risk group information available. Therefore, these results should be taken with caution. However, 62.3% and 53.7% of the sequences in Clusters 3 and 4, respectively, had risk group information available, making their risk group distributions more reliable. Most of these homosexuals were MSM. All of these results showed that the four major lineages of the CRF01_AE strains had displayed a specific geographic and risk group distribution after they were imported to China for 20 years.

FIG. 2.

Geographic distributions of the four major distinct phylogenetic clusters of CRF01_AE in China. The distribution of the four major CRF01_AE clusters in most of the provinces and cities across China is illustrated based on the data in Supplementary Table S1.

Epidemic status of the four major CRF01_AE lineages in China

We applied the BD-SIR method to explore the epidemiological dynamics of the four HIV-1 CRF01_AE clusters in China. We have used a previously reported evolutionary rate (6.59 × 10⁻³) as the prior⁹ evolutionary rate estimated in our study is 8.1 × 10⁻³, which is similar to this studies. Bayesian estimates for the epidemiological parameters and time to the most recent common ancestors of the clusters are summarized in Supplementary Table S2. The estimated population change trajectories of the susceptible (S), infected (I), and removed (R) individuals in the four local epidemics are shown in Figure 3.

FIG. 3.

SIR trajectories, incidence, and prevalence of four HIV-1 CRF01_AE lineages in China. The overall SIR dynamics (a) shows what epidemic stage each cluster went through, S, I, and R represent susceptible, infected, and removed individuals, respectively. The incidence and prevalence over time are shown on the right (b). The value of the vertical coordinate in each diagram denotes the numbers of individuals.

The estimated mean reproduction ratio R0 of the four clusters (Supplementary Table S2) ranges from 3.06 (95% HPD: 3.03–3.08) in Cluster 3 to 5.05 (95% HPD: 5.01–5.09) in Cluster 2. Median value of the rate to become noninfectious ranges from 0.20 to 0.44, indicating that the average infectious period lasts for about 2–5 years. Cluster 2 turns out to be the most informative cluster, which was sampled from a huge population with the most susceptible individuals.

Figure 3 shows the posterior medians of the epidemic time series and suggested that the four local epidemics corresponding to four genetic clusters had gone through varying epidemic stages until the most recent sampling dates in our data set. The results suggested that Clusters 2 and 3 are just before the peak of the local epidemic, and we could expect the explosive outbreak as they reached the peak in the near future. These dynamics can also be seen in the plots of the average effective reproduction ratio R0 over time (Fig. 4). R of Clusters 2 and 3 falls close to one at year 2008 and below to one after year 2010, respectively. However, Clusters 1 and 4 were in the early stage of inherent epidemic for high R0 and a susceptible but low infected individual population. The basic reproduction ratios of Clusters 1 and 4 were always above one.

FIG. 4.

Reconstructed effective reproduction ratio over time. Black curved line represents median effective reproduction ratio for each cluster. Gray curved line shows the 95% PHD interval, while the gray straight line marks the value where the effective reproduction ratio equals 1.

Discussion

In this study, we found 11 distinct clusters of CRF01_AE strains in China, and four major clusters, which accounted for 80.0% (1,793/2,241) in total, had led a real epidemic. Clusters 1 and 2 were epidemic in southern and southwestern China and prevalent among heterosexuals and IDUs, while Clusters 3 and 4 were dominated among homosexuals in eastern and central China and northeastern China, respectively.

The pre-existing complexity of the epidemic (multiple sources of introductions from diverse localities) is the main reason for the continuous extensive geographical dispersal across the viral phylogeny.²⁵ Multiple lineages of CRF01_AE strains were introduced into China during the early to mid-1990s.^7
–9,24 Some of these strains expanded locally and then spread into other regions (founder effect), and some strains failed to initiate a local expansion, which may be the reason that four major distinct clusters, some small clusters, and many sporadic strains of CRF01_AE were found in China. The effect of differential sampling still could not be excluded. For example, there were only two CRF01_AE strains from northwestern China. Some areas such as Guangxi province, Beijing city, Guangdong province, Yunnan province, and Liaoning province were posed on more extensive epidemiologic investigation than other regions; however, these areas also belonged to the major locations most affected by HIV epidemic in China. For the sampling bias, although our data set cannot represent the distribution directly, it is also valuable for understanding the overall epidemic trend in China. In this study, we found the biggest number of lineages by using all of CRF01_AE sequences reported according to the previous studies' principle. Although the exact number of lineages in our study was not completely the same with previous reports due to the differences in the number of sequences and principle used in the phylogenetic analysis, all the studies supported that the epidemic of CRF01_AE in China was driven by multiple lineages of HIV-1 strains. Indeed, except four major lineages that took up 80% of the total number of strains, other lineages had not really caused an epidemic. The epidemic caused by four major lineages was also proved in previous studies with fewer sequences. Hence, we think that the phylogenetic signal in our data set is believable. It is likely that we will observe the continued evolution and diversification within and across the four major CRF01_AE lineages with an increase in the number of HIV samples.

The network of contacts that provided the opportunity for transmission is important for HIV spread, and the transmission mode can affect HIV evolution.^11,12 In our compiled database with record transmission mode information, HIV-1 CRF01_AE detected in heterosexuals had almost all the lineages (including four major clusters) found in China, while CRF01_AE in homosexuals almost belong to Clusters 3 and 4, and in IDUs mostly belong to Clusters 1 and 2. It implied the important role of heterosexual transmission in the occurrence of multiple CRF01_AE lineages in China. For the high prevalence of CRF01_AE and subtype B in MSM, the new recombinant virus CRF55_01B and CRF59_01B was recently reported to circulate among MSM in China.^26,27 It is also meaningful to explore the evolutionary characteristics of the CRF01_AE strains to understand the source of parental strains of these recombinant viruses.

From the result of the epidemic history reconstruction for each of the four major lineages, we could infer that they had gone through different epidemic stage by the end of the most recent sampling date. The SIR trajectories showed that the former two had gone through a larger part of their epidemic processes than the latter two. Clusters 1 and 4 were far before the peak of the local epidemics, for there were no obvious flows from susceptible individuals into infected individuals during the sampled interval. Cluster 1 was mingled and interplayed with the Vietnamese strains, which had been proven previously⁶ and in this article; maybe they were still in a run-in period and did not cause a large outbreak. Cluster 4 was a lineage that occurred much later in China and it was still in its initial epidemic period in northeastern China. However, Clusters 2 and 3, which were two indigenous CRF01_AE lineages in China, showed a clear sign of outbreak. Our results indicated that Cluster 2, which was dominated by CRF01_AE-v designed by us, was sampled from a huge population with most susceptible individuals. This revealed the seriousness of the epidemic in the corresponding areas. The values of R0 of Clusters 1 and 4 remained above one persistently in the sample interval, indicating a potential outbreak, while those of Clusters 2 and 4 ran approximate to one and below one at year 2008 and 2010, respectively. A previous literature¹⁴ studying the five clusters of HIV-1 type B in the United Kingdom found that one cluster was in its end of epidemic, while in our analysis of CRF01_AE in this article, the four clusters were before or near the peak of the local epidemic.

Our study is the most extensive report to clarify the epidemic status of multiple lineages of CRF01_AE strains in China using large sequence data sets (n = 2,241) covering 22 provinces and cities of China. Our study reported that there were almost 11 CRF01_AE clusters in China, and four major distinct clusters had resulted into an epidemic with their own geographic and risk population characteristics. We first documented the current epidemic population stage and speculate the future trend of four major clusters of CRF01_AE strains in China and make an alarm on their effective prevention and control. It is useful for epidemiologists to understand the epidemic status of CRF01_AE in China. The continuous monitoring of HIV-1 CRF01_AE variants is important for understanding the dissemination dynamics of HIV and helps to inform effective intervention programs.

Footnotes

Acknowledgment

This work was supported by the Key National Science and Technology Program in the 12th Five-Year Period [grant number 2012ZX10001-002].

Sequence Data

The GenBank accession numbers of sequences used in this study are in the Supplementary Table S3.

Author Disclosure Statement

No competing financial interests exist.

References

Carr

, Salminen

, Koch

, et al.: Full-length sequence and mosaic structure of a human immunodeficiency virus type 1 isolate from Thailand. J Virol, 1996; 70:5935–5943.

, Takebe

, Luo

, et al.: Wide distribution of two subtypes of HIV-1 in Thailand. AIDS Res Hum Retroviruses, 1992; 8:1471–1472.

Lau

, Wang

, Saksena

: Emerging trends of HIV epidemiology in Asia. AIDS Rev, 2007; 9:218–229.

Chen

, Young

, Subbarao

, et al.: HIV type 1 subtypes in Guangxi Province, China, 1996. AIDS Res Hum Retroviruses, 1999; 15:81–84.

Cheng

, Zhang

, Capizzi

, Young

, Mastro

: HIV-1 subtype E in Yunnan, China. Lancet, 1994; 344:953–954.

Zeng

, Sun

, Liang

, et al.: Emergence of a new HIV type 1 CRF01_AE variant in Guangxi, Southern China. AIDS Res Hum Retroviruses, 2012; 28:1352–1356.

, Han

, Xu

, et al.: Reconstituting the epidemic history of HIV strain CRF01_AE among men who have sex with men (MSM) in Liaoning, northeastern China: Implications for the expanding epidemic among MSM in China. J Virol, 2012; 86:12402–12406.

, Xin

, Yu

, et al.: Phylogenetic and temporal dynamics of human immunodeficiency virus type 1 CRF01_AE in China. PLoS One, 2013; 8:e54238.

Feng

, He

, Hsi

, et al.: The rapidly expanding CRF01_AE epidemic in China is driven by multiple lineages of HIV-1 viruses introduced in the 1990s. AIDS, 2013; 27:1793–1802.

10.

Abubakar

, Meng

, Zhang

, Xu

: Multiple independent introductions of HIV-1 CRF01_AE identified in China: What are the implications for prevention?. PLoS One, 2013; 8:e80487.

11.

Derdeyn

, Hunter

: Viral characteristics of transmitted HIV. Curr Opin HIV AIDS, 2008; 3:16–21.

12.

Leigh Brown

, Lycett

, Weinert

, Hughes

, Fearnhill

, Dunn

: Transmission network parameters estimated from HIV sequences for a nationwide epidemic. J Infect Dis, 2011; 204:1463–1469.

13.

Grenfell

, Pybus

, Gog

, et al.: Unifying the epidemiological and evolutionary dynamics of pathogens. Science, 2004; 303:327–332.

14.

Kuhnert

, Stadler

, Vaughan

, Drummond

: Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model. J R Soc Interface, 2014; 11:20131106.

15.

Bouckaert

, Heled

, Kuhnert

, et al.: BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Comput Biol, 2014; 10:e1003537.

16.

Stadler

, Kuhnert

, Bonhoeffer

, Drummond

: Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc Natl Acad Sci U S A, 2013; 110:228–233.

17.

Martin

, Lemey

, Lott

, Moulton

, Posada

, Lefeuvre

: RDP3: A flexible and fast computer program for analyzing recombination. Bioinformatics, 2010; 26:2462–2463.

18.

Posada

, Crandall

: MODELTEST: Testing the model of DNA substitution. Bioinformatics, 1998; 14:817–818.

19.

Tamura

, Stecher

, Peterson

, Filipski

, Kumar

: MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol, 2013; 30:2725–2729.

20.

Stamatakis

: RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 2014; 30:1312–1313.

21.

Marques

: Kendall's generalized birth and death process and the need for construction of stochastic diffusion processes. Arq Fac Hig Saude Publica Univ Sao Paulo, 1965; 19:135–141.

22.

Rannala

, Yang

: Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference. J Mol Evol, 1996; 43:304–311.

23.

Kermack

, McKendrick

: A contribution to the mathematical theory of epidemics. Proc Royal Soc, 1927; 115:22.

24.

Zeng

, Sun

, Li

, et al.: Reconstituting the epidemic history of mono lineage of HIV-1 CRF01_AE in Guizhou province, Southern China. Infect Genet Evol, 2014; 26:139–145.

25.

Paraskevis

, Pybus

, Magiorkinis

, et al.: Tracing the HIV-1 subtype B mobility in Europe: A phylogeographic approach. Retrovirology, 2009; 6:49.

26.

Zhao

, Cai

, Zheng

, et al.: Origin and outbreak of HIV-1 CRF55_01B among MSM in Shenzhen, China. J Acquir Immune Defic Syndr, 2014; 66:e65–e67.

27.

Zhang

, Han

, An

, et al.: Identification and characterization of a novel HIV-1 circulating recombinant form (CRF59_01B) identified among men-who-have-sex-with-men in China. PLoS One, 2014; 9:e99693.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.13 MB

0.02 MB

0.03 MB