Abstract
Phylogeography can improve the understanding of local and worldwide HIV epidemics, including the migration of subepidemics across national borders. We analyzed HIV-1 sequences sampled from Mexico and San Diego, California to determine the relatedness of these epidemics. We sampled the HIV epidemics in (1) Mexico by downloading all publicly available HIV-1 pol sequences from antiretroviral-naive individuals in GenBank (n = 100) and generating similar sequences from cohorts of injection drug users and female sex workers in Tijuana, Mexico (n = 27) and (2) in San Diego, California by pol sequencing well-characterized primary (n = 395) and chronic (n = 267) HIV infection cohorts. Estimates of population structure (F ST), genetic distance cluster analysis, and a cladistic measure of migration events (Slatkin–Maddison test) were used to assess the relatedness of the epidemics. Both a test of population differentiation (F ST = 0.06; p < 0.01) and a cladistic estimate of migration events (84 migrations, p < 0.01) indicated that the Tijuana and San Diego epidemics were not freely mixing. A conservative cluster analysis identified 72 clusters (two or more sequences), with two clusters containing both Mexican and San Diego sequences (permutation p < 0.01). Analysis of this very large dataset of HIV-1 sequences suggested that the HIV-1 epidemics in San Diego, California and Tijuana, Mexico are distinct. Larger epidemiological studies are needed to quantify the magnitude and associations of cross-border mixing.
Introduction
T
Recent investigations of the spread of the HIV-1 subtype B epidemic in Europe and the emergence of HIV-1 in east Africa have applied phylogeographic approaches to better understand HIV transmission dynamics. 2,3 For example, Gray et al. in 2009 4 evaluated HIV-1 sequences of p24 (gag protein) and gp41 (envelope glycoprotein) coding regions in clades A and D in relation to geographic data and found that although the three major urban centers of the Democratic Republic of Congo (Kinshasa, Lubumbashi, and Kisangani) are located in the same country, the HIV epidemic in each city had evolved separately and was more closely related to nearby cities in other countries that share transportation networks. These accessibility networks, or networks within which individuals can easily access other individuals or services, correlated better with phylogenetic HIV structure than structural risk factors such as migration, population growth, and armed conflict. 4,5 In contrast, Paraskevis et al. in 2009 3 analyzed 1337 pol sequences sampled from across Europe using a cladistic measure of gene flow and concluded that countries acted either as HIV-1 sources or sinks or had significant bidirectional migration patterns. Based on phylogeographic analysis, they proposed that intervention strategies targeting tourists, travelers, and migrants would be the key to decreasing the spread of HIV across Europe. 3
The border crossing between San Diego, California and Tijuana, Baja California Mexico is the busiest in the world with 5–6 million registered crossings northward monthly, 6 while easy access to street drugs and a thriving red light district in Tijuana (>9000 registered female sex workers) attract growing numbers of U.S. nationals southward. 7 –10 The prevalence of HIV-1 in Tijuana was estimated at 0.54% in 2006, 11 which is higher than both the CDC estimated prevalence in U.S. adolescents and adults in 2006 of 0.45%, 12 and national estimates of HIV prevalence in Mexico of 0.3% in 2006. 8 The higher HIV prevalence in Tijuana is characteristic of cities with a large migrant population that tends to engage in more risky behaviors. 13 To better understand the relatedness of the HIV epidemics in San Diego and Mexico, we studied the epidemics on both sides of the border using a phylogeographic approach.
Materials and Methods
Study population
We obtained samples from the epidemic in (1) Mexico by downloading all publicly available HIV-1 pol sequences in GenBank on September 27, 2009 (n = 100 between years 2003 and 2005) and generating similar sequences from cohorts of injection drug users (IDU) and female sex workers (FSW) in Tijuana (n = 27 between years 2004 and 2006) and (2) in San Diego by pol sequencing of well-characterized primary (n = 395 between years 1996 and 2006) and chronic HIV infection cohorts (n = 267 between years 2006 and 2007). Given the published descriptions of the sampled cohorts, all sampled sequences were believed to originate from antiretroviral-naive individuals except for 42 Mexican sequences downloaded from GenBank. HIV risk factors and location of sampling were unavailable for the majority of Mexican sequences downloaded from GenBank. Location of sampling was identified to be northeast Mexico for 32 sequences, and three sequences were noted to be from FSW and eight from IDU.
Viral sequencing
The pol region of HIV-1 was sequenced from all of the samples from the Tijuana cohorts by published methods using the Viroseq (Abbott, Chicago, IL) platform. 14 Sequences were deposited in GenBank with accession numbers GU304608–GU304632.
Resistance profiling
All sequences from subtype B antiretroviral treatment (ART)-naive individuals (n = 741) were uploaded into the Calibrated Population Tool on the Stanford University HIV Drug Resistance Database (accessed April 20, 2010) and resistance analysis of the sequences was performed. 15
Phylogenetic analysis
Subtyping of each sequence was performed using Subtype Classification Using Evolutionary Algorithms (SCUEAL).
16
All non-B sequences in the sample were excluded in the subsequent phylogeographic analyses. Sequences were aligned using HIV-specific alignment tools in HyPhy
17
(sequence length = 453 nucleotides). A maximum likelihood tree was estimated using a TN93 model
18
of sequence evolution using PhyML,
19
and the among-site rate variation approximated with a 4 class gamma distribution. We further analyzed the relatedness of the epidemics in San Diego and Mexico (including Mexican sequences from GenBank and our sequences from Tijuana) using (1) estimates of population structure (Fst-statistics
20
), (2) the identification of clusters based on the synonymous codon-based maximum likelihood MG94xREV model
21
with genetic distances of < 0.5%, and (3) cladistic measures of migration events.
22
A subanalysis was then performed comparing sequences from San Diego and Tijuana alone. All analyses were performed using the HyPhy Package.
23
The alignment and tree in NEXUS format can be downloaded from
Results
From the naive group, a total of 92 sequences (12.4%) had a resistance-associated mutation to a nucleoside reverse transcriptase inhibitor and 12.3% (n = 91) to a nonnucleoside reverse transcriptase inhibitor, similar to reported rates of transmitted drug resistance in the region. 24 –26 Notably, no baseline resistance-associated mutations were observed among the sequences from Tijuana.
Most of the sequences analyzed were subtype B (773/789), including all Mexican sequences; 16 of the San Diego sequences were non-subtype B including three subtype A, five subtype C, two unclassified strains, and six unique recombinant forms. Once the non-subtype B sequences were excluded, the population structure of the remaining sequences was assessed using measures of population differentiation. Mean pairwise synonymous codon-based genetic distance within the San Diego and Mexico regions was 4.54%, and 4.55% between regions. A simple nucleotide-based distance measure (TN93 27 ) yielded larger numbers: 5.6% and 6.1%, respectively. When only the sequences from San Diego (662) and Tijuana (27) were analyzed, the within-region synonymous diversity was 4.55% (TN93 diversity 5.6%), while the between-region diversity was 4.93% (TN93 6.1%). A measure of population differentiation as inferred from F st statistics 28 was sensitive to the choice of genetic distance. The test using nucleotide TN93 distance indicated a significantly different population structure between all sequences from Mexico and San Diego (F ST = 0.07, p < 0.01) and between Tijuana and San Diego (F ST = 0.08, p < 0.01). However, when using codon-based synonymous distances, the Mexico vs. San Diego comparison was not significant (F ST = 0.01, p = 0.62). For the San Diego vs. Tijuana, the population remained structured (F ST = 0.08, p = 0.02), suggesting the epidemics are not freely mixing.
A cladistic measure of gene flow 29 also indicated that the Mexico and San Diego epidemics were not freely mixing, but nonetheless suggested that a number of cross-border transmissions have occurred (84 inferred migrations, p < 0.01). When the Tijuana and San Diego epidemics were compared, they were also found to be not freely mixing with only 14 inferred migrations (p < 0.01).
Finally, a very conservative clustering analysis (clusters defined as sequences with a synonymous genetic distance < 0.5% 30 ) identified 72 clusters with ≥ 2 sequences (range 2 to 35 sequences), but only two contained sequences derived from both Mexico and San Diego; this degree of separation is statistically significant (p < 0.01 permutation test).
Discussion
Similar to previous work, we found that most Mexican sequences were subtype B and that transmitted drug resistance was present, although surprisingly not from our sequences from Tijuana. 31 We then applied a phylogeographic approach to the analysis of 773 subtype B HIV-1 pol sequences derived from 127 Mexican and 646 San Diego subjects.
To analyze the relatedness of the epidemics, we utilized synonymous distances as they are preferred for clustering analysis because they mitigate the confounding effect of convergent evolution due to immune and drug resistance selection. Furthermore, our estimation of genetic distance was performed using maximum likelihood methods, which can better handle the numerous ambiguous nucleotides present in bulk HIV sequences. We found that the San Diego and Mexican epidemics were distinct using cladistic measures of gene flow (Slatkin-Maddison test) and clustering analysis, but when using F ST statistics, a more conservative distance-based approach, the Mexican and San Diego populations were not found to be phylogenetically distinct.
If the San Diego and Mexican epidemics were not distinct, the close proximity of San Diego and Tijuana would suggest that the subepidemics of these two cities would be freely mixing as well; however, when the population structure of the Tijuana and San Diego sequences was analyzed, these subepidemics were found to be phylogenetically distinct by all measures. Therefore these data suggest that even with the high levels of cross-border migration of people between San Diego and Tijuana, the HIV epidemics in each city have remained distinct, although interestingly, this was not demonstrated for the sampled HIV epidemics between San Diego and Mexico as a whole when using F ST statistics. Potential explanations for this are that either the networks of HIV-positive individuals in San Diego have closer ties to other parts of Mexico than Tijuana, or that the lack of mixing observed between the Tijuana and San Diego epidemics was due to differences in the HIV risk factors of the sampled individuals.
Early evidence suggested that the Mexican HIV epidemic had its roots in the U.S. epidemic, as 100% of the HIV cases registered in Mexico in 1983 were from people who had lived in the United States. 32,33 Today, although Mexico has the third highest absolute number of HIV cases in the Americas after the United States and Brazil, Mexico has only the twenty-third highest prevalence in the Americas. 8 However, one-third of these infections are from individuals living in border states. 34 Pertinent to San Diego, the neighboring Mexican state of Baja California has the second highest cumulative HIV incidence in Mexico, second only to the federal district (Mexico City), 8 with recent research estimating that 1 in 116 Tijuana residents between the ages of 15 and 49 are HIV infected. 35 Taken together, there is ample evidence that a separate more dynamic Mexican HIV subepidemic is occurring along its northern border with the United States. 36
The net flux of Mexicans migrating into the United States annually surpasses 400,000. 37 Migrants, who are separated from family and other supports, 38,39 tend to engage in higher HIV risk behaviors, 40,41 and those returning may carry the virus back home. In a study of male IDUs in Tijuana, those who had been deported from the United States to Mexico were four times more likely to be HIV positive than other male IDUs. 42 In a study of FSW in Tijuana and Ciudad Juarez, two-thirds had clients from the United States, and those who did so were more likely to engage in other higher risk sexual behaviors, have active syphilis, and to inject drugs. 43,44 More than one-third of male IDUs in Tijuana had homosexual encounters, but these individuals tended not to self-identify as men who have sex with men (MSM) and had larger numbers of female partners than non-MSM. 45 Finally, highlighting the bidirectionality of social networks across borders, nearly 75% of San Diego MSMs and nearly 50% of MSMs from Tijuana reported partners from across the border. 46 Although many of the factors presented above suggest that U.S. and Mexican epidemics, and in particular the San Diego and Tijuana subepidemics, should be closely related, our data suggest something different. Our work shows that even with the tremendous amount of cross-border migration, the social networks of the majority of HIV-positive individuals in San Diego are not mixing with the social networks of HIV-positive individuals in Tijuana.
Limitations to this study include the possibility that sampling error affected the findings. For example, the bulk of the sequences from San Diego in this study came from MSM, whereas the bulk of the sequences from Tijuana came from cohorts of FSW and IDUs, 47,48 and the HIV risk factors for non-Tijuana Mexican sequences was unknown. It is possible, therefore, that the observed separation in HIV transmission networks was because the networks sampled were distinct by HIV risk factor and not because of a national border. If more sequences from IDUs and non-MSM Hispanic men were obtained from San Diego County, more sequences from throughout Mexico were obtained with HIV risk factor data, or more sequences from self-identified MSMs were obtained from Tijuana, we might find that the epidemics are more interconnected.
Further efforts to better elucidate the relationships of the San Diego and Tijuana HIV subepidemics will need to focus on improving sampling of self-identified MSM social networks south of the border and Hispanic heterosexual and IDU networks north of the border. Future efforts to control the transmission of HIV in the border area may benefit from understanding how these transmission networks are related, so that suitable interventions can be targeted appropriately.
Footnotes
Acknowledgments
We thank Jean Carr for use of her previously published pol sequences from Mexico. This project was supported by Grants 5K01DA020364-05 (KCB), R01MHMH65849 (SAS), AI69432 (ACTG), AI080353 (R21), MH62512 (HNRC), MH083552 (Clade), AI077304 (Dual Infection), AI36214 (UCSD Center for AIDS Research), AI047745 (Dynamics), AI074621 (Transmission), AI07384 (AIDS Training Grant), AI080193 (ARRA) from the National Institutes of Health, DMS0714991 (NSF), the International AIDS Vaccine Initiative, and the California HIV/AIDS Research Program RN07-SD-702 (DMS).
Author Disclosure Statement
No competing financial interests exist.
