Abstract
The presence of non-B HIV subtypes in the USA has been documented during the epidemic, although the timing of early introductions of different subtypes remains uncertain. Subtype C, the most common HIV variant worldwide, was first reported in the USA in 1996–97, after subtype C had expanded greatly in sub-Saharan Africa. In this study, we report a patient with subtype C infection acquired by mother-to-child transmission, born in the USA in 1990 to a Washington, D.C. resident who never traveled outside the USA, demonstrating that subtype C was present in the USA much earlier. Comparative analysis of the sequence from this patient and subtype C sequences in the USA and elsewhere suggest multiple independent introductions of this subtype into the USA have taken place, many of which are traced to sub-Saharan or East Africa. These data indicate subtype C HIV was already present in the USA years earlier than previously reported, and during the early period of subtype C expansion.
T
Subtype C, which is the most common HIV variant worldwide, representing over 50% of all HIV infections, is present mainly in sub-Saharan Africa and South Asia. 6 Although the most common non-B subtype in the USA, subtype C is still relatively infrequent, 1 representing <2% of the overall U.S. HIV epidemic. Subtype C was first reported in the USA in samples obtained after 1996, 7 –9 after the subtype C epidemic had undergone a large expansion into sub-Saharan countries, including South Africa 10,11 ; these sampling times suggested that subtype C introduction to the USA occurred during a later part of its expansion.
In this study, we report that subtype C HIV was already being transmitted in the USA before 1990, as documented in a patient with subtype C infection acquired by mother-to-child transmission in 1990. We investigated the HIV sequence from this subtype C-infected child to determine the origin and relationship with other reports of subtype C infection diagnosed in the USA and elsewhere.
The index patient, an African American female, was born in Washington, D.C. in 1990 by normal spontaneous vaginal delivery. Neither the mother nor the child received transfusions and no perinatal HIV testing was performed. Five years later, the mother was diagnosed with HIV when she developed complications of AIDS. The child was first tested for HIV at that time and found to be HIV positive. The mother died shortly after her diagnosis; she had no history of intravenous drug use and presumably acquired HIV via heterosexual transmission. The mother served in the military, but was not been stationed outside Washington, D.C. and never traveled outside of the USA. The identity of the father is unknown.
After diagnosis, the child was treated with multiple mono- and dual- antiretroviral therapy regimens, and a number of combination antiretroviral regimens, but seldom achieved viral suppression. Her clinic notes (without laboratory documentation available) at age 9, report evidence of lamivudine resistance from HIV genotyping; clinic evaluations noted no injecting drug use or other HIV exposure risks except mother-to-child transmission. By age 12, she had acquired numerous drug resistance conferring mutations (DRMs)to nucleoside reverse transcriptase inhibitor (NRTI), including the multidrug resistance mutation, Q151M, as well as mutations conferring resistance to non-nucleoside reverse transcriptase inhibitors (NNRTI) and protease inhibitors (PI, see Supplementary Table S1; Supplementary Data are available online at
In 2014, she enrolled in a study of evaluation and management of treatment-experienced patients with virologic failure at the NIH Clinical Center (NCT01976715). At enrollment, her HIV RNA level was 117,951 copies/mL plasma, and CD4 T cell count was 14 cells/μL (2%). HIV genotyping (TRUGENE) in the protease (amino acids 3–99), reverse transcriptase (amino acids 31–247), and the entire integrase genes was performed and the virus was noted as subtype C 12 with extensive genotypic resistance, including NRTI resistance mutations D67N, T69D, V75I, V77L, T116Y, Q151M, and a complex of mutations at position 215, including 215T, A, I, V; NNRTI resistance mutations 101Q, 181C, and 190A, and PI mutation 90M; all identified previously in this patient.
The participant reported no other potential transmission risks and had no history of injection drug use. In investigating potential therapeutic options, we retrieved sequential genotypes obtained from previous rebound viremia periods. Most of these DRM were already present by 2002 (see Supplementary Table S1), indicating longstanding persistence of drug resistance.
We studied additional subtype C sequences to investigate potential phylogenetic relationships of this sequence. The pro-pol sequence of the child's virus, designated NIH HIV infected pediatric individual (NIH-C), was compared with other HIV-1 pro-pol sequences using phylogenetic and Bayesian techniques. We constructed a dataset of subtype C sequences from the USA and worldwide (Total n = 392, sequence length 910 nucleotides (nt), trimmed to correspond to commercial genotyping positions nt 2,262–2,549 of protease, and nt 2,661–3,282 of reverse transcriptase). The set consisted of three subsets: (1) all unique subtype C sequences reported to GenBank with the USA as country of origin and confirmed dates of isolation (n = 280;
From the top 200 sequences closely related to the patient's sequence, all unique sequences of 910 nt in length with isolation dates reported in the GenBank entry were selected and considered closely related to the patient sequence (n = 80); all of these sequences had identity values in the BLAST algorithm ranging from 84% to 93% and E value less than 92%. 3) Other well-curated subtype C sequences reported worldwide with known dates and locations of isolation, including a well-described set of sequences from Ethiopia and Israel (n = 29) and a number of subtype C reference sequences were retrieved (
All these sequences, and the HIV-1 reference sequence HXB2 that was used as an outgroup (GenBank accession number K03455), were aligned and subjected to (1) a maximum likelihood phylogenetic analysis (MEGA 6) 13 ; trees reconstructions were performed using the entire available sequence, as well as a reduced sequence with all resistance sites removed; and (2) Bayesian evolutionary analysis by sampling trees (BEAST) after removing resistance mutations; the HKY + G model with uncorrelated lognormal relaxed molecular clock was used, and tree coalescent of constant size was used with chain length of 600,000,000 steps. The outcome posterior's effective sample size was >500. All sequence identification numbers are provided in Supplementary Table S2.
Maximum likelihood analyses revealed (Fig. 1A) that the phylogeny of subtype C was diverse, with only a few branches having substantial bootstrap support (>70%, Fig. 1A, red circle). The child' sequence, denoted NIH-C (indicated by arrow) was located within a cluster of 44 sequences originating from sub-Saharan Africa (most from South Africa and Malawi), and Europe as well as 21 of the 280 U.S. sequences studied (7.5% of U.S. sequences). The pairwise distance between the NIH-C sequence and the closest sequence in the cluster (from South Africa) was 6.7%. Identical phylogenetic branching was obtained in the presence or absence of DRM codons in the sequences.
Bayesian analysis (BEAST, Fig. 1B) was conducted, and the estimated time of recent common ancestor of the NIH-C sequence (18.6 years) was c. 1973; only a few years after the estimated emergence of subtype C in the 1960's and early 1970's. 14,15 In addition, none of the branches of the other early U.S. subtype C sequences 2 –4 appeared to share a recent common ancestor with the NIH-C sequence, suggesting each branch was the product of an independent subtype C introduction to the USA.

NIH-C sequence is related to subtype C sequences from Southern Africa, rather than to most of those reported from the USA. Sequences (n = 393) were assembled as described and subjected to phylogenetic and BEAST analyses.
In general, the spread of HIV out of Africa in the early years of the epidemic was characterized by the existence of distinct founder viruses that established specific subtypes in various parts of the world, such as subtype B in the USA and western Europe, subtype AE in Southeast Asia, and subtype A in Eastern Europe and countries of the former Soviet Union. 16 The introduction and spread of other variants within these regions after the founder viruses arrived have not been well characterized, and additional analyses provide useful insight into the spread of HIV within populations. Subtype C, the most common subtype worldwide, is diverse 11,12 and chiefly concentrated in sub-Saharan Africa, 17 where it underwent a broad and complex expansion in the 1990's with multiple introductions into South Africa. 10,11 Subtype C was reported in the USA in mid-to-late 1990's, 2 –4,14 years after other non-B subtypes had been introduced into the USA, which were already present in 1988–1992. 18
In this study, we report that subtype C was actually present in the USA in the late 1980s before the large expansion of subtype C in sub-Saharan Africa; the variant we identified was genetically most closely related to other variants in sub-Saharan Africa. The individual reported here was diagnosed at age 5 and the most likely route of infection was mother-to-child transmission. It is, in a sense, not surprising that subtype C was identified in the District of Columbia, as the Washington metropolitan area has been a popular destination for African immigrants.
Phylogenetic analyses of HIV subtype C sequences identified in the USA revealed several distinct branches, but no strong genetic support to link these variants, suggesting multiple independent subtype C infections occurred. Detailed characterization of these events is not described, therefore it is not clear whether these were sporadic events or continuous transmissions of these individual variants.
There has been, in general, a lack of extensive viral sampling for analysis from this early period; new efforts to analyze large numbers of contemporary sequences made available through commercial genotyping sources are now in progress, 19 which will yield fine structure data on the current spread of HIV subtypes and can be analyzed with these early sequences. Such efforts have critical epidemiologic value to quickly characterize new outbreaks, including point source outbreaks, 20,21 essential for tracking the contemporary spread of HIV in populations to target prevention efforts to key populations in the setting of limited public health resources. 22 Subtype analysis is also essential to characterize strains of HIV circulating in regions in preparation for vaccine efforts. An abundance of circulating viruses may have critical impact on vaccine design; subtype C viruses elicit cross-clade responses, 23,24 which may be useful in design and composition of vaccines where multiple subtypes circulate. 25 –27 Subtypes may also influence response to antiretroviral therapy, and detailed understanding of subtype distribution in the context of prevalence of transmitted drug resistance will be useful in informing therapeutic choices for individuals initiating antiretroviral therapy. 28,29 Continuous molecular surveillance of HIV remains a clear public health imperative.
Footnotes
Acknowledgments
We are grateful to Ed Tramont, Donald S. Burke, Nelson Michael, and Sheila Peel for insightful discussions.
This project has been funded, in part, with federal funds from the National Cancer Institute, National Institutes of Health, under contract no. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
