Abstract
A 4-year-old child born to an HIV-1 seronegative mother was diagnosed with HIV-1, the main risk factor being transmission from the child's father who was seroconverting at the time of the child's birth. In the context of a forensic investigation, we aimed to identify the source of infection of the child and date of the transmission event. Samples were collected from the father and child at two time points about 4 years after the child's birth. Partial segments of three HIV-1 genes (gag, pol, and env) were sequenced and maximum likelihood (ML) and Bayesian methods were used to determine direction and estimate date of transmission. Neutralizing antibodies were determined using a single cycle assay. Bayesian trees displayed a paraphyletic–monophyletic topology in all three genomic regions, with the father's host label at the root, which is consistent with father-to-son transmission. ML trees found similar topologies in gag and pol and a monophyletic–monophyletic topology in env. Analysis of the time of the most recent common ancestor of each HIV-1 gene population indicated that the child was infected shortly after the father. Consistent with the infection history, both father and son developed broad and potent HIV-specific neutralizing antibody responses. In conclusion, the direction of transmission implicated the father as the source of transmission. Transmission occurred during the seroconversion period when the father was unaware of the infection and was likely accidental. This case shows how genetic, phylogenetic, and serological data can contribute for the forensic investigation of HIV transmission.
Introduction
W
Methods
Subjects
Subject CC1 (child) was born on 8th April 2009 to an HIV seronegative mother and was diagnosed with HIV-1 infection when he was 4 years old, in January 2013 (Supplementary Fig. S1; Supplementary Data are available online at
Blood samples were collected from both father and son at two different time points, 20th March 2009 and 12th December 2009.
Amplification, cloning, and sequencing
Genomic DNA was extracted from peripheral blood mononuclear cells using the Wizard Genomic DNA Purification Kit (Promega) as per the manufacturer's instructions. PCR amplification of the p17 (gag), integrase (pol), and C2 V3 (env) regions of HIV-1, cloning, and sequencing were performed as previously described. 7 At least 10 clones per each genomic region were sequenced for each subject. Sequences have been assigned GenBank accession numbers MG273182–MG273261.
Phylogenetic analyses
Multiple sequence alignment of the derived HIV-1 sequences was performed using MAFFT v7 under the L-INS-i algorithm
8
with manual editing of aligned sequences performed in MEGA 7.0.21.
9
Sequence subtyping was performed using REGA HIV subtyping tool version 3.
10
HIV subtyping indicated the query sequences to be HIV-1 subtype G. Thus, for the database controls (DBC), we retrieved HIV-1 subtype G sequences from the Los Alamos National Laboratory HIV database (LANL HIV DB) using the HIV BLAST tool and the geography (Portugal) search interface (
The sequence pair-wise diversity in each gene fragment from CC1 and CC2 were independently determined for each sampling time (20 March 2013 and 12 December 2013) using a Kimura-2 parameter model as implemented in MEGA 7.0.21. 9
Maximum likelihood (ML) phylogenetic reconstruction was performed using PhyML 3.1
11
as implemented in Seaview 4.5.4
12
with a bio-neighbor-joining (BioNJ) starting tree, and tree optimization parameters: nearest neighbor interchange and subtree pruning and regrafting heuristic search. Branch supports for the ML trees were inferred based on the approximate likelihood ratio test (aLRT).
13
Initial Bayesian inference using Markov-chain Monte Carlo (MCMC) sampling as implemented in MrBayes 3.2.6
14
was used to compare trees with the ML results. Two independent runs of four coupled chains per run were performed for 5 × 10
6
generations with trees sampled every 1,000 generations to produce 5,000 posterior tree samples. The burn-in was set at 10% of the initial posterior tree samples, and convergence of chains assumed for ESS values >200 for all the posterior parameters as viewed in Tracer 1.6 (
For the determination of time to the most recent common ancestor (tMRCA) and evolutionary rates of the viral sequences, a Bayesian MCMC approach was performed using BEAST 1.8.4.
16
Prior specifications were set in BEAUti 1.8.4.
16
Analysis of the tMRCA was performed using strict and uncorrelated lognormal relaxed molecular clock models, and logistic growth and skygrid dynamic population size as tree priors. Because the skygrid model failed to describe the dataset, only the logistic tree prior was carried on throughout the analysis with the logistic growth rate fixed at 0.01. Normal distributed priors were specified for the root height with a mean of 4 years for the son and 5 years for the father based on the epidemiological data (Supplementary Fig. S1), with a standard deviation of 2 years to allow for uncertainty and variance in these estimates. Because the mother was HIV negative and could not have infected the son, the prior was truncated at 4.68 years corresponding to the time of his birth after which he could have been infected. Three independent MCMC chains with random seed numbers were run sufficiently long for each dataset to ensure convergence with ESS >200 for all parameters as viewed in Tracer 1.6 (
Interpretation of phylogenetic topology
In a recent work, we showed that the shape (topology) of a phylogeny computed from independent samples from two infected persons is related to how those two persons are related to one another in a transmission network. 18 We determine the tree topology and root label by propagating host tip labels (CC1 and CC2) toward the root until the parent of the current node has a different label. Thus, we find the smallest set of internal nodes in the phylogeny such that all the tips have the same host label. We refer to this part of the tree as a clade. The tree topology in our method is determined by the number of clades for each label and the label at the root. If each host only has one clade, we call this topology monophyletic/monophyletic (MM); if one host has more than one clade and the other only has one—that is, all of the tips of one host are clustered with one another and embedded in the broader tree of the other host—we call this topology paraphyletic/monophyletic (PM); if both hosts have multiple clades, we call this topology polyphyletic/paraphyletic (PP). We have shown that the PM and PP topologies often arise in direct transmission cases where a direct transmission event forms one (PM) or multiple (PP) clades that are then sampled in the recipient.
Selective pressure analysis
The HYPHY package hosted on the Datamonkey open server 19 was used for analysis of selective pressure. The single-likelihood ancestor counting (SLAC), mixed-effects model of evolution (MEME), fixed-effects likelihood (FEL), and relaxed-effects likelihood (REL) methods were used with statistical significance at p < .1 (SLAC, MEME, and FEL) and a Bayes factor (BF) cutoff value of 50 (REL).
Antibody neutralization
Plasma antibody neutralization was assessed in TZM-bl cells with seven env-pseudoviruses from a panel of global HIV-1 reference isolates as described. 20
Results
Transmission linkage and direction
ML and Bayesian reconstruction inferred similar phylogenetic trees for all datasets with the sequences from CC1 and CC2 forming a strongly supported transmission cluster within subtype G (Fig. 1 and Supplementary Fig. S2). For the env dataset, CC1 and CC2 taxa displayed a monophyletic–monophyletic (MM) topological relationship in the ML tree (Fig. 1C). However, the Bayesian posterior probability (pp) for tree topology indicated a paraphyletic–monophyletic (PM) tree topology at pp = 0.84 over a MM topology, with CC2 inferred at the root (pp = 0.82). In addition, based on the env dataset, there was a strong support for the transmission of a single lineage of virus between CC2 and CC1 (pp = 0.97). For the gag fragment there was an even stronger evidence for a PM topology (pp = 0.98) with CC2 again inferred at the root (pp = 0.98), which was consistent with the inferred ML phylogenetic tree (Fig. 1A). A transmission of a single lineage of HIV-1 virus between CC2 and CC1 was strongly supported for gag (pp = 1.00). For the pol dataset, again, there was strong evidence in the ML tree for a PM topology with CC2 paraphyletic with respect to CC1 (aLRT = 0.97) (Fig. 1B). Altogether, the Bayesian and ML tree analyses support the hypothesis that the father infected his son on a single occasion.

ML tree reconstructions for gag
Evaluation of time of transmission
Assuming that CC2 like CC1 was also infected by a single virus lineage, we can estimate a most recent bound for when each individual was infected by the tMRCA of the individual HIV-1 populations. The estimated tMRCA consistently showed that CC1 most probably became HIV-1 infected at a later date than CC2 for all analyzed genomic fragments (mean tMRCA gag CC1 = March 2011, CC2 = November 2008; pol CC1 = April 2010, CC2 = November 2008, env CC1 = August 2010, CC2 = November 2007) (Fig. 2). Interestingly, CC2's tMRCA agrees well with symptoms of acute HIV-1 infection in early 2009.

Violin plots of the posterior distribution of estimated dates of the most recent common ancestor (tMRCA) for the gag, pol, and env sequences of CC1 and CC2. Standard boxplots are inserted in each violin plot. The dashed line indicates the birth of CC1, where the normal distributed prior of CC1 was truncated. The posterior probability (pp) in each genomic comparison shows the probability that CC2 was infected after CC1. The tMRCA is given as calendar year. tMRCA, time to the most recent common ancestor.
Evolutionary trends and immune pressure assessments
Using an uncorrelated lognormal relaxed clock, the within-host median evolutionary rates were higher in CC1 than CC2 (env: 21 and 5.0; gag: 24 and 4.1; pol: 2.9 and 2.4 × 10−3 median substitutions/site per year, respectively). Evaluating the full posterior distributions of these estimates showed that they differed significantly in env and gag, but not in pol (pp = 0.013; pp = 0.012; pp = 0.65, respectively). CC1 had many more codons affected by potential selective forces in gag and pol relative to CC2; most codons were negatively selected in CC1 and positively selected in CC2 (Supplementary Table S1). In contrast, similar levels of positive selection were detected in env. Consistent with this antibody selection footprint, plasma samples from CC1 and CC2 had similar high titers of neutralizing antibodies against HIV-1 isolates (Supplementary Fig. S3).
Discussion
We applied genetic, phylogenetic, and serological analyses to the investigation of an unusual father-to-son HIV-1 transmission case. The initial suspicion of sexual abuse prompted a criminal investigation, which eventually led to no criminal charges. Because of the lack of anal and genital lesions in the child, HIV-1 transmission by rape was excluded. It has previously been shown that the fluid from skin blisters and similar vesicular body fluids can have high load of infectious virions. 21 Moreover, exposure to bleeding skin lesions has been proposed as the most probable transmission route in a similar case of father-to-child HIV-1 transmission. 22 Therefore, we hypothesized that infection of the child might have occurred during the first days of life by accidental contact with the infectious fluid exuding from the father's skin blisters.
The Bayesian phylogenetic analysis showed a PM topology with the father inferred at the root in all three genomic regions. This topology has been strongly associated with direction of transmission, where the root host label typically indicates the donor. 7 Hence, our phylogenetic results are consistent with a father-to-son transmission implicating the father as the child's source of infection. The estimated dates of the tMRCA of the individual HIV-1 populations agreed with the acute HIV-1 infection of the father in early 2009 and infection of the child shortly after birth. However, we were unable to amplify HIV-1 DNA from biopsy samples collected from one skin blister of the father. This negative result may be due to severe nucleic degradation in the paraffin-embedded tissue samples or to the absence of HIV-infected cells in the particular blister we analyzed. Hence, at this time, it is impossible to provide a definitive explanation for the transmission route in this case.
Interestingly, we found that the env ML tree showed a MM topology, while gag and pol ML trees as well as all Bayesian analyses showed PM trees. This iterates the importance of evaluating more than one genomic region, as it is unlikely to have identical sampling artefacts across several regions. In addition, it shows that evaluating a single tree is not enough, even if it is the ML estimate, because many nearly as good trees that also plausibly explain the observed sequence data may differ in their topology. It is thus better to analyze the overall phylogenetic patterns in a full posterior tree sample. However, it should be noted that limited sampling can render PM trees MM and that the paraphyletic signal deteriorates over time especially in rapidly evolving regions such as C2 V3 in env. 18 This may explain the MM topology in env in our case since the time lag between the putative transmission event and sampling was about 4 years.
Median evolutionary rates in env C2 V3, a major antibody neutralizing domain, and gag, a target for cytotoxic T lymphocytes, were significantly higher in the child than in the father. This is consistent with the virus adaptation to the developing immune control in the child and with viral suppression due to effective ART in the father. 23 Early initiation of ART in the father may also explain the differences in selective pressure seen in pol in both patients. Remarkably, father and child exhibited a similarly potent and broad neutralizing antibody response providing additional support for a similar infection time, which should be around the birth of the child. 24
In conclusion, PM phylogeny, root host label, timing analysis, selection analysis, and neutralizing antibody profiling, all supported the father-to-son HIV-1 transmission shortly after the birth of the child when the father was seroconverting and was unaware of his HIV status. Consequently, the case was not taken to court.
Footnotes
Acknowledgments
Research reported in this publication was supported by the NIAID/NIH under award number R01AI087520, by European Funds through grant ‘Bio-Molecular and Epidemiological Surveillance of HIV Transmitted Drug Resistance, Hepatitis Coinfections and Ongoing Transmission Patterns in Europe (BEST HOPE) (project funded through HIVERA: Harmonizing Integrating Vitalizing European Research on HIV/Aids, grant 249697) and by grants PTDC/SAU-EPI/122400/2010, VIH/SAU/0029/2011, and PTDC/DTP-EPI/7066/2014 from Fundação para a Ciência e Tecnologia (FCT), Portugal. Global Health and Tropical Medicine Center was funded through FCT (UID/Multi/04413/2013). Inês Bártolo was supported by a postdoc fellowship (SFRH/BPD/76225/2011) from FCT, Portugal. A.A. was supported by Fundação para a Ciência e Tecnologia (FCT) through Investigador FCT Program.
Author Disclosure Statement
There are no conflicts of interest to report.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
