Abstract
Here we present new sequence data from HIV-1 subtype C-infected long-term survivors (LTS) from Karonga District, Malawi. Gag and env sequence data were produced from nine individuals each of whom has been HIV-1 positive for more than 20 years. We show that the three amino acid deletion in gag p17 previously described from these LTS is not real and was a result of an alignment error. We find that the use of dried blood spots for DNA-based studies is limited after storage for 20 years. We also show some unlikely amino acid changes in env C2-V3 in LTS over time and different patterns of genetic divergence among LTS. Although no clear association between mutations and survival could be shown, amino acid changes that are present in more than one LTS may, in the future, be shown to be important.
The natural history of HIV-1 infection and disease progression has been well established in adults, with 10 years being the median time from initial infection to the development of AIDS in the absence of therapeutic intervention. 1,2 However, this can vary widely within individuals, with rapid progressors developing AIDS symptoms in as little as 6 months, and other individuals, such as long-term survivors (LTS), who progress to AIDS over a much longer period of time. 3 LTS are those individuals characterized as having survived for >10 years, without antiretroviral therapy (ART), but who also show a steady decline in the number of circulating CD4 cells. 4
Some studies have shown viral factors play an important role in the survival of LTS, for example, viruses that contain defects in particular HIV-1 genes, such as env, gag, nef, vpu, vif, rev, and tat. 5 Currently there is very little information for LTS found in sub-Saharan Africa. 6 McCormack et al. 7 described a three amino acid deletion at the end of gag p17 found in 15 LTS from Karonga District, Malawi. In each LTS the deletion was observed in sequences dating from the late 1990s but was not present in any sequences dating from the 1980s. It was also described in two-thirds of the other HIV-1-positive individuals from Karonga District included in that study from the late 1990s and it was suggested that the deletion could be associated with longer survival and onward transmission. 7 In this work we sought to further characterize the viral factors involved in long-term survival in Karonga District, Malawi. We find that the previous observation of a three amino acid deletion 7 was, in fact, erroneous, and we describe our follow-up study of the 38 long-term survivors.
Thirty-eight HIV-positive individuals were seen in Karonga District Malawi in both the late 1980s and late 1990s. 2 Seventeen of them were still alive in 2004. 7 Fourteen of these were sought again in 2010 (three were not sought as they had refused to participate the previous two times) when it was found that three had died, one had left the region, and one refused to provide a sample. Of the nine individuals seen, eight were infected with subtype C and one (LTS30) was infected with the unclassifiable strain that was described for this region. 8 No amplification was possible from dried blood spots (DBS) from this latter individual prior to 2004. Five individuals had begun ART (one in 2005, two in 2006, one in 2008, and one in 2009). Four individuals had not begun ART and are thus HIV positive without treatment for a minimum of 21 years (although one of these has also now been referred) (Table 1).
Summary of Sequence Data, CD4 Counts, and ART Information Available for Those LTS Found to Be Still Present in Karonga District in 2004 and in 2010
Had died by 2010.
Had left the Karonga District by 2010.
Unclassifiable subtype.
ART, antiretroviral therapy; LTS, long-term survivors.
DNA was reextracted from DBS collected between 1986 and 1989 from LTS and the wider population with a view to cloning polymerase chain reaction (PCR) products to explore evidence of the three amino acid deletion in that time period. DBS were also available from 10 of the LTS seen in 2004 and cell pellets from 9 in 2010. Plasma or cell pellets were utilized from 100 individuals randomly chosen from samples collected between 2008 and 2010, from existing studies in the District to explore the frequency of the three amino acid deletion in the 2008–2010 time period. Proviral DNA was extracted from 200 μl of cell pellet or plasma using the QIAamp DNA Blood Mini Kit (Qiagen) blood and from the DBS using the QIAamp DNA Micro Kit (Qiagen). Nested PCR and sequencing of a 750-bp region of gag p17p24 and a 500-bp region of env C2V3 were carried out as previously described. 9 Amplification from the DBS from the 1980s was largely unsuccessful. DBS had been frozen at −20°C for over 20 years and exposed to a number of freeze-thaw events and it is highly probable that this has led to fragmentation of the DNA present. Previous studies found −20°C suitable for long-term storage of DBS (6 years) 10 –13 but our work suggests an upper limit to the length of time such samples can be stored successfully in this way for DNA-based studies. Cloning was carried out using the Topo TA cloning kit (Biosciences). Automatic sequencing in both directions was carried out by Eurofins Genetic Services Ltd or by LGC Genomics. Sequence chromatographs were examined and manually edited in Seqman (DNAStar Inc.).
The three amino acid deletion was not found in any gag sequences produced from LTS in 2004 and 2010 or in over 100 sequences produced from blood samples collected in 2008–2010. Furthermore, none of 50 sequences from 50 clones produced from the 1998 sample from LTS2 (which had previously showed the deletion) contained the deletion. We then reexamined all of the raw data from the sequences used in McCormack et al. 7 and the deletion was not found in any of the sequences. We suggest that an alignment error was made at an early stage of the multiple alignment assembly of the relevant sequences in the original study. This serves as a stark reminder of the dangers of such errors when handling large numbers of sequences. All affected sequences from McCormack et al. 7 that were submitted to GenBank have been reexamined and the correct sequences redeposited.
Multiple alignments of all sequential env (74) and gag (65) sequences from subtype C-infected LTS along with 40 control sequences were assembled and optimized in MacClade 4 (Sinauer Associates). Phylogenetic trees were reconstructed under the GTR+gamma model of DNA substitution implemented by RAxML 7.0.3 14 with all parameters optimized by RAxML. Confidence levels in the groupings in the phylogeny were assessed using 1000 bootstrap replicates as part of the RAxML phylogeny reconstruction. The subtype C ancestral sequence derived in previous work 15 was employed as the out-group for both gag and env trees. Both gene trees showed that for most individuals the sequences from the different time points grouped together (8/10 for env and 10/11 for gag) but only half grouped with significant bootstrap support (Fig. 1a and b). Sequences from LTS21 formed multiple clusters on both gene trees, which is consistent with the pattern seen in McCormack et al. 7

Maximum Likelihood trees generated from (
To further explore this, additional gag and env consensus sequences were produced from DNA reextracted from the DBS collected in 1989 and 1999 for this individual (LTS21) with additional env sequences produced from the 2010 DNA sample. In gag the 1990s sequences were ancestral to the 2010 sequences and the two 1989 sequences grouped distantly from them (Fig. 1b). In env the 2010 sequences showed further variation with two sequences grouping away from the 1999 sequences (Fig. 1a). The average genetic distance between the eight env sequences (across all time points) was 12%, which was higher than the 8.8% genetic distance between all other LTS sequences from all individuals at all the different time points. The genetic distance between two of the sequences collected from 2010 was even higher (17.5%) than between the sequences collected in the 1980s and 1990s (7.8%) from the same individual.
Although sample mislabeling is very unlikely, as the individual's name was written on the filter paper as well as a unique identifier, we cannot exclude the possibility. Superinfection is also possible. The sequences came from a female who was 21 when she was first identified as being HIV-1 positive. She has maintained a very low CD4 count for the past 6 years, 47 cells/mm3 in 2004 and 32 cells/mm3 in 2010. At both visits she was described as being healthy and showing no signs of AIDS and she refused ART on both occasions. Divergent env sequences were also found among sequences from the 2010 sample of LTS1, a female who was 37 when she was first identified as being HIV-1 positive. Her CD4 count was 586 cells/mm3 in 2004 and had fallen slightly to 449 cells/mm3 in 2010 and at that time she had not been referred for ART.
The BLOSUM62 matrix was used to assess the likelihood of amino acid substitutions between sequential sequences of env for each LTS, with indels and shared deletions relative to the other sequences also noted. A graphic representation of these observed amino acid substitutions within each LTS sample from one time period to another shows the large amount of change that was apparent across most of env C2-V3 in all individuals (Fig. 2). There are a number of positions that changed within nearly all of the LTS, e.g., HXB2 positions 268–269 showed substitutions in eight LTS (asterisk on Fig. 2). Some of these changes are less likely mutations, e.g., in six individuals (LTS8, 9, 10, 12, 20, and 22) there was a change from glycine to glutamic acid or vice versa. In one individual (LTS17) there a deletion at position 269 within the 1980s and one of the 1990s env sequences, which became a lysine in a second 1990s and 2004 sequence. Mutations at positions 11, 24, and 25 within the V3 loop have been associated with a change in coreceptor usage with a shift from negatively charged amino acids to positively charge amino acids being suggested to result in a switch from the use of CCR5 to CXCR4 usage. Only LTS17 showed evidence of a change to a positive charge in this region by 2004 but had left the district by 2010 and so we do not have any additional information on this person. A large amount of change was seen just after the V3 loop in the C3 region. Indeed the high degree of genetic divergence seen between multiple env sequences from the same time point in some LTS rendered comparisons of pairwise genetic distances between time points meaningless (even when we excluded the individuals with possible superinfection mentioned above). For example, two consensus sequences from LTS17 from 1999 showed a genetic distance of 4%. The population of viruses within individuals is currently being explored further by sequencing multiple clones.

A graphic representation of observed amino acid substitutions that have occurred in env within survivors over time. Substitutions are color coded by likelihood according to the BLOSUM62 matrix with green being the most likely and red the least (green > yellow > orange > red); blue indicates an insertion in one of the sequences relative to the other, pale gray indicates a shared deletion, and dark gray indicates an ambiguous site, e.g., a stop codon. The sequence collected at the earliest available time point was compared to all available env sequences from subsequent time points. The comparisons are labeled by the LTS number and the years being compared. Those labeled with an a or b refer to multiple sequences generated from the same time point. **Mark positions 268 and 269 in env using HXB2 numbering.
Comparing amino acid substitutions in gag sequences showed higher numbers of substitutions within the gag p17 domain when compared to p24 as might be expected and, using the BLOSUM62 matrix as a reference, most of these substitutions were changes that were more likely to occur (data not shown). Different patterns of genetic divergence in gag, calculated using pairwise distances, were apparent among the LTS over time. Two LTS (LTS5 and LTS8) showed an overall trend of an increase in genetic divergence in gag over time as can be seen in Fig. 3. Both LTS10 and LTS20 showed a general increase in genetic divergence from 1989 to 2004; however, in 2010 the amount of genetic divergence from 1989 decreased for LTS20 and plateaued in LTS10. Three of these individuals had begun ART between 2004 and 2010 and one has since been referred. Two individuals (LTS9 and LTS22) who were alive in 2004 but had died by 2010 showed very different patterns of divergence. LTS9 showed an increase in divergence from 1989 to 1999, which decreased in 2004, while in LTS22 there was a linear increase in divergence from 1989 to 2004 (Fig. 3).

The genetic divergence seen in gag over time in LTS5, 8, 9, 10, and 20. All sequences from one individual from different time points were compared to the sequence generated from the earliest time point available. LTS9 and 22 had died before 2010. LTS5 had began antiretroviral therapy (ART) in 2008, LTS8 in 2005, LTS20 in 2006, and LTS10 had been referred for ART in 2010 as indicated by the vertical lines.
In summary, we present new sequence data from HIV-1 subtype C-infected long-term survivors from Malawi, Africa. This work highlights some of the pitfalls associated with sequence analysis. We show that the three amino acid deletion in gag p17 previously described from these LTS 7 was the result of an alignment error. Extreme caution must be used while making inferences from sequence data as some errors may be very well hidden. However, data on long-term survivors infected with subtype C virus are important to accumulate. We show many amino acid changes in env C2-V3 in LTS over time. Although no clear association between mutations and survival could be shown, amino acid changes that are present in a number of LTS may in the future be shown to be important for survival, but future work in this regard will require data from virus and host.
Sequence Data
All sequences have been deposited into GenBank, accession numbers JN393505–393554, and raw data and alignments employed in this work are available from the authors on request. JN393505–393554
Footnotes
Acknowledgments
This material is based upon works supported by Science Foundation Ireland under Grant 07/RFP/EEEOBF424 and by an IRCSET EMBARK Scholarship to Ishla Seager (RS/2007/203). The Karonga Prevention Study is funded primarily by the Wellcome Trust, with contributions from LEPRA. Permission for the study was received from the National Health Sciences Research Committee, Malawi, and the Ethics Committee of the London School of Hygiene and Tropical Medicine, United Kingdom.
Author Disclosure Statement
No competing financial interests exist.
