Potential Control of Human Immunodeficiency Virus Type 1 asp Expression by Alternative Splicing in the Upstream Untranslated Region

Abstract

The negative-sense asp open reading frame (ORF) positioned opposite to the human immunodeficiency virus type 1 (HIV-1) env gene encodes the 189 amino acid, membrane-associated ASP protein. Negative-sense transcription, regulated by long terminal repeat sequences, has been observed early in HIV-1 infection in vitro. All subtypes of HIV-1 were scanned to detect the negative-sense asp ORF and to identify potential regulatory sequences. A series of highly conserved upstream short open reading frames (sORFs) was identified. This potential control region from HIV-1_NL4-3, containing six sORFs, was cloned upstream of the reporter gene EGFP. Expression by transfection of HEK293 cells indicated that the introduction of this sORF region inhibits EGFP reporter expression; analysis of transcripts revealed no significant changes in levels of EGFP mRNA. Reverse transcriptase–polymerase chain reaction analysis (RT-PCR) further demonstrated that the upstream sORF region undergoes alternative splicing in vitro. The most abundant product is spliced to remove sORFs I to V, leaving only the in-frame sORF VI upstream of asp. Sequence analysis revealed the presence of typical splice donor- and acceptor-site motifs. Mutation of the highly conserved splice donor and acceptor sites modulates, but does not fully relieve, inhibition of EGFP production. The strong conservation of asp and its sORFs across all HIV-1 subtypes suggests that the asp gene product may have a role in the pathogenesis of HIV-1. Alternative splicing of the upstream sORF region provides a potential mechanism for controlling expression of the asp gene.

Introduction

U ntil recently, it was believed that retroviruses relied on positive-sense transcription, with variable splicing of transcripts providing the range of viral proteins required for survival and replication. However, an emerging body of data suggests that negative-sense transcription is also important in retroviruses.

Negative-strand open reading frames (ORFs) opposite to the env gene have been described in human immunodeficiency virus type 1 (HIV-1) (Miller, 1988; Vanhee-Brossollet et al., 1996), human T-cell leukaemia virus type 1 (HTLV-1) (Larocca et al., 1989), HTLV-2 (Halin et al., 2009), and feline immunodeficiency virus (Briquet et al., 2001). The HTLV-I HBZ protein modulates viral transcription (Gaudray et al., 2002) and enhances viral persistence (Arnold et al., 2006). Two alternatively spliced variants of the HTLV-I HBZ RNA, which encode different HBZ isoforms, have been characterized (Cavanagh et al., 2006); both isoforms downregulate Tax-mediated viral transcription (Lamasson et al., 2007; Yoshida et al., 2008). The antisense protein of HTLV-2, APH-2, also suppresses Tax-mediated viral transcription (Halin et al., 2009).

A negative-sense ORF, asp, positioned opposite to the gp120/gp41 junction of the env gene in HIV-1 produces both RNA and protein products (Vanhee-Brossollet et al., 1996) (see Fig. 1). The asp ORF is highly conserved among all strains and subtypes of HIV-1 but absent from HIV-2 sequences (Miller, 1988; Bukrinsky and Etkin, 1990; Briquet and Vaquero, 2002) and encodes a highly hydrophobic protein with two transmembrane helices. In vitro studies indicate that the ASP protein is recognized by antibodies present in the sera of HIV+ individuals (Vanhee-Brossollet et al., 1996); however, the HIV-1 ASP protein has not yet been detected in vivo.

FIG. 1.

Location and organization of asp and its associated sORFs. (A) The HIV-1 genome with the negative-sense gene asp and (B) its associated sORFs indicated in the negative-sense orientation with nucleotide position numbers indicated below from NL4-3. Locations of primers used to amplify the sORF region are indicated. (C) Four alternative transcripts were detected for the sORF region: unspliced, spliced variant 1 (containing sORF VI only), spliced variant 2 (containing sORFs I, II, and VI), and spliced variant 3 (containing part of sORF VI). Splice donor and acceptor motifs shown with 100% matches to consensus underlined. sORF, short open reading frames; HIV-1, human immunodeficiency virus type 1.

Endogenously expressed asp RNA has been shown to inhibit replication of HIV-1, but not HIV-2 (Tagieva and Vaquero, 1997). Recent in vitro studies confirm localization of ASP to the plasma membrane (Clerc et al., 2011), while earlier studies suggested that the ASP protein is also present in viral particles released from HIV-1-infected cells (Briquet and Vaquero, 2002). These data suggest that asp could play a pivotal role in the life cycle of the HIV-1 virus; however, a functional role for the ASP protein is yet to be established. The presence of a repeated (PxxP) amino acid sequence motif, which is associated with signaling kinase–SH3 domain interaction in HIV-1 Nef (Greenway et al., 2003) and Hepatitis E ORF-3 protein (Korkaya et al., 2001) may clue to the function of ASP.

Early studies conducted by Bukrinsky and Etkin (1990) detected three polyadenylated negative-sense RNA transcripts (1.6, 1.1, and 1.0 kb) in acutely infected H9 cells; these transcripts were present early in infection (day 3) but were not detected later (on days 5 or 7). Michael et al. (1994) confirmed the presence of a negative-sense transcript in tissue cultures and in peripheral blood mononuclear cells (PMBCs) isolated from infected patients; sequence analysis predicted a full-length transcript of 2.3 kb. More recently, Landry et al. (2007) provided further evidence for negative-sense transcription in HIV-1, identifying an alternative poly-A signal that would produce a 4.1-kb transcript; however, this longer negative-sense transcript was not detected by northern blot analysis.

The mechanism of regulation of asp expression remains unclear to date. In vitro studies have confirmed that transcription of the negative-sense ORF occurs early in infection, controlled by long terminal repeat (LTR) sequences (Peeters et al., 1996; Pereira et al., 2000; Bentley et al., 2004). The role of the transcriptional activator, Tat, in negative-sense transcription is yet to be fully elucidated. Both Michael et al. (1994) and Bentley et al. (2004) suggested that Tat simultaneously upregulates positive-sense transcription and downregulates negative-sense transcription, so it may switch off expression of negative-sense genes soon after infection. However, more recent studies by Landry et al. (2007) suggest that Tat acts to upregulate negative-sense transcription.

In this study, we report the strong conservation of a series of short open reading frames (sORFs) upstream of the asp gene and demonstrate the ability of this sORF region to regulate Asp expression in a reporter gene system. We detected alternative splicing of this sORF region, associated with varying levels of inhibition of the downstream reporter gene. Together, these data suggest that this upstream sORF region could play a role in regulating expression of HIV-1 asp.

Materials and Methods

Bioinformatic analyses

All sequences were scanned for large ORFs, opposite env, using DNAMAN^® (Lynnon BioSoft). Sequences included 31 HIV-1 complete sequences spanning subtypes A (SE6594, UGO37), B (NL4-3, OYI, MN, ACH1, BRU, CAM1), C (92BR025, ETH2220), D (94UG114, ELI, NDK), E (92NG083, 93TH253), F1 (FIN9363, VI850), F2 (CM53657, MP257), G (DRCBL, HH8793, X558), H (V1991, V1997), J (SE9173, SE9280), K (MP535), N (DJO 0131, YBF30, YBF100), and U (83CD00323). Upstream DNA sequences equivalent to nts 8742 to 7932 of HIV-1 (NL4-3) were collected and subjected to multiple sequence alignment (Vector NTI Advance™; Invitrogen). The consensus sequence was assessed for the location and number of sORFs, sequence similarity to the consensus, and strength of initiation codon using DNAMAN.

A total of 27 HIV-1 subtype B sequences were collected to examine the conservation of splice acceptor and donor motifs. Sequences included MBC925, 1027-03, TWCYS, 01UYTRA1179, WC1PR, MBCC98, WC10P-6, WC10C-11, PCM034, S61G7, CANA1, CANB6, 85US Ba-L, HIVMCK1, HIV2132, BH10, NL4-3, 1058-11, 5157-86, US4, PCM013, 02HNsc11, 05AR163052, BZ167, 1001-09, PCM001, and HIVOYI.

Cell culture

HEK293 cells were maintained in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum (Invitrogen), 2 mM L-glutamine, 100 U/mL penicillin G, and 100 μg/mL streptomycin at 5% CO₂ and 37°C. For transfection, cells were seeded at a density of 1.6×10⁵ cells/mL in 2.0 cm² culture dishes.

Reporter gene constructs

The pEGFP (sORF I-VI wt) construct was derived from pNL4-3 DNA by subcloning the sORF segment (position 8798 to 7980 comprising the start codon of sORF I to the stop codon of sORF VI, respectively) into the pGEM^®-T Easy vector system (Promega) with the addition of an AgeI site by the polymerase chain reaction (PCR) primers—forward: 5′-CTACCGGTCTCCAGGCAAGAATCC-3′ and reverse: 5′-CTACCGGTCTTGCCACCCATCTTATAGC-3′ (AgeI site shown in bold). The sORF segment was cloned into the AgeI site of the pEGFP-N1 vector (Clontech) such that the stop codon of sORF VI sits 63 bp upstream of the ATG codon of the reporter EGFP.

The parental plasmid, pEGFP-N1, was used as positive control in all transfection experiments. The negative control plasmid, pE-N1, was kindly provided by Dr. D. Purcell (Department of Microbiology and Immunology, The University of Melbourne, Australia) and is derived from pEGFP-N1 by excision of the reporter EGFP gene with AgeI and NotI, end filling by Klenow and recircularization.

Site-directed mutagenesis was used to knock out the ATG codon of each sORF sequentially, starting at sORF VI. Mutagenesis was carried out using the QuikChange^® II Site-Directed Mutagenesis Kit (Stratagene) according to the manufacturer's instructions. The following primers (and their complementary primers) were used to mutate the ATG codons of each sORF with mutated sequence in bold—sORF I: 5′-GCTTATAGAGCTATTCGCCACCTACCTAGAAGAA TAAGACAGG-3′; sORF II: 5′-GCTCAATGCCACAGCCCTAGCAGTAGCTGAGGGG-3′; sORF III: 5′-GGAGAGAGA GACAGGAGAACAGATCCCTTCGATTAGTGAACGG-3′; sORF IV: 5′-GGCAGGGATATTCACCCTTATCGTTTCAG ACCC-3′; sORF V: 5′-GGCTGTGGTATATAAAATTATTC CTAATGATAGTAGGAGGC-3′; sORF VI: 5′-GGAACAG ATTTGGAATAACCTGACCTGGATGGAGTGG-3′. The SA 1 and SD 1 sites were mutated using the following primers along with their complementary primers (mutated sequence shown in bold)—SD 1: 5′-GAGCTATTCGCCACATATCTA GAAGAATAAGACAGG-3′; SA 1: 5′-GAAATAACATGAC TTGGATGGAGTGGGACAGG-3′.

Spliced variants 1, 2, and 3 were cloned individually into the parental plasmid, pEGFP-N1, using the In-Fusion™ Advantage PCR Cloning Kit (Clontech) according to the manufacturer's instructions. The following sets of primers were used to amplify and linearize the parental plasmid, pEGFP-N1, for variants 1 and 2: 5′-GATCCACCGGTCGCC-3′ and 5′-AATTCGAAGCTTGAGCTCG-3′, and for variant 3: 5′-GCTCGAGATGTGAGTCCGGTAGC-3′ and 5′-CCATGG TGAGCAAGGGCG-3′. The following primers were used to amplify the sequence unique to each spliced variant: for spliced variant 1: 5′-CTCAAGCTTCGAATTCCACCCATC TTATAGCAAAATCC-3′ and 5′-TTGGAATAACATGACA CCTAGAAGAATAAGACAGGCTTGG-3′, spliced variant 2: 5′-CTCAAGCTTCGAATTCCACCCATCTTATAGCAAA ATCC-3′ and 5′-TTGGAATAACATGACCTCTTGATTGT AACGAGGATTGTG-3′, and spliced variant 3: 5′-GCTACC GGACTCAGATCTCGAGC-3′ and 5′-CGCCCTTGCTCAC CATGG-3′. Individual reactions for variants 1 and 2 were in-fused with the common sequence, which was amplified with the following primers for splice variant 1: 5′-CTTATTCTTCT AGGTGTCATGTTATTCCAAATCTGTTCCAG-3′ and 5′-GG CGACCGGTGGATCGGATCAACAGCTCCTGGGG-3′ and for splice variant 2: 5′-CGTTACAATCAAGAGGTCATG TTATTCCAAATCTGTTCCAG-3′ and 5′-GGCGACCGGTG GATCGGATCAACAGCTCCTGGGG-3′. All plasmid constructs were sequenced to check integrity.

Transient transfections and reporter gene assays

HEK293 cells were transfected with 1 μg DNA using the calcium phosphate ProFection^® transfection system (Promega) according to the manufacturer's instructions. In all transfection experiments, pEGFP-N1 and pE-N1 were used as positive and negative controls, respectively. Transfected cells were harvested 48 hours after transfection by lysis with 1× Reporter lysis buffer (Promega) and EGFP fluorescence read with the FLUOStar OPTIMA microplate reader (BMG Labtech) at excitation and emission wavelengths of 485 and 520 nm, respectively. For normalization, each sample was assayed for total protein using the Bradford method as described by the manufacturer (Sigma), and the absorbance was measured using the FluoStar FLUOStar OPTIMA microplate reader (BMG Labtech). EGFP was normalized according to total cell protein (EGFP fluorescence/mg of protein); each result represents the calculated mean±SE of six transfected samples.

Transcript analysis

Transfected HEK293 cells were harvested for total RNA at 48 hours after transfection using the SV Total RNA isolation system (Promega) according to the manufacturer's instructions. RNA samples were treated with DNAse I prior to cDNA synthesis with RQ1 RNase-free DNase (Promega) according to the manufacturer's instructions. cDNA was synthesized from total RNA (5 μg) using the Transcriptor first-strand cDNA synthesis kit (Roche) according to the manufacturer's instructions, primed by oligo (dT). Samples of the cDNA were then PCR-amplified using primers (forward: 5′-GCTACCGGACTCAGATCTCGAGC-3′ and reverse: 5′-CCTGGCTGTGGAAAGATACC-3′, except for spliced variants 1, 2, and 3, where the reverse primer was replaced with 5′-CGCCCTTGCTCACCATGG-3′). Reactions were established in the presence of 1.25 U Taq DNA polymerase (New England Biolabs), 1× ThermoPol buffer, 20 μM dNTP, and 20 μM of each primer. PCR conditions were as follows: initial denaturation of 94°C for 1 min followed by 30 cycles of denaturation (94°C for 30 s), annealing (62°C for 30 s), and extension (72°C for 1 min) with a final extension of 72°C for 5 min. RT-PCR amplification reactions were controlled for DNA contamination (RNA sample with no RT), auto-priming (cDNA synthesis in the presence of RT with no primer), and general transcript integrity with glyceraldehyde 3-phophate dehydrogenase (GAPDH)-specific primers (5′-TGCACCACCAACTGCTTAGC-3′ and 5′-GGCATGGACT GTGGTCATGAG-3′). Amplified products were cloned in pGEM-T Easy (Promega) prior to sequencing.

Northern analysis was conducted using 1 (g total RNA, blotted onto a nylon membrane, probed with DIG-labeled EGFP and GAPDH cDNA, and then detected by a luminescence assay according to the manufacturer's instructions (Roche). Message abundance was measured using the densitometry tool of the Dolphin-DOC image system (Wealtec), and the total EGFP message was normalized against total GAPDH message. EGFP message abundance is presented as transcript relative to GAPDH message and represents the calculated mean±SE of three samples.

Quantitative real-time PCR analysis

TaqMan^® RT-PCR assays (Applied Biosystems) were carried out to determine the relative amounts of each spliced variant within the single mRNA pool. TaqMan probes were designed to span exon junctions, thus targeting each species. Primers and TaqMan probes for unspliced variant are forward: 5′-TCTCTCCACCTTCTTCTTCTATTCCTT-3′, reverse: 5′-CAGACCCACCTCCCAATCC-3′, and probe: 5′-CTGTCG GGTCCCCTC-3′; for spliced variant 1 are forward: 5′-GCG TCCCAGAAGTTCCACAAT-3′, reverse: 5′-GCCTTGGAAT GCTAGTTGGAGTAAT-3′, and probe: 5′-CAAGAGTCATG TTATTCC-3′; for spliced variant 2 are forward: 5′-CCCT GTCTTATTCTTCTAGGTCATGTT-3′, reverse: 5′-TGCCTTG GAATGCTAGTTGGA-3′, and probe: 5′-TCTGTTCCAGAG ATTTATTA-3′; and for spliced variant 3 are forward: 5′-AAATCCTTTCCAAGCCCTGTCT-3′, reverse: 5′-CCAC TGCTGTGCCTTGGAAT-3′, and probe: 5′-TCTAGAGAT TTATTACTCCAACTAG-3′. Individual reactions were established according to the TaqMan Fast Universal PCR Master Mix, without an AmpErase UNG (Applied Biosystems), along with the appropriate primer/probe set and cDNA template (as described previously) in a total volume of 20 μL. Isolated spliced variants, cloned into pGEM-T Easy (as described previously), were used as standards and used to check primer/probe specificity along with a –RT sample to ensure no DNA contamination. The concentrations of starting material were measured using the Quant-iT™ dsDNA HS assay with the Qubit™ fluorometer (Life Technologies) and copy number was calculated. The thermal cycling protocol consisted of an initial melt at 95°C for 20 s followed by 40 cycles of 95°C for 3 s and 60°C for 30 s using the ABI 7500 Fast Real-Time PCR System (Applied Biosystems). The Ct values were calculated using software SDS2.4 to generate calibration curves and calculate copy numbers of target transcripts.

Results

Asp and its sORFs are highly conserved

Multiple sequence alignment of the region upstream of the asp ORF (nts 8742 to 7932 in HIV-1 NL4-3) in 31 HIV-1 sequences, representing all subtypes, revealed a series of well-conserved sORFs that displayed nucleotide sequence conservation >80% with reference to the consensus sequence (Fig. 1B). The major infecting HIV-1 subtypes (B, C, and D) displayed 5–7 sORFs (typically 6 sORFs), while fewer sORFs (3–5) were observed in the rarer subtypes (H, K, and N).

The sequences of sORFs I, V, and VI were highly conserved among all subtypes, with strong conservation of the initiation codon (refer to supplementary data). Nucleotide changes in the sORF VI ATG initiation codon were observed in only 6 sequences of the 31 sequences examined: B (CAM1), G (X558), N (YBF30, YBF100, DJO 0131), and U (83CD00323). Three of these (CAM1, 83CD00323, and X558) displayed an alternate initiation codon further downstream, so they possessed a truncated sORF VI. The sequences of sORFs II, III, IV, and V were less well conserved (between 63% and 96%).

The translation initiation context of each sORF was assessed as weak, adequate, or strong by comparison to the Kozak consensus sequence (Kozak, 1986). The sequence context of sORFs I (26/31 sequences), V (21/31), and VI (25/31) are typically adequate, consistent with other sORF regulating systems, such as that of the yeast GCN4 (Hinnebusch, 1997). Interestingly, sORFs II and IV display a weak initiation codon context in most HIV subtypes, but a strong initiation context is observed in B and C clade strains (Table 1).

Table 1.

Nucleotide Sequence Conservation of Short Open Reading Frames and Initiation Codon Strength

		sORF
Subtype	Strain	I	II	III	IV	V	VI
A	SE6594	90	90	87	95	86	89
A	UG037	90	88	88	90	85	92
B	NL43	90	88	91	90	88	87
B	OYI	81	83	90	95	89	91
B	MN	95	88	86	95	87	89
B	ACH1	95	83	88	95	87	90
B	BRU	95	90	91	90	88	87
B	CAM1	100	88	86	95	90	89
C	92BR025	76	79	94	95	85	91
C	ETH2220	67	90	94	95	84	92
D	94UG114	81	90	90	95	83	88
D	ELI	81	83	90	95	87	88
D	NDK	81	88	90	95	88	86
E	92NG083	86	90	81	95	88	95
E	93TH253	89	73	77	95	82	83
F	FIN9363	86	96	90	95	84	94
F	VI850	86	92	94	95	88	94
F	CM53657	95	88	87	95	80	57
F	MP257	86	88	92	95	81	53
G	DRCBL	86	90	89	95	83	94
G	HH8793	86	90	86	95	85	93
G	X558	76	85	86	90	87	93
H	V1991	90	94	-	-	-	-
H	V1997	90	94	92	95	84	91
J	SE9173	90	98	86	95	85	93
J	SE92809	90	98	87	95	85	93
K	MP535	81	96	90	95	88	91
N	DJO 0131	57	75	75	81	63	67
N	YBF30	57	81	77	81	65	66
N	YBF100	57	77	83	81	63	66
U	83CD003Z3	90	98	85	87	84	95

The percentage homology of the sORF sequence to the consensus sequence (generated from all 27 sequences) is indicated by the numerical value in each box. The presence of an initiation codon is indicated by numerical values in bold type. Kozak initiation strength of each sORF is indicated by shading; dark, medium, and unshaded boxes represent strong, adequate, and weak initiation codons, respectively.

Note that the strength of sORF initiation codons is typically adequate, whereas sORFS II and IV are typically strong in subtype B. sORFS, short open reading frames.

Asp sORFs inhibit expression of a downstream reporter gene

The region nt 8798 to 7980 from pNL4-3, containing six sORFs (sORF I–VI wt), was cloned upstream of the reporter gene EGFP in the plasmid pEGFP-N1 and transfected into HEK293 cells. EGFP expression in cells transfected with the pEGFP-(sORF I-VI wt) construct was compared to cells transfected with the base plasmid, pEGFP-N1, and pE-N1 (lacking the EGFP gene).

The presence of the upstream sORF region reduced EGFP expression by at least 95% (p≤0.01, n=6) (Fig. 2A) similar to levels of inhibition observed in CEM and HEK293T cell lines (unpublished data: N. Vardarli and S. Yap). Northern blot indicated no significant difference in abundance of the reporter EGFP transcript (normalized to GAPDH transcript) in pEGFP-N1 and pEGFP-(sORFs I-IV wt) transfected cells (p=0.345, n=3) (Fig. 2B).

FIG. 2.

EGFP expression is affected by upstream sORFs. The presence of the upstream sORF region (construct I–VI wt) reduced EGFP expression by 95% in comparison to the parental plasmid. Mutation of all sORFs (construct NIL) results in a 45% increase in EGFP expression compared to the I-VI wt (p<0.01). There is no effect on the abundance of transcript. (A) EGFP assay of HEK293 cells transfected with constructs containing sORFs I–VI upstream of the reporter EGFP and the initiation codons of all sORFs knocked out (construct NIL). Plasmids pEGFP-N1 and pE-N1 were used as positive and negative controls, respectively. All data represent the mean±SE of six individual experiments. (B) Transcript abundance, consistent across all constructs. All data represent the mean±SE of three individual experiments.

The initiator AUG codon of each sORF was mutated, producing a construct without active sORFs (construct NIL). In the absence of active sORFs, a 45% increase in expression compared to the wild-type sORF construct (p<0.01, n=6) (Fig. 2A) was observed; however, transcript abundance was unaffected (Fig. 2B). This suggests that events occurring at the level of translation, rather than transcription, are associated with modulation of gene expression by the asp sORF region.

Alternative splicing of the sORF region and conservation of splice sites

Primers situated immediately up- and downstream of the sORF region (see Fig. 1B) were used to detect the sORF transcript in transfected HEK293 cells. The lack of nonspecific amplification was confirmed by the absence of product from cells transfected with the pEGFP-N1 and pE-N1 plasmids. An amplification product corresponding to the full-length construct transcript (890 bp) was amplified from the cDNA pool (Fig. 3E, fourth lane); however, it was not the most abundant product. At least, two shorter amplimers were also detected in the same cDNA pool: an abundant product (240 bp) and a less abundant product (480 bp). All amplified products were recovered, cloned, and sequenced.

FIG. 3.

Analysis of spliced variants. Spliced variant 1 increased EGFP expression by 22% (p=0.021) in comparison to the wild type, while spliced variant 2 decreased EGFP expression by 55% (p<0.01) and spliced variant 3 produced a fourfold increase in EGFP expression (p<0.01). Transcript abundance was consistent for all constructs. Reverse transcriptase polymerase chain reaction (RT-PCR) analysis of spliced variants 1 and 2 reveals full transcript within variant 1 and suggests that spliced variant 2 is further processed to produce spliced variant 3. (A) EGFP assay of HEK293 cells transfected with constructs containing sORFs I–VI upstream of the reporter EGFP and constructs containing spliced variant 1 (sORF VI), spliced variant 2 (sORFs I, II, and VI), and spliced variant 3 (sORF VI_alt). All data represent the mean±SE of six individual experiments. (B) Transcript abundance, consistent across all constructs. All data represent the mean±SE of three individual experiments. RT-PCR analysis performed on RNA samples from HEK293 cells. (C) Analysis of constructs containing spliced variant 1 (sORF VI alone), spliced variant 2 (sORFs I, II, and VI), (D) spliced variant 3 (sORF VI_alt), and (E) I-VI wt (initial spliced products indicated with a star). Samples were assessed for DNA contamination (-RT) and RNA priming (-OLI); no contamination was detected. Lanes marked+depict the cDNA samples. The cDNA samples show the amplification products from a sole product (240-bp amplimer) in spliced variant 1. Spliced variant 2 depicts two products: the major spliced variant 2 (480-bp amplimer), indicated with a star, and a sub-spliced product, spliced variant 3 (240-bp amplimer). GAPDH controls shown below, indicating integrity of the transcript pool. PCR, polymerase chain reaction; GAPDH, glyceraldehyde 3-phophate dehydrogenase.

Alignment of the sequences of the two shorter amplimers with the full-length transcript revealed that splicing had occurred within the transcript (data not shown). The two spliced sequences used the same splice acceptor SA 1 (immediately before sORF I, NL4-3 nt. position 8095) but alternative splice donor sites, SD 1 and SD 2 (NL4-3 nt. positions 8745 and 8540, respectively) as depicted in Figure 1C. The 240-bp splice product (spliced variant 1) resulted from the removal of sORFs I to V inclusive (using SD 1 and SA 1), while the 480-bp splice product (spliced variant 2) resulted from the removal of sORFs III to V inclusive (using SD 2 and SA 1).

Spliced variants 1 and 2 were cloned separately into the plasmid pEGFP-N1, and these constructs were transfected into HEK293 cells; EGFP expression, transcript abundance, and transcript integrity were analyzed as described earlier. Spliced variant 1, which contains only sORF VI, produced a slight increase (22%) in EGFP expression compared to the wild type (p=0.021, n=6) (Fig. 3A), but transcript abundance did not change (Fig. 3B). In contrast, spliced variant 2, which contains sORFs I, II, and VI, produced a 55% decrease in EGFP expression compared to the wild type (p=<0.01, n=6).

Transcript RT-PCR analysis (Fig. 3C) showed that the predominant species produced in spliced variant 1 transfected cells was the expected 240-bp spliced product, containing sORF VI alone (Fig. 3C, lane 2). However, an analysis of spliced variant 2-transfected cells (Fig. 3C, lane 5) revealed two amplified products: the expected 480-bp spliced variant 2, containing sORFs I, II, and VI, and a smaller, less abundant product (214 bp). All amplified products were again recovered, cloned, and sequenced as detailed previously.

Alignment of the sequences of the three products (data not shown) revealed that splicing within the transcript of spliced variant 2, involving SD 1, and an alternative splice acceptor, SA 2 (NL4-3 nt. position 8069) 26-bp downstream of SA 1, had produced a third product (spliced variant 3, Fig. 1C) of similar size to spliced variant 1. This splicing event results in the complete removal of sORFs I and II and the ATG initiation codon of sORF VI, presenting an alternative, in-frame sORF ATG codon (weak Kozak sequence context) 69 bp further downstream (NL4-3 nt. position 8022); this alternative sORF is designated sORF VI_alt. Spliced variant 3 was not detected in any of the six colonies screened in the initial RT-PCR and cloning experiments involving spliced variant 1, suggesting that splicing to produce variant 3 is an infrequent event.

Analysis of expression showed that spliced variant 3, which contains only the sORF VI_alt ATG initiation codon, produced a fourfold increase in EGFP expression compared to the wild type (p=<0.01, n=6) with no change in transcript abundance (Fig. 3A, B, respectively). Transcript RT-PCR analysis (Fig. 3D, lane 2) confirmed that the predominant species produced in variant 3-transfected cells was of the size expected for the product containing sORF VI_alt.

Analysis of the HIV-1_NL4-3 sequence revealed typical splice donor (at nt 8745 and 8540) and splice acceptor (at nt 8095 and 8069) sequences. SD 1 (NL4-3 nt. position 8745) was 100% conserved across 27 HIV-1 subtype B sequences examined, while SD 2 (NL4-3 nt. position 8540) of variant 2 was slightly less well conserved (86 and 93% conservation of the two critical nucleotides, respectively) (Fig. 4A). SA 1 was 100% conserved across all 27 HIV-1 B clade sequences examined (Fig. 4B). Analysis of the SA 2 site showed that conservation of the vital A and G nucleotides at positions 13 and 14 was poor (30% and 65%, respectively). There is potential for production of a fourth spliced product, from the moderately conserved SD 2 and the poorly conserved SA 2, but this product was never detected in our experiments.

FIG. 4.

Conservation of splice donor and acceptor motifs. Essential sequence components of splice donor 1 and splice acceptor 1 are 100% conserved across 27 HIV-1 subtype B sequences. (A) The splice donor motif with position numbers (with respect to the splice site) are indicated below. Splice site indicated by the vertical line and essential components of the splice site indicated in italics. The conservation of each residue for both SD 1 and 2 are expressed as percentage of a total of 27 HIV-1 subtype B sequences randomly selected and examined for motif conservation. The SD 1 motif is more highly conserved than SD 2 in the sequences examined. (B) The splice acceptor motif with position numbers (NL4-3) are indicated below. Splice site is indicated by the vertical line and essential components of the splice site are indicated in italics. The critical residues of the splice acceptor (highlighted) were both 100% conserved across the 27 HIV-1 subtypes B sequences examined for SA 1, utilized by variants 1 and 2. SA 2, utilized by variant 3, exhibits a lower degree of conservation.

Real-time PCR was used to determine the relative levels of alternatively spliced and unspliced transcripts in the cDNA pool obtained from HEK293 cells transfected with the pEGFP-(sORF I-VI wt) construct. TaqMan probes specific to each species (unspliced product and spliced variants 1, 2, and 3) were used. All four probes were efficient (Fig. 5A). The probe specificity was confirmed using positive control DNA; absence of amplification in –RT-indicated samples were not contaminated with DNA. Copy numbers indicate that variants 1 and 3 are the most abundant products (Fig. 5B) present in 3:2 ratio (p<0.01, n=3). In combination, these two transcripts are eightfold more abundant than the unspliced transcript, consistent with our RT-PCR experiments (Fig. 3E, Lane 4). Spliced variant 2 was the least abundant of all the transcripts, equivalent to 6% of the unspliced transcript (p<0.01, n=3).

FIG. 5.

Quantitative real-time PCR analysis of transcript abundance. Real-time PCR analysis confirms that spliced variants 1 and 3 are the most abundant products present at a ratio of 3:2 (p<0.01) and, in combination, are eightfold more abundant than the unspliced transcripts. Spliced variant 2 was the least abundant of all the transcripts.(A) The efficiencies of each primer/probe with all four sets depicting efficiencies between 90% and 110%. (B) Copy numbers of each transcript (unspliced and spliced variants 1, 2, and 3) within the same cDNA sample. Samples were assessed for DNA contamination (-RT) and primer specificity; no contamination was detected. Copy numbers obtained for each transcript were significantly different (p<0.01). All data represent the mean±SE of triplicate samples.

Effects of mutating the SA 1 and SD 1 motifs

Constructs in which the SD 1 and/or SA 1 sites had been mutated were transfected into HEK293 cells alongside the pEGFP-(sORF I-VI wt) construct and controls (pEGFP-N1, pE-N1); analysis of reporter gene expression, transcript abundance, and transcript integrity was performed as previously described.

Mutation of the conserved splice donor (SD 1) produced a 28% increase in EGFP expression compared to the wild type (p=0.003, n=6) (Fig. 6A); transcript abundance did not change (Fig. 6B). RT-PCR analysis (Fig. 6C, seventh lane) showed that both the full-length, unspliced transcript and spliced variant 2 (produced using the alternative splice donor site, SD 2, at nt 8540) were present (confirmed by the PCR product sequence analysis, Fig. 6C, Lane 7, starred). Note that spliced variant 3 was not detected due to the absence of SD 1.

FIG. 6.

Effects of mutating the major splice donor and acceptor motifs. Mutation of splice donor 1 increased EGFP expression by 28% (p=0.003) in comparison to the wild type. Mutation of splice acceptor 1 (with or without disruption of splice donor 1) decreased EGFP expression to ∼17% of wild-type levels. Transcript abundance is consistent for all constructs. RT-PCR analysis of splice donor 1 mutant reveals splicing to produce spliced variant 2, while mutation of splice acceptor 1 with or without splice donor 1 abolishes splicing activity. (A) EGFP assay of HEK293 cells transfected with constructs containing sORFs I–VI upstream of the reporter EGFP with mutation of the SD 1 and SA 1 motifs. All data represent the mean±SE of six individual experiments. (B) Transcript abundance, consistent across all constructs. All data represent the mean±SE of three individual experiments. RT-PCR analysis performed on RNA samples from HEK293 cells. Analysis of (C) constructs with and without mutation of the SD 1 and SA 1 motifs and (D) construct containing mutation of both SD 1 and SA 1 along with pEGFP-N1 and pE-N1 parental plasmid controls. Samples were assessed for DNA contamination (-RT) and RNA priming (-OLI); no contamination was detected. Lanes marked+depict the cDNA samples. Lanes marked I–VI wt show the amplification products from the three alternative transcripts detected: the major product, spliced variants 1 and 3 (240-bp amplimer), and less abundant product, spliced variant 2 (480-bp amplimer) transcripts are indicated with a star, and the unspliced transcript is indicated by the 890-bp amplimer. Lane marked SD 1, construct where the SD 1 motif was mutated, displays increased use of SD 2. Lanes marked SA 1, construct where the SA 1 motif was mutated and SD 1 & SA 1, where both SD 1 and SA 1 motifs have been mutated, only contain the 840-bp amplification product of the full-length transcript, indicating loss of splicing activity. GAPDH controls shown below, indicating integrity of the transcript pool.

Similar results were obtained when SA 1 was disrupted in either the presence or absence of SD1. Production of EGFP was reduced to ∼17% of wild-type levels (Fig. 6A), and transcript abundance was again unaffected (Fig. 6B). The predominant cDNA/mRNA species in each instance was the full-length, unspliced transcript; no splice products were detected (Fig. 6C, lanes 4 and 10).

Discussion

The expression of negative-sense genes in retroviruses has been contentious until negative-sense transcription in HTLV-1 was shown to produce a protein of functional importance (Gaudray et al., 2002; Arnold et al., 2006). While HBZ has been shown to be a transcription factor, no functional role has yet been attributed to HIV-1 ASP.

While the conservation of the asp ORF in HIV-1 has been well established (Miller, 1988), this work describes the presence and high-level conservation of a series of sORFs upstream of the asp ORF across all subtypes of HIV-1. Upstream sORFs are typically associated with translational regulation of the downstream gene product; examples include many genes for which tight regulation of expression is critical (Davuluri et al., 2000; Morris et al., 2000; Suzuki et al., 2000). Here, we have demonstrated that the sORF region upstream of HIV-1 asp regulates the expression of a downstream reporter gene without any change in transcript abundance. Regulation was abolished by mutation of the sORF AUG initiation codons, suggesting a translational mechanism of regulation.

In addition, we have shown that the asp sORF region undergoes alternative splicing in vitro producing at least three different spliced variants in addition to the unspliced transcript; the most abundant of these retains only the sORF immediately preceding asp, sORF VI. Michael et al. (1994) and Bukrinsky and Etkin (1990) reported negative-sense asp transcripts of sizes 2.3 kb and 1.6 kb, respectively. These can now be explained as the unspliced and predominant spliced (utilizing SA 1 and SD 1) forms, differing by the size of the 0.65-kb intron that we have detected. The importance of SA 1 and SD 1 is supported by the high conservation of these typical splice acceptor and donor sequences. While conservation of SD1 could be explained by high-level conservation of the positive-sense strand, as required to maintain Pro (NL4-3 Env Pro814) in this position within Env gp41, this is not the case for SA1. The inclusion of any nucleotide base at position X of the ACX nucleotide sequence would maintain Thr at this position within gp41 (NL4-3 Env Thr597), suggesting that another selective pressure is responsible for the high-level conservation observed here. SD 2 is only moderately conserved and SA 2 is poorly conserved, so these sites may be of little importance.

Our experiments have shown that mutation of the predominantly utilized splice acceptor, SA 1, abolishes splicing activity of the sORF region, drastically reducing reporter expression. Splicing may have a role in releasing the severe inhibitory effect of the presence of the sORFs, in a similar manner to regulation of expression of the Estrogen Receptor α by alternative splicing of upstream sORFs reported by Kos et al. (2002).

Spliced variant 1 (containing sORF VI alone) produces a higher level of expression in comparison to spliced variant 2 (containing sORFs I, II, and VI), which could be related to the number of upstream sORFs. However, spliced variant 3 (sORF VI_alt) produces much higher levels of expression than either spliced variants 1 or 2, suggesting specific inhibition associated with sORF VI. This inhibition is also observed in the wild-type pool of unspliced and mixed spliced variants. The mechanism by which ribosomes translate these transcripts is not known and is the subject of further work in our laboratory.

Transcription of the negative-sense gene is restricted to the early phase of infection by T-cell transcription factors acting on the negative-sense promoter (Michael et al., 1994; Bentley et al., 2004). This study indicates that translation may also be strongly inhibited by the presence of multiple sORFs upstream of asp; inhibition may be modulated by alternative splicing of the upstream transcript. Such multi-modal regulation would permit a combination of regulatory signal inputs to precisely control asp expression, limiting ASP production to a particular phase in the virus replication cycle and/or in response to host signaling events. The potential for intron sequences to control expression of proviral and host cell genes via si-RNA or mi-RNA pathways, activities that are becoming increasingly apparent (Yeung et al., 2007; Klase et al., 2009; Schopman et al., 2011), must also be considered.

Most studies on asp and its expression to date, including this work, have used gene or specific region (e.g., upstream region, LTR) constructs. It is now crucial to extend these investigations into HIV-1-infected cell systems. However, the transient expression (Landry et al., 2007) of asp regulated by transcription from the 3′-LTR, the potential for alternative splicing of the transcript, and sORF inhibition of translation, as well as the lack of a known function for ASP, present great challenges.

Conclusions

The strong conservation of asp and its sORFs across all HIV-1 subtypes suggests that the asp gene product may have a tightly regulated role in the pathogenesis of HIV-1. We have shown that the asp sORF region inhibits downstream gene expression and may be alternatively spliced. This may provide a potential mechanism for the control of expression of the asp gene.

Footnotes

Acknowledgments

The authors wish to thank Dr. D. Purcell (Department of Microbiology and Immunology, The University of Melbourne, Australia) for providing the pE-N1-negative control plasmid and Mr Zoon Chan (The School of Applied Sciences and Engineering, Monash University, Gippsland, Australia) for assistance in cloning of spliced variants 1 and 2. We also wish to acknowledge the assistance of Dr. Karen Laurie and Ms. Louise Carolan (The Victorian Infectious Diseases Reference Laboratory, Melbourne, Australia) with the quantitative RT-PCR experiments.

This work was supported by The School of Applied Sciences and Engineering, Monash University, Gippsland campus.

Authors' Contributions

M.S.B. completed in silico analyses and performed the majority of the experimental work; K.E.B. performed preliminary RT-PCR analysis; N.J.D. assisted with data interpretation and experimental design; J.A.M. assisted with experimental design, research supervision, and data analysis; all authors participated in drafting of the article and approved the final article.

Author Disclosure Statement

The authors declare that they have no competing financial interests.

References

Arnold

, Yamamoto

, Li

, Phipps

A.J.

, Younis

, Lairmore

M.D.

, Green

P.L.

2006. Enhancement of infectivity and persistence in vitro by HBZ, a natural antisense coded protein of HTLV-1. Blood, 107:3976–3982.

Bentley

, Deacon

, Sonza

, Zeichner

, Churchill

2004. Mutational analysis of the HIV-1 LTR as a promoter of negative sense transcription. Arch Virol, 149:2277–2294.

Briquet

, Richardson

, Vanhee-Brossollet

, Vaquero

2001. Natural antisense transcripts are detected in different cell lines and tissues of cats infected with feline immunodeficiency virus. Gene, 267:157–164.

Briquet

, Vaquero

2002. Immunolocalization Studies of an Antisense Protein in HIV-1-Infected Cells and Viral Particles. Virology, 292:177–184.

Bukrinsky

M.I.

, Etkin

A.F.

1990. Plus Strand of the HIV Provirus DNA is Expressed at Early Stages of Infection. AIDS Res Hum Retroviruses, 6:425–426.

Cavanagh

, Landry

, Audet

, Arpin-Andre

, Hivin

, Pare

, Thete

, Wattel

, Marriott

SJ.

, Mesnard

, Barbeau

2006. HTLV-1 antisense transcripts initiating in the 3′ LTR are alternatively spliced and polyadenlated. Retrovirology, 3:1–15.

Clerc

, Laverdure

, Torresilla

, Landry

, Borel

, Vargas

, Arpin-Andre

, Gay

, Briant

, Gross

, Barbeau

, Mesnard

J-M.

2011. Polarized expression of the membrane ASP protein derived from HIV-I antisense transcription in T cells. Retrovirology, 8:74.

Davuluri

, Suzuki

, Sugano

, Zhang

M.Q.

2000. CART classification of human 5′ UTR sequences. Genome Res, 10:1807–1816.

Gaudray

, Gachon

, Basbous

, Biard-Piechaczyk

, Devaux

, Mesnard

2002. The Complementary Strand of the Human T-Cell Leukemia Virus Type 1 RNA Genome Encodes a bZIP Transcription Factor That Down-Regulates Viral Transcription. J Virol, 76:12813–12822.

10.

Greenway

A.L.

, Holloway

, McPhee

D.A.

, Ellis

, Cornall

, Lidman

2003. HIV-1 Nef control of cell signalling molecules; multiple strategies to promote virus replication. J Biosci, 28:323–335.

11.

Halin

, Douceron

, Clerc

, Journo

, Ling Ko

, Landry

, Murphy

E.L.

, Gessain

, Lemasson

, Mesnard

J-M.

, Barbeau

, Mahieux

2009. Human T-cell leukaemia virus type 2 produces a spliced antisense transcript encoding a protein that lacks a classic bZIP domain but still inhibits Tax2-mediated transcription. Blood, 114:2427–2438.

12.

Hinnebusch

A.G.

1997. Translational Regulation of Yeast GCN4. J Biol Chem, 272:21661–21664.

13.

Klase

, Winograd

, Davis

, Carpio

, Hildreth

, Heydarian

, Fu

, McCaffrey

, Meiri

, Ayash-Rashkovsky

, Gilad

, Bentwich

, Kashanchi

2009. HIV-1 TAR miRNA protects against apoptosis by altering cellular gene expression. Retrovirology, 6:18.

14.

Korkaya

, Jameel

, Gupta

, Tyagi

, Kumar

, Zafrullah

, Mazumdar

, Lal

S.K.

, Xiaofang

, Sehgal

, Das

S.R.

, Sahal

2001. The ORF3 Protein of Hepatitis E Virus Binds to Src Homology 3 Domains and Activates MAPK. J Biol Chem, 276:42389–42400.

15.

Kos

, Denger

, Reid

, Gannon

2002. Upstream Open Reading Frames Regulate the Translation of the Multiple Mrna Variants of the Estrogen Receptor α J Biol Chem, 277:37131–37138.

16.

Kozak

1986. Point Mutations Define a Sequence Flanking the AUG Initiator Codon That Modulates Translation by Eukaryotic Ribosomes. Cell, 44:283–292.

17.

Lamasson

, Lewis

M.R.

, Polakowski

, Hivin

, Cavanagh

M.H.

, Thébault

, Barbeau

, Nyborg

J.K.

, Mesnard

J.M.

2007. Human T-cell leukaemia type 1 (HTLV-1) bZIP protein interacts with the cellular transcription factor CREB to inhibit HTLV-1 transcription. J Virol, 81:1543–1553.

18.

Landry

, Halin

, Lefort

, Audet

, Vaquero

, Mesnard

J-M.

, Barbeau

2007. Detection, characterization and regulation of antisense transcripts in HIV-1. Retrovirology, 4:71.

19.

Larocca

, Chao

L.A.

, Seto

M.H.

, Brunck

T.K.

1989. Human T-cell Leukaemia Virus minus strand transcription in infected T-cells. Biochem Biophys Res, 163:1006–1013.

20.

Michael

N.L.

, Vahey

M.T.

, Darcy

, Ehrenberg

P.K.

, Mosca

J.D.

, Rappaport

, Redfield

R.R.

1994. Negative-Strand RNA Transcripts Are Produced in Human Immunodeficiency Virus Type 1-Infected Cells and Patients by a Novel Promoter Down-regulated by Tat. J Virol, 68:979–987.

21.

Miller

R.H.

1988. Human Immunodeficiency Virus May Encode a Novel Protein on the Genomic DNA Plus Strand. Science, 293:1420–1422.

22.

Morris

D.R.

, Geballe

A.P.

2000. Upstream Open Reading Frames as Regulators of mRNA Translation. Mol Cell Biol, 20:8635–8642.

23.

Peeters

, Lambert

P.F.

, Deacon

N.J.

1996. A Fourth Sp1 Site in the Human Immunodeficiency Virus Type 1 Long Terminal Repeat Is Essential for Negative-Sense Transcription. J Virol, 70:6665–6672.

24.

Pereira

L.A.

, Bentley

, Peeters

, Chuchill

M.J.

, Deacon

N.J.

2000. A compilation of cellular transcription factor interactions with the HIV-1 LTR promoter. Nucleic Acids Res, 28:663–668.

25.

Schopman

N.C.T.

, Willemsen

, Poi Liu

, Bradley

, van Kampen

, Baas

, Berkhout

, Haasnoot

2011. Deep sequencing of virus-infected cells reveals HIV-encoded small RNAs. Nucleic Acids Res, 2011:1–14.

26.

Suzuki

, Ishihara

, Sasaki

, Nakagawa

, Hata

, Tsunoda

, Watanabe

, Ota

, Isogai

, Suyama

, Sugano

2000. Statistical analysis of the 5′ untranslated region of human mRNA using “oligo-capped” cDNA libraries. Genomics, 64:286–297.

27.

Tagieva

N.E.

, Vaquero

1997. Expression of naturally occurring antisense RNA inhibits human immunodeficiency virus type 1 heterologous strain replication. J Gen Virol, 78:2503–2511.

28.

Vanhee-Brossollet

, Thoreau

, Serpente

, D'Auriol

, Levy

, Vaquero

1996. A Natural Antisense RNA Derived from the HIV-1 env Gene Encodes a Protein Which Is Recognised by Circulating Antibodies of HIV+ Individuals. Virology, 206:196–202.

29.

Yeung

M.L.

, Benkirane

, Jeang

K-T.

2007. Small non-coding RNAs, mammalian cells, and viruses: regulatory interactions? Retroviology, 4:74.

30.

Yoshida

, Satou

, Yasanuga

, Fugisawa

, Matsuoka

2008. Transcriptional control of spliced and unspliced human and T-cell leukaemia virus type 1 bZIP factor (HBZ) gene. J Virol, 82:9359–9368.