Identification and Characterization of the HERV-K (HML-8) Group of Human Endogenous Retroviruses in the Genome

Abstract

Human endogenous retroviruses (HERVs) can be vertically transmitted in a Mendelian fashion, are stably maintained in the human genome, and are estimated to constitute ∼8% of the genome. HERVs affect human physiology and pathology through their provirus-encoded protein or long terminal repeat (LTR) element effect. Characterization of the genomic distribution is an essential step to understanding the relationships between endogenous retrovirus expression and diseases. However, the poor characterization of human MMTV-like (HML)-8 prevents a detailed understanding of the regulation of the expression of this family in humans and its impact on the host genome. In light of this, the definition of an accurate and updated HERV-K HML-8 genomic map is urgently needed. In this study, we report the results of a comprehensive analysis of HERV-K HML-8 sequence presence and distribution within the human genome and hominoids, with a detailed description of the different structural and phylogenetic aspects characterizing the group. A total of 40 proviruses and 5 solo LTR elements for human were characterized, which included a detailed description of provirus structure, integration time, potentially regulated genes, transcription factor-binding sites, and primer-binding site features. Besides, 9 chimpanzee sequences, 8 gorilla sequences, and 10 orangutan sequences belonging to the HML-8 subgroup were identified. The integration time results showed that the HML-8 elements were integrated into the primate lineage around 35 and 42 million years ago (mya), during primates evolutionary speciation. Overall, the results clarified the composition of the HML-8 groups, providing an exhaustive background for subsequent functional studies.

Introduction

Germ cell infections caused by exogenous retroviruses and incorporation into host DNA occurred millions of years ago, leading to vertical transmission in a Mendelian fashion and stable maintenance of human endogenous retroviruses (HERVs) within the human genome.^1,2 HERVs are estimated to constitute ∼8% of human DNA.^1,3 Two hypotheses have been proposed to explain their persistence in the human genome during evolution: a parasitic hypothesis and a symbiotic hypothesis. In the parasitic hypothesis, HERVs are neutral and rather difficult to eliminate.^4,5 In contrast, the symbiotic hypothesis holds that they have been preserved by positive selection.⁶

HERVs originate as integrated proviruses. A common set of HERVs includes the gag, pro, pol, and env genes, which are flanked by two long terminal repeats (LTRs) that act as promoters.⁷ The gag gene encodes the structural protein that forms the core of the virion; pro encodes the viral protease; pol encodes reverse transcriptase and integrase, and env encodes the glycoprotein complex that controls receptor-mediated fusion and entry.⁸

Even when most gag, pro, pol, and env remain, they are usually inactive due to the accumulation of substitutions, deletions, and insertions. Most HERVs exist in the form of solo LTRs produced by homologous recombination between the 5′ and 3′ LTRs. These LTR elements have been shown to influence gene regulation by providing regulatory elements such as enhancers, promoters, splicing sites, and polyadenylation sites for various host genes.^9,10

The classification of HERVs has been controversial for a long time.¹¹ One system based on the transfer RNA (tRNA) molecules exists, which acts as the primer for retroviral minus-strand DNA synthesis. For example, the HERV-K element is considered to use lysine-tRNA. Nonetheless, these naming methods are currently considered incomplete. Another system exists based on phylogenetic methods of the highly conserved pol sequence.^12,13 Phylogenetically, HERVs can be divided into three classes, where Class I consists of Gamma retrovirus-like, Class II consists of Beta retrovirus-like, and Class III consists of Spuma retrovirus-like elements.^14,15 As such, HERV-Ks, acquired by the human species between 3 and 6 million years ago (mya), belong to the Class II of the betaretrovirus-like supergroup.¹⁶ The groups were initially named human MMTV-like (HML)-1 to HML-6, followed by the definition of HML-7 to HML-10.^17

–20 By 2011, two proviruses belonging to the HML-2 branch were identified as new groups HML-11.²¹ Currently, HERV-Ks are divided into subfamilies HML-1 through HML-11.

HERVs affect human physiology and pathology mainly in two ways. One aspect is based on the effect of the provirus-encoded protein on the host. The most typical physiological function of HERVs is that Env is highly expressed and involved in the formation of the placenta. Syncytin-1 and 2 are the Env proteins of HERV-W and HERV-FRD, respectively, which maintain the fusion trophoblast cell layer and their connection with the cytotrophoblast layer.^22,23 Several proteins encoded by HERV-K are related to cancer, such as germ cell tumors, teratocarcinoma, ovarian cancer, prostate cancer, melanoma, rheumatoid arthritis, and amyotrophic lateral sclerosis.^24

–27 Np9 promotes the growth of myeloid and lymphoblastic leukemia cells by activating the Notch1, ERK, and AKT pathways through the upregulation of β-catenin.²⁸ Moreover, the expression of HERV-K Env protein in breast cancer tissue is significantly higher compared with normal breast tissue, which is related to disease progression and poor prognosis.²³

Besides, many research have proven gene regulated endogenous retroviruses (ERV) expression. TRIM28 is a nuclear protein that mediates gene silencing through heterochromatin, the TRIM28 expression on dendritic cells can silence endogenous retrovirus and prevent excessive T cell priming.²⁹ WEE1 kinase is a key regulator of G2/M checkpoint. WEE1 inhibition increases ERV expression by relieving SETDB1/H3K9me3 repression through downregulating FOXM1.³⁰

The other aspect is based on LTR elements. They can act as transcriptional regulatory elements to interfere with the expression of upstream and downstream genes. In fact, there are many examples of HERV LTRs acting as promoters or transcription factor-binding sites for genes.^31

–34 The LTR of HERV-E has been revealed to be located upstream of the pancreatic amylase gene in the reverse direction, regulating the expression of the amylase gene and providing promoter activity.³⁵ HERV-K LTR has tissue-specific enhancer activity and can be used as the main promoter of the galactopancreatic amylase gene in the human colon and small intestine.³⁶

The transcriptional activation of HERV LTRs also has harmful effects on the body. An in vitro model of human mammary epithelial cell transformation revealed 5′ LTR promoter activity in tumorigenic cells, suggesting that the cellular environment of cancer cells is a key component for inducing the activity of the LTR promoter.³⁷ Additionally, HERV-W LTR downregulated the expression of the GABBR1 gene in schizophrenia.³⁸ Two members of the HERV-I family induced AZFa gene microdeletions in azoospermia patients.³⁹

For the HERV-K group, it is known that characterization of the genomic distribution is an essential step to understanding the relationships between endogenous retrovirus expression and diseases. For HML-8, there is one study showing the polymorphism of the HERV-K11 gene, indicating that the polymorphisms may arise from an individual-specific basis.⁴⁰ However, there is currently no information about the characterization of HML-8, which prevents a detailed understanding of the regulation of the expression of this family in humans and its impact on the host genome. In light of this, the definition of a precise and updated HERV-K HML-8 genomic map is urgently needed.

Materials and Methods

HML-8 identification and localization in the human genome (hg38)

To confirm HML-8 provirus and solo LTR localization in the human genome, we selected the Genome Reference Consortium assembly GRCh38/hg38 (released December 2013) as the human background sequence and the assembled MER11A-HERVK11-MER11A as a query search to identify HML-8. A traditional BLAT search⁴¹ in the UCSC Genome Browser database⁴² was used. DNA BLAT works by keeping an index of the entire genome in memory. The index consists of all overlapping 11-mers stepped by 5, except for those heavily involved in repeats (http://genome.ucsc.edu/cgi-bin/hgBlat). Additionally, the HML-8 orthologous loci have been identified through the comparative localization of the harboring genomic region for the following Hominoid genome assemblies in UCSC Genome Browser, including Chimpanzees (Pan troglodytes, assembly January 2018-Clint_PTRv2/panTro6), Gorillas (Gorilla gorilla gorilla, assembly August 2019-Kamilah_GGO_v0/gorGor6), and Orangutans (Pongo pygmaeus abelii, assembly January 2018—Susie_PABv2/ponAbe3).

Element distribution prediction and chromosome mapping

To evaluate whether HML-8 is randomly distributed in human chromosomes, we predicted its expected distribution according to the formula e = Cl × n/Tl (where e represents the expected number of integrations in the chromosome, Cl represents the length of the chromosome, n represents the total identified number of HML-8 loci in the human genome, and Tl is the sum length of all chromosomes).⁴³ The comparison of the actual number of HML-8 loci with the expected elements in the chromosome was analyzed through a chi-square (χ²) test.

Structural characterization

All 40 HML-8 proviral elements were characterized in detail through the Dfam reference MER11A-HERVK11-MER11A by multiple alignments performed with MEGA 7 and subsequent analysis through the BioEdit software platform.^44,45 All the deletions were annotated.

Phylogenetic analyses

To confirm the assignment of the identified HML-8 elements, maximum likelihood (ML) phylogenetic trees were constructed using MEGA 7.⁴⁶ Out of the 40 identified proviral elements, 3 proviral sequences that were longer than 80% of the HML-8 reference were used to construct a near-full-length phylogenetic tree. According to the model selection function of MEGA 7, the best-fitting model of nucleotide substitution for the full-length provirus was GTR+G+I. For the four coding regions, namely, those corresponding gag, pro, pol, and env, we screened sequences longer than 90% of the corresponding section of the HML-8 reference to construct their phylogenetic tree subregions. The best-fitting models of nucleotide substitution for gag, pro, pol, and env analysis were HKY+G+I for gag, GTR + G for pro and pol, and HKY + G for env.

Tree topologies were searched using the nearest neighbor interchange (NNI) procedure. For the hominoids, we screened sequences longer than 70% of provirus to construct a near-full-length phylogenetic tree. The best-fitting models were GTR + G for chimpanzees and gorillas, and GTR+G+I for orangutans. The confidence of each node in phylogenetic trees was determined using the bootstrap test with 500 bootstrap replicates. The final ML trees were visualized using iTOL.⁴⁷

Estimation of the integration time of HML-8

To estimate the time of integration, we assumed a substitution rate of 0.2%/nucleotides/million years for the human genome and used this rate to assess the action of divergence on each HML-8 sequence.⁴⁸ Estimation of the integration time is calculated based on the formula T = D%/0.2%, where T is the estimated time of integration (in million years) and D% is the percentage of divergent nucleotides. The divergence values were estimated by comparing the sequence of each HML-8 internal element gag, pro, pol, and env gene and its generated consensus sequence. The final age of each sequence was expressed.

Functional prediction of cis-regulatory elements and enrichment analysis

Noncoding regions typically lack biological function annotation. To examine the biological importance of HML-8 solo LTRs, their potential association with the analysis of the nearby genes was performed based on the Genomic Regions Enrichment of Annotations Tool (GREAT) against hg38.⁴⁹ The association rule was as follows: basal + extension, 5,000 bp upstream, 1,000 bp downstream, 1000,000 bp max extension, curated regulatory domains included.

After filtering regulatory genes by GREAT, we used the WEB-based Gene SeT AnaLysis Toolkit (WebGestalt)⁵⁰ to perform functional enrichment analysis, which is crucial for interpreting the list of interesting genes. WebGestalt can use three well-established and complementary methods for enrichment analysis, including overrepresentation analysis (ORA), gene set enrichment analysis (GSEA), and network topology-based analysis (NTA). The enrichment method used in the current work was ORA. The parameters for the enrichment analysis included the following: minimum number of IDs in the category, 5; the maximum number of IDs in the category, 2,000; false discovery rate (FDR) method, Benjamini–Hochberg; and significance level, top 10.

In silico examination of the conserved transcription factor-binding sites

The transcription factor-binding sites of the HML-8 LTR consensus sequence were predicted through the JASPAR database. The taxon was “vertebrates,” and the species was Homo sapiens. We selected chromatin immunoprecipitation sequencing data in JASPAR to predict transcription factors with a relative profile score threshold ≥95%. The constructed HML-8 LTR consensus sequence alignment and annotation were performed using Geneious software.⁵¹

Primer-binding site features representation

Primer-binding site (PBS) features of three near-full-length proviruses (LTR length >80%) and the HML-8 reference sequence were all analyzed using MEGA7 and BioEdit. The degree of conservation at each position was represented by a logo built from WebLogo at http://weblogo.berkeley.edu ⁵² Then, the PBS type was identified with tRNAdb (http://trna.bioinf.uni-leipzig.de).⁵³

Results

HML-8 element identification, localization, and actual distribution in hg38

First, the whole HML-8 element distribution was displayed based on Ensembl (Fig. 1A). In total, we characterized 40 HERV-K HML-8 proviruses and 5 solo LTR elements. Each HML-8 element was screened out and named according to the genomic locus of insertion (Tables 1 and 2). The average length of these proviruses was 4,875 bp. Among them, 6 sequences were longer than 70% of the full-length HML-8 reference sequence (10,485 bp), 16 sequences were 40%–70% of the reference length, and the remaining 18 sequences were <40% of the reference length. The lengths of the five solo LTRs were ∼75% of that of the MER11A, the LTR for HML-8. The nucleotide sequence of each element is shown in Supplementary Dataset S1.

FIG. 1.

Chromosomal distribution of HML-8 loci. (A) All HML-8 elements (grey arrows) have been visualized on the human karyotype. The number of HML-8 proviral elements (B) and solo LTRs (C) integrated into each human chromosome was depicted and compared with the expected number of random insertion events based on chromosomal length. The expected number of sequences in the chromosome was marked with black and the actual detected number of sequences was marked with grey. HML, human MMTV-like; LTR, long terminal repeat.

Table 1.

HML-8 Provirus Distribution

Locus	Chromosome	Strand	Position start	Position end	Length (bp)	Match+mismatch (bp)/full length (bp), %	Range, %	Qgap (bp)/[match+mismatch+Qgap (bp)], %	Insertion or deletion^a	Intergenic/intron/exon	Gene including the region
11q22.1	chr11	−	101211829	101220990	9,162	84.89	80–90%	3.06	NA	Intergenic	NA
19p12	chr19	−	23847813	23857118	9,306	81.93	80–90	5.22	NA	Intron	RP11-255H23.4
10p11.1	chr10	+	39042793	39051551	8,759	81.89	80–90	3.22	NA	Intergenic	NA
1q25.3	chr1	+	181246378	181254691	8,314	76.60	70–80	12.20	Deletion	Intergenic	NA
9p21.1	chr9	+	31770623	31778838	8,216	76.26	70–80	3.20	NA	Intergenic	NA
5p13.1	chr5	+	40104283	40112103	7,821	72.38	70–80	5.94	NA	Intergenic	NA
3p12.3	chr3	+	79052990	79060047	7,058	65.12	60–70	17.40	Deletion	Intron	ROBO1
9q32	chr9	−	112392518	112399562	7,045	65.11	60–70	24.03	Deletion	Intron	HSDL2
2p14	chr2	+	63989923	63996557	6,635	62.47	60–70	19.26	Deletion	Intron	VPS54
4q13.2	chr4	+	69191894	69198741	6,848	60.99	60–70	28.68	Deletion	Exon and intron	LOC105377267
11q13.2	chr11	+	67698607	67705411	6,805	60.24	60–70	31.01	Deletion	Intergenic	NA
12p11.1	chr12	+	34523565	34530133	6,569	59.16	50–60	11.41	Deletion	Intergenic	NA
Xp11.21	chrX	+	56942432	56948457	6,026	56.97	50–60	4.55	NA	Intergenic	NA
11p11.12	chr11	+	50440166	50446225	6,060	56.27	50–60	4.96	NA	Intergenic	NA
1p13.3	chr1	+	109702701	109708522	5,822	54.43	50–60	30.95	Deletion	Intron	GSTM2
1p33	chr1	+	46901185	46906302	5,118	46.70	40–50	38.63	Deletion	Intergenic	NA
11q11	chr11	+	54760333	54765370	5,038	46.06	40–50	6.61	NA	Intergenic	NA
8q11.1	chr8	+	46582940	46588016	5,077	45.03	40–50	20.00	Deletion	Intergenic	NA
12q23.3	chr12	+	105309242	105313830	4,589	42.28	40–50	7.76	NA	Intron	C12orf75-AS1
4q31.1	chr4	+	139632779	139637626	4,848	42.16	40–50	45.38	Deletion	Intergenic	NA
3q13.13	chr3	+	111525471	111529919	4,449	41.63	40–50	30.35	Deletion	Intergenic	NA
5q35.1	chr5	+	172397741	172402147	4,407	40.97	40–50	25.79	Deletion	Intron	SH3PXD2B
Xp21.1	chrX	+	34798563	34802767	4,205	39.38	30–40	23.90	Deletion	Intergenic	NA
19p11	chr19	+	24321247	24325713	4,467	36.82	30–40	5.09	NA	Intergenic	NA
8p23.1	chr8	+	8129952	8133947	3,996	36.07	30–40	26.76	Deletion	Intron	NA
22q11.21	chr22	+	19934889	19938756	3,868	35.86	30–40	7.78	NA	Exon and intron	TXNRD2
8p11.1	chr8	+	43894893	43898752	3,860	35.18	30–40	37.50	Deletion	Intergenic	NA
6q11.1	chr6	+	61275132	61278506	3,375	31.24	30–40	3.70	NA	Intergenic	NA
6p11.2	chr6	+	58400580	58403961	3,382	31.03	30–40	4.46	NA	Intron	XXbac-BPG55C20.7
7p12.1	chr7	+	50578797	50582145	3,349	27.32	20–30	30.55	Deletion	Intergenic	NA
Xp11.4	chrX	+	41645168	41648067	2,900	26.07	20–30	12.93	Deletion	Intron	CASK
2q34	chr2	+	213446560	213449125	2,566	24.21	20–30	5.05	NA	Intron	SPAG16
Yq11.222	chrY	+	17684937	17687515	2,579	23.55	20–30	13.61	Deletion	Intergenic	NA
14q32.11	chr14	+	90951671	90954075	2,405	22.42	20–30	8.06	NA	Intron	RPS6KA5
1p35.1	chr1	+	33066078	33068373	2,296	21.75	20–30	44.40	Deletion	Intergenic	NA
4q32.3	chr4	+	164750444	164752633	2,190	20.80	20–30	5.50	NA	Intergenic	NA
19q11	chr19	+	27580191	27582278	2,088	18.93	10–20	5.25	NA	Intergenic	NA
Yq11.23	chrY	+	25204027	25205492	1,466	13.74	10–20	2.17	NA	Intron	TTTY17C
5q14.1	chr5	+	81899820	81900981	1,162	9.96	0–10	11.00	Deletion	Intergenic	NA
1p21.1	chr1	+	106159132	106160005	874	7.85	0–10	5.73	NA	Intergenic	NA

HML-8 proviral sequences possess insertion or deletion compared with MER11A-HERVK11-MER11A.

HML, human MMTV-like; NA, not available.

Table 2.

HML-8 Solo Long Terminal Repeat Tracks Distribution

No.	Locus	Chromosome	Strand	Position start	Position end	Length (bp)	Percentage of MER11A in length, %	Match+mismatch/full length, %	Range, %	Qgap (bp)/[match+mismatch+Qgap (bp)], %	Insertion or deletion^a	Intergenic/intron/exon	Gene including the region
(1)	2q24.2	chr2	−	160986569	160987534	966	75.12	9.07	0–10	14.40	Deletion	Intergenic	NA
(2)	Yq12	chrY	+	56926587	56927558	972	74.88	9.04	0–10	13.97	Deletion	Intron	NM_001394354.1
(3)	Xq28	chrX	+	155740067	155741038	972	74.88	9.04	0–10	13.97	Deletion	Intron	ENSG00000168939
(4)	2p13.3	chr2	−	68628859	68629823	965	74.80	9.03	0–10	13.52	Deletion	Intergenic	NA
(5)	1p32.1	chr1	−	59558297	59559357	1,061	74.41	8.98	0–10	14.52	Deletion	Intron	ENSG00000172456

HML-8 solo LTR sequences possess insertion or deletion compared with MER11A.

LTR, long terminal repeat.

To assess whether the HML-8 integrations are present in the human genome in a random way, we compared the expected number of integrations with the detected number of HML-8 loci on each chromosome. The results showed that the number of HML-8 integration events observed was always inconsistent with the expected amounts (Fig. 1B, C). For the proviral elements, the number of HML-8 insertions on chromosomes 2, 3, 6, 7, 10, 13, 14, 15, 16, 17, 18, 20, and 21 was lower than expected. There were no loci detected on chromosomes 13, 15, 16, 17, 18, 20, and 21, and on chromosomes 1, 4, 5, 8, 9, 11, 12, 19, 22, X, and Y, the actual numbers identified were higher than the expected numbers (Fig. 1B).

Solo LTRs were detected only on chromosomes 1, 2, X, and Y (Fig. 1C). However, the differences were not statistically significant according to the chi-square test. Analysis revealed that HML-8 provirus and solo LTR integration displayed a nonrandom integration among human chromosomes. Furthermore, all 40 identified proviral elements and 5 solo LTRs were analyzed to determine their locations in intergenic regions, introns, and exons (Tables 1 and 2).

The results showed that 25 proviral elements are located in intergenic regions, accounting for 62.5%, 13 proviral elements were located in introns, accounting for 32.5%, and 2 proviral elements are located in both introns and exons, accounting for 5% (Table 1). With respect to solo LTRs, two were located in intergenic regions, accounting for 40%. The remaining three solo LTRs were located in introns, accounting for 60% (Table 2). A previous work by Brady et al⁵⁴ has demonstrated that the accumulation of HML-2 proviruses in introns and intergenic regions is not a result of integration preference but selection against proviruses that integrate into exons and genic regions. The results, in this study, displaying an apparent bias for insertions into intergenic regions and introns should be the same reason. The proviruses in genes and their relative transcriptional orientation are presented in Supplementary Tables S1 and S2.

Structural characterization

HML-8 sequences showed a typical proviral structure, with the gag, pro, pol, and env genes flanked by 5′ LTR and 3′ LTR sequences. According to the annotation information summarized in the Dfam database (https://www.dfam.org/family/DF0000189/features), the complete HML-8 contains four open reading frames. Specifically, these structures are located in the 5′ LTR (from nucleotide 1 to 1,266), the gag gene (nucleotides 1,422–3,530), the pro gene (nucleotides 3,341–4,345), the pol gene (nucleotides 4,303–7,032), the env gene (nucleotides 6,890–9,217), and the 3′ LTR (from nucleotide 9,220 to 10,485) sequences.

To describe the structure of each HML-8 provirus, we aligned 40 HML-8 sequences and annotated the position of the single retroviral component and deletions (Fig. 2). In general, there were different degrees of absence of LTRs at both ends of all the proviruses. We obtained three relatively complete sequences of proviruses (11q22.1, 19p12, 10p11.1), accounting for 80%–90% of the complete reference sequence length. However, their LTR structures are still incomplete. More specifically, the integrity of six separate regions is summarized in Table 3.

FIG. 2.

HML-8 proviruses structural characterization. Each HML-8 provirus nucleotide sequence has been compared with the Dfam consensus reference. LTR, gag, pro, pol, and env regions were annotated. Black lines represented deleted parts.

Table 3.

The Integrity of Six Separate Regions Relative to the Corresponding Reference Sections

Provirus No.	Locus	Provirus regions^a	5′LTR, %	gag, %	pro, %	pol, %	env, %	3′ LTR, %
(1)	11q22.1	chr11 101211829 101220990	38.55	98.96	98.41	99.89	98.97	44.71
(2)	19p12	chr19 23847813 23857118	36.73	97.34	96.02	95.60	96.82	38.39
(3)	10p11.1	chr10 39042793 39051551	39.42	98.06	100.00	99.19	97.94	24.80
(4)	1q25.3	chr1 181246378 181254691	49.84	64.72	66.07	93.33	98.02	36.89
(5)	9p21.1	chr9 31770623 31778838	72.12	98.96	99.30	99.60	67.31	0.00
(6)	5p13.1	chr5 40104283 40112103	0.00	92.89	98.21	99.05	90.46	18.96
(7)	3p12.3	chr3 79052990 79060047	0.00	94.97	98.41	65.60	90.16	28.67
(8)	9q32	chr9 112392518 112399562	37.36	98.77	43.48	49.19	98.54	33.25
(9)	2p14	chr2 63989923 63996557	0.00	93.27	76.82	63.08	90.21	32.23
(10)	4q13.2	chr4 69191894 69198741	28.20	99.95	98.71	15.35	97.55	38.47
(11)	11q13.2	chr11 67698607 67705411	73.93	69.75	0.00	56.63	98.67	9.64
(12)	12p11.1	chr12 34523565 34530133	0.00	39.02	98.41	99.30	79.12	18.25
(13)	Xp11.21	chrX 56942432 56948457	0.00	59.89	98.71	98.68	60.95	0.00
(14)	11p11.12	chr11 50440166 50446225	0.00	57.94	100.00	98.94	59.28	0.00
(15)	1p13.3	chr1 109702701 109708522	73.14	99.38	94.93	16.01	65.25	0.00
(16)	1p33	chr1 46901185 46906302	39.26	97.11	88.46	10.29	60.82	0.00
(17)	11q11	chr11 54760333 54765370	0.00	0.00	13.23	99.27	90.21	8.93
(18)	8q11.1	chr8 46582940 46588016	0.00	49.64	100.00	65.60	54.73	0.00
(19)	12q23.3	chr12 105309242 105313830	0.00	0.00	0.00	78.35	90.89	27.01
(20)	4q31.1	chr4 139632779 139637626	32.94	74.30	0.00	14.95	88.62	0.00
(21)	3q13.13	chr3 111525471 111529919	0.00	0.00	81.99	35.53	99.05	37.20
(22)	5q35.1	chr5 172397741 172402147	0.00	0.00	19.20	63.41	90.89	36.81
(23)	Xp21.1	chrX 34798563 34802767	0.00	0.00	14.43	63.81	90.21	26.62
(24)	19p11	chr19 24321247 24325713	0.00	80.70	99.50	51.65	0.00	0.00
(25)	8p23.1	chr8 8129952 8133947	0.00	0.00	0.00	65.82	89.56	13.35
(26)	22q11.21	chr22 19934889 19938756	70.46	97.44	88.16	0.00	0.00	0.00
(27)	8p11.1	chr8 43894893 43898752	0.00	0.00	34.03	42.01	89.73	25.83
(28)	6q11.1	chr6 61275132 61278506	0.00	80.04	97.91	31.58	0.00	0.00
(29)	6p11.2	chr6 58400580 58403961	0.00	78.14	98.41	31.72	0.00	0.00
(30)	7p12.1	chr7 50578797 50582145	0.00	0.00	8.06	61.68	55.88	0.00
(31)	Xp11.4	chrX 41645168 41648067	0.00	0.00	0.00	17.40	88.92	27.41
(32)	2q34	chr2 213446560 213449125	0.00	0.00	0.00	0.00	87.93	39.89
(33)	Yq11.222	chrY 17684937 17687515	0.00	0.00	0.00	6.70	88.49	30.25
(34)	14q32.11	chr14 90951671 90954075	0.00	0.00	0.00	0.00	60.74	75.36
(35)	1p35.1	chr1 33066078 33068373	0.00	0.00	34.13	41.83	42.53	0.00
(36)	4q32.3	chr4 164750444 164752633	0.00	0.00	0.00	0.00	77.19	30.96
(37)	19q11	chr19 27580191 27582278	0.00	81.41	47.66	0.00	0.00	0.00
(38)	Yq11.23	chrY 25204027 25205492	0.00	0.00	0.00	52.86	0.00	0.00
(39)	5q14.1	chr5 81899820 81900981	0.00	0.00	0.00	0.00	30.07	27.73
(40)	1p21.1	chr1 106159132 106160005	0.00	0.00	0.00	23.99	13.57	0.00

Chromosome: start-end (strand). Positions are referred to the human genome sequence, assembly GRCh38/hg38.

Phylogenetic analyses

To confirm the classification of the newly identified sequence and characterization of the phylogenetic relationships within the HML-8 group, we analyzed three proviral sequences that were longer than 80% of the HML-8 reference to generate ML phylogenetic trees; the analyzed sequences included the reference sequences of all Dfam HERV-K group members (HML-1 to -10) and some representative exogenous betaretroviruses. The results showed that all three proviruses clustered with the Dfam HML-8 group reference sequence, supported by 100% of bootstrap (Fig. 3A).

FIG. 3.

Phylogenetic analysis of the HML-8 near-full-length proviruses and four subregions by Maximum Likelihood method. Phylogenetic analyses of HML-8 proviruses (A), gag (B), pro (C), pol (D), and env (E) together with references. The two intragroup clusters of the env gene (a and b) were annotated and depicted with grey and light grey background colors, respectively. The resulting phylogeny was tested by using the bootstrap method with 500 replicates. The length of branches indicates the number of substitutions per site.

In addition, we further constructed ML subregion trees for the 12 sequences of gag elements, 15 pro elements, 10 pol elements, and 15 env elements (Fig. 3B–E). These phylogenetic groups of different regions of HML-8 all clustered together but were clearly separated from the other HERV-K groups (HML-1 to -7, -9, -10). Interestingly, within these main phylogenetic groups, we observed two different clusters located in the env section. They were statistically supported by bootstrap values (86% and 76%, respectively) and were named type a and type b. The type a sequences included the Dfam HML-8 env reference, whereas the type b elements showed a more divergent structure relative to the structure of the group reference elements. The solo LTR of HML-1 to -10 could not be used as a reference sequence because of its large differences from length to base composition, so we did not construct a phylogenetic tree of the HML-8 solo LTR.

Estimated time of integration

Because the LTRs of the provirus obtained were mostly deleted, we estimated the HML-8 provirus age based on the gag, pro, pol, and env regions. Each region longer than 90% of the corresponding reference section length was selected to calculate the integration time. The ancestral sequences of 4 regions were generated through MEGA 7, following the majority rule by the multiple alignments of all corresponding elements. The T value was estimated by the relation T = D%/0.2%, where 0.2% represents the human genome neutral mutation rate expressed in substitutions/nucleotide/million years. For each region of a provirus, the final T value was calculated. We have provided details on the period of provirus formation in Table 4. Overall, the results showed that the majority of HML-8 elements found in the human genome were integrated into the primate lineage between 23.5 and 52 mya. The average time of integration was 37.1 mya.

Table 4.

Estimated Time of HML-8 Elements' Integration

Locus	Provirus regions^a	Divergence from consensus sequence				Mean divergences	T = D%/0.2%	Age gene vs. consensus (million years)
Locus	Provirus regions^a	gag	pro	pol	env	Mean divergences	T = D%/0.2%	Age gene vs. consensus (million years)
11q22.1	chr11 101211829–101220990	0.07	0.08	0.05	0.05	0.062	0.30875	30.88
19p12	chr19 23847813–23857118	0.06	0.06	0.05	0.07	0.061	0.305	30.5
10p11.1	chr10 39042793–39051551	0.1	0.1	0.08	0.08	0.092	0.46125	46.13
1q25.3	chr1 181246378–181254691	NA	NA	0.05	0.05	0.048	0.24	24
9p21.1	chr9 31770623–31778838	0.06	0.07	0.05	NA	0.062	0.308333333	30.83
5p13.1	chr5 40104283–40112103	0.13	0.08	0.08	0.08	0.094	0.4675	46.75
3p12.3	chr3 79052990–79060047	0.1	0.05	NA	0.07	0.073	0.363333333	36.33
9q32	chr9 112392518–112399562	0.06	NA	NA	0.05	0.056	0.28	28
2p14	chr2 63989923–63996557	0.11	NA	NA	0.07	0.089	0.445	44.5
4q13.2	chr4 69191894–69198741	0.08	0.09	NA	0.08	0.082	0.4117	41.17
11q13.2	chr11 67698607–67705411	NA	NA	NA	0.05	0.052	0.26	26
12p11.1	chr12 34523565–34530133	NA	0.1	0.1	NA	0.097	0.485	48.5
Xp11.21	chrX 56942432–56948457	NA	0.08	0.08	NA	0.078	0.3875	38.75
11p11.12	chr11 50440166–50446225	NA	0.1	0.1	NA	0.1	0.5	50
1p13.3	chr1 109702701–109708522	0.06	0.07	NA	NA	0.064	0.3175	31.75
1p33	chr1 46901185–46906302	0.09	NA	NA	NA	0.093	0.465	46.5
11q11	chr11 54760333–54765370	NA	NA	0.1	0.09	0.097	0.4825	48.25
8q11.1	chr8 46582940–46588016	NA	0.05	NA	NA	0.047	0.235	23.5
12q23.3	chr12 105309242–105313830	NA	NA	NA	0.06	0.061	0.305	30.5
3q13.13	chr3 111525471–111529919	NA	NA	NA	0.05	0.048	0.24	24
5q35.1	chr5 172397741–172402147	NA	NA	NA	0.07	0.065	0.325	32.5
Xp21.1	chrX 34798563–34802767	NA	NA	NA	0.08	0.077	0.385	38.5
19p11	chr19 24321247–24325713	NA	0.1	NA	NA	0.104	0.52	52
22q11.21	chr22 19934889–19938756	0.06	NA	NA	NA	0.057	0.285	28.5
6q11.1	chr6 61275132–61278506	NA	0.08	NA	NA	0.083	0.415	41.5
6p11.2	chr6 58400580–58403961	NA	0.09	NA	NA	0.09	0.45	45

Chromosome: start-end (strand). Positions are referred to the human genome sequence, assembly GRCh38/hg38.

Comparative identification of orthologous insertions in hominoids

To identify the HML-8 orthologous sequences of different hominoids to human, and assess the period of germline “capture” event of the evolutionary lineage, we performed BLAT searches in the genome sequences of chimpanzees, gorillas, and orangutans using the HML-8 consensus sequence as a query, constructed phylogenetic trees with human, and estimated the integration time.

A total of 9 chimpanzee sequences, 8 gorilla sequences, and 10 orangutan sequences identified belong to the HML-8 subgroup. These hominoid sequences with three human proviral sequences of HML-8 were used to construct phylogenetic trees to confirm the phylogenetic relationships between human and hominoids. The results showed that except for chr19: 19066083–19080391 (gorilla) and chr19: 23582963–23597406 (chimpanzee), all the other proviruses were clustered together with HML-8 consensus (Fig. 4A). In addition, we constructed chimpanzee, gorilla, and orangutan phylogenetic trees, and the results showed the HML-8 orthologous insertions in nonhuman primates (Fig. 4B–D).

FIG. 4.

Phylogenetic analysis of the HML-8 near-full-length proviruses for human and hominoids by Maximum Likelihood method. (A) Phylogenetic analyses of HML-8 proviruses of human and hominoids together with references. Phylogenetic analyses of HML-8 proviruses of chimpanzees (B), gorillas (C), and orangutans (D). The resulting phylogeny was tested by using the bootstrap method with 500 replicates. The length of branches indicates the number of substitutions per site.

The integration time of HML-8 elements in chimpanzees, gorillas, and orangutans was also estimated. The results showed that the integration time of chimpanzees (Worksheet chimpanzees in Supplementary Table S3) ranged from 15 to 52.33 mya (the average time of integration was 35.86 mya), and the integration time of gorillas (Worksheet gorillas in Supplementary Table S3) ranged from 18.5 to 70.13 mya (the average time of integration was 39.9 mya), the integration time of orangutans (Worksheet orangutans in Supplementary Table S3) ranged from 22.5 to 92.8 mya (the average time of integration was 41.4 mya), which is consistent with that in human. Taken together, most HML-8 elements entered into the primate lineages around 35 and 42 mya, during primates' evolutionarily speciation.

Functional prediction of cis-regulatory elements and enrichment analysis

The GREAT analysis results are shown in Table 5, which describes the associations between each solo LTR and its putatively regulated gene(s). A total of eight genes were predicted. Among these, one LTR was not associated with any of the genes; the other four LTRs were associated with two genes each (Fig. 5A). The absolute distances of these eight genes to the transcription start site (TSS) were between 5 and 500 kb (Fig. 5B, C).

FIG. 5.

The genes associated with solo LTRs and GO analysis. (A) The number of associated genes per LTR. (B) Binned by orientation and distance to TSS. (C) Binned by absolute distance to TSS. Each Biological Process (D), Cellular Component (E), and Molecular Function (F) category is represented by a grey, black, and light grey bar, respectively. The height of the bar represents the number of IDs in the gene list and also in the category. TSS, transcription start site; GO, gene ontology.

Table 5.

The Associations Between Each Solo Long Terminal Repeat and the Gene(s) It Putatively Regulates According to the Association Rule Used

Locus	Provirus regions^a	Gene symbol	Full name	Related diseases
2q24.2	chr2 160986569 160987534	RBMS1 (−493258)	RNA-Binding Motif Single-Stranded Interacting Protein 1	Diffuse glomerulonephritis, arthrogryposis multiplex congenita 2, neurogenic type
2q24.2	chr2 160986569 160987534	TANK (−173245)	TRAF Family Member Associated NFKB Activator	Nipah virus encephalitis, open-angle glaucoma
Yq12	chrY 56926587 56927558	None	None	None
Xq28	chrX 155740067 155741038	TMLHE (−127617)	Trimethyllysine Hydroxylase, Epsilon	Autism X-linked 6, branched-chain keto acid dehydrogenase kinase deficiency
Xq28	chrX 155740067 155741038	SPRY3 (−27259)	Sprouty RTK Signaling Antagonist 3	Colorblindness, partial, protan series, legius syndrome
2p13.3	chr2 68628859 68629823	PROKR1 (−14248)	Prokineticin Receptor 1	Hirschsprung disease 1, Kallmann syndrome
2p13.3	chr2 68628859 68629823	APLF (+161780)	Aprataxin And PNKP-Like Factor	Inhalation anthrax, anthrax disease
1p32.1	chr1 59558297 59559357	HOOK1 (−256034)	Hook Microtubule Tethering Protein 1	Water-clear cell adenoma, spermatogenic failure
1p32.1	chr1 59558297 59559357	FGGY (+261846)	FGGY Carbohydrate Kinase Domain Containing	Lateral sclerosis, spastic paraplegia 7, autosomal recessive

Chromosome: start-end (strand). Positions are referred to the human genome sequence, assembly GRCh38/hg38.

To analyze the biological classification of key genes related to solo LTRs, gene ontology (GO) Slim summaries were performed using WebGestalt. Biological process (BP) analysis revealed that these genes were mainly enriched in metabolic processes and biological regulation (Fig. 5D). The changes in cellular components (CCs) showed that these genes were significantly enriched in the cytosol and that their molecular function (MF) was enriched in protein binding (Fig. 5E, F).

Among eight unique Entrez gene IDs, eight were annotated to the selected functional categories and in the reference list, which was used for the enrichment analysis. Based on the parameters set in the Materials and Methods section, 10 categories were identified as enriched categories. As shown in Figure 6A, a total of 10 enriched categories were identified for BPs: carnitine biosynthetic process, positive regulation of DNA ligation, regulation of DNA ligation, amino acid betaine biosynthetic process, positive regulation of protein deubiquitination, single-strand break repair, regulation of protein deubiquitination, carnitine metabolic process, pentose metabolic process, and negative regulation of fibroblast growth factor receptor signal. A bar chart showing the enrichment rate of the results was constructed. When the top results are chosen to be returned and the FDR for the categories is ≤0.05, the colors of the bars are in a darker shade than when the FDR exceeds 0.05. The volcano plot in Figure 6B shows the log2 of the FDR versus the enrichment rate for all the functional categories in the database, highlighting the degree to which the significant categories emerge from the background.

FIG. 6.

Enrichment result categories binned by Biological Process. (A) The bar chart plots the enrichment results vertically with the bar width equal to the enrichment ratio in ORA. (B) Customizable volcano plot. The inset shows an initial layout for comparison. ORA, overrepresentation analysis.

The size and color of the dot are proportional to the number of overlaps (for ORA). The significantly enriched categories are labeled, and the labels were positioned automatically by a force field-based algorithm run at startup. The enrichment results for CCs and MFs are illustrated in Supplementary Figures S1 and S2. Notably, these results are entirely speculative, and future research is needed to confirm any of the implied associations between solo LTRs and nearby genes.

In silico examination of the conserved transcription factor-binding sites

HML-8 exhibiting specific base insertions may influence the complexity of LTR transcriptional regulation.²¹ A complete view of the putative transcription factor-binding sites within the HML-8 LTR is shown in Figure 7A. A total of 33 human transcription factor-binding sites were predicted, which included those of 21 transcription factors: HIF1A, SP1, SP2, SP4, STAT3, GATA2, GATA3, GATA4, MXI1, SOX10, RBPJ, KLF1, KLF5, KLF7, KLF12, PRDM1, CREB1, THAP1, MAZ, ZEB1, and THAP1. The motifs are indicated for the sense strand and antisense strand.

FIG. 7.

In silico examination of the conserved TFBSs and logos representing the PBSs of HML-8. (A) The forward arrows indicate the sense strand, the reverse arrow indicates the antisense strand. Different transcription factors are marked with different colors. (B) PBS nucleotide sequence. PBS, primer-binding site; TFBSs, transcription factor-binding sites. Created at http://weblogo.berkeley.edu/logo.cgi

PBS features of HML-8 sequences

Traditionally, HERVs have been named according to the tRNA that binds their reverse trancriptase (RT) enzyme and PBS.¹² Thus, HERV-K is named after lysine-tRNA. In the 3 proviral and 1 consensus sequences of HML-8 elements analyzed, the PBS was located from approximately nucleotide 3–20 downstream of the 5′ LTR. To summarize the general variation of the PBS sequence within the HML-8 group, we generated a logo in which the letter height is proportional to the nucleotide conservation at each position (Fig. 7B). The results showed that the TGG starting nucleotides were the most conserved among the 18 bases analyzed. However, no PBSs belong to lysine, confirming the relatively low taxonomic value of this feature (Supplementary Table S4).

Discussion

Since the discovery that HERV-K family members play physiological and pathological roles in the human body, great attention has been dedicated to further understand their impact on hosts. One challenge that remains involves the lack of a complete and updated description of the HERV-K sequences in the human genome, the lack of information on their genomic background, and the lack of detailed knowledge of HERV-K single members. To date, the characterization and identification of HML-1 to -7 and HML-9 to -10 groups have been carried out.^{21,43,55

–61} Among HML groups, HML-2 is the best-characterized one; full genomic characterization of HML-2 has been performed, and time of integration and subtypes have been investigated.²¹ The precise identification of these subtypes has led to studies of their roles in physiological tumors or neurological diseases.

In the present study, we completed the identification and characterization of HML-8 proviruses and solo LTRs in human DNA, providing the first exhaustive description of this group. Following the approach carried out in a previous study,^59,60 we characterized a total of 40 HERV-K HML-8 proviruses and 5 solo LTR elements. The chromosomal distribution of these proviruses and the solo LTRs revealed a nonrandom integration pattern. Besides, all 40 identified proviral elements and 5 solo LTRs are usually enriched outside transcription units in the human genome (Tables 1 and 2).^54,62 The results revealed that these elements are mainly distributed within intergenic regions and introns. The reason may be that integration of an HERV provirus in the transcription unit is harmful and therefore subject to negative selection and elimination during evolution.^{14,54,63

–66}

Overall, the structural characterization revealed that only three HML-8 members (7.5%) retained the almost complete proviral structure. The majority of HML-8 elements show a defective proviral structure, with the complete loss of 5′ LTR (70%) and 3′ LTR (40%) sequences. The deletion of more than 40% of the reference length sequence of the gag, pro, pol, and env genes accounted for 52.5%, 50%, 55%, and 30% of those sequences, respectively.

Phylogenetic analysis showed that 3 sequences of HML-8 near-full-length proviruses as well as 12 sequences of gag, 15 elements of pro, 10 elements of pol, and 15 sequences of env form a unique monophyletic cluster distinctly separate from other HML groups and supported by the maximum bootstrap value. The phylogenetic tree of env regions revealed the presence of two well-supported clusters, identified here as types a and b and comprising 7 and 8 members, respectively.

Concerning the time of integration estimation, the traditional approach based on the divergence between the two LTRs of the same provirus was not applicable due to the lack of enough LTR sequences. Therefore, using gag, pro, pol, and env, we estimated the HML-8 provirus age. The results indicated that the main period of HML-8 integration occurred between 23.5 and 52 mya.

To identify HML-8 orthologous sequences in different hominoids, we screened 9 chimpanzee sequences, 8 gorilla sequences, and 10 orangutan sequences belonging to the HML-8 subgroup. The result showed that almost all hominoid sequences were clustered together with HML-8 consensus. The integration time analysis indicated that HML-8 sequences were integrated into the primate lineages around 35 and 42 mya, during primates' evolutionarily speciation. From this result, the integration time of HML-8 should be earlier than HML-2, HML-6, HML-7, and HML-10, whereas later than HML-5.^{21,43,55

–61,67}

A total of eight genes were identified as being potentially regulated by the five solo LTRs. GO analyses showed that these genes were mainly enriched in metabolic processes and biological regulatory processes, which indicated that these components may be involved in basic BPs. Through the prediction of transcription factors acting on HML-8 elements by JASPAR, many transcription factors were found to bind to the HML-8 sequence, indicating that HML-8 is likely to play a regulatory role in the expression of adjacent genes. It should be noted that these results are entirely prediction based. Experimental validation studies are required to confirm the associations between the LTRs and these genes.

Similar to previous work,^68,69 we further identified three proviral sequences and a consensus sequence. The result showed that no PBSs belong to lysine. This confirms that the traditional nomenclature of HERVs according to the tRNA that binds their RT enzyme and PBS¹² is imprecise.

Conclusion

In summary, the present exhaustive characterization of HML-8 composition and the genomic context of its insertion constitutes the first description of this poorly investigated group of elements. These elements can be activated in different tissues both under physiological conditions and in human disease development. Our study could contribute to better defining their real impact and contributions to the human genome.

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Availability of Data and Materials

All data generated or analyzed during this study are included in this published article.

Footnotes

Acknowledgment

The authors thank Mingyue Chen for her help in data analysis.

Authors' Contributions

Research design: L.L. and C.Y. Performed the analysis: L.J., X.G., X.Z., H.L., J.H., X.W., Y.L., T.L., B.Z., Y.W, and J.L. Contributed to the composition of the article: M.L., L.J., and L.L.

Author Disclosure Statement

The authors declare that they have no competing interests.

Funding Information

This study was supported by NSFC (31900157).

Supplementary Material

Supplementary Dataset S1

Supplementary Figure S1

Supplementary Figure S2

Supplementary Table S1

Supplementary Table S2

Supplementary Table S3

Supplementary Table S4

References

Bannert

, Kurth

. Retroelements and the human genome: New perspectives on an old relation. Proc Natl Acad Sci U S A, 2004; 101(Suppl 2):14572–14579; doi: 10.1073/pnas.0404838101

Boller

, Schönfeld

, Lischer

, et al. Human endogenous retrovirus HERV-K113 is capable of producing intact viral particles. J Gen Virol, 2008; 89(Pt 2):567–572; doi: 10.1099/vir.0.83534-0

, Gu

, Wang

, et al. Evolutionary analyses of the human genome. Nature, 2001; 409(6822):847–849.

Doolittle

, Sapienza

. Selfish genes, the phenotype paradigm and genome evolution. Nature, 1980; 284(5757):601–603.

Weinberg

. Origins and roles of endogenous retroviruses. Cell, 1980; 22(3):643–644.

Temin

HM.

Origin and General Nature of Retroviruses. In: The Retroviridae. (Levy JA. ed.) Springer US: Boston, MA, 1992; pp. 1–18.

Bannert

, Kurth

. The evolutionary dynamics of human endogenous retroviral families. Annu Rev Genomics Hum Genet, 2006; 7:149–173; doi: 10.1146/annurev.genom.7.080505.115700

Johnson

. Origins and evolutionary consequences of ancient endogenous retroviruses. Nat Rev Microbiol, 2019; 17(6):355–370; doi: 10.1038/s41579-019-0189-2

Gogvadze

, Buzdin

. Retroelements and their impact on genome evolution and functioning. Cell Mol Life Sci, 2009; 66(23):3727–3742; doi: 10.1007/s00018-009-0107-2

10.

Xue

, Zeng

, Jia

, et al. Identification of the distribution of human endogenous retroviruses K (HML-2) by PCR-based target enrichment sequencing. Retrovirology, 2020; 17(1):10; doi: 10.1186/s12977-020-00519-z

11.

Blomberg

, Benachenhou

, Blikstad

, et al. Classification and nomenclature of endogenous retroviral sequences (ERVs): Problems and recommendations. Gene, 2009; 448(2):115–123; doi: 10.1016/j.gene.2009.06.007

12.

Cohen

, Larsson

. Human endogenous retroviruses. Bioessays, 1988; 9(6):191–196.

13.

Jern

, Sperber

, Blomberg

. Use of endogenous retroviral sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy. Retrovirology, 2005; 2:50.

14.

Medstrand

, van de Lagemaat

, Mager

. Retroelement distributions in the human genome: Variations associated with age and proximity to genes. Genome Res, 2002; 12(10):1483–1495.

15.

Vargiu

, Rodriguez-Tome

, Sperber

, et al. Classification and characterization of human endogenous retroviruses; mosaic forms are common. Retrovirology, 2016; 13:7; doi: 10.1186/s12977-015-0232-y

16.

Sverdlov

. Retroviruses and primate evolution. Bioessays, 2000; 22(2):161–171.

17.

Medstrand

, Blomberg

. Characterization of novel reverse transcriptase encoding human endogenous retroviral sequences similar to type A and type B retroviruses: Differential transcription in normal human tissues. J Virol, 1993; 67(11):6778–6787; doi: 10.1128/JVI.67.11.6778-6787.1993

18.

Andersson

, Medstrand

, Yin

, et al. Differential expression of human endogenous retroviral sequences similar to mouse mammary tumor virus in normal peripheral blood mononuclear cells. AIDS Res Hum Retroviruses, 1996; 12(9):833–840; doi: 10.1089/aid.1996.12.833

19.

Medstrand

, Mager

, Yin

, et al. Structure and genomic organization of a novel human endogenous retrovirus family: HERV-K (HML-6). J Gen Virol, 1997; 78(Pt 7):1731–1744; doi: 10.1099/0022-1317-78-7-1731

20.

Andersson

, Lindeskog

, Medstrand

, et al. Diversity of human endogenous retrovirus class II-like sequences. J Gen Virol, 1999; 80(Pt 1):255–260; doi: 10.1099/0022-1317-80-1-255

21.

Subramanian

, Wildschutte

, Russo

, et al. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses. Retrovirology, 2011; 8:90; doi: 10.1186/1742-4690-8-90

22.

Blond

, Lavillette

, Cheynet

, et al. An envelope glycoprotein of the human endogenous retrovirus HERV-W is expressed in the human placenta and fuses cells expressing the type D mammalian retrovirus receptor. J Virol, 2000; 74(7):3321–3329.

23.

Blaise

, de Parseval

, Benit

, et al. Genomewide screening for fusogenic human endogenous retrovirus envelopes identifies syncytin 2, a gene conserved on primate evolution. Proc Natl Acad Sci U S A, 2003; 100(22):13013–13018; doi: 10.1073/pnas.2132646100

24.

Garcia-Montojo

, Doucet-O'Hare

, Henderson

, et al. Human endogenous retrovirus-K (HML-2): A comprehensive review. Crit Rev Microbiol, 2018; 44(6):715–738; doi: 10.1080/1040841X.2018.1501345

25.

Arru

, Galleri

, Deiana

, et al. HERV-K modulates the immune response in ALS patients. Microorganisms, 2021; 9(8):1784; doi: 10.3390/microorganisms9081784

26.

Mameli

, Erre

, Caggiu

, et al. Identification of a HERV-K env surface peptide highly recognized in rheumatoid arthritis (RA) patients: A cross-sectional case-control study. Clin Exp Immunol, 2017; 189(1):127–131; doi: 10.1111/cei.12964

27.

Arru

, Mameli

, Deiana

, et al. Humoral immunity response to human endogenous retroviruses K/W differentiates between amyotrophic lateral sclerosis and other neurological diseases. Eur J Neurol, 2018; 25(8):1076-e84; doi: 10.1111/ene.13648

28.

Chen

, Meng

, Gan

, et al. The viral oncogene Np9 acts as a critical molecular switch for co-activating β-catenin, ERK, Akt and Notch1 and promoting the growth of human leukemia stem/progenitor cells. Leukemia, 2013; 27(7):1469–1478; doi: 10.1038/leu.2013.8

29.

Chikuma

, Yamanaka

, Nakagawa

, et al. TRIM28 expression on dendritic cells prevents excessive T cell priming by silencing endogenous retrovirus. J Immunol, 2021; 206(7):1528–1539; doi: 10.4049/jimmunol.2001003

30.

Guo

, Xiao

, Wu

, et al. WEE1 inhibition induces anti-tumor immunity by activating ERV and the dsRNA pathway. J Exp Med, 2022; 219(1):e20210789; doi: 10.1084/jem.20210789

31.

Bannert

, Hofmann

, Block

, et al. HERVs new role in cancer: From accused perpetrators to cheerful protectors. Front Microbiol, 2018; 9:178; doi: 10.3389/fmicb.2018.00178

32.

Babaian

, Mager

. Endogenous retroviral promoter exaptation in human cancer. Mob DNA, 2016; 7:24; doi: 10.1186/s13100-016-0080-x

33.

Thompson

, Macfarlan

, Lorincz

. Long terminal repeats: From parasitic elements to building blocks of the transcriptional regulatory repertoire. Mol Cell, 2016; 62(5):766–776; doi: 10.1016/j.molcel.2016.03.029

34.

Fuentes

, Swigut

, Wysocka

. Systematic perturbation of retroviral LTRs reveals widespread long-range effects on human gene regulation. Elife, 2018; 7:e35989; doi: 10.7554/eLife.35989

35.

Samuelson

, Wiebauer

, Snow

, et al. Retroviral and pseudogene insertion sites reveal the lineage of human salivary and pancreatic amylase genes from a single gene during primate evolution. Mol Cell Biol, 1990; 10(6):2513–2520; doi: 10.1128/mcb.10.6.2513-2520.1990

36.

Dunn

, Medstrand

, Mager

. An endogenous retroviral long terminal repeat is the dominant promoter for human beta1,3-galactosyltransferase 5 in the colon. Proc Natl Acad Sci U S A, 2003; 100(22):12841–12846.

37.

Montesion

, Bhardwaj

, Williams

, et al. Mechanisms of HERV-K (HML-2) transcription during human mammary epithelial cell transformation. J Virol, 2018; 92(1):e01258-17; doi: 10.1128/JVI.01258-17

38.

Hegyi

. GABBR1 has a HERV-W LTR in its regulatory region—A possible implication for schizophrenia. Biol Direct, 2013; 8:5; doi: 10.1186/1745-6150-8-5

39.

Kamp

, Hirschmann

, Voss

, et al. Two long homologous retroviral sequence blocks in proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events. Hum Mol Genet, 2000; 9(17):2563–2572; doi: 10.1093/hmg/9.17.2563

40.

Cakmak Guner

, Karlik

, Marakli

, et al. Detection of HERV-K6 and HERV-K11 transpositions in the human genome. Biomed Rep, 2018; 9(1):53–59; doi: 10.3892/br.2018.1096

41.

Kent

. BLAT—The BLAST-like alignment tool. Genome Res, 2002; 12(4):656–664.

42.

Kent

, Sugnet

, Furey

, et al. The human genome browser at UCSC. Genome Res, 2002; 12(6):996–1006.

43.

Grandi

, Pisano

, Pessiu

, et al. HERV-K(HML7) integrations in the human genome: Comprehensive characterization and comparative analysis in non-human primates. Biology (Basel), 2021; 10(5):439; doi: 10.3390/biology10050439

44.

Hall

. BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nuclei Acids, 1999; 41:95–98.

45.

Storer

, Hubley

, Rosen

, et al. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA, 2021; 12(1):2; doi: 10.1186/s13100-020-00230-y

46.

Kumar

, Stecher

, Tamura

. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol, 2016; 33(7):1870–1874; doi: 10.1093/molbev/msw054

47.

Letunic

, Bork

. Interactive Tree Of Life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Res, 2021; 49(W1):W293–W296; doi: 10.1093/nar/gkab301

48.

Lebedev

, Belonovitch

, Zybrova

, et al. Differences in HERV-K LTR insertions in orthologous loci of humans and great apes. Gene, 2000; 247(1–2):265–277; doi: 10.1016/s0378-1119(00)00062-7

49.

McLean

, Bristor

, Hiller

, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol, 2010; 28(5):495–501; doi: 10.1038/nbt.1630

50.

Liao

, Wang

, Jaehnig

, et al. WebGestalt 2019: Gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res, 2019; 47(W1):W199–W205; doi: 10.1093/nar/gkz401

51.

Kearse

, Moir

, Wilson

, et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 2012; 28(12):1647–1649; doi: 10.1093/bioinformatics/bts199

52.

Crooks

, Hon

, Chandonia

, et al. WebLogo: A sequence logo generator. Genome Res, 2004; 14(6):1188–1190; doi: 10.1101/gr.849004

53.

Jühling

, Mörl

, Hartmann

, et al. tRNAdb 2009: Compilation of tRNA sequences and tRNA genes. Nucleic Acids Res, 2009; 37(Database Issue):D159–D162; doi: 10.1093/nar/gkn772

54.

Brady

, Lee

, Ronen

, et al. Integration target site selection by a resurrected human endogenous retrovirus. Genes Dev, 2009; 23(5):633–642; doi: 10.1101/gad.1762309

55.

Flockerzi

, Burkhardt

, Schempp

, et al. Human endogenous retrovirus HERV-K14 families: Status, variants, evolution, and mobilization of other cellular sequences. J Virol, 2005; 79(5):2941–2949; doi: 10.1128/JVI.79.5.2941-2949.2005

56.

Mayer

, Meese

. The human endogenous retrovirus family HERV-K(HML-3). Genomics, 2002; 80(3):331–343; doi: 10.1006/geno.2002.6839

57.

Seifarth

, Baust

, Murr

, et al. Proviral structure, chromosomal location, and expression of HERV-K-T47D, a novel human endogenous retrovirus derived from T47D particles. J Virol, 1998; 72(10):8384–8391; doi: 10.1128/JVI.72.10.8384-8391.1998

58.

Lavie

, Medstrand

, Schempp

, et al. Human endogenous retrovirus family HERV-K(HML-5): Status, evolution, and reconstruction of an ancient betaretrovirus in the human genome. J Virol, 2004; 78(16):8788–8798; doi: 10.1128/JVI.78.16.8788-8798.2004

59.

Pisano

, Grandi

, Cadeddu

, et al. Comprehensive characterization of the human endogenous retrovirus HERV-K(HML-6) group: Overview of structure, phylogeny, and contribution to the human genome. J Virol, 2019; 93(16):e00110-19; doi: 10.1128/JVI.00110-19

60.

Grandi

, Cadeddu

, Pisano

, et al. Identification of a novel HERV-K(HML10): Comprehensive characterization and comparative analysis in non-human primates provide insights about HML10 proviruses structure and diffusion. Mob DNA, 2017; 8:15; doi: 10.1186/s13100-017-0099-7

61.

Jia

, Liu

, Yang

, et al. Comprehensive identification and characterization of the HERV-K (HML-9) group in the human genome. Retrovirology, 2022; 19(1):11; doi: 10.1186/s12977-022-00596-2

62.

Skaletsky

, Kuroda-Kawaguchi

, Minx

, et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 2003; 423(6942):825–837.

63.

Smit

. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev, 1999; 9(6):657–663.

64.

van de Lagemaat

, Medstrand

, Mager

. Multiple effects govern endogenous retrovirus survival patterns in human gene introns. Genome Biol, 2006; 7(9):R86.

65.

Maksakova

, Romanish

, Gagnier

, et al. Retroviral elements and their hosts: Insertional mutagenesis in the mouse germ line. PLoS Genet, 2006; 2(1):e2.

66.

Cutter

, Good

, Pappas

, et al. Transposable element orientation bias in the Drosophila melanogaster genome. J Mol Evol, 2005; 61(6):733–741; doi: 10.1007/s00239-004-0243-0

67.

Hanke

, Hohn

, Bannert

. HERV-K(HML-2), a seemingly silent subtenant—But still waters run deep. APMIS, 2016; 124(1–2):67–87; doi: 10.1111/apm.12475

68.

Jern

, Stoye

, Coffin

. Role of APOBEC3 in genetic diversity among endogenous murine leukemia viruses. PLoS Genet, 2007; 3(10):2014–2022; doi: 10.1371/journal.pgen.0030183

69.

Kuraguchi

, Ohene-Baah

, Sonkin

, et al. Genetic mechanisms in Apc-mediated mammary tumorigenesis. PLoS Genet, 2009; 5(2):e1000367; doi: 10.1371/journal.pgen.1000367

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB

0.14 MB

0.25 MB

0.01 MB

0.03 MB

0.01 MB