Abstract
Abstract
Sarcoidosis is a multifactorial systemic disease characterized by granulomatous inflammation and greatly impacting on global public health. The etiology and mechanisms of sarcoidosis are not fully understood. Recent high-throughput biological research has generated vast amounts of multi-omics big data on sarcoidosis, but their significance remains to be determined. We sought to identify novel candidate regions, and genes consistently altered in heterogeneous omics studies so as to reveal the underlying molecular mechanisms. We conducted a comprehensive integrative literature analysis on global data on sarcoidosis, including genomic, transcriptomic, proteomic, and phenomic studies. We performed positional integration analysis of 38 eligible datasets originating from 17 different biological layers. Using the integration interval length of 50 kb, we identified 54 regions reaching significance value p ≤ 0.0001 and 15 regions with significance value p ≤ 0.00001, when applying more stringent criteria. Secondary literature analysis of the top 20 regions, with the most significant accumulation of signals, revealed several novel candidate genes for which associations with sarcoidosis have not yet been established, but have considerable support for their involvement based on omic data. These new plausible candidate genes include NELFE, CFB, EGFL7, AGPAT2, FKBPL, NRC3, and NEU1. Furthermore, annotated data were prepared to enable custom visualization and browsing of these sarcoidosis related omics evidence in the University of California Santa Cruz (UCSC) Genome Browser. Further multi-omics approaches are called for sarcoidosis biomarkers and diagnostic and therapeutic innovation. Our approach for harnessing multi-omics data and the findings presented herein reflect important steps toward understanding the etiology and underlying pathological mechanisms of sarcoidosis.
Introduction
S
Sarcoidosis is thought to result from both genetic susceptibility and specific environmental triggers (Chen and Moller, 2011). Familial clustering (Rybicki et al., 2001), increased concordance in monozygotic twins (Sverrild et al., 2008), and large regional and racial variations in the disease severity and prevalence suggest a strong genetic component to the disease. Various studies have confirmed genetic predisposition to sarcoidosis, with allelic variation in human leukocyte antigen HLA-DBR1 as a major susceptibility factor (Rossman et al., 2003).
In recent years, high-throughput technologies have facilitated comprehensive analyses at various biological levels, including genomic, transcriptomic, and proteomic levels. Consequently, many novel genetic risk loci and susceptibility genes for sarcoidosis have been proposed, indicating the complex genetic and pathophysiological architecture of the disease (Hofmann et al., 2008).
Interpretation and functional characterization of omics results are challenging, due to low reproducibility of identified findings and a high possibility of false-positive results. In addition, current research in sarcoidosis has mainly been focused on alterations at each molecular level separately. Integration of significant alterations detected at various biological layers, as well as an integration of data from multiple studies conducted at the same biological level, can identify novel susceptibility factors more reliably by reducing the rate of false-positive results and by recognizing previously undetected alterations with low effects on individual levels.
We have previously reported an initial integrative analysis of alterations detected at different omics levels for sarcoidosis, where locations of genes with altered transcriptomic or proteomic profiles were plotted along the chromosomal maps and inspected for co-location with regions associated with sarcoidosis in linkage studies (Maver et al., 2009). Since then, a number of omics studies on sarcoidosis have been published, but only a few have conducted integration of omics data. Global mRNA and microRNA (miRNA) expression was investigated by Maertzdorf et al. (2012), while Crouser et al. (2012) investigated integrated transcriptomic miRNA data from two different tissues, reporting that expression profiles of blood and diseased tissues are very different. Rosenbaum et al. (2015) investigated gene expression in three different tissues, showing that common pathogenetic mechanism contributes to sarcoidosis in different sites. However, studies were conducted on a relatively small number of patients, and a comprehensive integration of heterogeneous omics data in sarcoidosis is still lacking.
The aim of the present study was to review the literature for global data on genomic, transcriptomic (mRNA and miRNA), and proteomic alterations associated with sarcoidosis. Next, we performed a comprehensive positional integratomic analysis using Integratomics tool (Maver and Peterlin, 2011). We systematically collected a large number of studies carried out on various biological levels and in different tissues, to highlight the alterations consistently identified as significant. Using this approach, we identified novel candidate genomic regions and genes that may be involved in the pathogenesis of sarcoidosis.
Materials and Methods
PubMed database was searched up to April 2017 using the following keywords: sarcoidosis, and transcriptome* or proteome* or microarray or expression arrays or profiling or genome-wide or linkage. Furthermore, we searched Gene Expression Omnibus (GEO) (Edgar et al., 2002) and ArrayExpress public repositories (Kolesnikov et al., 2015) using the keyword sarcoidosis for raw expression data from transcriptomic (mRNA) and miRNAomic studies. Studies investigating biological alterations on the whole genomic, transcriptomic, miRNAomic, or proteomic level were included in the integrative analysis. Studies investigating defined sets of genes (data obtained by Immunochip (Fischer et al., 2015)) or specific genomic areas were excluded from further analysis to avoid bias due to a more frequent investigation of specific regions.
Data on statistically significant alterations (p-values <0.05), such as single-nucleotide polymorphisms (SNPs) associated with sarcoidosis, regions of genomic linkage disequilibrium in sarcoidosis, differentially expressed (DE) mRNA or miRNAs, and DE proteins, were extracted from each dataset, supplementary material, or published article. p-Values before correction for multiple testing were considered. For each type of the study, specific value of −log10p was assigned to each significant signal, except in the case where significance values (p-values) were not available (nine proteomic studies and one microRNA study); in such cases, all of the genes were assigned arbitrary value 1 and have been further treated equally as −log10p-values.
Genome-wide association studies
We included statistically significant polymorphisms (p-values below 0.05) of six studies, including European (German), African-American, and European American populations. Where total data from screening stage were not available, polymorphisms selected for validation stage were included (Fischer et al., 2012; Hofmann et al., 2008).
Linkage analysis
Genomic coordinates of markers were obtained from University of California Santa Cruz (UCSC) Genome browser site (http://genome.ucsc.edu) (Schurmann et al., 2001). Next, the length of the susceptibility regions was calculated taking into account the resolution of each study. The resolution was calculated by dividing the length of the human genome assembly by the number of markers investigated. Altogether 21 regions from three studies (Fischer et al., 2010; Iannuzzi et al., 2005; Schurmann et al., 2001) were significantly associated with sarcoidosis and were included into the integrative analysis.
Transcriptomics
Raw gene expression data were obtained from GEO repository using GEOquery package for R (Bioconductor) (Davis and Meltzer, 2007). We included studies comparing expression profiles between patients with clearly defined sarcoidosis and healthy individuals. Studies that compared expression profiles between different forms of sarcoidosis, such as the study by Lockstone et al. (2010) (GSE19976), where self-limiting pulmonary sarcoidosis was compared to progressive fibrotic pulmonary sarcoidosis, were excluded from analysis. Study with GEO accession number GSE2657 was excluded due to a low number of investigated sarcoid samples (n = 2). In the study of Bloom et al. (2013) (GSE42834) training set of patients and controls was included in the integration analysis.
Finally, data from 11 studies containing 12 transcriptomic datasets were included in the integration analysis. Among them, six datasets were obtained from studies investigating changes in blood (GSE34608, GSE19314, GSE18781, GSE1907, GSE42834, and GSE37912) and one dataset from each of the following studies: expression in lung tissue (GSE16538), skin tissue (GSE32887), CD4+ T-cells isolated from peripheral whole blood samples (GSE56998), lacrimal gland tissue (GSE58331), and anterior orbit tissue (GSE58331). All of the studies were performed using expression profiling by an array.
MicroRNA
Data from four sources containing five datasets were included in the analysis. Data were extracted from GEO database (GSE26409, GSE31568, and GSE34608) and from supplementary material in one case (Crouser et al., 2012). Three studies investigated expression in blood, one expression in lung tissue, and one in peripheral blood mononuclear cells (PBMCs). Appropriate annotations were searched using UCSC Genome browser (Kent et al., 2002), where necessary.
Proteomics
We have included data from 13 studies, investigating changes in proteomic profiles in serum, bronchoalveolar lavage fluid (BALF), alveolar macrophages, and BALF exosomes. Altogether 201 unique genes coding for DE proteins were collected using HGNC (HUGO Gene Nomenclature Committee) nomenclature.
Phenotype associations
Phenotype gene panels were generated using Phenotype Panel Generator (www.kimg.eu/generator/), by entering clinical signs and symptoms using human phenotype ontology (HPO) nomenclature (Kohler et al., 2017). HPOs associated with sarcoidosis were obtained from Online Mendelian Inheritance in Man (OMIM) database (MIM identifier:181000) and by additional literature search. Genes with matching phenotypic traits were scored higher. Data included 482 genes associated with clinical phenotypes of sarcoidosis.
Analytical steps were performed in the R statistical environment, version 3.3.3 using Bioconductor version 3.4, and integration analysis of heterogeneous omics data was performed using Integratomics tool, based on genomic localization of identified alterations (Maver and Peterlin, 2011). Reactome pathway enrichment analysis for top-ranked genes (Table 2) was performed using the Reactome pathway analysis tool (https://reactome.org/PathwayBrowser/).
Finally, genes located in the top regions were searched in PubMed database to exclude genes previously associated with sarcoidosis.
Annotation data included in the integration analysis was visualized in UCSC Genome Browser (Kent et al., 2002) (Custom Tracks), using hg19 genome assembly. We designed Browser Extensible Data (BED) file format of the input data to enable a custom view of altered regions across the genome. Furthermore, we designed BED file, where each tissue was treated as a separate level, to gain additional insights into transcriptomic (mRNA and miRNA) and proteomic alterations in different tissues.
Results
Literature search
In the present study, we systematically collected heterogeneous omics data investigating sarcoidosis, including genomic (Genome-wide association studies [GWASs] and linkage), transcriptomic (mRNA and miRNA), and proteomic datasets, obtained from several different tissues. Altogether, 38 multi-omics datasets eligible for the inclusion were extracted from 35 studies, which resulted in 147,719 statistically significant signals. We obtained 4098 significantly altered SNPs from GWASs, 21 regions from linkage analyses, 142,608 signals from transcriptomic (mRNA) studies, 763 from miRNA studies, and 229 genes encoding DE proteins from proteomic studies. In addition, 482 genes associated with the clinical phenotype of sarcoidosis were included as a separate dataset (Supplementary Table S2) (Table 1).
BALF, bronchoalveolar lavage fluid; PBMCs, peripheral blood mononuclear cells; GWAS, genome-wide association study; HPO, human phenotype ontology; BAL, bronchoalveolar.
Finally, 17 various study types were included into integration analysis, when taking into account tissue heterogeneity of data: GWAS, linkage analysis, blood transcriptome, lung transcriptome, skin transcriptome, bronchoalveolar (BAL) cell transcriptome, CD4 cell transcriptome, lacrimal gland transcriptome and anterior orbit transcriptome, miRNA expression in blood, miRNA expression in lung tissue, miRNA expression in PBMCs, BALF proteomics, serum proteomics, soluble proteome and membrane-associated fraction of alveolar macrophage proteomics, and BALF exosome proteomics. Data on study characteristics, including first author, year of publication, PMID and GEO accession numbers, the method used, investigated tissue, number of cases and controls, and available data on clinical phenotype of the patients, are presented in Supplementary Table S1.
Integration analysis
Positional integration was conducted using Integratomics tool, based on the positional integratomic approach, mapping heterogenic genomic data to the coordinate system of the human genome. We performed analysis using integration interval length of 50 kb and 1000 permutations. A permutation test was also used to estimate significance values of accumulation of signals from studies of the similar type.
Initially, following layers were included in the integration: GWAS, linkage, transcriptomics (mRNA), transcriptomics (miRNA), proteomics, and phenotype associations. Genome-wide distribution of significance of integrated regions is presented in Figure 1. The most significant peak of the signals was observed in HLA region on chr6. We identified 54 regions (with integration score of at least 4, p-value ≤0.0001), containing 176 potential candidate genes for sarcoidosis (Supplementary Table S3). When applying more stringent criteria, we identified 15 regions with significance values below 0.00001, containing 62 genes. Among them, seven regions (46.7%) were located at chr6, two regions at chr11, and one region at each of the chromosomes 1, 3, 5, 9, 12, and 14.

The genome-wide distribution of significance values for integration regions. Each dot represents 50 kb region. The height of the dots represents the integration score (−log10p-value).
Overview of the most significant regions
The two most significant regions were located at chr6 (chr6:31900000–31949999 and chr6:32800000–32849999) with −log10p-value of 7.5 and detected alterations at four different biological layers.
In the first region, containing Ensembl genes MIR1236, NELFE, DOM3Z (DXO), SKIV2L, CFB, C2, and STK19, we detected alterations at GWAS, linkage, transcriptomic (mRNA), and proteomic level. GWAS signals of this region mainly accumulated in NELFE, SKIV2L, and STK19 genes, while the linkage signal was spread across the entire 50 kb region. With extended analysis by separate tissues, significant transcriptional (mRNA) signals were detected in blood (by five independent studies), in lungs, BAL, lacrimal gland, anterior orbit, and skin tissue. The most significant peak of accumulation of transcriptomic signals was observed in the region of CFB and C2 genes. Proteomic signals were contributed by the CFB gene, the product of which was altered in BALF and serum of sarcoidosis patients.
The second highest ranked region contained PSMB8, PSMB9, TAP1, and TAP2 genes and detected alterations at GWAS, linkage, transcriptomics (mRNA), and phenotypic level. The detected GWAS signal was located in TAP2 gene, and the linkage signal was extended across the entire 50 kb region. Transcriptomic (mRNA) signals were detected in six various tissues; in blood by four independent studies and by one study in each of the following tissues: lungs, BAL, lacrimal gland, anterior orbit, and skin. Phenotype association signal was observed in PSMB8 gene.
Another region located at chr6 and detected significant alterations at four biological levels followed. The region was partly overlapping with the first region and contained an additional C4A gene. The phenotype signal in this region was attributed to the SKIV2L gene, which is known to be associated with hepatomegaly, a possible clinical feature of sarcoidosis.
The region at chr9, containing EGFL7, MIR126, and AGPAT2 genes, reached an integration score of 5.9. The region was highly ranked due to linkage and transcriptomic (mRNA and miRNA) signals and additional gene-phenotype associations. Transcriptomic (mRNA) changes in blood and lacrimal gland were accumulated in the region of EGFL7 and MIR126 genes, which further overlapped with the alterations in the expression of miRNA in blood, detected by two independent studies. Significant accumulation of transcriptomic signals was detected in AGPAT2 gene region, where signals from four different tissues, including blood (detected by five studies), CD4 cells, lungs, and anterior orbit, overlapped. The known association of AGPAT2 gene with hepatomegaly and splenomegaly contributed to the additional high scoring of this region.
The accumulation of significant alterations at the four highest ranked regions is presented in Supplementary Figures S1–S3.
Secondary literature analysis
Novel candidate genes were searched for in regions with the highest integration values. We excluded genes previously investigated in association with sarcoidosis, such as TNF (Xie et al., 2014), LTA (Song et al., 2014), LTB, C4A (Wennerstrom et al., 2012), PSMB8, PSMB9, and TAP2 (Foley et al., 1999). We identified several novel candidate genes, including MIR1236, NELFE, C2, CFT, EGFL7, MIR126, AGPAT2, FKBP, NRC3, and NEU1. Reactome enrichment analysis was performed for 42 genes (Supplementary Table S4). Results showed that genes were enriched in 350 pathways; top 50 enriched pathways are presented in Supplementary Table S5. The main biological processes and metabolic pathways included immune system, transport of small molecules, metabolism of proteins, cell cycle, and programmed cell death.
Data availability
We have made the data available for browsing in the form of UCSC browser custom track set. The data are available for browsing at http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Keli&hgS_otherUserSessionName=sarcoidosis.
Discussion
Sarcoidosis is a multifactorial systemic inflammatory disease of unknown etiology with a heterogeneous clinical course, characterized by the presence of noncaseating epithelium granulomas in one or more organs. Investigation of its largely unexplained etiology and underlying pathological mechanisms remains a challenge and, therefore, requires the use of new approaches, such as the integration of multi-omics data.
In recent years, technological advances at various omics levels enabled the discovery of novel biomarkers and candidate genes for pathological mechanisms of sarcoidosis, on global genomics, transcriptomics, and proteomic levels. Interpretation of large amounts and different types of omics data, commonly generated using hypothesis-free approaches, is challenging, due to high risk of false-positive or false-negative findings and a low number of biological or technical replicates.
Currently, there is a lack of integration studies published on sarcoidosis, investigating global biological alterations on different biological layers. In 2009 our team published the first integration study on sarcoidosis, based on the plotting of transcriptomic and proteomic signals to the corresponding chromosomal positions (Maver et al., 2009). Recently, Rivera et al. (2016) performed integrative analyses of genetic, transcription, and pathway modeling on different phenotypic forms of sarcoidosis; however, only genes associated with immune response were included in the analysis.
In the present study, we conducted a comprehensive search of published genomic, transcriptomic (mRNA and miRNA), and proteomic data, aiming to search for novel candidate regions and genes associated with sarcoidosis, which were significantly altered at various biological levels simultaneously. These regions present candidate genes with greater reliability than single layer analysis and enable more comprehensive understanding of pathological processes between molecular systems (Maver and Peterlin, 2011).
A total of 38 eligible datasets, containing 17 different biological layers, were included in the integration analysis. By positional integration analysis of the heterogeneous omics data, we identified 15 regions with an integration score of at least 5, containing 62 genes, and 54 regions with an integration score of at least 4, containing 176 potential candidate genes for sarcoidosis. Most of these genes were located at HLA region at chr6p21, which is in agreement with previously reported susceptibility regions and pathological processes associated with sarcoidosis (Rossman et al., 2003).
Among genes detected in top regions, we observed genes with a well-supported role in sarcoidosis. This includes TNF, LTA, TAP2, PSMB8, PSMB9, and C4A, which support the feasibility of integrative analysis in detecting true positive biological signals. For example, TNF gene, encoding pro-inflammatory cytokine, has known important role in the pathogenesis of the disease and the formation and maintenance of granuloma, as well as in the immune response (Xie et al., 2014). We have confirmed that TNF gene is significantly altered on genomic (GWAS, linkage), transcriptomic (mRNA) level, and using gene–phenotype associations.
Top 20 regions (with the highest integration scores) were selected for more precise investigation of the genes. The most significant accumulation of signals was observed at chr6:31900000–31949999 region, containing MIR1236, NELFE, DOM3Z, SKIV2L, CFB, C2, and STK19 genes and at chr6:32800000–32849999 region, containing PSMB8, PSMB9, TAP1, and TAP2 genes.
MIR1236 is an essential part of NELF complex, which negatively regulates the elongation of transcription by RNA polymerase II. The gene has previously not been associated with sarcoidosis. It has been shown that miR-1236 is upregulated in primary human lymphatic endothelial cells in response to IL-1β and may function as negative regulator of VEGFR-3 signaling during inflammatory lymphangiogenesis (Jones et al., 2012). Findings on lymphangiogenesis were previously reported in patients with pulmonary sarcoidosis in the study by Kambouchner et al. (2011), where sarcoid granulomas were found to be more strongly associated with lymphatic vessels than with blood vessels. miR-1236 may also be involved in ulcerative colitis inflammatory activity by downregulating the expression of RORγ (Zhang et al., 2015). Recently, it has been shown that miR-1236 regulates hypoxia-induced epithelial-mesenchymal transition and metastasis through repressing HDAC3 and SENP1 expression (Chen et al., 2016).
In the identified region, we observed significant changes at four levels, including GWAS, linkage, transcriptomic (mRNA), and proteomic level. When we visualized integration results for MIR1236 gene region in the UCSC genome browser, we observed that differential expression of mRNA was detected in blood by three independent transcriptomic studies (Maertzdorf et al., 2012; Rosenbaum et al., 2009; Zhou et al., 2012) and in lacrimal gland tissue by one study (Rosenbaum et al., 2015). Furthermore, in the close proximity of the MIR1236 gene differential expression of mRNA was also observed in skin, anterior orbit, lung tissue, and in BALF (Supplementary Fig. S1).
One of the most common ocular manifestations in patients with sarcoidosis is uveitis (20–50%) (Jamilloux et al., 2014). Complement component 2 (C2) and factor B (CFB) are regulators of the complement system, activation of which has been reported to play an essential role in the pathogenesis of ocular autoimmunity (Manickam et al., 2011). C2/CFB locus has previously been associated with susceptibility to macular degeneration and noninfectious intermediate and posterior uveitis, particularly in patients with Vogt–Koyanagi–Harada disease, which is also characterized by granulomatous inflammation (Yang et al., 2016). Moreover, treatment with CFB antibody has been shown to result in the suppression of uveitis in animal models (Manickam et al., 2011).
In addition to linkage and several transcriptomic signals of this region, alterations of CFB protein profile in patients with sarcoidosis were detected by two independent proteomic studies, conducted in serum and in BAL (Landi et al., 2015; Sabounchi-Schütt et al., 2003).
TAP1 gene codes for the membrane-associated protein, a member of the superfamily of ATP-binding cassette transporters, which is involved in the transport of antigens from the cytoplasm to the endoplasmic reticulum for association with major histocompatibility complex (MHC) class I molecules. Pathogenic homozygous variants in TAP1 gene represent an established cause of autosomal recessive bare lymphocyte syndrome (BLS), type1 (OMIM:604571). Patients with BLS may present with granulomatous skin lesions and bronchiectasis, clinical features commonly observed in sarcoidosis patients (Gadola et al., 2000). Although the TAP1 gene has previously been investigated in sarcoidosis, the conclusive association with sarcoidosis has not yet been established (Foley et al., 1999).
Furthermore, genes of this 50-kb region were previously associated with susceptibility to other autoimmune disorders; PSMB8 with hypersensitivity pneumonitis (Camarena et al., 2010), PSMB9 with psoriasis and vitiligo (Casp et al., 2003), and TAP1 and TAP2 with vitiligo (Casp et al., 2003).
PSMB8 and PSMB9 genes are part of class II region of the MHC and part of multicatalytic proteasome complex. In the input study of Gharib et al. (2016), authors reported that immunoproteasomes (PSMB8 and PSMB9) were overexpressed in BAL cells from patients with sarcoidosis. It was previously shown that patients with active sarcoidosis but not healthy controls demonstrate immunoproteasome in the lung tissue and in granulomas, suggesting the role of the proteasomal system in the mediation and sustaining of putative immune response in sarcoidosis (Sixt et al., 2014). In addition, pathogenic variants of PSMB8 gene represent an established cause of autosomal recessive autoinflammation, lipodystrophy, and dermatosis syndrome (OMIM:256040).
The high integration score of this region is supported by significant alterations at genomic (GWAS and linkage), transcriptomic, and phenotypic level. Clinical symptoms and signs, associated with PSMB8 gene, included elevated erythrocyte sedimentation rate, arthralgia, hepatomegaly, and splenomegaly. Differential expression of mRNA in this genomic region was detected in blood, BAL, lacrimal gland, anterior orbit, lung, and skin tissues (Supplementary Fig. S2).
The region at chr9 (chr9:139550000–139599999), containing EGFL7, MIR126, and AGPAT2 genes, reached integration score of 5.9. These three genes have not previously been investigated in sarcoidosis. EGFL7 (epidermal growth factor-like domain 7) gene is mostly expressed in endothelial cells and is involved in endothelial cell activation during inflammation. In was shown that pro-inflammatory cytokine TNFα (tumor necrosis factor alpha) strongly represses EGFL7 expression in endothelial cells in vitro. In addition, it was shown that EGFL7 expression is strongly repressed in mouse lung endothelial cells during lipopolysaccharide (LPS)- and TNFα-induced inflammation in vivo (Pinte et al., 2016). The seventh intron of the EGFL7 gene contains a miRNA site for miR-126. miR-126 may play roles in ulcerative colitis inflammatory activity by downregulating the expression of IKBA, an important inhibitor of NF-kappa B signaling pathway signaling pathway (Feng et al., 2012).
The genomic region containing EGFL7, MIR126, and AGPAT2 genes was significantly altered at four biological levels, including genomic (linkage), transcriptomic (mRNA), miRNAomic, and phenotypic level (Supplementary Fig. S3). When inspecting the expression profile, using UCSC genome browser, we observed that mRNA differential expression was detected in blood by six independent input studies and by one study in each of the following tissues: lacrimal gland, CD4 blood cells, lung tissue, and anterior orbit. Differential expression of miRNA was detected in blood by two independent studies. Furthermore, on the phenotypic level, we observed a signal in AGPAT2 gene, which is associated with splenomegaly and hepatomegaly, clinical signs frequently observed in sarcoidosis patients.
FKBPL gene is located at chr6 and encodes a protein with similarity to the immunophilin protein family, which plays a role in immunoregulation and basic cellular processes involving protein folding and trafficking. It was demonstrated that secreted FKBPL has potent antiangiogenic activity (Valentine et al., 2011). The 50-kb region was significantly altered at genomic (GWAS and linkage) and transcriptomic (mRNA) level, including lacrimal gland, anterior orbit, skin, and BAL tissues. The gene was previously not associated with sarcoidosis, but was recently identified as significant in patients with Löfgren syndrome by Rivera et al. in the Immunochip analysis (Rivera et al., 2016).
In addition, we detected several other plausible candidate genes reaching high integration scores, which are presented in Table 2. LST1 and NRC3 genes, located at chr6, may play a role in autoimmune inflammation and in infectious disease (Mulcahy et al., 2006). APOA1 gene, located at chr11, was significantly alerted at transcriptomic (mRNA), proteomic, and phenotypic level. The significant alteration was detected by four independent proteomic studies conducted in three different tissues: serum (Sabounchi-Schutt et al., 2004), BALF (Landi et al., 2015; Wattiez et al., 2000), and BALF exosomes (Martinez-Bravo et al., 2017).
Integration score represents −log10p estimates of p-values obtained through permutation.
Partly overlapping regions.
Partly overlapping regions.
NEU1 gene (neuraminidase 1 gene), located at chr6, encodes an enzyme that removes terminal sialic acid from glycoproteins. The gene was previously not associated with sarcoidosis, but was reported to be involved in pathophysiological mechanism associated with pulmonary fibrosis (Luzina et al., 2016). Previous data indicate that elevated expression of NEU1 sialidase promotes pulmonary collagen deposition, lymphocytosis, and fibrosis. We showed that NEU1 was significantly altered at genomic (GWAS, linkage), transcriptomic, and phenotypic level. Expression of mRNA was significant in several tissues, including blood, lung tissue, and BAL.
SLC44A4 gene, located in the same region, was reported in association with sarcoidosis in the input GWAS by Adrianto et al. (2012). Recently, the gene has also been implicated in fibrotic idiopathic interstitial pneumonia in genome-wide imputation study (Fingerlin et al., 2016).
Our study has some limitations. First, sarcoidosis is a disease with a heterogeneous clinical presentation, characterized by diverse signs and symptoms, affecting various organ systems. We did not limit our search on specific types of the disease, which might vary in their genetic susceptibility, genomic distributions, and cellular activities (Rivera et al., 2016). Second, data from some of the GWAS were not fully available, and consequently, SNPs selected for validation stage (based on their value of significance) were included in the analysis.
Conclusions
We systematically analyzed and integrated multi-omics data, genomic, transcriptomic, proteomic, and phenotypic, on sarcoidosis. Using the positional integratomic approach of omics data, we identified novel candidate regions and revealed several genes that might be involved in the pathogenesis of sarcoidosis. We suggest that our approach for harnessing multi-omics data and the findings presented herein reflect an important step in the understanding of etiology and underlying pathological mechanisms of sarcoidosis.
Footnotes
Author Disclosure Statement
The authors declare that no competing financial interests exist.
Acknowledgment
This study was funded by the Slovenian Research Agency (ARRS), grant no. P4-0220 and P3-0326.
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
