Abstract
OBJECTIVE:
This research aims to pinpoint key biomarkers and immunological infiltration of idiopathic pulmonary fibrosis (IPF) through bioinformatics analysis.
METHODS:
From the GEO database, 12 gene expression profiles were obtained. The LIMMA tool in Bioconductor accustomed to identify the genes that are expressed differently (DEGs), and analyses of functional enrichment were performed. A protein-protein interaction network (PPI) was constructed using STRING and Cytoscape, and a modular analysis was performed. Analysis of the immunological infiltration of lung tissue between IPF and healthy groups was done using the CIBERSORTx method.
RESULTS:
11,130 genes with differential expression (including 7,492 up-regulated and 3,638 down-regulated) were found. The selected up-regulated DEGs were mainly involved in the progression of pulmonary fibrosis and the selected down-regulated DEGs maintain the relative stability of intracellular microenvironment, according to functional enrichment analysis. KEGG enrichment analysis revealed that up-regulated DEGs were primarily abundant in the PI3K-Akt signaling mechanism, whereas down-regulated DEGs were associated with cancer pathways. The most significant modules involving 8 hub genes were found after the PPI network was analyzed. IPF lung tissue had a greater percentage of B memory cells, plasma cells, T cells follicular helper, T cells regulatory, T cells gamma delta, macrophages M0 and resting mast cells. while a relatively low proportion of T cells CD4 memory resting, NK cells resting and neutrophils.
CONCLUSION:
This research demonstrates the differences of hub genes and immunological infiltration in IPF.
Keywords
Introduction
IPF is a chronic maladjusted response to minor injuries, accompanied by proliferation of fibroblasts and excessive deposition of extracellular matrix, resulting in aberrant remodeling of the lungs. It is a lung condition that worsens and is fatal [1]. Patients with IPF have poor therapeutic effect and poor prognosis. They often pass away within two to three years of diagnosis, and around 40% of patient survive for five years [2, 3]. To date, the pathophysiology of idiopathic pulmonary fibrosis is not fully explicited, and effective therapies have not yet emerged clinically to alleviate patients’ symptoms [4]. Extensive studies have been conducted by many scholars aiming to reduce the death of IPF patients and improve their quality of life, but unfortunately the research progress has not been smooth. The pathogenic mechanism of IPF is still unclear, and few clinical indicators exist for diagnosis and determination of prognosis, which is the fundamental reason for the poor outcome of IPF treatment [5]. Inaccurate diagnosis and limited clinical treatment lead to serious poor prognosis and high mortality, which indicates that the growth of new and efficient treatment of IPF is an urgent problem for researchers. Indepth understanding of the pathogenesis of IPF and identification of related biomarkers and mechanisms of IPF is the direction of future research and development of drugs for the handling of idiopathic pulmonary fibrosis, and may become a emerging therapy for the effective handling of idiopathic pulmonary fibrosis [6]. Based on rapid development and wide application of bioinformatics, researchers have discovered many differential genes of diseases. At present, Screening for genes that play a decisive role in disease from the large number of differential genes is a trend in the study of disease pathogenesis [7]. Differential genes and regulatory pathways in idiopathic pulmonary fibrosis may be examined using bioinformatic analysis of the gene expression patterns of lung tissue with the disease., which is of great significance for the accurate diagnosis and effective handling of idiopathic pulmonary fibrosis. Therefore, screening for effective biomarkers of IPF is an urgent task.
The target of this research is to screen out the abnormal expression genes and key biomarkers of immunological infiltration in the lung tissues of IPF, so as to provide effective targets for the accurate diagnosis and precise handling of IPF. Figure 1 depicts how this research was conducted.

Data analysis pipeline for the identification of clinically relevant genes using microarray datasets.
Microarray data
We downloaded files of 12 original gene expression profiles of idiopathic pulmonary fibrosis from the GEO database. Table 1 displays the key details of the 12 datasets.
The main features of 12 selected datasets included in this analysis
The main features of 12 selected datasets included in this analysis
We compared gene expression levels between IPF lung tissue and normal lung tissue using the linear models for microarray data (LIMMA; http://www.bioconductor.org/packages/release/bioc/html/limma.html) package of Bioconductor to identify differ-entially expressed genes.
As selection criteria, we utilized P values corresponding to gene symbols following a t-test and adjusted P values 0.05 and |log FC|>2.
For each data set, Bioconductor (http://bioconductor.org/biocLite.R) generated volcano graphs. A heatmap of the DEGs was created in R software using the heatmap package (https://bioconductor.org/packages/release/bioc/html/heatmaps.html). The UpSetR program was used to create an UpSetR map, and overlapping DEGs were kept for additional analysis in order to identify relevant DEGs.
GO and KEGG enrichment analysis of DEGs
The screened differential genes were functionally enriched using the DAVID database (GO Gene Ontology). We used the Reactome database (https://reactome.org/) to visualize the differential genes for idiopathic pulmonary fibrosis after KEGG pathway enrichment analysis. We used P < 0.05 as a screening criterion for significant enrichment.
Construction of PPI network and identification of hub genes
Predicting PPI networks using STRING, which was also used to identify any potential associations. In addition, modules of the PPI network were screened using Cytoscape version 3.6.1’s Molecular Complex Detection (MCODE) plugin.
Immunological infiltration by CIBERSORTx analysis
In order to calculate the percentage of 22 immunological cells, the CIBERSORTx program reviewed the previously gathered standardized data for gene expression. These immunological cells included, in various states, B cells, plasma cells, T cells, NK cells, monocytes, dendritic cells, mast cells, macrophages, neutrophils, and eosinophils. We used the vioplot tool in version 3.6.0 of R to compare the different levels of infiltration of each immunological cell between patients with idiopathic pulmonary fibrosis and the healthy group, and the samples were filtered for a P value <0.05.
Results
Identifying DEGs
DEGs were discovered following bulk correction and Standardization of the microarray findings for 12 datasets. 11,130 genes with differential expression, including 7,492 up-regulated and 3,638 down-regulated, were found in total. Genes overlapped in the 12 datasets, as seen by the UpSetR map (Fig. 2). A volcano plot was used to check the accuracy of each dataset’s findings (Fig. 3). Figure 4 displays a heatmap of the DEG expression.

DEGs were identified from 12 gene expression profiling datasets based on fold change >2 and adjusted p value <0.05. UpSetR map showing overlapping genes.

The volcano plot. The red plots represent upregulated genes, the black plots represent nonsignificant genes, and the green plots represent downregulated genes.

Differentially expressed gene expression heatmap of lung tissue.
Based on GO functional enrichment analysis, the selected upregulated DEGs, mainly involved in the process of pulmonary fibrosis, including participation in the formation of extracellular matrix and extracellular structural tissue, collagen catabolic process (Fig. 5A). Additionally, the chosen down-regulated DEGs preserve the largely stable intracellular microenvironment, including the plasma membrane part, intrinsic plasma membrane component, integral plasma membrane component, regulation of multicellular organismal processes, regulation of localization, tube development, regulation of transport, cell surface, anatomical structure morphogenesis, and response to organic substance (Fig. 5B).

Gene Ontology (GO) analyses of DEGs. (A) The selected up-regulated DEGs. (B) The selected down-regulated DEGs.
According to the analysis of KEGG pathway enrichment, we concluded that the upregulated DEGs associated with IPF are mainly enriched in PI3K-Akt signaling pathway, focal adhesion, human papillomavirus infection and so on (Fig. 6A); down-regulated DEGs linked to IPF were linked to cancer pathways, neuroactive ligand-receptor interaction and so on (Fig. 6B).

Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses of DEGs. (A) The selected up-regulated DEGs. (B) The selected down-regulated DEGs.
We used STRING to examine the PPI network of the DEGs. The most important modules we filtered through the analysis include COL1A1, COL3A1, COL1A2, MMP2, BDNF, CAV1, CDH5 and S100A12, and we achieved the identification of the most important modules by using Cytoscape’s MCODE plugin. (Fig. 7).
Immunological infiltration analyses
Due to methodological constraints, the immunological infiltration landscape in IPF has not been entirely uncovered. We compared the immunological infiltration in 22 immunological cell subpopulations between IPF and healthy lung tissues using the CIBERSORTx algorithm. Figure 7 presents the findings from 103 healthy volunteers and 103 IPF patients of GSE150910. B memory cells, plasma cells, T cells follicular helper, T cells regulatory, T cells gamma delta, macrophages M0 and resting mast cells were typically found in higher concentrations in IPF lung tissue compared with normal tissue, while T cells CD4 memory resting, NK cells resting and neutrophils were generally found in relatively lower concentrations (Fig. 8, P < 0.05).

(A) PPI network analysis of the identified up-regulated DEGs. (B) PPI network analysis of the identified down-regulated DEGs.

The landscape of immunological infiltration between IPF and normal controls.
With the development of Internet technology and information technology, researchers use bioinformatics to study various diseases, including IPF. A fatal chronic disease, IPF is characterized by abnormal lung remodeling and ECM deposition [8]. IPF is an irreversible lung disease that eventually leads to irreversible damage to lung function and even death [9]. But now, the mechanism of IPF is still unclear, and even no valid treatment to delay the process of IPF [10]. At present, the curative effect of drugs for treating idiopathic pulmonary fibrosis is very limited, which can only prolong the life of patients for a short time [11]. The life span of patients with IPF depends to a large extent on the time of clinical diagnosis and treatment. Early diagnosis and early implementation of treatment can better prolong the life of patients with IPF [12]. Bioinformatics technology have provided us with a better understanding of the mechanism of IPF.
We analyzed 12 microarray data sets using bioinformatics in this work, and by comparing the variations in gene expression profiles between patients with IPF and healthy groups, we sorted out key target genes associated with the disease. In this study, biological processes and signaling pathways were examined for 11,130 DEGs, including 7,492 upregulated genes and 3,638 downregulated genes. These genes were identified by bioinformatics analysis and screened out. Subsequently, PPI examined the differentially expressed genes to learn more about the biological basis of IPF. We discovered that these DEGs were involved in several immunological responses and immunological cell chemotaxis, thus we used the CIBERSORTx algorithm approach to examine the immunological cell infiltration of IPF.
We further analyzed the selected up-regulated and down-regulated differentially expressed genes’ gene annotation and signaling pathway enrichment, and the analysis results revealed that up-regulated differentially expressed genes with related IPF participate in the progression of pulmonary fibrosis such as extracellular matrix, extracellular matrix organization, extracellular region, extracellular region part, extracellular matrix structural constituent, collagen catabolic process, collagen metabolic process and so on; down-regulated DEGs associated with IPF maintain the relative stability of intracellular microenvironment such as plasma membrane portion, intrinsic component of the plasma membrane, cell surface, regulation of transport, regulation of localization, and reaction to organic material.
Through the analysis of KEGG pathway enrichment, we learned that the up-regulated DEGs associated with IPF were linked to PI3K-Akt signaling pathway, focal adhesion, human papillomavirus infection, protein digestion and uptake; while the down-regulated DEGs associated with IPF were linked to cAMP signaling pathway and neuroactive ligand-receptor interactionsWe analyzed PPI networks to identify the most crucial hub genes, including the upregulated DEGs analysis of COL1A1, COL3A1, COL1A2, MMP2 and the downregulated DEGs analysis of BDNF, CAV1, CDH5 and S100A12.
Type I collagen encoded by COL1A1 is a member of the collagen family and is involved in the regulation of intercellular adhesion and differentiation [13]. Collagen constitutes the main structural protein of the extracellular matrix, which is the basic unit of collagen fibrils and is formed by juxtaposed macromolecular chains, and is essential for the growth and production of fibrosis [14]. Besides, previous research have shown that COL1A1 is crucial for the development and spread of many cancers [15]. Our research sheds light on COLA1’s possible involvement in IPF and its usage as an immune-related biomarker.
One of the collagen family members is COL3A1. COL3A1 is an essential ECM protein that was first identified in 1971. In the connective tissues of the organism with stretch such the skin, lung and uterus, type III collagen— a fibrillar collagen— is present. It encodes the type III collagen’s pro-alpha1 chains, which is often present alongside type I collagen [16]. There are several significant physiological roles of type III collagen.
Previous studies have shown that the COL3A1 expression was increased in a number of malignancies, which may make it harder for cancer patients to survive. In a number of malignancies, the COL3A1 has been demonstrated to encourage the development of malignancy and medication resistance [16].
One member of the protein family known as the nerve growth factor is encoded by the protein coding gene known as BDNF [17]. An essential signaling protein called BDNF triggers signaling cascades below NTRK2 [18]. It plays a role in supporting the survival and differentiation of certain neuronal populations in the peripheral and central nervous system throughout the growth and development of the organism. Participates in pathfinding, axonal development, and the control of dendritic shape and growth. Major controller of synaptic plasticity and transmission at adult synapses throughout the CNS. The fact that BDNF promote a variety of adaptive neuronal responses, such as long-term potentiation, long-term depression, specific types of short-term synaptic plasticity, and homeostatic control of intrinsic neuronal excitability, highlights how versatile it is. Previous studies have shown that this gene could be involved in inhibiting the biological behavior of proliferation, migration and invasion of non-small cell lung cancer [19].
S100A12 was highly exhibited in monocytes, but our analysis showed that idiopathic pulmonary fibrosis patients’ lung tissues were down-regulated. According to binding to AGER, S100A12 might stimulate immunological cells and activate the MAPK and NF-kappa-B signaling routes, which would cause the generation of pro-inflammatory cytokines and an increase of adhesion molecules in cells VCAM1 and ICAM1. Majority of AT1 cells expressed AGER, and patients with idiopathic pulmonary fibrosis had considerably lower levels of AGER. Furthermore, TGFB1 and TNF-alpha promoted the loss of AGER in pulmonary fibrosis. As per RAGE-p38 MAPK signaling, studies revealed that S100A12 might suppress lung fibroblast migration. Previous studies have suggested monocytes may be able to differentiate in IPF. Mononuclear cells are drawn into the lungs during the pulmonary fibrosis process in reaction to tissue damage, where they can develop into durable macrophages that eventually produce TGF-β, CCL18, CHI3L1, and MMPs, which causes fibroblast activation, myofibroblast differentiation, and remodeling of the ECM. According to Analysis, S100A12’s mRNA levels drastically dropped during the monocyte-to-macrophage transition. Some researchers hypothesized that TGFB1 and TNF-alpha may both limit the production of S100A12 during the fibrosis process, or that the differentiation of monocytes may be the reason of low S100A12 expression [20].
Finally, in order to analyze the immunological cell infiltration of IPF, we employed the CIBERSORTx method, and the findings revealed a substantial difference between IPF and healthy groups in terms of immunological cell infiltration. Studies believe that immunological cell infiltration can clear aging alveolar epithelial cells and play a key part in the incidence and growth of IPF [21]. It has been reported that B cells are abnormal in IPF lung tissues [22]. T cells follicular helper have been recognized as a new type of helper T cells. Studies have implicated Plasma cells as crucial effector cells in the initiation and development of pulmonary fibrosis [23]. It was discovered that there was a very large pathogenic activation and proliferation of T-cell follicular helper cells in the peripheral blood of individuals with idiopathic pulmonary fibrosis [24]. Studies have shown that Tregs can promote the discharge of TGF-β1 and the deposition of collagen protein during lung injury [25].
Previous research has demonstrated the critical role that the immunological microenvironment plays in the onset and progression of illnesses. As a result, immunological-related biomarkers have garnered a lot of interest in the diagnosis of illness. In this work, we demonstrated that combined microarray data with a larger sample volume might more correctly and efficiently depict the biological aspects of IPF. A more thorough molecular mechanism for comprehending the development of IPF is provided by the biological roles and pathways of the discovered genes. Results of bioinformatic analysis and immunological infiltration analyses suggested that COL1A1, COL3A1, COL1A2, MMP2, BDNF, CAV1, CDH5 and S100A12 are novel predictive biomarkers for IPF, in this regard, fibroblasts play a key role. To clarify the roles of important genes and immunological infiltration profiles in the progress of IPF, and to validate the connection between the key genes and immunological infiltration, more research is nonetheless required.
In this study, we used bioinformatics to screen differentially expressed genes, some of which have been shown to be involved in IPF and some of which have not yet been studied, suggesting that they may provide a reference for subsequent studies on the mechanism of idiopathic liver fibrosis and drug therapy, which may need to be verified by ex vivo experiments and clinical data. Unfortunately, due to the limitations of our current research conditions, we were unable to conduct a more in-depth study. We look forward to conducting more in-depth studies in the future when conditions permit.
Conclusions
The current investigation has identified genes that may serve as biomarkers for IPF and has indicated an imbalanced immunological response at the transcriptome level. These genes include COL1A1, COL3A1, POSIN, COL1A2, MMP2, BDNF, CAV1, CDH5 CRIA1 and S100A12. The identified genes are involved in various molecular functions, biological processes, cellular composition, and signalling pathways that may have a close association with the pathogenesis of IPF. Furthermore, the study reveals that the formation and progression of IPF might be influenced by B memory cells, Plasma cells, T cells, Macrophages M0, Mast cells resting and Eosinophils. This study provides an additional insight into the prognosis and personalised care for IPF.
Funding
Self-financed Scientific Research Project of Guangxi Zhuang Autonomous Region Health and Family Planning Commission (Z20180969).
Conflicts of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Data availability
The availability of the data was disclosed as follows: The information may be found at NCBI GEO: GSE92592, GSE83717, GSE52463, GSE134692, GSE99621, GSE2052, GSE150910, GSE10667, GSE24206, GSE110147, GSE53845 and GSE17978.
