Abstract
BACKGROUND:
Pulmonary metastasis is the most frequent cause of death in osteosarcoma (OS) patients. Recently, several bioinformatics studies specific to pulmonary metastatic osteosarcoma (PMOS) have been applied to identify genetic alterations. However, the interpretation and reliability of the results obtained were limited for the independent database analysis.
OBJECTIVE:
The expression profiles and key pathways specific to PMOS remain to be comprehensively explored. Therefore, in our study, three original datasets of GEO database were selected.
METHODS:
Initially, three microarray datasets (GSE14359, GSE14827, and GSE85537) were downloaded from the GEO database. Differentially expressed genes (DEGs) between PMOS and nonmetastatic osteosarcoma (NMOS) were identified and mined using DAVID. Subsequently, GO and KEGG pathway analyses were carried out for DEGs. Corresponding PPI network of DEGs was constructed based on the data collected from STRING datasets. The network was visualized with Cytoscape software, and ten hub genes were selected from the network. Finally, survival analysis of these hub genes also used the TARGET database.
RESULTS:
In total, 569 upregulated and 1238 downregulated genes were filtered as DEGs between PMOS and NMOS. Based on the GO analysis result, these DEGs were significantly enriched in the anatomical structure development, extracellular matrix, biological adhesion, and cell adhesion terms. Based on the KEGG pathway analysis result, these DEGs were mainly enriched in the pathways in cancer, PI3K-Akt signaling, MAPK signaling, focal adhesion, cytokine-cytokine receptor interaction, and IL-17 signaling. Hub genes (ANXA1 and CXCL12) were significantly associated with overall survival time in OS patient.
CONCLUSION:
Our results may provide new insight into pulmonary metastasis of OS. However, experimental studies remain necessary to elucidate the biological function and mechanism underlying PMOS.
Introduction
Osteosarcoma (OS) is most common type of malignant bone cancer. The pulmonary tissue is the most common site for metastatic spread of the disease, accounting for more than 75% of metastases. Pulmonary metastasis is the major cause of OS leading deaths [1, 2, 3, 4]. Hence, novel therapeutic targets for pulmonary metastasis in osteosarcoma (PMOS) are urgently needed.
Many reports demonstrated that abnormal genes contribute to the pathogenesis [5, 6, 7, 8] and metastasis of OS [9, 10, 11, 12, 13]. Recently, several bioinformatics studies specific to pulmonary metastasis have been applied to identify genetic alterations in PMOS. However, DEGs in these reports are mostly based on tissue-cell mixed samples or independent database analysis, limiting the interpretation and reliability of the results obtained. The expression profiles and key pathways specific to PMOS remain to be comprehensively explored. Therefore, in our study, three original datasets of GEO database were selected according to strict screening criteria: microarray datasets that contain both PMOS and nonmetastatic osteosarcoma (NMOS) in human tissue samples. We mined and analyzed data from these databases, anticipating the identification of gene signatures specific to PMOS and creating a foundation for further studies.
Materials and methods
Data sources
Microarray data were downloaded from the publicly available GEO database. The screening indicators were as follows: data that contain both PMOS and nonmetastatic osteosarcoma (NMOS), human tissue samples. Datasets that did not meet these indicators were excluded. After careful view, three datasets were identified for further analysis: GSE14359 [14], GSE14827 [15, 16], and GSE85537 [17] (Table 1). Raw data and probe annotation files were downloaded for further analysis.
Summary of microarray datasets
Summary of microarray datasets
GEO, Gene expression Omnibus; PMID, PubMed unique identifier.
The GEO2R online analysis tool (
GO and KEGG analyses
The GO analysis include the classified type of BP, MF, and CC. GO and KEGG pathway analyses of DEGs used DAVID tools (
Construction of PPI network
The DEGs that were previously identified were mapped to the STRING database (
Identification of hub genes
CytoHubba was used to calculate the node degree for each protein. The top 10 genes were identified as hub genes.
Survival analysis
The correlation between key genes and survival time was assessed with the survival function in the R package using clinical information from the TARGET database. Patients in this series matrix were divided into high and low expression groups based on the expression level of mRNA. In overall survival analysis of each hub gene, GraphPad Prism software (v.8,
Volcano plot of DEGs. Note: Yellow, blue, and red color represent the relatively down, non-different and up expression of genes, respectively. X: 
GO analysis of DEGs. Note: (A) Go annotation of DEGs in three parts: BP, MF, and CC. The horizontal axis represents the name of the GO terms and the vertical axis represents the number of genes. (B) The top 20 of enrichment terms of DEGs. The horizontal axis represents the rich factor value and the vertical axis represents the name of the GO terms. The size of the dot represents the number of genes involved in the terms. The color changes from red to green represent changes in significance from high to low.
KEGG pathway analysis of DEGs. Note: (A) KEGG pathway annotation of DEGs in six parts: Cellular Processes, Human Diseases, Environmental Information Processing, Organismal Systems, Metabolism, and Genetic Information Processing. The horizontal axis represents the number of genes and the vertical axis represents the name of the pathways. (B) The top 20 of enrichment pathways of DEGs. The horizontal axis represents the rich factor value and the vertical axis represents the name of the pathways. The size of the dot represents the number of genes involved in the pathways. The color changes from red to green represent changes in significance from high to low.
Identification of DEGs
After a careful screening, a total of 340 DEGs were identified from GSE14359, including 268 upregulated and 72 downregulated genes; 67 DEGs were identified from GSE14827, including 5 upregulated and 62 downregulated genes; 1426 DEGs were identified from GSE85537, including 304 upregulated and 1122 downregulated genes. All DEGs were identified by comparing PMOS samples with NMOS samples. 26 DEGs was found to be differentially expressed in at least two profiles. Unfortunately, zero DEGs was found to be differentially expressed in all three profiles. Therefore, we harvested 1807 genes after summing up the three databases and removing the duplicates. Subsequently, volcano plot analysis of these DEGs was performed (Fig. 1).
Functional enrichment analysis
GO and KEGG pathway analyses were carried out to explore the functions of these 1807 DEGs. GO analysis indicated that DEGs were enriched in numerous functional terms (Fig. 2A). BP terms included cellular process, biological regulation, regulation of BP, metabolic processes, response to stimulus, multicellular organismal processes, developmental processes, signaling, localization, and positive regulation of BP. MF terms included binding, catalytic activity, molecular function regulator, transcription regulator activity, and transporter activity. CC terms included cell, organelle, membranes, extracellular regions, and protein-containing complexes. A bubble chart for these terms was organized into the following: anatomical structure development, developmental processes, multicellular organism development, multicellular organismal processes, extracellular matrix, system development, anatomical structure morphogenesis, collagen-containing extracellular matrix, animal organ development, cellular developmental processes, cell development, biological adhesion, cell adhesion, cell differentiation, regulation of developmental processes, extracellular regions, extracellular matrix organization, plasma membrane regions, regulation of multicellular organismal processes, and extracellular structure organization as the top 20 enrichment GO terms (Fig. 2B).
KEGG analysis indicated that DEGs were enriched in numerous pathways (Fig. 3A). The bubble chart for these pathways was organized into the following: cancer, PI3K-Akt signaling, MAPK signaling, cytokine-cytokine receptor interaction, IL-17 signaling, focal adhesion, AGE-RAGE signaling in diabetic complications, ECM-receptor interaction, Rap1 signaling, proteoglycans in cancer, protein digestion and absorption, ovarian steroidogenesis, relaxin signaling, calcium signaling, TNF signaling, Chagas disease (American trypanosomiasis), cAMP signaling, platelet activation, transcriptional misregulation in cancers, and glutamatergic synapse (Fig. 3B).
PPI network construction
PPI network was constructed using the STRING database to examine interactions among DEGs. A visualized network was created by Cytoscape with confidence
PPI network. Note: Pink nodes represent upregulated genes, blue nodes represent downregulated genes.
The identified top 10 genes, which were evaluated by degree of connectivity in the PPI network, were as follows: complement C3 (C3, degree
Top ten hub genes with higher degree of connectivity
Top ten hub genes with higher degree of connectivity
Top 10 hub genes. Note: 10 identified hub genes: C3, GNG11, PPBP, ANXA1, GPR17, CXCL8, CXCL12, ADCY1, EGFR, AGTR2. Red (high degree value), yellow (low degree value).
Modulation of ANXA1 (
Overall survival analysis of the hub genes. Note: Gene changes of ANXA1 (A) and CXCL12 (B) were significantly correlated with the overall survival time of OS patients (
Previous studies are mostly based on a single dataset or tissue-cell mixed datasets, limiting the interpretation of the results in PMOS [18, 19, 20, 21]. In the present study, we strictly integrated three datasets according to the screening criteria to identify key genes correlated with PMOS. By comparing the three mRNA profiles downloaded from the GEO database, 569 upregulated and 1238 downregulated DEGs were screened among 20 PMOS tissue samples and 31 NMOS tissue samples. These DEGs were associated with many GO terms significantly enriched in various KEGG pathways. GO and KEGG analyses indicated that DEGs were closely associated with tumor progression and chemoresistance [22, 23, 24, 25, 26]. Subsequently, PPI network was constructed to investigate the interrelationship among the DEGs. The top 10 hub genes were ranked based on the degree of connectivity. C3, PPBP, EGFR, and AGTR2 were upregulated in PMOS, whereas GNG11, ANXA1, GPR17, CXCL8, CXCL12, and ADCY1 were downregulated. Finally, a survival curve was generated to assess relationships between the expression level of hub genes and prognosis of OS patients. Low circulating levels of ANXA1 and CXCL12 were linked to significantly worse survival.
ANXA1, a member of the annexin superfamily, is a calcium-dependent phospholipid-binding protein. The role of ANXA1 in tumor development and metastasis has been documented in many cancers [27, 28, 29, 30, 31, 32, 33], but its role and underlying mechanism in OS are poorly understood. We show for the first time a clinically positive correlation between ANXA1 and overall survival rate.
CXCL12 (also known as SDF-1), a member of the CXC subfamily of chemokines, is secreted by stromal cells from a variety of tissues [34]. The involvement of CXCL12 is implicated in tumor progression of a variety of cancers and immune diseases [35, 36, 37], but its role in the metastasis of OS is ambiguous. An interaction between CXCL12 and its two receptors, CXCR4 and CXCR7, drives the metastasis of OS cells to the lung [38, 39, 40, 41]. Interestingly, CXCL12 is weakly, or even negatively, expressed in lung metastases from OS [42]. CXCL12 positively correlated with better longtime survival and lower prevalence of metastasis [43]. High expression of CXCL12 in OS tissues predicted longer survival, which is consistent with our current results. Osteoblasts and OS cells express the specific chemokine axis (CXCR4, CXCR7, and CXCL12) might prevent tumor cells from entering the circulation [44, 45]. Recently, the levels of CXCL4, CXCL6, and CXCL12 in plasma in OS patients were significantly higher compared with that in the controls. However, the higher circulating levels of CXCL4 and CXCL6, but not CXCL12, were associated with a poorer outcome for OS patients [46]. Certainly, the CXCL12-CXCR4/CXCR7 axis is not the sole regulator of systemic spread but could be crucial for metastatic cancer development. Moreover, PI3K pathways are common important downstream pathways regulated by ANXA1 [47] and the CXCL12-CXCR4/CXCR7 axis [48].
Conclusion
The present study identified genes that may be useful diagnostic biomarkers and treatment goals for PMOS patients. The results will support studies of mechanisms and effective therapies for PMOS. However, due to limitations in the present study, additional work is required to elucidate molecular mechanisms involved in this challenging disease. Necessarily, further research is required to verify the mechanisms of ANXA1 and CXCL12 in PMOS.
Author contributions
Data collection: Yinan Chai; Data analysis: Lihan Xu, Rui He; Manuscript writing: Liangjun Zhong; Conception and design: Yuying Wang.
Data availability
The data used to perform the analyses described herein are publicly available from the GEO (
Ethical considerations
There was no need to obtain ethical approval from the hospital since all data in the manuscript was based on bioinformatics, and GEO and TARGET have obtained informed consent from all participants.
Funding
This work was supported by the Natural Science Foundation of Zhejiang Province (Grant number: LY20H060003).
Footnotes
Conflict of interest
The authors declare that there are no competing interests associated with the manuscript.
