Abstract
Integrated analysis of accumulated data is an effective way to obtain reliable potential diagnostic molecular of cervical lymph node metastases (LNM) in papillary thyroid carcinoma (PTC). The benefits of prophylactic lymph node dissection (PLND) for these clinically node-negative (cN0) patients remained considerable controversies. Hence, elucidation of the mechanisms of LNM and exploration of potential biomarkers and prognostic indicators are essential for accurate diagnosis of LNM in PTC patients. Up to date, advanced microarray and bioinformatics analysis have advanced an understanding of the molecular mechanisms of disease occurrence and development, which are necessary to explore genetic changes and identify potential diagnostic biomarkers. In present study, we performed a comprehensive analysis of the differential expression, biological functions, and interactions of LNM-related genes. Two publicly available microarray datasets GSE60542 and GSE129562 were available from Gene Expression Omnibus (GEO) database. Differentially expressed genes between clinically node-positive (cN1) and cN0 PTC samples were screened by an integrated analysis of multiple gene expression profile after gene reannotation and batch normalization. Our results identified 48 differentially expressed genes (DEGs) genetically associated with LNM in PTC patients. Gene ontology (GO) analyses revealed the changes in the modules were mostly enriched in the regulation of MHC class II receptor activity, the immune receptor activity, and the peptide antigen binding. Kyoto encyclopedia of genes and genomes (KEGG) enrichment analysis of DEGs displayed the intestinal immune network for IgA production, staphylococcus aureus infection, and cell adhesion molecules (CAMs). To screen core genes related to LNM of PTC from the protein-protein interaction network, top 10 hub genes were identified with highest scores. Our results help us understand the exact mechanisms underlying the metastasis of cervical LNM in PTC tissues and pave an avenue for the progress of precise medicine for individual patients.
Keywords
Introduction
Papillary thyroid carcinoma (PTC) is exceeding malignancies with a rise in novel cases worldwide, which is largely due to the widespread utilization of high-resolution ultrasound and fine-needle aspiration (FNA) biopsy [1–3]. Thyroidectomy couple with therapeutic neck dissection are preferred for patients with known lymph node metastasis in preoperative examination such as Ultrasound (US), Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). However, the capability of these examination to recognize lymph node metastasis is common restricted for the overlying thyroid gland [4, 5]. Up to date, unfortunately, there remains considerable controversy about the benefits of prophylactic lymph node dissection (PLND) for these clinically node-negative (cN0) patients. Some insist PLND will be helpful to reduce the recurrence of PTC and avoid the possibility of secondary surgery, while others argue PLND increased risk of recurrent laryngeal nerve damage and permanent hypoparathyroidism [6, 7]. Contrast-enhanced ultrasonography (CEUS) is a potential tool to investigate the microvascular flow of different tissues/organs, which displays the microvascularity of cancer tissue and achieves better temporal and spatial resolution than the convention Doppler techniques [8]. Ki-67 is a crucial biomarker that affects cell proliferation and a fundamental indicator for evaluating the recurrence and metastasis of various malignant tumors, including breast cancer, colorectal cancer, lung cancer, and thyroid cancer [9]. Moreover, p53 protein is a tumor suppressor, which can promote the transformation and excessive proliferation of malignant tumor cells [10]. Although extensive efforts have been made, studies providing LNM gene expression profiles of PTC are relatively rare. Thus, it is a pressing task to identify molecular biomarkers and explore underlying mechanisms involved in the LNM in PTC patients, which could help develop novel diagnostic and individual thera-peutic strategies.
Microarray techniques and bioinformatic analysis have been widely adopted to recognize the differentially expressed genes (DEGs) and biological functional pathways [11, 12]. However, solid results may be difficult to be acquired because of small sample size and high false-positive rate in independent microarray analysis. In addition, different microarray platforms, various statistical methods and smaller sample sizes may lead to unconvincing results from these investigations. Accordingly, further comprehensive analysis is needed to identify more reliable biomarkers to overcome these inconsistencies. In our research, two mRNA microarray datasets from the Gene Expression Omnibus (GEO) were picked out for subsequent analysis. DEGs between positive-node (N1) status and negative-node (N0) status from patients with PTC were screened to determine key biomarkers. Subsequently, possible mechanisms underlying metastasis of lymph node were investigated through Ontology (GO) enrichment, Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment and protein-protein interaction (PPI) network analysis. In summary, a total of 48 DEGs and top 10 hub genes were identified, which may be prospective diagnosis and therapy biomarkers and therapeutic targets for LNM in PTC patients, specifically in deciding to perform PLND in the absence of clinical LNM.
Materials and methods
Our research was conducted according to Declaration of Helsinki and approved by approved by the Ethics Committee of the Affiliated Hospital of Jiangsu University.
Microarray data
GEO (http://www.ncbi.nlm.nih.gov/geo) is a public functional genomics data repository of high throughout gene expression data, chips and microarrays [13]. Two gene expression datasets GSE60542 and GSE129562 were downloaded from GEO. The probes were converted into the corresponding gene symbol according to the annotation information. The GSE60542 contained 19 positive-node PTC samples and 14 negative-node PTC samples. GSE129562 is involved in 5 present-node tissues and 3 absent-node tissues in PTC.
Identification of DEGs
GEO datasets were often based on different platforms, and samples were processed on different days, by different groups even by different people, we integrated all samples from the two datasets to improve the number of samples (21 positive-node PTC samples and 17 negative-node PTC samples). Batch normalization was applied to provide a balance between discovery of statistically significant genes and limitations of false-positives using sva and limma R package. Probe sets without corresponding gene symbols or genes with more than one probe set were removed or averaged, respectively. DEGs were screened by R package limma (http://bioconductor.org/packages/release/bioc/html/limma.html) [14]. Genes from each sample with the following criteria were retained: (1) a |log2 (foldchange)|>0.5 and (2) P < 0.05. The integrated dysregulated gene lists were saved for following analysis.
Kyoto encyclopedia of genes and genomes and gene ontology enrichment analyses of DEGs
The identified DEGs were annotated by Gene Ontology and Kyoto encyclopedia of genes and genomes (KEGG) analysis using “clusterProfiler” Bioconductor package [15], which integrates biological data and analysis tools, and provides a comprehensive set of functional annotation of genes to extract biological information. GO is a major bioinformatics tool to annotate genes and analyze biological process of these genes [16]. KEGG is used to understanding high-level functions and biological systems from large-scale molecular datasets generated by high-throughput experimental technologies [17]. P value <0.05 was considered as statistical significance.
Protein-protein interaction network construction and module analysis
Protein-protein interaction (PPI) networks were predicted using the Search Tool for the Retrieval of Interacting Gene (STRING) online database (http://string-db.org; version 10.5) [18]. Each node in the PPI network represents a gene, protein or other molecule, and the connections between the nodes represent the interactions between these biomolecules. The most closely associated nodes may indicate hub genes with important physiological regulatory functions [19]. The analysis of functional interactions between proteins may provide insight into the mechanism of the occurrence or metastasis of the disease. In current investigation, the PPI network of DEGs was constructed via string database, and the interactions with comprehensive scores>0.4 was considered statistically significant.
Statistical analysis
Quantitative data are exhibited as mean±standard deviation (SD). Student’s t-test or one-way analysis of variance (ANOVA) were used for statistical analysis of all cell line experiments. Gene expression correlations were assessed using Spearman’s correlation. P-value <0.05 was considered statistically significant.
Results
Convergence of gene expression signatures across different experiments from GEO database
In present investigation, two datasets GSE60542 and GSE129562 were available from GEO database. The GSE60542 dataset included 19 positive-LNM and 14 negative-LNM in PTC samples. The GSE129562 dataset consisted of 5 present-LNM tissues and 3 absent-LNM in PTC tissues. The data from the two microarray datasets were batch normalized and then merged to increase sample number. Next, limma package was applied to screen differentially expressed genes according to the threshold of |log2 (foldchange)| and P < 0.05. Total 48 DEGs were finally selected out, comprising 14 up-regulated genes and 34 down-regulated genes (Table 1). Volcano plot and heat maps for DEGs, illustrating the trends in DEG expression, were visualized in Fig. 1.
All 48 differentially expressed genes (DEGs) were screen from two profile datasets
All 48 differentially expressed genes (DEGs) were screen from two profile datasets

(a, b) Volcano of DEGs among GSE60542 and GSE129562 after batch normalization. (b) Heatmap of 48 DEGs available from GEO database. The red color represents the upregulated genes, the green color represents the downregulated genes, and the black color represents genes without any change.
We then performed GO enrichment analysis of DEGs in LNM of PTC samples using “clusterProfiler” Bioconductor package. The GO enriched function of DEGs were mainly involved in immune receptor activity (5 genes), G protein-coupled receptor binding (4 genes), and MHC class II receptor activity (3 genes) (Table 2; Fig. 2a). Furthermore, 25 KEGG pathways were overrepresented in the DEGs. Most of the DEGs were predominantly enriched in intestinal immune network for IgA production (6 genes), staphylococcus aureus infection (5 genes), and cell adhesion molecules (CAMs) (5 genes) (Table 3; Fig. 2b).
Gene ontology analysis of differentially expressed genes of cervical LNM in PTC tissues
Gene ontology analysis of differentially expressed genes of cervical LNM in PTC tissues

(a) Gene Ontology analysis was used to obtain significant functional analysis. (b) KEGG analysis were used to obtain significant enriched KEGG terms. The color represents the P value.
Kyoto encyclopedia of genes and genomes pathways of differentially expressed genes of cervical LNM in PTC tissues
All DEGs were screened using the STRING database to further investigate their properties and the interactions among them. The PPI network was constructed based on STRING database to provide deep insights into the mechanisms underlying metastasis of cervical LNM in PTC tissues. The PPI network consists of 20 nodes and 31 edges, in which the nodes correspond to genes and the edges represent the links between genes, and top 10 genes with the most connections in PPI relationship pairs were graphed in barplot diagram, including SELL, CX3CR1, FCGR2B, CD38, IL15, HLA-DRA, CCL8, CD79A, HLA-DMA and HLA-DPA1 (Table 4; Fig. 3).
Functional roles of top 10 hub genes
Functional roles of top 10 hub genes

(a) Protein-protein interaction (PPI) network constructed among DEGs with the criterion of interaction score >0.4. (b) Top 10 genes in PPI relationship pairs were graphed in barplot diagram. The x-axis represents the number of connections.
PTC is associated with an excellent clinical outcome, as a result, the 5-year survival rate for PTC is approximately 98% and is one of the indolent diseases among cancers [20, 21]. Although the mortality rate is very low, recurrent and metastatic diseases can occur in approximately 20–30% of cases, which are mainly related to metastatic lymph nodes [22]. The role of routine lymph node dissection in the treatment for PTC remains controversial. In the preset, contrast-enhanced ultrasound (CEUS) is a common method to detect LNM in clinical practice, which usually presents centripetal or asynchronous perfusion, hyperenhancement, heterogeneous enhancement, perfusion defects and ring-enhancing margins in these metastasis LNM in PTC patients [23]. The quantitative CEUS has the higher sensitivity to detect the subtle hemodynamic changes of the lesions because metastatic LNMs often secret vascular growth factors to form lots of twisted, irregular diameter and incomplete vascular walls of immature micro-vessels [24]. However, their value in clinical application is still controversial attributed to inconsistent and unsatisfactory results [25, 26]. It is generally accepted therapeutic neck dissection is recommended to remove known metastatic lymph nodes identified either pre- or intraoperatively because this surgical management can reduce the incidence of PTC persistence/recurrence [27]. However, there are no randomized controlled trails to support the benefit of routine prophylactic LND for PTC patients with no pre-or intraoperative evidence of nodal metastasis [28, 29]. Therefore, due to the debates of current surgical treatment of PTC and prophylactic LND as well as the fuzziness in the literature, molecular biomarkers that allow risk stratification of the presence of LNM will be of great clinical value. US-guided fine-needle aspiration biopsy (FNAB) is widespread utilized to improve the diagnostic accuracy through cytologic and genomic detection [30]. More importantly, the capability to genomic detection in cytologic specimens is not subordinate to that in pathologic specimens [31].
Microarray is a promising and popular method for large-scale gene expression profiling, greatly facilitating the analysis of thousands of dysregulated key genes responsible for carcinogenesis [12, 32–34]. In the present study, we identified biomarkers involved in metastatic lymph nodes of PTC patients by analyzing gene expression data downloaded from GEO databases. Since the datasets collected for two-dataset analysis were based on different platforms, and the samples were processed on different dates or in different groups, we used the R package to remove or minimize batch effects [35]. A total of 48 DEGs were identified to associate with LNM between clinically node-positive (cN1) and node-negative (cN0) PTC samples. GO annotation result of DEGs demonstrated immune receptor activity, G protein-coupled receptor binding, and MHC class II receptor activity were chiefly enriched. As the largest family of membrane protein receptors, G protein coupled receptors can be activated by a variety of ligands and play corresponding signal transduction functions, thus participating in crucial physiological processes in organisms, such as the regulation of the immune system, the regulation of the autonomic nervous system and the maintenance of homeostasis [36]. Upon G protein coupled receptor is stimulated, G protein related signal pathways including cAMP pathway, phosphatidylinositol signal pathway, and MAPK/Erk signal pathway [37–39], which will largely affect cell survival and cause adverse stress reactions. Taken together, these observations imply that the activation of G protein coupled receptor may be a major contributor to metastatic lymph nodes. KEGG enrichment analysis of DEGs showed many of these genes were mapped to intestinal immune network for IgA production, staphylococcus aureus infection, and cell adhesion molecules (CAMs), which suggested a critical role of immune and inflammatory responses in metastatic of lymph nodes. The directional migration of lymphocytes is caused by the interaction between the lymphocyte homing receptor and vascular address on endothelial cells [40, 41]. In the PPI network identified in the present study, top 10 hub genes were highlighted as the most significant hub genes, with multiple interactions in the network.
Limitations
It is undeniable that there are still several limitations in our research. All DEGs retrieved from online databases, so this result has not been verified by clinical research. Also, although we integrated two GEO datasets to enlarge sample number, the total sample number was still not big enough to restrict the discovery of more DEGs in metastasis of lymph node. Therefore, further validation studies could lead to additional insights into the development of LNM as well as the identification of additional functional biomarker candidates for improved clinical diagnostic of LNM in PTC patients.
Conclusion
In summary, our comprehensive bioinformatic analysis has identified numerous useful molecular targets for the future investigation of the mechanisms and selection of biomarkers for LNM in PTC patients. Several important biological processes and pathways as well as the hub genes functioning in these processes may provide novel insights into the development and progression of metastasis of lymph node. Furthermore, further molecular biological exploration will be performed to confirm the function of the identified genes associated with LNM in PTC patients. Last but not the least, pre-operative hub genes analysis via minimally invasive US-FNA biopsy can guide the surgical management of PTC patients.
Footnotes
Acknowledgments
We greatly acknowledged the Zhenjiang Social Development Fund (SH2018035), Zhenjiang Key Research and Development Plan Fund (SH2016036) and the Doctoral Start-up Fund of Affiliated Hospital of Jiangsu University (jdfyRC2017013).
Disclosure
All authors report no conflicts of interest in this work.
