Protein Network Analysis of the Fifth Chromosome of Zebrafish

Abstract

In the past 20 years, zebrafish has gradually become an important animal model for studying the function of human genes. At the same time, identification of zebrafish reference genome sequences and >10,000 protein-coding genes indicates that they are at least 75% homologous to human genes, further validating its utility as a research model for growth and developments (GDs). However, the molecular mechanism of zebrafish GDs has almost no molecular interactions, so a protein–protein interaction (PPI) network is highly desirable. The project extracted 58 genes on the fifth chromosome of the model organism, and studied and modeled its GDs mechanisms, which encompassed 296 interactions between the 58 proteins involved in GDs. This includes not only accurate predictions of PPI but also those molecular interactions collected from the literature and experimentally derived. These molecules then interact, modularize, look for central genes, analyze, and predict their GDs processes, and hope to help scholars study the process of GDs, providing hypotheses and help.

1. Introduction

The zebrafish was first identified as a genetically managed organism in the 1980s. The development of zebrafish is divided into six stages: cleavage stage, blastocyst stage, gastrula stage, division stage, forming stage, and incubation period (Fu and Wang, 2013). During the first 24 hours, zebrafish development begins with activation of the egg and then continues to occur in subsequent fertilized eggs, cleavage, blastocysts, gastrointestinal and schizophrenia. Therefore, they are involved in the main stages of pattern formation: cell proliferation, cell differentiation, axial determination, primary genital production, and the appearance of primary organ systems to establish a basic vertebrate body plan.

Zebrafish attracts the attention of many researchers because of their small size, low cost of breeding, large-scale breeding, and many advantages. After more than 30 years of research application and system development, there are ∼20 zebrafish strains. The zebrafish gene has relevant data in the National Center of Biotechnology Information (NCBI) database for query and download, which is convenient for research. The zebrafish cell marker technology, tissue transplantation technology, mutation technology, haplotype technology, transgenic technology, gene activity inhibition technology, etc. have matured, and there are thousands of zebrafish embryo mutants, which are the molecular mechanisms for studying embryonic development (Webb and Miller, 2000). As for these good resources, some can also be used as a model of human disease. Zebrafish has become one of the most important vertebrate developmental biology models, and its use in other disciplines has shown great potential. Since the zebrafish gene has 87% similarity to the human gene, which means that the results obtained by doing drug experiments on it are also applicable to the human body in most cases, it is valued by biologists.

Zebrafish can be applied to cell lineage analysis, mutation identification, saturation mutagenesis analysis, gene transfer research, and gene linkage map construction in developmental biology, and its in vitro fertilization, embryo body transparency, and rapid cell division make it more comparable with large mice (Kanehisa et al., 2004). They have a unique advantage. The zebrafish is transparent and observable. The early embryos of zebrafish are transparent and grow in vitro. The development of embryos at various stages can be well observed through a microscope. Today, scientists have also developed zebrafish that are transparent throughout adulthood (Yang et al., 2014). Cannono said, “This is very exciting, because under the microscope you can observe the development of their organs and systems, you can see their blood flowing, their heart is beating.”

Systematic application of gene screening results in a large number of phenotypic traits of mutation. These mutations, when driven to homozygotes, can produce defects in the pathology of various organ systems similar to human diseases. These studies have made important contributions to our understanding of the development of vertebrates. Using zebrafish, one can study the basic problems of life sciences and reveal the molecular mechanisms of embryonic and tissue development (Isogai et al., 2001).

2. Methods

2.1. Data source

Data were obtained from the NCBI Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo). Due to the high homology between human and zebrafish, from the 47,036 zebrafish genes 58 representative zebrafish growth and developments (GDs) related genes of the fifth chromosome of zebrafish were selected.

2.2. Identification of GDs

After obtaining raw data, the data were normalized using python software, and then a one-sample t-test was performed to identify the GDs (Van Houcke et al., 2015). The confidence interval is chosen to be 99% as the cutoff criterion (Table 1).

Table 1.

Results of Confidence Interval on Growth and Development from Gene Expression Omnibus Databases

Interval	Number
Num = 0	10
0 < Num <10	9
10 < Num <100	11
100 < Num <1000	13
Num >1000	15

2.3. The functional enrichment analysis of the GDs

Kyoto Gene and Genomic Encyclopedia (KEGG) is a database for genome deciphering. It is a well-recognized comprehensive database that systematically analyzes gene function and genomic information databases, which helps researchers conduct research on genes and expressions as a whole, including various biochemical pathways. In this study, the KEGG database was used to study the enrichment analysis of GDs to find biochemical pathways that may affect the GDs of zebrafish (Yang et al., 2014). Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to perform KEGG pathway enrichment analysis with p value <0.025 and gene count >2.

2.4. Protein–protein interaction network construction

Since proteins rarely function independently, it is important to understand the interaction of these proteins by studying the larger functional groups of proteins (Kanehisa et al., 2010). In this study, the STRING online tool was used to analyze the protein–protein interactions (PPIs) of GDs with a composite score >0.15 cutoff criteria. We abandon the relationship of node degrees ≤5, and then use the Cytoscape software to build the network. From previous studies, most of the PPI networks obtained were subject to scale-free attribution. Therefore, the node degree of the network is analyzed and used to obtain the central protein in the PPI network, and the connectivity degree ≥20 is selected as the central node, and it is predicted that it may play an important role in the aging research progress of zebrafish (Kimmel et al., 1995).

2.5. Network module analysis of the GDs

The nodes and edges of the PPI network are very complex, and we need to use the Cluster ONE Cytoscape plugin for enrichment analysis. Before running Cluster ONE to disclose the rich functional modules of the PPI network, we set the parameter minimum size >5 and the minimum density <0.05. We also performed gene ontology (GO) functional enrichment analysis on modular genes to analyze gene function at the molecular level (Panula et al., 2006). In addition, we also used DAVID to perform the KEGG pathway enrichment analysis of the best enrichment module, and also performed K-means clustering.

2.6. Training support vector machine model

To predict functional PPI, we trained a support vector machine (SVM) model based on the integration of four different types of data sets as described in Section 2 (Venkat et al., 2017). The SVM model was evaluated using 10-fold cross-validation, with 10 cross-validations repeated 100 times, and the same number of negative samples as the positive samples was randomly selected to train the SVM model each time (Nakayama et al., 2006). To show the performance of the training model against the baseline data set, the average of 10 cross-validations was used as the final output of the model. It can be seen that our well-trained models are highly accurate, indicating the reliability of our predicted PPI.

2.7. Results and discussion

2.7.1. Normalized GDs

For the data after preprocessing, according to the information obtained by the known database, normalization of python is performed, and the minimum–maximum normalization process, also called the deviation standardization, is a linear transformation of the original data, so that the result values are mapped to between [0, 1]. The conversion function is as follows: $X^{*} = \frac{x - min}{max - min} .$

In this project, this type was used to first quantify the protein–protein correlation of zebrafish GDs, and then do a single-sample t-test. The one-sample test is also called the t-test of the comparison of the sample mean and the population mean (Kanehisa et al., 2008). N samples were taken in the population for normal distribution, and it is judged whether the population mean μ is the same as the known population mean μ(). We screened out the highly relevant protein genes we wanted. The results are as follows.

2.7.2. KEGG pathways analysis

To further understand the function of GDs, DAVID was applied to identify the KEGG pathway of significant dysregulation (Cherkassky and Ma, 2004). The pathways obtained from p values <0.05 and gene counts >2 from all aging-related genes are given in Table 2, respectively. According to the enrichment results, focal adhesion, inositol phosphate metabolism, extracellular matrix (ECM)–receptor interaction, and mammalian target of rapamycin (mTOR) signaling pathway were significantly enriched. There are also some pathways that we might have predicted before, such as autophagy, and we have reason to believe that these protein interactions can lead to or predict factors in the GDs of zebrafish that we have not considered before (Srihari and Leong, 2012).

Table 2.

The Kyoto Gene and Genomic Encyclopedia Pathways of Growth and Development

No. term ID	Term description	Matching proteins in your network (labels)	False discovery rate
dre04115	p53 signaling pathway	atm, atr, cdk2, chek2, mdm2, mdm4, ptenb, tp53	9.18E-10
dre04218	Cellular senescence	atm, atr, cdk2, chek2, mdm2, nbn, ptenb, tp53	2.94E-07
dre04110	Cell cycle	atm, atr, cdc25, cdk2, chek2, mdm2, tp53	9.79E-07
dre04068	FoxO signaling pathway	atm, cdk2, ins, mdm2, ptenb	0.00057
dre04010	MAPK signaling pathway	cdc25, fgf5, ins, ntrk2b, tp53	0.0161
dre04070	Phosphatidylinositol signaling system	MTMR4, cds2, ptenb	0.0161
dre04350	TGF-β signaling pathway	bmpr1ba, fsta, spaw	0.0161
dre04914	Progesterone-mediated oocyte maturation	cdc25, cdk2, ins	0.0161
dre03440	Homologous recombination	atm, nbn	0.0174
dre04140	Autophagy—animal	MTMR4, ins, ptenb	0.0259
dre04210	Apoptosis	atm, dab2ipb, tp53	0.0259
dre04310	Wnt signaling pathway	ruvbl1, tp53, wnt11	0.0259
dre04340	Hedgehog signaling pathway	gas1a, gas1b	0.0259
dre04150	mTOR signaling pathway	ins, ptenb, wnt11	0.0272
dre04512	ECM–receptor interaction	lamc3, tnc	0.0435
dre00562	Inositol phosphate metabolism	MTMR4, ptenb	0.0454
dre04510	Focal adhesion	lamc3, ptenb, tnc	0.0459

ECM, extracellular matrix; FoxO, Forkhead box O; MAPK, mitogen-activated protein kinase; mTOR, mammalian target of rapamycin; TGF, transforming growth factor.

2.7.3. PPI network construction

The STRING tool is used to obtain the PPI relationship of GDs. When the comprehensive score was >0.4, a total of 324 PPI relationships were obtained. After filtering out the nodes with degrees ≤5, we finally constructed a network with 48 nodes and 296 edges (Fig. 1). We calculated the connectivity of each node of the PPI network and output as shown in Figure 2.

FIG. 1.

A network constructed with 48 nodes and 296 edges on STRING.

FIG. 2.

The connectivity of each node of the protein–protein interaction network.

Tumor protein p53 (Tp53), phosphatase and tensin homolog B (Ptenb), cyclin-dependent kinase 2 (Cdk2), tumor protein p53 binding protein 1(Tp53bp1), serine/threonine kinase (Atr), checkpoint kinase 1 (Chek1). A node with a high connectivity ≥20 is considered a central gene (Sui et al., 1999).

2.7.4. Module analysis

PPI network enrichment is one of the main methods for studying and identifying functional proteins. In this study, eight important modules (p < 1 × 10 − 3) were enriched by the Cluster ONE plug-in with a minimum parameter ≥5 and a minimum density of <0.05. The most important enrichment modules A (p = 0.1 × 10 − 2), B (p = 0.9083 × 10 − 4), and C (p = 0.12 × 10 − 2) are shown in Figure 3. According to Figure 3, it is obvious that module A may be the best module because it has 38 nodes and 210 edges, whereas module B has 10 nodes and 44 edges, and module C has 7 nodes and 21 edges (Alexandrov et al., 2018).

FIG. 3.

A network constructed with 38 nodes on Cytoscape.

o further investigate functional changes during tumor progression, we performed a GO function annotation in module A (Gilks, 2004). Module A's GO enrichment score is 11.28. Therefore, module A may be the most suitable module for further functional analysis. There are 38 genes in module A (Fig. 3), which coincide with the central genes we hypothesized previously. The hypothetical central genes are also labeled in Figure 1. Module A may, therefore, be the most suitable module for further functional analysis. Module A has 38 genes that are enriched in biological processes such as cell aging, cell cycle, and regulation of metabolic processes, and then the results of the analysis by KEGG enrichment are given in Table 3 (Jeong et al., 2001).

Table 3.

The Results of the Kyoto Gene and Genomic Encyclopedia Enrichment Analysis in Module A

No. term ID	Term description	Matching proteins in your network
dre04115	p53 signaling pathway	atm, atr, cdk2, chek2, mdm2, mdm4, ptenb, tp53
dre04218	Cellular senescence	atm, atr, cdk2, chek2, mdm2, nbn, ptenb, tp53
dre04110	Cell cycle	atm, atr, cdc25, cdk2, chek2, mdm2, tp53
dre04068	FoxO signaling pathway	atm, cdk2, ins, mdm2, ptenb
dre04010	MAPK signaling pathway	cdc25, fgf5, ins, ntrk2b, tp53
dre04070	Phosphatidylinositol signaling system	MTMR4, cds2, ptenb
dre04350	TGF-β signaling pathway	bmpr1ba, fsta, spaw
dre04914	Progesterone-mediated oocyte maturation	cdc25, cdk2, ins
dre03440	Homologous recombination	atm, nbn
dre04140	Autophagy—animal	MTMR4, ins, ptenb
dre04210	Apoptosis	atm, dab2ipb, tp53
dre04310	Wnt signaling pathway	ruvbl1, tp53, wnt11
dre04340	Hedgehog signaling pathway	gas1a, gas1b
dre04150	mTOR signaling pathway	ins, ptenb, wnt11
dre04512	ECM–receptor interaction	lamc3, tnc
dre00562	Inositol phosphate metabolism	MTMR4, ptenb
dre04510	Focal adhesion	lamc3, ptenb, tnc

2.7.5. Prediction of PPI on SVM

According to the recent article on the interaction network of corn proteins and the concepts and experimental operations proposed by a scholar, we found that SVM can be used to predict PPI, and the correct rate can reach an ideal credibility. Therefore, we carry out a series of classification and sorting of data (Worsley et al., 1997). Our data are not only from NCBI but also from BioGrid, DIP, IntAct, etc., while some of the predicted PPIs are derived from coexpression, and some are from experimental determination. Some are determined by the database, and some are automatically detected. Based on these data, some processing is performed on the data. The predicted protein interactions and the proportions after training the data are as given in Table 4.

Table 4.

The Number of Molecular Interactions Deposited in Growth and Development and the Number of Proteins Involved

	Predicted physics PPIs	Predicted functional PPIs	Experimental determined PPIs	Databased annotated PPIs	Total
Number of proteins	50	576	72	54	767
Number of interactions	42	467	56	49	834

PPIs, protein–protein interactions.

3. Results

This project introduces the GDs mechanism of the fifth chromosome NC_007116.7 of zebrafish, which covers 296 interactions between 58 proteins related to aging. This includes not only accurate predictions of PPI but also those molecular interactions obtained from literature collection and experiments (Bowen et al., 2009). Then we networked to find the central genes (Tp53, ptenb, cdk2, tp53bp1, atr, and chek1), which coincides with the important genes derived from the subsequent module enrichment. In the process of module A analysis of zebrafish aging, functional enrichment such as biological regulation, biological process, and molecular metabolic process is obtained, which indicates that it plays an important role in the GDs of zebrafish. At the same time, the SVM is used to train the obtained data to obtain the accuracy of the experiment, which further proves our prediction (Dooley and Zon, 2000).

4. Discussion

Zebrafish has become a popular creature in the study of gene function in vertebrates. Almost transparent embryos of this species and the ability to accelerate gene research by gene knockdown or overexpression have led to the widespread use of zebrafish in detailed studies of vertebrate gene function and the increasing study of human genetic diseases (Irizarry et al., 2003). However, to effectively mimic human genetic diseases, it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. In this study, we selected 58 representatives from the NCBI (https://www.ncbi.nlm.nih.gov) 47,036 zebrafish genes based on the homology of zebrafish to mice and humans of aging related genes; sexual zebrafish aging related genes. Most of these genes are rich in biological processes and biological regulation, and some genes are rich in cellular metabolic processes.

In addition, we also used the STRING tool to obtain the PPI relationship of the DEG and obtained a network with 87 nodes and 767 edges (Ritchie et al., 2007). In the network, Tp53, ptenb, cdk2, tp53bp1, atr, and chek1 are selected as hub nodes because their connectivity is ≥20. Finally, we performed a module analysis of the PPI network. Module A containing 38 genes has been shown to be closely related to zebrafish GDs. Through GO analysis, the genes in module A are enriched in biological regulation, biological processes, molecular metabolic processes, etc., indicating that telomere wear and cell aging play an important role in the GDs of zebrafish. At the same time, to predict the accuracy, we used SVM to train the predicted PPI, and obtained higher accuracy, which further verified our hypothesis.

5. Conclusion

As a result of the study, we confirmed the GDs mechanism of zebrafish and mTOR signaling pathway, ECM–receptor interaction, inositol phosphate metabolism, and focal adhesion molecular metabolism (Huang et al., 2007). In addition, we also pointed out that genes such as Tp53, ptenb, cdk2, tp53bp1, atr, and chek1 may play an important role in GDs, and they are predicted source genes.

Footnotes

Acknowledgments

The research was partly supported by the program for Professor of Special Appointment (Eastern Scholar, 15HJPY-MS02) at Shanghai Institutions of Higher Learning and National Natural Science Foundation of China (Grant No. 61775139).

Author Disclosure Statement

The authors declare there are no competing financial interests.

References

Alexandrov

, Golyandina

, Holloway

, et al. 2018. Two-exponential models of gene expression patterns for noisy experimental data. J. Comput. Biol. 25, 1220–1230.

Bowen

N.J.

, Walker

L.D.

, Matyunina

L.V.

, et al. 2009. Gene expression profiling supports the hypothesis that human ovarian surface epithelia are multipotent and capable of serving as ovarian cancer initiating cells. BMC Med. Genomics. 2, 71.

Cherkassky

, and Ma

2004. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 17, 113–126.

Dooley

, and Zon

L.I.

2000. Zebrafish: A model system for the study of human disease. Curr. Opin. Genet. Dev. 10, 252–256.

L.J.

, and Wang

2013. Investigation of the hub genes and related mechanism in ovarian cancer via bioinformatics analysis. J. Ovarian Res. 6, 92.

Gilks

C.B.

2004. Subclassification of ovarian surface epithelial tumors based on correlation of histologic and molecular pathologic data. Int. J. Gynecol. Pathol. 23, 200–205.

Huang

D.W.

, Sherman

B.T.

, Tan

, et al. 2007. The DAVID Gene Functional Classification Tool: A novel biological module-centric algorithm to functionally analyze large gene lists. Genome. Biol. 8, R183.

Irizarry

R.A.

, Hobbs

, Collin

, et al. 2003. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 4, 249–264.

Isogai

, Horiguchi

, and Weinstein

B.M.

2001. The vascular anatomy of the developing zebrafish: An atlas of embryonic and early larval development. Dev. Biol. 230, 278–301.

10.

Jeong

, Mason

S.P.

, Barabasi

A.L.

, et al. 2001. Lethality and centrality in protein networks. Nature. 411, 41–42.

11.

Kanehisa

, Araki

, Goto

, et al. 2008. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484.

12.

Kanehisa

, Goto

, Furumichi

, et al. 2010. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355–D360.

13.

Kanehisa

, Goto

, Kawashima

, et al. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280.

14.

Kimmel

C.B.

, Ballard

W.W.

, Kimmel

S.R.

, et al. 1995. Stages of embryonic development of the zebrafish. Dev. Dyn. 203, 253–310.

15.

Nakayama

, Nakayama

, Kurman

R.J.

, et al. 2006. Sequence mutations and amplification of PIK3CA and AKT2 genes in purified ovarian serous neoplasms. Cancer Biol. Ther. 5, 779–785.

16.

Panula

, Sallinen

, Sundvik

, et al. 2006. Modulatory neurotransmitter systems and behavior: Towards zebrafish models of neurodegenerative diseases. Zebrafish. 3, 235–247.

17.

Ritchie

M.E.

, Silver

, Oshlack

, et al. 2007. A comparison of background correction methods for two-colour microarrays. Bioinformatics. 23, 2700–2707.

18.

Srihari

, and Leong

H.W.

2012. Temporal dynamics of protein complexes in PPI networks: A case study using yeast cell cycle dynamics. BMC Bioinformatics. 13(Suppl. 17), S16.

19.

Sui

, Tokuda

, Ohno

, et al. 1999. The concurrent expression of p27(kip1) and cyclin D1 in epithelial ovarian tumors. Gynecol. Oncol. 73, 202–209.

20.

Van Houcke

, De Groef

, Dekeyster

, et al. 2015. The zebrafish as a gerontology model in nervous system aging, disease, and repair. Ageing Res. Rev. 24, 358–368.

21.

Venkat

P.S.

, Narayanan

K.R.

, Datta

, et al. 2017. A Bayesian network-based approach to selection of intervention points in the mitogen-activated protein kinase plant defense response pathway. J. Comput. Biol. 24, 327–339.

22.

Webb

S.E.

, and Miller

A.L.

2000. Calcium signalling during zebrafish embryonic development. Bioessays. 22, 113–123.

23.

Worsley

S.D.

, Ponder

B.A.

, Davies

B.R.

, et al. 1997. Overexpression of cyclin D1 in epithelial ovarian cancers. Gynecol. Oncol. 64, 189–195.

24.

Yang

, Han

, Yuan

, et al. 2014. Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types. Nat. Commun. 5, 3231.