Identifying Interactions Between Kinases and Substrates Based on Protein

Abstract

Protein phosphorylation is a kind of important post-translational modification of protein, which plays a critical role in many biological processes of eukaryote. Identifying kinase–substrate interactions is helpful to understand the mechanism of many diseases. Many computational algorithms for kinase–substrate interactions identification have been proposed. However, most of those methods are mainly focused on utilizing protein local sequence information. In this article, we propose a new computational method to predict kinase–substrate interactions based on protein–protein interaction (PPI) network. Different from existing methods, the PPI network is utilized to measure the similarities of kinase–kinase and substrate–substrate, respectively. Then, the pairwise similarities of kinase–kinase and substrate–substrate are adjusted based on the assumption that the similarities of kinase–kinase and substrate–substrate are more reliable if they are in the same cluster. Finally, the bi-random walk is used to predict potential kinase–substrate interactions. The experimental results show that our method outperforms other state-of-the-art algorithms in performance. Furthermore, the case study demonstrates that it is effective in predicting potential kinase–substrate interactions.

1. Introduction

Protein phosphorylation is an important post-translational modification in living organisms. In the phosphorylation process, the substrate is phosphorylated by transferring adenosine triphosphate phosphate groups to the amino acid sequence under the catalysis of kinases. It plays an important role in the regulation of many cellular processes, such as cell metabolism, gene expression, and cellular signal transduction (Cohen, 2002; Olsen et al., 2006, 2010). Abnormal regulation between kinases and substrates may cause serious diseases, especially rheumatoid arthritis (Chen et al., 2017) and diabetes (Cohen, 2001; Lan et al., 2018). Therefore, identifying interactions between substrates and kinases is helpful to understanding the mechanism of cellular process and providing a fundamental basis for the drug-target researches (Gan et al., 2019).

Several biological methods are developed to infer phosphorylation sites and corresponding kinases. It can be classified into two categories: low-throughput (Lin et al., 2003; Salinas et al., 2004; Aponte et al., 2009) and high-throughput technique (Villen et al., 2007; Han et al., 2010; Lin et al., 2010). However, these types of experimental methods are costly and time consuming.

To overcome these limitations, many computational methods have been proposed to identify kinase–substrate interactions. Linding et al. (2007) developed a computational approach to predict site-specific kinase–substrate interactions by integrating motif information and the network context of kinases and phosphoproteins. Dang et al. (2008) presented an algorithm to identify kinase–substrate relationships based on conditional random fields. Zhou et al. (2004) developed a web server, GPS, to predict phosphorylation site by using the substitution matrix and Markov cluster algorithm. Zou et al. (2013) proposed a computational framework, PKIS, to identify kinase–substrate interactions. It used the composition of monomer spectrum encoding strategy to encode local sequence feature and predicted the interactions by using support vector machine. Patrick et al. (2014) developed a Bayesian network model that integrates cellular context to predict kinase–substrate interactions. Fan et al. (2014) proposed a computational method for kinase-specific phosphorylation sites prediction based on functional information and random forest. Li et al. (2015b) presented a kernel-based method to address the kinase identification problem by using supervised Laplacian regularized least squares. Song et al. (2017) developed a bioinformatics tool to infer kinase-specific substrates and their associated phosphorylation sites by combining protein sequence and functional features.

In addition, some methods use the biological network information to improve the performance. Song et al. (2012) developed a web server, iGPS, which is extended from GPS algorithm by integrating protein–protein interaction (PPI) network information. Damle and Mohanty (2014) proposed a computational method, PhosNetConstruct, to identify kinase–substrate relationships by analyzing domain-specific phosphorylation network. Li et al. (2015a) proposed a network-based method for kinase–substrate interactions prediction based on sequence similarity. Moreover, Qin et al. (2016) presented a computational method for inferring the interactions between kinases and substrates based on protein domains network. These computational methods for identifying kinase–substrate interactions have achieved great successes. However, phosphorylation is a complex biological process which is usually involved in various biological mechanisms. In addition, these machine learning-based algorithms require a large amount of negative samples that are hard to obtain from biological data. Moreover, some methods only use local information instead of global information, which may result in false positive (FP).

In this article, we propose a novel computational method, Predict Kinase-Substrate Interaction by using Bi-random Walk (KSIBW), to predict kinase–substrate interactions based on assumption that similar substrates tend to be related with similar kinases. First, the reliability of PPI network is improved based on local topological feature and biological feature. Then, the similarities of kinase–kinase and substrate–substrate are calculated by using short path method, respectively. In addition, these similarities are adjusted based on the assumption that the similarities of kinase–kinase and substrate–substrate are more reliable if they are in the same cluster. Finally, the bi-random walk algorithm is employed to predict potential kinase–substrate interactions. The experimental results show that our method outperforms other state-of-the-art algorithms in performance.

2. Materials and Methods

2.1. Data resources

In this article, the human kinase–substrate interactions are obtained from the Phospho.ELM 9.0 database (Dinkel et al., 2010). After removing the redundant data, 216 kinases, 724 substrates, and 1256 kinase–substrate interactions are collected in final. The human PPI data are obtained from InWeb_IM database (Li et al., 2017), which were collected from eight source databases, including DIP (Xenarios et al., 2002), BIND (Bader et al., 2003), WikiPathways (Kelder et al., 2011), IntAct (Orchard et al., 2013), Reactome (Croft et al., 2013), BioGRID (Stark et al., 2006), NetPath (Kandasamy et al., 2010), and MatrixDB (Launay et al., 2014). It contains 14,684 human proteins and 625,641 interactions. In InWeb_IM, each PPI contains confidence score (CF), which is calculated based on the reproducibility of the interaction data between different publications.

2.2. Kinase–kinase and substrate–substrate similarity measure

To enhance the reliable of PPI network, the new edge clustering coefficient (NECC; Ma et al., 2017) is used to weight the PPI network. The PPI network can be described as an undirected graph G(V,E); each node v \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\in$$ \end{document} V denotes a protein and each edge (u,v) \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\in$$ \end{document} E denotes the interaction between nodes u and v. The NECC is defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \rm { NECC } } \left( { u , v } \right) = { \frac { { N_ { u , v } } + ECC \left( { u , v } \right) } { 2 \left\vert { { Z_ { u , v } } } \right\vert + 1 } } \tag { 1 } \end{align*} \end{document}

\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${Z_{u , v}}$$ \end{document} represents the set of all common neighbors of u and v. d_u and d_v denote the degrees of nodes u and v, respectively. Then, the final weight of the edge is calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \rm{W}} \left( {u , v} \right) = { \rm{ \alpha }} \times { \rm{CF}} \left( {u , v} \right) + \left( {1 - \alpha } \right) \times { \rm{NECC}} \left( {u , v} \right) \tag{4} \end{align*} \end{document}

where CF(u,v) denotes the CF of edge (u,v) that obtained from InWeb_IM (Li et al., 2017). The parameter \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \rm{ \; \alpha \;}}$$ \end{document} is used to trade off the NECC and the CF between two nodes.

To calculate the similarity between two proteins, we find the shortest path between the two proteins. Then the similarity is calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \rm{sim}} \left( {u , v} \right) = \mathop \prod \limits_{ \left( {i , j} \right) \in S{P_{u , v}}} { \rm{W}} \left( {i , j} \right) \tag{5} \end{align*} \end{document}

2.3. Clustering kinase and substrate

Based on the assumption that two kinases or substrates are more similar if they belong to the same cluster, we further improve the kinase–kinase similarity and substrate–substrate similarity by clustering kinase–kinase similarity network and substrate–substrate similarity network. The ClusterONE is utilized to cluster the kinase similarity network K (Nepusz et al., 2012) and improve the similarity between kinases that belong to the same cluster. The similarities of kinase–kinase within same cluster are adjusted based on assumption that kinases belonging to the same cluster tend to behave more similarly. Assuming the two kinases k_i and k_j are in the same cluster C, the similarity between k_i and k_j is reinforced as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \rm{si}}{{ \rm{m}} \prime } \left( {{k_i} , {k_j}} \right) = { \rm{ \gamma *sim}} \left( {{k_i} , {k_j}} \right) \tag{6} \end{align*} \end{document}

2.4. Construction of kinase and substrate heterogeneous network

Based on these two similarity matrices, the kinase similarity network and the substrate similarity network are constructed. For kinase similarity network K, k_i and k_j represent two different kinases. If the similarity between k_i and k_j is 0, then there is no edge between this two kinases. Otherwise, there is an edge connection between these two kinases, and the weight of the edge is the similarity value of the two kinases. The substrate similarity network S is similar to the kinase similarity network.

Let I denote the kinase–substrate association network. e_ij denotes the edge of I and the initial value of e_ij is set to 1 if there is a known interaction between kinase k_i and substrate s_j; otherwise 0. Based on the association network, the kinase–substrate heterogeneous network is constructed by conjoining kinase similarity network and substrate similarity network. An example of kinase–substrate heterogeneous network is shown in Figure 1.

FIG. 1.

Illustration of the kinase–substrate heterogeneous network. The triangle and the circle represent the substrate and the kinase, respectively. The solid line shows the similarity between the two proteins. The dotted line shows the kinase–substrate interactions. Two similarity networks are bridged by known kinase–substrate interactions.

2.5. Predicting kinase–substrate interactions based on bi-random walk

Bi-random walk is an extension of the random walk, which is widely used in disease gene identification (Lan et al., 2015), drug repositioning (Luo et al., 2016), and phenome-genome association prediction (Xie et al., 2015).

The kinase similarity matrix and substrate similarity matrix are normalized by using Laplace normalization, respectively. The normalized kinase similarity matrix S_n is calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { S_n } = { \rm { D } } _s^ { - \frac { 1 } { 2 } } \times S \times { \rm { D } } _s^ { - \frac { 1 } { 2 } } \tag { 7 } \end{align*} \end{document}

where D_s represents the diagonal matrix of the substrate similarity matrix S and D_s(i,i) is the sum of the ith row of S.

The kinase-normalized similarity matrix K_n is calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { K_n } = { \rm { D } } _k^ { - \frac { 1 } { 2 } } \times K \times { \rm { D } } _k^ { - \frac { 1 } { 2 } } \tag { 8 } \end{align*} \end{document}

where D_k represents the diagonal matrix of the kinase similarity matrix K and D_k(i,i) is the sum of the ith row of K.

Different from the previous normalized procedure, the normalization matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${I_n} \;$$ \end{document} of kinase–substrate interactions matrix I is defined as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} RW \left( 0 \right) = {I_n} = I \left( {i , j} \right) / \sum \nolimits_{i , j = 0}^n {{ \rm{I}} \left( {{ \rm{i}} , { \rm{j}}} \right) } \tag{9} \end{align*} \end{document}

After getting normalized matrices S_n, K_n and I_n, the bi-random walk is employed to identify kinase–substrate interactions by walking on the kinase similarity network and substrate similarity network simultaneously. Considering that different networks may have different topology structures, the optimal number of steps of walker on these two networks may be inconsistent. Thus, we limit the number of walking steps on two different networks by setting two parameters l and r where they represent the maximum number of random walks on the substrate network and kinase network, respectively. The bi-random walk procedure is formalized as follows:

Left walk in the substrate similarity network: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} R{W_L} \left( t \right) = { \rm{ \beta }} \times {S_n} \times RW \left( {t - 1} \right) + \left( {1 - { \rm{ \beta }}} \right) \times {I_n} \tag{10} \end{align*} \end{document}

Right walk in the kinase similarity network: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} R{W_R} \left( t \right) = { \rm{ \beta }} \times RW \left( {t - 1} \right) \times {K_n} + \left( {1 - { \rm{ \beta }}} \right) \times {I_n} \tag{11} \end{align*} \end{document}

The left and right predicted results are integrated to acquire the final output: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} RW \left( t \right) = \frac { { R { W_L } \left( t \right) + R { W_R } \left( t \right) } } { 2 } \tag { 12 } \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$R{W_L} \left( t \right)$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$R{W_R} \left( t \right)$$ \end{document} represent the predicted score of kinase–substrate interactions based on walking on the substrate similarity network and kinase similarity network at the step t, respectively. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$RW \left( t \right)$$ \end{document} denotes the final predicted score at the step t. The parameters l, r, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \rm{ \;}} \beta { \rm{ \;}}$$ \end{document} are set to 2, 2, and 0.3 in our experiment, respectively.

3. Experiments and Results

3.1. Evaluation metrics

In this article, 10-fold cross-validation and de novo test are utilized to evaluate the performance of different algorithms. In the 10-fold cross-validation, known kinase–substrate interactions are randomly divided into 10 subsets. In each cross-validation trial, nine subsets are used as the training set and the remaining one subset is treated as the test set. After completing the test on the data set, the predicted scoring matrix is generated. Then we rank the unknown kinase–substrate interactions and test set based on predicted scores. For each threshold, the corresponding predicted result of test set is considered true positive (TP) if the predicted score is greater than the threshold. Otherwise, it is considered as false negative. For the unknown kinase–substrate interaction, it is treated as FP if the predicted score is greater than the threshold and as true negative (TN) if the predicted value is less than the threshold. By choosing various thresholds, we calculate different true positive rate (TPR) and false positive rate (FPR). The TPR and FPR are calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} TPR = { \frac { TP } { { \rm { TP } } + { \rm { FN } } } } \tag { 13 } \end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} FPR = { \frac { FP } { { \rm { FP } } + { \rm { TN } } } } \tag { 14 } \end{align*} \end{document}

The receiving operating characteristic (ROC) curve is drawn based on the previously calculated TPR and FPR. Then the area under curve (AUC) value is calculated to evaluate the performance of different algorithms.

3.2. Comparison with network-based method

To evaluate the performance of KSIBW, we compare it with Hetesim-SEQ (Li et al., 2015b) algorithm. Similar to KSIBW, Hetesim-SEQ is another network-based method of predicting kinase–substrate interactions. For both KSIBW and Hetesim-SEQ, we use the same data set that obtained from Phospho.ELM and 10-fold cross-validation is used for performance evaluation. The experimental results of KSIBW and Hetesim-SEQ is shown in Figure 2. KSIBW achieves the AUC value of 0.842, which is higher than Hetesim-SEQ (AUC = 0.802). It shows that KSIBW performs better than Hetesim-SEQ.

FIG. 2.

The ROC curves for predicting kinase–substrate interactions with different methods.

3.3. Comparison with different predictors by de novo test

To evaluate the power of our method for predicting new kinase–substrate interactions, we perform de novo test experiments. In the de novo test, we delete all known kinase–substrate interactions of kinase i in each time. The rest of kinase–substrate interactions are treated as training set. We compare KSIBW with four state-of-the-art methods of kinase–substrate interactions, including GPS (Zhou et al., 2004), iGPS (Song et al., 2012), NetworKIN (Linding et al., 2007), and PhosphoPICK (Patrick et al., 2014). Since these methods only provide web server, we submit the data set to the corresponding web server for testing. We take six kinase groups, including Atypical, CAMK, CMGC, Other, STE, and TK, as examples to illustrate the predictive performance of different methods. Figure 3 shows the ROC curves of different methods in different kinase groups. It can be discovered from Figure 3, the KSIBW performs better than the other four algorithms on different kinase groups. For example, for Atypical group, KSIBW achieves the AUC value of 0.813, which is higher than GPS (AUC = 0.469), iGPS (AUC = 0.575), NetworkKIN (AUC = 0.689), and PhosphoPICK (AUC = 0.471).

FIG. 3.

The ROC curves for kinase group Atypical, CAMK, CMGC, Other, STE, and TK with different methods.

3.4. Effect of PPI weight parameter α

To enhance the reliable of PPI network, the NECC and CF are calculated to weight the PPI network. These two measures are integrated to weight the edge between two proteins in the PPI network. The parameter α is set to trade off the NECC and CF between two proteins. To evaluate the effect of different α values on the performance of the algorithm, we set the value of α from 0.1 to 0.9 and calculate the corresponding AUC score. As shown in Figure 4, the AUC achieves the highest value (0.836) when α is 0.9. It should be noted that we do not cluster the similarity networks when testing the effect of α.

FIG. 4.

The performance of KSIBW at different values of α.

3.5. Effect of the similarity reinforcing factor γ

The parameter \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \rm{ \; \gamma \;}}$$ \end{document} is a reinforcing factor, which is used to enhance the similarity of proteins that belonged to the same cluster. To evaluate the effect of different \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \rm{ \; \gamma \;}}$$ \end{document} values on the performance of KSIBW, we vary the parameter \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \rm{ \; \gamma \;}}$$ \end{document} from 1.1 to 1.9. The corresponding result is shown in Figure 5. It can be discovered from Figure 5, the AUC is 0.842 when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \rm{ \; \gamma \;}}$$ \end{document} is 1.2, which is larger than the result without clustering (AUC = 0.836). It demonstrates that enhancing the similarity in the cluster can improve the performance of KSIBW. However, it should be noticed that the performance of the algorithm declines gradually when the value of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \rm{ \; \gamma \;}}$$ \end{document} is >1.2. The possible reason is that if we overly artificially increase the similarity among cluster, it may weaken the interaction between different clusters.

FIG. 5.

The performance of KSIBW at different values of γ.

3.6. Case studies

To further validate the ability to predict unknown kinase–substrate interactions, the case study is conducted. All known kinase–substrate interactions are used as a training set and the unknown kinase–substrate interactions are used as the test set. We employ KSIBW to predict potential kinase–substrate interactions and acquire the prediction score of all kinase–substrate pair in the test set.

We take TP53 as an example to illustrate the ability of KSIBW to detect unknown kinase–substrate interactions. TP53 is a transcription factor that controls the initiation of the cell cycle. It has been found that TP53 has close association with many human cancers. It plays many roles in cell cycle, apoptosis, genomic stability, inhibition of angiogenesis, and so on. We analyze the top 10 predicted results of the TP53. The detailed information is listed in Table 1. By checking the PhosphoNET database, it has been found that three of the prediction results are recorded by PhosphoNET. For example, the serine site in position 20 of substrate TP53 is catalyzed by kinase PLK3, ATR phosphorylates TP53 at serine site in position 15, and TP53 associates with kinase CDK2 through serine site in position 26. In addition, although some of the predicted kinase–substrate interactions are not presented in PhosphoNET, it has been validated in published literatures. For example, TP53 has been found to be regulated by FLT3 (Irish et al., 2007). JAK2 has been verified that negatively regulates TP53 in myeloproliferative neoplasms (Nakatake et al., 2012). In addition, some new interesting kinase–substrate interactions are also discovered from the experimental results. It deserves for biologists to validate by using biological experiments.

Table 1.

Top 10 Prediction Results of Tp35

Top	Substrate	Prediction kinase	Evidence
1	TP53	PLK3	PhosphoNET
2	TP53	ATR	PhosphoNET
3	TP53	CDK2	PhosphoNET
4	TP53	FLT3	PMID: 17105820
5	TP53	TLK1	Unknown
6	TP53	LYN	Unknown
7	TP53	MAPK8	Unknown
8	TP53	JAK2	PMID: 21785463
9	TP53	PTK6	Unknown
10	TP53	MKNK1	Unknown

4. Conclusion

Protein phosphorylation is an important post-translational modification. It plays an important role in cell metabolism, gene expression, and cellular signal transduction. Predicting the relationships between substrates and its specific kinases can facilitate to understand the mechanism of cellular processes. Owing to the time-consuming and laborious experimental method, many computational methods have been developed to identify kinase–substrate interactions. However, most of those computational methods are usually focused on utilizing protein local sequence information, which are not sufficient for accurate prediction.

In this article, we propose a computational method to predict kinase–substrate interactions based on bi-random walk. First, the reliable of PPI network is improved based on local topological feature and biological feature. Then, the similarities of kinase–kinase and substrate–substrate are calculated by using short path method, respectively. In addition, the similarities are adjusted based on the assumption that the similarities of kinase–kinase and substrate–substrate are more reliable if they are in the same cluster. Further, the bi-random walk algorithm is employed to predict potential kinase–substrate interactions. We evaluate our method in terms of 10-fold cross-validation and de novo prediction. The experimental results show that our algorithm achieves higher AUC than other state-of-the-art algorithms. Furthermore, the case study is conducted and the results show the effectiveness of our method for potential kinase–substrate interactions prediction.

Footnotes

Acknowledgments

This study is supported in part by the National Natural Science Foundation of China under Grant Nos. 61702122, 61751314, and 31560317; Natural Science Foundation of Guangxi 2017GXNSFDA198033 and AB17195055; Director Open Fund of Qinzhou City Key Laboratory of Advanced Technology of Internet of Things IOT2017A04; Doctor foundation of Guangxi University XBZ180479.

Author Disclosure Statement

The authors declare that no competing financial interests exist.

References

Aponte

A.M.

, Phillips

, Harris

R.A.

, et al. 2009. 32 p labeling of protein phosphorylation and metabolite association in the mitochondria matrix. Methods Enzymol. 457, 63–80.

Bader

G.D.

, Betel

, Hogue

C.W.

, et al. 2003. Bind: The biomolecular interaction network database. Nucleic Acids Res. 31, 248–250.

Chen

Q.F.

, Wang

Y.Q.

, Chen

B.S.

, et al. 2017. Using propensity scores to predict the kinases of unannotated phosphopeptides. Knowl-Based Syst. 135, 60–76.

Cohen

2001. The role of protein phosphorylation in human health and disease. Eur J Biochem. 268, 5001–5010.

Cohen

2002. The origins of protein phosphorylation. Nat. Cell Biol. 4, 127.

Croft

, Mundo

A.F.

, Haw

, et al. 2013. The reactome pathway knowledgebase. Nucleic Acids Res. 42, D472–D477.

Damle

N.P.

, and Mohanty

2014. Deciphering kinase–substrate relationships by analysis of domain-specific phosphorylation network. Bioinformatics. 30, 1730–1738.

Dang

T.H.

, Van Leemput

, Verschoren

, et al. 2008. Prediction of kinase-specific phosphorylation sites using conditional random fields. Bioinformatics. 24, 2857–2864.

Dinkel

, Chica

, Via

, et al. 2010. Phospho.Elm: A database of phosphorylation sites update 2011. Nucleic Acids Res. 39, D261–D267.

10.

Fan

, Xu

, Shen

, et al. 2014. Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest. Amino Acids. 46, 1069–1078.

11.

Gan

, Qiu

, Deng

C.S.

, et al. 2019. KSIMC: Predicting kinase-substrate interactions based on matrix completion. Int. J. Mol. Sci. 20, pii: E302.

12.

Han

, Ye

, Liu

, et al. 2010. Phosphoproteome analysis of human liver tissue by long-gradient nanoflow LC coupled with multiple stage ms analysis. Electrophoresis. 31, 1080–1089.

13.

Irish

J.M.

, Ånensen

, Hovland

, et al. 2007. Flt3 Y591 duplication and Bcl-2 overexpression are detected in acute myeloid leukemia cells with high levels of phosphorylated wild-type p53. Blood. 109, 2589–2596.

14.

Kandasamy

, Mohan

S.S.

, Raju

, et al. 2010. Netpath: A public resource of curated signal transduction pathways. Genome Biol. 11, R3.

15.

Kelder

, van Iersel

M.P.

, Hanspers

, et al. 2011. Wikipathways: Building research communities on biological pathways. Nucleic Acids Res. 40, D1301–D1307.

16.

Lan

, Wang

, Li

, et al. 2015. Computational approaches for prioritizing candidate disease genes based on PPI networks. Tsinghua Sci Technol. 20, 500–512.

17.

Lan

, Wang

J.X.

, Li

, et al. 2018. Predicting microRNA-disease associations based on improved microRNA and disease similarities. IZZZ/ACM Trans. Comput. Biol. Bioinform. 15, 1774–1782.

18.

Launay

, Salza

, Multedo

, et al. 2014. Matrixdb, the extracellular matrix interaction database: Updated content, a new navigator and expanded functionalities. Nucleic Acids Res. 43, D321–D327.

19.

, Xu

, Zhang

, et al. 2015b. Kinase identification with supervised Laplacian regularized least squares. PLoS One. 10, e0139676.

20.

, Wang

, Xu

, et al. 2015a. Prediction of kinase–substrate relations based on heterogeneous networks. J. Bioinform. Comput. Biol. 13, 1542003.

21.

, Wernersson

, Hansen

R.B.

, et al. 2017. A scored human protein–protein interaction network to catalyze genomic interpretation. Nat. Methods. 14, 61–64.

22.

Lin

, Xie

, Zhu

, et al. 2010. Understanding protein phosphorylation on a systems level. Brief Funct. Genomics. 9, 32–42.

23.

Lin

, Zhang

P.W.

, Zhu

, et al. 2003. Phosphatidylinositol 3-kinase, protein kinase C, and MEK1/2 kinase regulation of dopamine transporters (DAT) require N-terminal DAT phosphoacceptor sites. J. Biol. Chem. 278, 20162–20170.

24.

Linding

, Jensen

L.J.

, Ostheimer

G.J.

, et al. 2007. Systematic discovery of in vivo phosphorylation networks. Cell. 129, 1415–1426.

25.

Luo

H.M.

, Wang

J.X.

, Li

, et al. 2016. Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm. Bioinformatics. 32, 2664–2671.

26.

C.Y.

, Chen

Y.P.P.

, Berger

, et al. 2017. Identification of protein complexes by integrating multiple alignment of protein interaction networks. Bioinformatics. 33, 1681–1688.

27.

Nakatake

, Monte-Mor

, Debili

, et al. 2012. JAK2 V617F negatively regulates p53 stabilization by enhancing MDM2 via La expression in myeloproliferative neoplasms. Oncogene. 31, 1323.

28.

Nepusz

, Yu

, Paccanaro

, et al. 2012. Detecting overlapping protein complexes in protein–protein interaction networks. Nat. Methods. 9, 471–472.

29.

Olsen

J.V.

, Blagoev

, Gnad

, et al. 2006. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell. 127, 635–648.

30.

Olsen

J.V.

, Vermeulen

, Santamaria

, et al. 2010. Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci. Signal. 3, ra3.

31.

Orchard

, Ammari

, Aranda

, et al. 2013. The MIntAct project IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363.

32.

Patrick

, Lˆe Cao

K.A.

, Kobe

, et al. 2014. Phosphopick: Modelling cellular context to map kinase-substrate phosphorylation events. Bioinformatics. 31, 382–389.

33.

Qin

G.M.

, Li

R.Y.

, Zhao

X.M.

, et al. 2016. Phosd: Inferring kinase–substrate interactions based on protein domains. Bioinformatics. 33, 1197–1204.

34.

Salinas

, Wang

, Rosa de Sagarra

, et al. 2004. Protein kinase Akt/PKB phosphorylates heme oxygenase-1 in vitro and in vivo. FEBS lett. 578, 90–94.

35.

Song

, Ye

, Liu

, et al. 2012. Systematic analysis of protein phosphorylation networks from phosphoproteomic data. Mol. Cell Proteomics. 11, 1070–1083.

36.

Song

, Wang

, et al. 2017. Phosphopredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Sci. Rep. 7, 6862.

37.

Stark

, Breitkreutz

B.J.

, Reguly

, et al. 2006. Biogrid: A general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539.

38.

Villen

, Beausoleil

S.A.

, Gerber

S.A.

, et al. 2007. Large-scale phosphorylation analysis of mouse liver. Proc. Natl. Acad. Sci. U S A. 104, 1488–1493.

39.

Xenarios

, Salwinski

, Duan

X.J.

, et al. 2002. DIP, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305.

40.

Xie

, Xu

, Zhang

, et al. 2015. Network-based phenome-genome association prediction by bi-random walk. PLoS One. 10, e0125138.

41.

Zhou

F.F.

, Xue

, Chen

G.L.

, et al. 2004. GPS: A novel group-based phosphorylation predicting and scoring method. Biochem. Biophys. Res. Commun. 325, 1443–1448,

42.

Zou

, Wang

, Shen

, et al. 2013. PKIS: Computational identification of protein kinases for experimentally discovered protein phosphorylation sites. BMC Bioinformatics. 14, 247.

Identifying Interactions Between Kinases and Substrates Based on Protein–Protein Interaction Network

Abstract

Abstract

1. Introduction

2. Materials and Methods

2.1. Data resources

2.2. Kinase–kinase and substrate–substrate similarity measure

2.3. Clustering kinase and substrate

2.4. Construction of kinase and substrate heterogeneous network

2.5. Predicting kinase–substrate interactions based on bi-random walk

3. Experiments and Results

3.1. Evaluation metrics

3.2. Comparison with network-based method

3.3. Comparison with different predictors by de novo test

3.4. Effect of PPI weight parameter α

3.5. Effect of the similarity reinforcing factor γ

3.6. Case studies

4. Conclusion

Footnotes

Acknowledgments

Author Disclosure Statement

References