Identification of Disease-Associated MicroRNAs Via Locality-Constrained Linear Coding-Based Ensemble Learning

Abstract

ABSTRACT

Clinical trials indicate that the dysregulation of microRNAs (miRNAs) is closely associated with the development of diseases. Therefore, predicting miRNA-disease associations is significant for studying the pathogenesis of diseases. Since traditional wet-lab methods are resource-intensive, cost-saving computational models can be an effective complementary tool in biological experiments. In this work, a locality-constrained linear coding is proposed to predict associations (ILLCEL). Among them, ILLCEL adopts miRNA sequence similarity, miRNA functional similarity, disease semantic similarity, and interaction profile similarity obtained by locality-constrained linear coding (LLC) as the priori information. Next, features and similarities extracted from multiperspectives are input to the ensemble learning framework to improve the comprehensiveness of the prediction. Significantly, the introduction of hypergraph-regular terms improves the accuracy of prediction by describing complex associations between samples. The results under fivefold cross validation indicate that ILLCEL achieves superior prediction performance. In case studies, known associations are accurately predicted and novel associations are verified in HMDD v3.2, miRCancer, and existing literature. It is concluded that ILLCEL can be served as a powerful tool for inferring potential associations.

1. INTRODUCTION

Researchers have discovered that microRNAs (miRNAs) are associated with the pathogenesis of many human diseases, such as cancer, heart disease, and allergic diseases. It is estimated that each miRNA controls the expression of approximately tens of genes, and the expression of each gene is also synergistically regulated by multiple miRNAs. The development of diseases is attributed to the abnormal expression of miRNAs. In tumor cells, the translation of some mRNAs is not inhibited by the low expression of some miRNAs. Conversely, the overexpression of some miRNAs results in the strong repression of mRNAs (Deng and Huang, 2014). Thus, predicting miRNA-disease associations (MDAs) by using computational models facilitates the identification of biomarkers and the discovery of specific drugs (Cheng et al., 2005).

So far, researchers have proposed various computational methods to predict MDAs. The first type is network-based method. For example, Gu et al. (2016) presented a network consistency projection-based prediction model that predicted disease-associated miRNA without the need for negative samples (NCPMDA). Considering the complex associations between data nodes, Yu et al. (2020) constructed a three-layer heterogeneous network based on unbalanced random walk to identify MDAs. Among them, the introduction of the intermediate node lncRNA can predict the association between networks effectively (TCRWMDA). Zhong et al. (2022) proposed a prediction model named GRPAMDA, which enhanced the features of data nodes by constructing graph random propagation network and using attention network to strengthen the neighborhood features of data nodes.

The second type is matrix-based method. Matrix completion (MC) can improve prediction accuracy by making full use of known information to compensate for missing items. Therefore, Chen et al. (2021) added similarity-based neighborhood constraints to MC to recover unknown association pairs (NCMCMDA). To better preserve the experimentally validated MDAs, Zheng et al. (2022) integrated non-negative matrix factorization and MC into the prediction framework (NMFMC). Aiming to increase the sparsity, Cui et al. (2019) introduced the $L_{2, 1}$ -norm based on the traditional collaborative matrix factorization (RCMF). Given that graph regularized matrix factorization (GRMF) decomposition aided in learning the manifold of data nodes, Gao et al. (2019) introduced L_2,1-norm on the basis of RCMF to predict MDAs (DNSGRMF). However, complex relationships between samples are not taken into account by GRMF.

Machine learning-based models that can flexibly handle large-scale data information are widely used to explore MDAs. To describe more complex relationships between samples and to fuse multiple types of information, Ding et al. (2020) combined hypergraphs and multiple kernel learning (MKL), and predicted associations using Laplace support vector machines (HGBLM). Nevertheless, the method is not sufficiently sensitive to similarity values. To preserve local information in the process of constructing similarity network, Qu et al. (2018) proposed a locally constrained linear coding (LLC) approach and used label propagation to obtain the final association scores (LLCMDA). However, the simple integration of similarities in LLCMDA and the lack of information sources affect the prediction performance.

To fuse similarity information in the miRNA (disease) space, Zhou et al. (2021) proposed a model based on MKL (DAEMKL). Among them, the features in the fused similarity networks learned by the regression model are input into the autoencoder to predict MDAs.

In this study, we design a ensemble learning model based on locality-constrained linear coding to predict MDAs (ILLCEL). The method achieves superior prediction value under fivefold cross-validation (CV) and also exhibits reliable predictive performance in case analysis. The main contributions of ILLCEL are as follows:

Compared to obtaining similarity in the single known association matrix, interaction profile (IP) similarity is calculated based on LLC to preserve local information.

Considering the single priori information, the similarity features of miRNAs (diseases) are considered from different perspectives.

Compared with the traditional graph structure, the introduction of hypergraph regular terms describes more complex associations between samples and improves the prediction accuracy.

2. MATERIALS AND METHODS

2.1. Human MDAs

The HMDD v2.0 (Li et al., 2014), which contains 5430 experimentally validated associations between 495 miRNAs and 383 diseases, is used to obtain the original association information. As shown in Equation (1), the sparse matrix M _D , which consists of 0 and 1, is utilized to represent MDAs. The equation is shown below: $M_{D} (m_{i}, d_{j}) = \{\begin{matrix} 1, i f m i R N A m_{i} r e l a t e d t o d i s e a s e d_{j}, \\ 0, o t h e r w i s e . \end{matrix}$ (1)

2.2. miRNA space

In miRNA space, IP features, sequence features, IP similarity, sequence similarity, and functional similarity are calculated to enhance priori information.

First, the associations between miRNAs and all diseases can be regarded as miRNA feature descriptors. To make full use of the known association information, miRNA feature vectors are obtained based on IP. Here, IP features extracted from the association matrix are fed into the LLC to obtain the IP similarity (Saffari and Ebrahimi-Moghadam, 2015). The specific equation is as follows:

where t_i is the descriptor of the i-th sample. D represents a dictionary matrix. $μ$ is the regularization parameter, and $⨀$ is the element-wise multiplication. S _i is a local adapter vector, which indicates the distance between samples, and the equation is as follows: $S_{i, j} = exp (\frac{{|| M_{D} (i, :) - M_{D} (j, :)| |}_{2}}{σ}) .$ (3)

In the association matrix, $M_{D} (i, :)$ and $M_{D} (j, :)$ refer to the i-th and j-th rows, respectively. $σ$ is the kernel bandwidth parameter (Qu et al., 2018).

The Lagrangian function of Equation (2) is calculated and minimized to obtain the IP similarity $M S_{1}$ containing local information. The specific equation is as follows: $M S_{1} = [z_{m 1}, z_{m 2}, \dots, z_{m n}] .$ (4)

Second, since available information sources cannot directly reflect the associations between miRNAs, miRNA sequences containing attribute features are transformed into numerical vectors to characterize miRNAs. In miRBase, 495 miRNA sequence features are obtained (Kozomara and Griffiths-Jones, 2014). Next, the sequence alignment results are calculated based on the Needleman-Wunsch algorithm, and then the miRNA sequence similarity matrix $M S_{2}$ is constructed (Vladimir, 2008).

Finally, according to the similarity calculated by Wang et al. (2010), we denote the functional similarity as $M S_{3}$ . $M S_{3} (m_{i}, m_{j})$ represents the similarity score between miRNA m_i and m_j. Since the self-similarity of each miRNA is 1, the diagonal element of the matrix $M S_{3}$ is 1.

2.3. Disease space

We first extract the disease IP features. Then, the IP similarity based on IP features, disease semantic similarity 1, and disease semantic similarity 2 are calculated.

To begin with, IP features are obtained based on M _D . After converting the IP features into an encoding matrix, the IP similarity network $D S_{1}$ consisting of refined features is obtained. $D S_{1} = [z_{d 1}, z_{d 2}, \dots, z_{d n}] .$ (5)

Next, the associations between different diseases are obtained from the MeSH database (Lipscomb, 2000). Here, disease Ds is represented by Directed Acyclic Graph (DAG). In $D A G (D s) = (D s, T (D s), E (D s))$ , $T (D s)$ represents the set of nodes, including Ds itself and all its ancestor nodes. The links of two adjacent nodes can be described by $E (D s)$ . The semantic contribution value of node d_s to Ds is calculated as follows: $D 1_{D s} (d_{s}) = \{\begin{matrix} 1, i f d_{s} = D s, \\ max (Δ \times D 1_{D s} (d_{s}') | d_{s}' \in c h i l d r e n o f d_{s}), i f d_{s} \neq D s, \end{matrix}$ (6)

where $Δ$ is a semantic contribution factor with a value of 0.5. As the number of levels increases in the DAG, the semantic contribution value of ancestor node d decreases gradually. Therefore, the semantic score of disease Ds can be expressed by the following equation.

Since more the parts shared between diseases, the stronger the similarity, the disease semantic similarity $D S_{2}$ is shown in Equation (8). $D S_{2} (D s_{i}, D s_{j}) = \frac{\sum_{t \in T (D s_{i}) \cap T (D s_{j})} D 1_{D s_{i}} (t) + D 1_{D s_{j}} (t)}{D V 1 (D s_{i}) + D V 1 (D s_{j})},$ (8)

where t represents the overlap part of $D s_{i}$ and $D s_{j}$ . $D 1_{D s_{i}} (t)$ and $D 1_{D s_{j}} (t)$ are the semantic contribution scores of t to disease $D s_{i}$ and $D s_{j}$ , respectively.

Finally, in the same layer of the DAG, the contribution of different disease terms to the disease semantic values may be different. Hence, the disease semantic similarity 2 $D S_{3}$ is calculated as follows: $D S_{3} (D s_{i}, D s_{j}) = \frac{\sum_{t \in T (D s_{i}) \cap T (D s_{j})} D 2_{D s_{i}} (t) + D 2_{D s_{j}} (t)}{D V 2 (D s_{i}) + D V 2 (D s_{j})},$ (9) $D 2_{D s} (d_{s}) = - log (\frac{t h e n u m b e r o f D A G s i n c l u d i n g D s}{t h e n u m b e r o f d i s e a s e}),$ (10)

2.4. Hypergraph learning

In traditional graph structures, the multivariate geometric structure between data cannot be described (Zhang et al., 2018). By contrast, hypergraph learning is able to describe higher order relationships between objects. In hypergraph $H G = (V, E, T)$ , V and E denote the set of vertices and hyperedges, respectively. In addition, T is the weights of the super-edges. In particular, the hyperedges are built by connecting the K nearest neighbors of each vertex via the k-Nearest Neighbor (KNN) algorithm. Next, the relationships between the vertices are represented by the matrix G_H.

The degrees of vertex and hyperedge are defined as:

Next, the diagonal matrices of d_v and d_h are represented by D _v and D _h , respectively. Then, the diagonal matrix A _m is used to denote the hyperedge weight.

Finally, the hypergraph Laplacian matrix L is expressed as: $L = I - Θ,$ (14) $Θ = D_{v}^{- 1 ∕ 2} G_{H} A_{m} D_{h}^{- 1} G_{H}^{T} D_{v}^{- 1 ∕ 2},$ (15)

where I is the identity matrix.

2.5. Ensemble learning

To improve the diversity and reliability of the model, the features and similarities considered from multiple perspectives are input into the ensemble learning. In the miRNA space, Z _i stands for the feature matrix, which can be projected onto the prediction matrix P _m by the projection matrix F _i . The function is represented as follows: $\begin{matrix} {min}_{F_{i}} \sum_{i = 1}^{n} {∥Z_{i} {F_{i}}^{T} - P_{m}∥}_{F}^{2} \\ s . t . F_{i} \geq 0, \end{matrix}$ (16)

where $|| \cdot| |_{F}^{2}$ is the Frobenius norm.

Next, the $l_{1, 2}$ -norm regularization term is introduced to ensure the smoothness of F _i . In addition, prediction matrix P _m needs to be approximated to M _D . Thus, Equation (16) is transformed into: $\begin{matrix} {min}_{F_{i}, P_{m}} || P_{m} - M_{D}| |_{F}^{2} + α \sum_{i = 1}^{n} {∥Z_{i} {F_{i}}^{T} - P_{m}∥}_{F}^{2} + β \sum_{i = 1}^{n} {∥F_{i}∥}_{1, 2}^{2} \\ s . t . F_{i} \geq 0, \end{matrix}$ (17)

where $α$ is the trade-off parameter. It can be utilized to measure the error between the predicted values and the projection matrix. The contribution of F _i is controlled by the regularization factor $β$ .

Considering the higher order relationships of the samples, the hypergraph-regular term is introduced. The final objective function is shown as follows: $\begin{matrix} {min}_{F_{i}, P_{m}, ω} || P_{m} - M_{D}| |_{F}^{2} + α \sum_{i = 1}^{n} {∥Z_{i} {F_{i}}^{T} - P_{m}∥}_{F}^{2} + \sum_{i = 1}^{m} ω_{i}^{λ} t r ({P_{m}}^{T} L_{i} P_{m}) + β \sum_{i = 1}^{n} {|| F_{i}| |}_{1, 2}^{2} \\ s . t . F_{i} \geq 0, \sum_{i} ω_{i}^{} = 1, \end{matrix}$ (18)

where $ω_{i}$ is the weight vector used to control the contribution of different hypergraph-regular terms. $t r (.)$ denotes the trace. The exponential parameter $λ$ is used to measure the degree of influence of different similarities on the prediction results.

The Lagrangian function of Equation (18) is expressed as follows: $\begin{matrix} L f = || P_{m} - M_{D}| |_{F}^{2} + α \sum_{i = 1}^{n} {∥Z_{i} {F_{i}}^{T} - P_{m}∥}_{F}^{2} + \sum_{i = 1}^{m} ω_{i}^{λ} t r ({P_{m}}^{T} L_{i} P_{m}) + β \sum_{i = 1}^{n} {|| F_{i}| |}_{1, 2}^{2} \\ - δ (\sum_{i = 1}^{m} ω_{i}^{} - 1) - \sum_{i = 1}^{n} t r (Γ_{i} F_{i}) . \end{matrix}$ (19)

After calculating the partial derivative of Equation (19), P _m , $ω_{i}$ , and F _i can be expressed as follows:

$ω_{i}^{} = \frac{{(\frac{1}{t r ({P_{m}}^{T} L_{i} P_{m})})}^{\frac{1}{λ - 1}}}{\sum_{i = 1}^{m} {(\frac{1}{t r ({P_{m}}^{T} L_{i} P_{m})})}^{\frac{1}{λ - 1}}},$ (21) $F_{i} = F_{i} ⨀ \sqrt{\frac{F_{i} {(α {Z_{i}}^{T} Z_{i} + β e e^{T})}^{+} + α {({P_{m}}^{T} Z_{i})}^{-}}{F_{i} {(α {Z_{i}}^{T} Z_{i} + β e e^{T})}^{-} + α {({P_{m}}^{T} Z_{i})}^{+}}},$ (22)

where e is a column vector and the element value of each term is 1. Taking the matrix R as an example, its positive and negative parts are calculated as follows: $R^{+} = \frac{(|R| + R)}{2},$ (23) $R^{-} = \frac{(|R| - R)}{2} .$ (24)

After obtaining the prediction matrix P _m in the miRNA space, the prediction score P _d in the disease space is obtained in the same way. To obtain comprehensive association information, the association matrices from both sides are integrated. The final prediction matrix P is shown in Equation (25). The flowchart of ILLCEL is shown in Figure 1.

FIG. 1.

The flowchart of ILLCEL. ILLCEL, the framework of the locality-constrained linear coding-based ensemble learning model.

P = \frac{P_{m} + {P_{d}}^{T}}{2} .

(25)

3. RESULTS AND DISCUSSION

3.1. Performance evaluation

To test the performance of ILLCEL, we use fivefold CV. Among them, all known MDAs are randomly divided into five groups, four of which are used for training and one for testing. To avoid the bias caused by the division of samples, the average value of fivefold CV running 50 times is output. The evaluation indicator area under the curve (AUC) value, which typically ranges from 0.5 to 1, is the area under the receiver operating characteristic (ROC) curve (Fawcett, 2006). The horizontal and vertical ordinates in ROC curve denote the False Positive Rate (FPR) and True Positive Rate (TPR), respectively. The equations of FPR and TPR are shown as follows: $T P R = \frac{T P}{T P + F N},$ (26) $F P R = \frac{F P}{T N + F P},$ (27)

where TP and FP are the samples predicted to be positive. TP is actually positive and FP is negative. Besides, TN and FN are predicted to be negative, where TN actually stands for negative examples and FN actually represents positive samples.

3.2. Parameter selection

To obtain superior prediction performance, four parameters require adjustment, that is, the regularization parameters $α$ and $β$ , the exponential parameter $λ$ , and the number of neighbors K. First, all combinations of parameters $α ∕ β \in \{10^{- 3}, 10^{- 2}, 10^{- 1}, 10^{0}, 10^{1}\}$ , $λ \in \{2^{2}, 2^{3}, 2^{4}, 2^{5}\}$ are taken into account. After conducting the grid selection, the four-slice diagram is plotted in Figure 2. It can be seen that our model has the optimal AUC value at $α = 1 0^{0}$ , $β = 1 0^{- 2}$ , and $λ = 2^{4}$ .

FIG. 2.

The influence of $λ$ , $α$ , and $β$ on the prediction results.

Next, the effect of K on the prediction results in hypergraph learning is verified. Specifically, AUC values are calculated for K from 5 to 30 in steps of 5 while keeping the remaining parameters constant. As shown in Figure 3, the optimal value is obtained when $K = 15$ and the AUC value remains stable as K increases. Therefore, K is set to 15 to improve the efficiency of our method.

FIG. 3.

The influence of k on the prediction results.

3.3. Methods comparison

To evaluate the prediction performance of the model, ILLCEL is utilized for comparison with NCPMDA (Gu et al., 2016), NCMCMDA (Chen et al., 2021), NMFMC (Zheng et al., 2022), TCRWMDA (Yu et al., 2020), and GRPAMDA (Zhong et al., 2022). As shown in Figure 4, the AUC values of ILLCEL, NCPMDA, NCMCMDA, NMFMC, TCRWMDA, and GRPAMDA are 0.9533, 0.8778, 0.9085, 0.9165, 0.9209, and 0.9396, respectively. It can be observed that ILLCEL has better prediction performance.

FIG. 4.

AUC values of different prediction methods. AUC, area under the curve.

Since ILLCEL performs prediction in miRNA space and disease space independently and takes the mean value as output, the predicted values on the miRNA (disease) side are compared with ILLCEL in keeping the framework consistent. As shown in Figure 5, the predicted values of our method outperform that of the single space, which indicates that more MDAs can be mined by gathering valid biological information in both spaces.

FIG. 5.

Prediction results from the miRNA (disease) side.

3.4. Case studies

To further evaluate the prediction performance of ILLCEL, Kidney Neoplasms is analyzed in this section. First, the association matrix and the prediction matrix are sorted in descending order by association score to select miRNAs with higher association scores. Second, the accuracy of ILLCEL in predicting known MDAs can be demonstrated by comparing the prediction matrix with the original association matrix. Finally, HMDD v3.2 and miRCancer are used to validate the reliability of the model in predicting unknown associations. For convenience, HMDD v3.2 (Huang et al., 2019) and miRCancer (Xie et al., 2013) are denoted by H and M, respectively, in this section.

As one of the top 10 cancers in the world, renal tumor is a common urological tumor whose incidence and mortality rate are increasing year by year. Kidney neoplasms develop from various types of renal cells, such as renal cell carcinoma (RCC) and clear cell RCC (ccRCC) (Su et al., 2020). In Table 1, 9 of the top 20 miRNAs associated with kidney neoplasms have been accurately predicted. Among the remaining 11 predicted miRNAs, miR-200a, miR-155, miR-375, and miR-203 can be confirmed in H or M. It has been indicated that miR-429 can inhibit cancer cell production by targeting AKT1 in RCC.

Table 1.
Predicted MicroRNAs for Kidney Neoplasms

Rank miRNA Evidence Rank miRNA Evidence

1 hsa-mir-1 Known 11 hsa-mir-23b Known

2 hsa-mir-200 Known 12 hsa-mir-9 PMID: 30532596

3 hsa-mir-141 Known 13 hsa-mir-155 H

4 hsa-mir-215 Known 14 hsa-mir-99a PMID: 23173671

5 hsa-mir-192 Known 15 hsa-mir-125b PMID: 25155155

6 hsa-mir-15a Known 16 hsa-mir-375 M

7 hsa-mir-21 Known 17 hsa-mir-146a PMID: 21975861

8 hsa-mir-200c Known 18 hsa-mir-203 H, M

9 hsa-mir-200a H 19 hsa-mir-20a PMID: 34360679

10 hsa-mir-429 PMID: 31814979 20 hsa-mir-200b PMID: 31130475

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-1	Known	11	hsa-mir-23b	Known
2	hsa-mir-200	Known	12	hsa-mir-9	PMID: 30532596
3	hsa-mir-141	Known	13	hsa-mir-155	H
4	hsa-mir-215	Known	14	hsa-mir-99a	PMID: 23173671
5	hsa-mir-192	Known	15	hsa-mir-125b	PMID: 25155155
6	hsa-mir-15a	Known	16	hsa-mir-375	M
7	hsa-mir-21	Known	17	hsa-mir-146a	PMID: 21975861
8	hsa-mir-200c	Known	18	hsa-mir-203	H, M
9	hsa-mir-200a	H	19	hsa-mir-20a	PMID: 34360679
10	hsa-mir-429	PMID: 31814979	20	hsa-mir-200b	PMID: 31130475

In addition, the growth and migration of RCC are inhibited by the dysregulation of miR-99a (Cui et al., 2012) and miR-200b (Li et al., 2019). Significantly, miR-20a can be used as a biomarker for identifying RCC (Oto et al., 2021). The expression of miR-9 (Xie et al., 2018) tends to be downregulated in ccRCC compared to normal renal tissue, while the expression levels of miR-125b (Fu et al., 2014) and miR-146a (Ichii et al., 2012) are enhanced.

As a type of intracranial tumor, brain neoplasms arise from the uncontrolled growth of abnormal cells. Among them, medulloblastoma (MB) is the most common malignant brain tumor in children. It has been shown that miR-9 can exert a great effect on MB by regulating HES1. Furthermore, the expression of miRNA target genes is positively associated with the accumulation of miRNAs in the cerebellum (Dubuc et al., 2012). In Table 2, the top 20 miRNAs associated with brain neoplasms are selected. Among them, 11 known associations have been successfully confirmed. In addition, 4 novel predicted associations can be found in H or M. It has been documented that the development of gliomas is closely associated with the dysregulation of miR-125b, miR-92a, and miR-145. In glioblastoma, miR-29a can exert a significant inhibitory effect on cells by regulating the platelet-derived growth factor (PDGF) pathway (Yang et al., 2019).

Table 2.

Predicted MicroRNAs for Brain Neoplasms

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-21	Known	11	hsa-mir-32	Known
2	hsa-mir-1	Known	12	hsa-mir-92a	PMID: 31378903
3	hsa-mir-9	Known	13	hsa-mir-92b	Known
4	hsa-mir-22	Known	14	hsa-let-7a	H
5	hsa-mir-92	Known	15	hsa-mir-145	PMID: 23390502
6	hsa-mir-222	Known	16	hsa-mir-16	M
7	hsa-mir-326	Known	17	hsa-mir-34a	Known
8	hsa-mir-221	H, M	18	hsa-mir-150	unconfirmed
9	hsa-mir-129	Known	19	hsa-mir-106b	H
10	hsa-mir-125b	PMID: 30131528	20	hsa-mir-29a	PMID: 31482267

4. CONCLUSION

Currently, identifying that miRNAs associated with complex diseases is a critical research topic in the biomedical field. In this article, a ensemble learning model based on locality-constrained linear coding is proposed to predict associations (ILLCEL). Specifically, features and similarities from different bioinformatics sources are fed into the ensemble learning. To preserve the local information, LLC converts the IP feature descriptors extracted from the known association matrix into an encoding matrix. Considering the high-order relationships between samples, the hypergraph-regular term is introduced into the model. After integrating the prediction matrices in miRNA space and disease space, unknown associations are identified. The AUC value of ILLCEL under fivefold CV is 0.9533, which is superior to the comparison methods. In the case studies, known associations have been accurately predicted, and most of the novel associations are confirmed.

However, ILLCEL still has some shortcomings. First, the multisource bioinformation is not reasonably integrated when inputted into the ensemble learning, which may affect the prediction accuracy. Second, a more reasonable strategy needs to be designed to select the optimal combination of parameters.

Footnotes

AUTHORs' CONTRIBUTIONS

Y.S.: writing-review and editing. J.-X.L.: review and editing. J.W.: data collection. Y.-L.G.: visualization. B.-X.G.: formal analysis.

AUTHOR DISCLOSURE STATEMENT

The authors declare they have no conflicting financial interests.

FUNDING INFORMATION

This work was supported by the National Natural Science Foundation of China (Grant No. 62172254).

References

Chen

, Sun

, Zhao

. NCMCMDA: MiRNA–disease association prediction through neighborhood constraint matrix completion. Brief Bioinformatics, 2021; 22(1):485–496.

Cheng

, Byrom

, Shelton

, et al. Antisense inhibition of human MiRNAs and indications for an involvement of MiRNA in cell growth and apoptosis. Nucleic Acids Res, 2005; 33(4):1290–1297; doi: 10.1093/nar/gki200

Cui

, Zhou

, Zhao

, et al. MicroRNA-99a induces G1-Phase cell cycle arrest and suppresses tumorigenicity in renal cell carcinoma. BMC Cancer, 2012; 12(1):1–11.

Cui

, Liu

, Gao

, et al. RCMF: A robust collaborative matrix factorization method to predict MiRNA-disease associations. BMC Bioinformatics, 2019; 20(25):1–10.

Deng

S-P

, Huang

D-S

. SFAPS: An R Package for structure/function analysis of protein sequences based on Informational Spectrum Method. Methods, 2014; 69(3):207–212; doi: 10.1016/j.ymeth.2014.08.004

Ding

, Jiang

, Tang

, et al. Identification of human MicroRNA-disease association via Hypergraph Embedded Bipartite Local Model. Comput Biol Chem, 2020; 89:107369; doi: 10.1016/j.compbiolchem.2020.107369

Dubuc

, Mack

, Unterberger

, et al. The epigenetics of brain tumors. Cancer Epigenet Methods Protoc, 2012; 863:139–153.

Fawcett

. An introduction to ROC analysis. Pattern Recogn Lett, 2006; 27(8):861–874.

, Liu

, Pan

, et al. Tumor MiR-125b predicts recurrence and survival of patients with clear-cell renal cell carcinoma after surgical resection. Cancer Sci, 2014; 105(11):1427–1434.

10.

Gao

, Cui

, Gao

, et al. Dual-network sparse graph regularized matrix factorization for predicting MiRNA–disease associations. Mol Omics, 2019; 15(2):130–137; doi: 10.1039/C8MO00244D.

11.

, Liao

, Li

, et al. Network consistency projection for human MiRNA-disease associations inference. Sci Rep, 2016; 6(1):1–10; doi: 10.1038/srep36054

12.

Huang

, Shi

, Gao

, et al. HMDD v3. 0: A database for experimentally supported human microRNA–disease associations. Nucleic acids Res, 2019; 47(D1):D1013–D1017; doi: 10.1093/nar/gky1010

13.

Ichii

, Otsuka

, Sasaki

, et al. Altered expression of MicroRNA MiR-146a correlates with the development of chronic renal inflammation. Kidney Int, 2012; 81(3):280–292.

14.

Kozomara

, Griffiths-Jones

. MiRBase: Annotating high confidence microRNAs using deep sequencing data. Nucleic acids Res, 2014; 42(D1):D68–D73.

15.

, Qiu

, Tu

, et al. HMDD v2. 0: A database for experimentally supported human microRNA and disease associations. Nucleic acids Res, 2014; 42(D1):D1070–D1074; doi: 10.1093/nar/gkt1023

16.

, Guan

, Liu

, et al. MicroRNA-200b is downregulated and suppresses metastasis by targeting LAMA4 in renal cell carcinoma. EBio Med, 2019; 44:439–451.

17.

Lipscomb

. Medical Subject Headings (MeSH). Bull Med Libr Assoc, 2000; 88(3):265–266.

18.

Oto

, Herranz

, Plana

, et al. Identification of MiR-20a-5p as Robust Normalizer for urine microRNA studies in renal cell carcinoma and a profile of dysregulated microRNAs. Int J Mol Sci, 2021; 22(15):7913.

19.

, Zhang

, Lyu

, et al. LLCMDA: A novel method for predicting MiRNA gene and disease relationship based on locality-constrained linear coding. Front Genet, 2018; 9:576.

20.

Saffari

, Ebrahimi-Moghadam

. Label propagation based on local information with adaptive determination of number and degree of neighbor's similarity. Neurocomputing, 2015; 153:41–53.

21.

, Jiang

, Chen

, et al. MicroRNA-429 inhibits cancer cell proliferation and migration by targeting AKT1 in renal cell carcinoma. Mol Clin Oncol, 2020; 12(1):75–80.

22.

Vladimir

The Needleman-Wunsch Algorithm for Sequence Alignment. Lecture given at the 7th Melbourne Bioinformatics Course, 2008; pp. 1–46.

23.

Wang

, Wang

, Lu

, et al. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics, 2010; 26(13):1644–1650; doi: 10.1093/bioinformatics/btq241

24.

Xie

, Ding

, Han

, et al. MiRCancer: A microRNA–cancer association database constructed by text mining on literature. Bioinformatics, 2013; 29(5):638–644; doi: 10.1093/bioinformatics/btt014

25.

Xie

, Lv

, Liu

, et al. Identification and validation of a four-miRNA (MiRNA-21-5p, MiRNA-9-5p, MiR-149-5p, and MiRNA-30b-5p) prognosis signature in clear cell renal cell carcinoma. Cancer Manage Res, 2018; 10:5759.

26.

Yang

, Dodbele

, Park

, et al. MicroRNA-29a inhibits glioblastoma stem cells and tumor growth by regulating the PDGF pathway. J Neurooncol, 2019; 145:23–34.

27.

, Shen

, Zhong

, et al. Three-layer heterogeneous network combined with unbalanced random walk for miRNA-disease association prediction. Front Genet, 2020; 10:1316.

28.

Zhang

, Yue

, Tang

, et al. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions. PLoS Comput Biol, 2018; 14(12):e1006616.

29.

Zheng

, Zhang

, Wan

. MiRNA-disease association prediction via non-negative matrix factorization based matrix completion. Signal Process, 2022; 190:108312.

30.

Zhong

, Li

, You

, et al. Predicting miRNA–disease associations based on Graph Random Propagation Network and Attention Network. Brief Bioinformatics, 2022;23(2).

31.

Zhou

, Yin

, Jiao

, et al. Predicting miRNA-disease associations through deep autoencoder with Multiple Kernel Learning. IEEE Trans Neural Netw Learn Syst, 2021; 1–10.