Multi-View Enhanced Tensor Nuclear Norm and Local Constraint Model for Cancer Clustering and Feature Gene Selection

Abstract

The analysis of cancer data from multi-omics can effectively promote cancer research. The main focus of this article is to cluster cancer samples and identify feature genes to reveal the correlation between cancers and genes, with the primary approach being the analysis of multi-view cancer omics data. Our proposed solution, the Multi-View Enhanced Tensor Nuclear Norm and Local Constraint (MVET-LC) model, aims to utilize the consistency and complementarity of omics data to support biological research. The model is designed to maximize the utilization of multi-view data and incorporates a nuclear norm and local constraint to achieve this goal. The first step involves introducing the concept of enhanced partial sum of tensor nuclear norm, which significantly enhances the flexibility of the tensor nuclear norm. After that, we incorporate total variation regularization into the MVET-LC model to further augment its performance. It enables MVET-LC to make use of the relationship between tensor data structures and sparse data while paying attention to the feature details of the tensor data. To tackle the iterative optimization problem of MVET-LC, the alternating direction method of multipliers is utilized. Through experimental validation, it is demonstrated that our proposed model outperforms other comparison models.

1. INTRODUCTION

Nowadays, cancer represents one of the most severe life-threatening hazards for individuals. Currently, there are many sequencing technologies, through which a large amount of cancer sequencing data have been generated. The exploration of cancer genomics data can provide powerful support for tumor prediction, diagnosis, and treatment. The existing work (Liao et al., 2003) indicates that few genes play a significant role in biological processes. As a result, it is essential to discover the genes that affect biological processes.

Many models have been used in biological data analysis, such as robust principal component analysis (RPCA) (Liu et al., 2014). It divides the observational data into low-rank and sparse. In some works (Bouwmans et al., 2018; Vaswani et al., 2018), nuclear norm minimization is a convex relaxation with rank minimization because rank minimization is discontinuous and non-convex. Then, Oh et al. (2016) pointed out that the nuclear norm can cause some errors. So partial sum of singular values (PSSV) is used to solve this problem (Oh et al., 2016).

In practical application, multidimensional data are ubiquitous. High-dimensional magnetic resonance imaging enables the imaging of unknown lesions to be detected more efficiently and earlier. He et al. (2016a) applied low-rank tensors to sparse sampling and reconstruction of medical images. In the field of biology (Hu et al., 2019), feature factor selection using tensor robust principal component analysis (TRPCA) was first introduced by Hu et al. (Yu et al., 2019). A method called t-product-based sparse regularization tensor robust principal component analysis (t-STRPCA) was proposed to select feature genes, and the clustering of biological data samples has proved to be meaningful (Yang et al., 2020). Zhao et al. (2020) defined a low-rank weighted model based on the tensor framework for tumor genomics sample clustering, which is conducive to disease prevention and treatment. In addition, in literature (Zhao et al., 2021), a new model, hypergraph regularized tensor robust principal component analysis (HTRPCA), is used to preserve the complex geometric information between multiple samples.

A new model, called outlier-robust tensor principal component analysis (OR-TPCA), has been introduced for addressing tensor-related problems (Zhou and Feng, 2017). OR-TPCA can detect outliers better than TRPCA and can provide a more accurate recovery of tensor subspace. Braman (2010) and Kilmer and Martin (2011) are the first to mention tensor singular value decomposition (T-SVD), which allows for the extension of matrix operations to tensor operations while suppressing the absence of inherent information caused by flattening the tensor. However, T-SVD may have some inevitable deviations. Liu et al. (2018) showed that the tensor low-rank signal remains in the core tensor acquired by singular value decomposition.

Liu et al. proposed improved robust tensor principal component analysis (IRTPCA). In fact, T-SVD minimizes all singular values simultaneously with the same strategy, resulting in a rank less than the target rank (Oh et al., 2013; Vaswani et al., 2018). Recently, the weighting strategy mentioned in Gu et al. (2014), Hosono et al. (2016), and Peng et al. (2014) is meaningful. Using the prior knowledge that the larger singular value contains the main information to compress each singular value to different degrees, generalizing nuclear norm. Furthermore, the utilization of the partial sum of tensor nuclear norm (PSTNN) has become increasingly prevalent based on the research concepts presented in Gu et al. (2014), Jiang et al. (2020), Oh et al. (2016), and Zhang and Peng (2019).

Building upon the theoretical background, we introduce a novel tensor nuclear norm (TNN) called the enhanced partial sum of tensor nuclear norm (EPSTNN). EPSTNN incorporates the first N singular values, preventing issues of rank deficiency. In addition, we apply total variation (TV) regularization (He et al., 2016b; Sun et al., 2019; Wang et al., 2018) to enforce constraints on the low-rank component. The following are the contributions of our study:

1.
A new model, Multi-View Enhanced Tensor Nuclear Norm and Local Constraint (MVET-LC), is proposed to effectively perform feature gene selection and clustering experiments on multi-view cancer data sets.
2.
EPSTNN is introduced to enhance the ability of MVET-LC to distinguish between the relative significance of internal data information, resulting in its capability to effectively extract low-rank structures from multiview data.
3.
TV regularization is integrated into MVET-LC to constrain the low-rank component, equipping the model with a better ability to capture the finer details of the internal data structures.

2. MATERIALS AND MODEL

2.1. Tensor preliminaries

In this article, we adopt Euler notation to represent tensors, for example, $ℳ \in ℛ^{n_{1} \times n_{2} \times n_{3}}$ , where denotes the Fourier transformed tensor of tensor $ℳ$ . $ℳ^{(i)}$ is the ith frontal slice of $ℳ$ .

Definition 2.1. (T-SVD) (Kilmer and Martin, 2011; Lu et al., 2020) A tensor $ℳ \in ℛ^{n_{1} \times n_{2} \times n_{3}}$ is given by $S 1 * S * S 2^{T}$ . Every frontal slice of $S \in ℛ^{n_{1} \times n_{2} \times n_{3}}$ is diagonal. $S 1 \in ℛ^{n_{1} \times n_{1} \times n_{3}}$ and $S 2 \in ℛ^{n_{2} \times n_{2} \times n_{3}}$ are orthogonal.

Definition 2.2. (Tensor average rank) (Lu et al., 2020) We can express the tensor average rank of $ℳ$ as $r a n k_{a} (ℳ)$ , which is defined as follows: $r a n k_{a} (ℳ) = \frac{1}{n_{3}} r a n k (b c i r c (ℳ)),$ (1)

where $b c i r c (ℳ)$ is the block cyclic matrix of $ℳ$ .

Definition 2.3. (TNN) (Lu et al., 2020) The TNN of $ℳ$ is symbolized as $| | ℳ | |_{*}$ . The TNN can be expressed as the summation of the singular values of a tensor. In recent studies, it has been shown that TNN can be expressed as Equation (2) and serves as a convex relaxation of the tensor average rank, as demonstrated in Lu et al. (2020).

2.2. EPSTNN

Although the minimization problem of TNN can be easily solved, it treats all singular values equally. However, large singular values are closely related to vital information, and they should be subject to more minor penalties to ensure that information is not lost or destroyed. Therefore, PSTNN has been proposed to approximate the rank (Jiang et al., 2020; Zhang and Peng, 2019), as follows:

where $| | \cdot | |_{p = N}$ is PSSV (Oh et al., 2016; Oh et al., 2013). PSSV is denoted by $| | \cdot | |_{p = N} = \sum_{i = N + 1}^{min (m, n)} σ_{i} (\cdot)$ for a matrix.

TNN treats all singular values equally. However, large singular values should be subject to more minor penalties to ensure that information is not lost or destroyed. We propose a new nonconvex proxy EPSTNN. EPSTNN orders the singular values of the first N according to their magnitudes and assigns weights to them. Generally, the larger singular values are related to the main projection direction, so the weights of the larger values should be increased to reduce the shrinkage of the values. Figure 1 is a diagram of EPSTNN. EPSTNN expands the difference of internal structure information, which is conducive to the accurate retention of necessary data, as follows:

FIG. 1.

Illustrations of enhanced partial sum of tensor nuclear norm.

$| | \cdot | |_{w = t o p N}$ denotes the weighting operation on the N largest singular values, and $| | \cdot | |_{w = t o p N} = \sum_{i = 1}^{N} w_{i} σ_{i} (\cdot)$ . In addition, $| | \cdot | |_{p = N} = \sum_{i = N + 1}^{min (m, n)} σ_{i} (\cdot)$ (Zhang and Peng, 2019). $σ_{i} (\cdot)$ is the ith largest singular value. w_i is the weight applied to each singular value, .

To solve the minimization problem based on EPSTNN, the enhanced partial sum of the singular value thresholding operator $ℰ (\cdot)$ is proposed. Let $Y = Y_{1} + Y_{2}$ , Y₁ and Y₂ are $U_{Y 1} S_{Y 1} {V_{Y 1}}^{H}$ and $U_{Y 2} S_{Y 2} {V_{Y 2}}^{H}$ , where $U_{Y 1}$ , $V_{Y 1}$ are the vector matrices related to the N most significant singular values, and $U_{Y 2}$ , $V_{Y 2}$ correspond to the last $(min (n_{1}, n_{2}) - N)$ singular values. $ℰ_{N, w, τ} (Y) = U_{Y 1} T_{w} [S_{Y 1}] {V_{Y 1}}^{H} + U_{Y 2} T_{τ} [S_{Y 2}] {V_{Y 2}}^{H} .$ (5)

$T_{w} [x]$ and $T_{τ} [x]$ are as follows:

2.3. TV regularization

It has been mentioned in Hu et al. (2019) and Liu et al. (2013) that low-rank and sparse tensors can be obtained from original observation tensor, in which the sparse tensor data are regarded as important genes so that feature genes can be extracted from the sparse data. To achieve high accuracy in the feature genes extraction from cancer data, TV is introduced in this article. The definition of TV is as follows:

Let $D (ℳ) = [D_{1} (ℳ), D_{2} (ℳ), D_{3} (ℳ)]$ denote the three-dimensional difference operator. In the tensor field, $D_{1} (ℳ)$ , $D_{2} (ℳ)$ , and $D_{3} (ℳ)$ are first-order difference operators in three different directions. Then TV is defined as the l₁-norm of the difference results, as follows: $|| ℳ| |_{T V} = || D (ℳ)| |_{1} .$ (8)

2.4. MVET-LC model and its solution

Advances in sequencing technology have allowed more gene expression data to be detected. These gene expression data contain a lot of meaningful biological information. Some genomics data are close to some low-dimensional subspaces (Bartenhagen et al., 2010; Bertoni and Valentini, 2005; Liu et al., 2014; Liu et al., 2013). To take into account the internal structure of the subspace in cancer omics data and to accurately extract feature data, we propose a data processing model called MVET-LC. ${min}_{D, ℋ} || D| |_{E P S T N N} + λ_{1} || D| |_{T V} + λ_{2} || ℋ| |_{1}, s . t . G = D + ℋ .$ (9)

$D$ is a low-rank tensor, and $ℋ$ is sparse. By introducing the definition of TV and auxiliary variables, formula (8) can be rewritten as ${min}_{Z, ℒ, D, ℋ} || Z| |_{E P S T N N} + λ_{1} || ℒ| |_{1} | + λ_{2} || ℋ| |_{1}, s . t . Z = D, ℒ = D (D), G = Z + ℋ .$ (10)

Through the use of alternating direction method of multipliers, we can transform Equation (10) into the following form: $\begin{matrix} L (Z, ℒ, D, ℋ) = || Z| |_{E P S T N N} + λ_{1} || ℒ| |_{1} + λ_{2} || ℋ| |_{1} + < X_{1}, Z - D > + < X_{2}, ℒ - D (D) > \\ + < X_{3}, G - Z - ℋ > + \frac{μ}{2} (|| Z - D| |_{F}^{2} + || ℒ - D (D)| |_{F}^{2} + || G - Z - ℋ| |_{F}^{2}) . \end{matrix}$ (11)

Next, we decompose Equation (11) into multiple subproblems to solve iteratively. Algorithm 1 outlines the iterative solution process for MVET-LC (Table 1).

Table 1.

Solving Multi-View Enhanced Tensor Nuclear Norm and Local Constraint by Alternating Direction Method of Multipliers

Algorithm 1
Input: $G \in ℛ^{n_{1} \times n_{2} \times n_{3}}$
Output: $D$ , $ℋ$ While not converged do
1. Update $Z^{k + 1}$ by (12)
2. Update $D^{k + 1}$ by (13)
3. Update $ℒ^{k + 1}$ by (14)
4. Update $ℋ^{k + 1}$ by (15)
5. Update ${X_{1}}^{k + 1}$ by ${X_{1}}^{k + 1} = {X_{1}}^{k} + μ^{k} (Z^{k + 1} - D^{k + 1})$
6. Update ${X_{2}}^{k + 1}$ by ${X_{2}}^{k + 1} = {X_{2}}^{k} + μ^{k} (ℒ^{k + 1} - D (D^{k + 1}))$
7. Update ${X_{3}}^{k + 1}$ by ${X_{3}}^{k + 1} = {X_{3}}^{k} + μ^{k} (G - Z^{k + 1} - ℋ^{k + 1})$
8. Update $μ^{k + 1}$ by $μ^{k + 1} = min (ρ μ^{k}, μ_{max})$
End while

Update $Z$ :

Update $D$ :

Update $ℒ$ :

Update $ℋ$ :

In Algorithm 1, the main time cost of MVET-LC is the update of $Z$ and $D$ , while a simple linear system can optimize the remaining variables. The total overhead of MVET-LC is $O (2 n_{1} n_{2} n_{3} log (n_{3}) + n_{(1)} n_{(2)}^{2} n_{3})$ .

3. EXPERIMENTS AND DISCUSSION

Some genomics data are close to low-dimensional subspaces, so the low-rank tensor $D$ is naturally used for sample clustering. Moreover, feature genes are important characteristic genes in cancer research. After the MVET-LC model decomposition, the differentially expressed tensor $ℋ$ is used in the feature selection experiment.

The proposed model, MVET-LC, is applied to multiview data for feature selection and clustering experiments. The framework of MVET-LC is illustrated in Figure 2. RPCA (Oh et al., 2016), TRPCA (Lu et al., 2016), the model by Lu et al. (2020) (referred to in this article as T-TRPCA), OR-TPCA (Zhou and Feng, 2017), IRTPCA (Liu et al., 2018), PSTNN (Zhang and Peng, 2019), and EPSTNN (in this article) are chased as comparison models.

FIG. 2.

The framework of the Multi-View Enhanced Tensor Nuclear Norm and Local Constraint model.

3.1. Data sets

Using 12 different types of cancer data from the TCGA website, including pancreatic adenocarcinoma (PAAD), cholangiocarcinoma (CHOL), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), stomach adenocarcinoma (STAD), rectum adenocarcinoma (READ), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), uterine corpus endometrial carcinoma (UCEC), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), and uterine carcinosarcoma (UCS), three data sets (SRLL, PCCEH, and UCU) were obtained. As shown in Figure 3, the three dimensions in the three-dimensional tensor data set represent genes, data types, and samples. Feature selection and clustering experiments were then performed on these multiview data sets using the MVET-LC framework.

FIG. 3.

Schematic diagram of tensor cancer genomics data.

SRLL is the combination of STAD, READ, LIHC, and LUAD, PCCEH is the combination of PAAD, CHOL, COAD, ESCA, and HNSC, and UCU is the combination of UCEC, CESC, and UCS. Table 2 lists the detailed information.

Table 2.

Descriptions of the Data Sets

Data sets	Samples	Genes	Classes
SRLL	1276	17,462	4
PCCEH	1055	16,951	5
UCU	737	17,243	3

3.2. Feature selection and analysis

We use the aforementioned models to extract feature genes from multiview data and compare and analyze the extraction results. First of all, according to Hu et al. (2019), we sort the genes extracted from each model according to the gene relevance score (GRS). Generally, genes with higher correlation scores have a greater impact on tissue lesions. We selected the top 500 genes and searched for them on the GeneCards website for further analysis.

Figure 4 presents the top genes selected by the models with GRS in the three data sets and their corresponding average values. HRS and ARS denote the highest GRS and the average GRS, respectively. HRS can reflect whether an algorithm can effectively select the most effective key genes. ARS represents the average GRS value of the selected genes. Therefore, ARS reflects the overall quality of the selected genes. In contrast, HRS represents the gene with the highest GRS value, indicating the most significant gene in the feature selection process. Thus, MVET-LC can effectively select key genes with high GRS values (HRS high). In other words, some other genes not proved to be pathogenic extracted by MVET-LC may be associated with cancers, which provides a new path for tumor analysis.

FIG. 4.

The results of extraction of feature genes.

Table 3 shows the top two genes selected by the MVET-LC model with the highest GRS in each data set. We then focused on the relationship between the SMAD4 gene and the SRLL data set.

Table 3.

The Top Two Genes with the Highest GRS Are Selected in Each Data Set

Data set	Gene abbreviation	Gene official name	Related diseases
SRLL	SMAD4	SMAD Family Member 4	Myhre Syndrome and Juvenile Polyposis Syndrome
SRLL	MSH2	MutS Homolog 2	Lynch Syndrome I and Mismatch Repair Cancer Syndrome 2
PCCEH	KRAS	KRAS Proto-Oncogene, GTPase	Oculoectodermal Syndrome and Noonan Syndrome 3
PCCEH	MET	MET Proto-Oncogene, Receptor Tyrosine Kinase	Renal Cell Carcinoma, Papillary, 1 and Deafness, Autosomal Recessive 97
UCU	EGFR	Epidermal Growth Factor Receptor	Inflammatory Skin And Bowel Disease, Neonatal, 2 and Lung Cancer
UCU	MUC1	Mucin 1, Cell Surface Associated	Tubulointerstitial Kidney Disease, Autosomal Dominant, 2 and Syringoma

SMAD4 is classified as a tumor suppressor gene (Howe et al., 2002). Researchers found that selective deletion of SMAD4 in T cells can lead to carcinogenesis in the gastrointestinal tract of mice (Kim et al., 2006). Therefore, SMAD4 signaling in T cells is necessary to inhibit gastrointestinal tumors. A recent study by Jiang et al. (2019) investigated the association between BRAF and SMAD4 mutations and resistance to neoadjuvant chemoradiotherapy in patients with locally advanced rectal cancer.

SMAD family of genes are candidate tumor suppressor genes in LIHC (Yu et al., 2016). SMAD4 was found to be commonly mutated through genome sequencing of lung tumors in patients diagnosed with squamous cell carcinoma (Liu et al., 2015). According to Bian et al. (2015) the prognosis of patients with LUAD can be predicted based on the immunohistochemical status of P53Mut, P16, and SMAD4. Through case analysis, we prove that the genes selected by the MVET-LC model have biological significance.

3.3. Clustering results and analysis

Clustering analysis of multiview cancer samples is beneficial for finding potential correlation and complementarity information in complex data. Also, it can provide support for the study of the similarity among cancers and the discovery of new cancer subtypes. The evaluation metrics used are as follows: accuracy (ACC), normalized mutual information (NMI), F-measure (F-M), and adjusted Rand index (ARI).

We have $C_{1} = \{C_{1}^{1}, C_{1}^{2}, \dots, C_{1}^{k}\}$ and $C_{2} = \{C_{2}^{1}, C_{2}^{2}, \dots, C_{2}^{k}\}$ denoting predicted subtype class labels and actual subtype class labels, respectively, and NMI is defined as

where $M I (\cdot)$ represents mutual information and $H (\cdot)$ represents information entropy.

F-M is defined as

ARI is defined as

where a_i and b_j are the number of samples in class $C_{1}^{i}$ and class $C_{2}^{j}$ , respectively.

Table 4 and Figure 5 show the results of the aforementioned models on the three data sets. To show the experimental results more intuitively, we visualized the clustering results of the MVET-LC model in Figure 6. Different colors represent different diseases in the data sets.

FIG. 5.

The clustering outcomes for the SRLL, PCCEH, and UCU data sets.

FIG. 6.

Visualization of clustering effect.

Table 4.

The Results of the Models on the Three Data Sets

Data sets	Metrics (%)	RPCA	TRPCA	t-TRPCA	OR-TPCA	IRTPCA	PSTNN	EPSTNN	MVET-LC
SRLL	F-M	73.88	76.78	78.36	76.96	74.75	76.69	76.89	80.56
SRLL	ARI	58.97	62.65	61.14	62.87	65.91	63.49	67. 01	67.58
PCCEH	F-M	61.87	72.49	72.16	68.78	71.19	72.69	76.82	80.32
PCCEH	ARI	67.13	72.18	76.02	78.45	80.03	76.91	78.99	79.22
UCU	F-M	75.86	80.19	80.85	82.53	82.41	83.89	86.92	87.39
UCU	ARI	63.46	68.63	69.08	74.18	73.44	74.59	75.82	77.14

Bold values show the best results.

ARI, adjusted Rand index; EPSTNN, enhanced partial sum of tensor nuclear norm; F-M, F-measure; IRTPCA, improved robust tensor principal component analysis; MVET-LC, Multi-View Enhanced Tensor Nuclear Norm and Local Constraint; OR-TPCA, outlier-robust tensor principal component analysis; PSTNN, partial sum of tensor nuclear norm; RPCA, robust principal component analysis; TRPCA, tensor robust principal component analysis; t-TRPCA, the model by Lu et al. (2020).

The specific analysis and discussion are as follows:

Table 4 shows that the matrix model RPCA has a lower accuracy by at least 1% compared with the tensor data analysis models. This phenomenon may be because RPCA destroys the geometric structure information of three-dimensional data when processing multiview cancer genomics data. It also fails to effectively detect the outliers by matrix or flattening the data. However, tensor methods can avoid the aforementioned problems and concentrate on dealing with multiview data.

In general, the experimental outcomes obtained from EPSTNN and the approach built on EPSTNN (MVET-LC) are superior to those from PSTNN and models constructed based on TNN (TRPCA, T-TRPCA, OR-TPCA, and IRTPCA). EPSTNN is exploited to widen the discrimination between the data. Data with different information can be identified more clearly, which increases the probability of discovering new cancer subtypes. Meanwhile, EPSTNN can enhance the precision of low-rank structures extraction, thereby facilitating the identification of genes that are more closely linked to cancer.

MVET-LC with TV has the advantage of preserving the spatial structure of multiview data sets compared with other models. Meanwhile, due to the addition of TV item, MVET-LC pays more attach to the local features easily overlooked in the data. In Table 4 and Figure 5, the comprehensive evaluation index shows that MVET-LC is satisfactory.

4. CONCLUSION

Our research has presented a new model, MVET-LC, designed specifically for the analysis of cancer data. EPSTNN is utilized in this model to capture complex internal structures of the data, whereas TV is added as a constraint term for the low-rank tensor to preserve local features. Our experiments have shown that the MVET-LC model performs better than other models.

In the future, our research will continue to focus on the analysis of multiview cancer data, with particular emphasis on low-rank structure extraction and sparse component separation.

Footnotes

AUTHORs' CONTRIBUTIONS

Writing—review and editing (equal) by Q.Q., S.-S.Y., and J.S. Writing—review and editing (lead) by J.-X.L.

AUTHOR DISCLOSURE STATEMENT

The authors declare they have no conflicting financial interests.

FUNDING INFORMATION

This study was supported in part by the National Natural Science Foundation of China under Grant Nos. 61872220 and 62172254.

References

Bartenhagen

, Klein

H-U

, Ruckert

, et al. Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinformatics, 2010; 11(1):567; doi: 10.1186/1471-2105-11-567

Bertoni

, Valentini

. Random projections for assessing gene expression cluster stability. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005, vol. 141. IEEE: Montreal, QC, Canada; 2005; pp. 149–154.

Bian

, Li

, Xu

, et al. Clinical outcome and expression of mutant P53, P16, and Smad4 in lung adenocarcinoma: A prospective study. World J Surg Oncol, 2015; 13(1):1–8.

Bouwmans

, Javed

, Zhang

, et al. On the applications of robust PCA in image and video processing. Proc IEEE, 2018; 106(8):1427–1457; doi: 10.1109/JPROC.2018.2853589

Braman

Third-order tensors as linear operators on a space of matrices. Linear Algebra Appl, 2010; 433(7):1241–1253; doi: 10.1016/J.LAA.2010.05.025

, Zhang

, Zuo

, et al. Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, Columbus, Ohio, USA; 2014; pp. 2862–2869.

, Liu

, Christodoulou

, et al. Accelerated high-dimensional MR imaging with sparse sampling using low-rank tensors. IEEE Trans Med Imaging, 2016a;35(9):2119–2129; doi: 10.1109/TMI.2016.2550204

, Zhang

, et al. Total-variation-regularized low-rank matrix factorization for hyperspectral image restoration. IEEE Trans Geosci Remote Sens, 2016b;54(1):178–188; doi: 10.1109/TGRS.2015.2452812

Hosono

, Ono

, Miyata

Weighted tensor nuclear norm minimization for color image denoising. In: IEEE International Conference on Image Processing, Phoenix, Arizona, USA; 2016; pp. 3081–3085.

10.

Howe

, Shellnut

, Wagner

, et al. Common deletion of SMAD4 in juvenile polyposis is a mutational hotspot. Am J Hum Genet, 2002; 70(5):1357–1362.

11.

, Liu

J-X

, Gao

Y-L

, et al. Differentially expressed genes extracted by the tensor robust principal component analysis (TRPCA) method. Complexity, 2019; 2019:1–13; doi: 10.1155/2019/6136245

12.

Jiang

, Wang

, et al. Mutation in BRAF and SMAD4 associated with resistance to neoadjuvant chemoradiation therapy in locally advanced rectal cancer. Virchows Arch, 2019; 475(1):39–47.

13.

Jiang

T-X

, Huang

T-Z

, Zhao

X-L

, et al. Multi-dimensional imaging data recovery via minimizing the partial sum of tubal nuclear norm. J Comput Appl Math, 2020; 372:112680; doi: 10.1016/J.CAM.2019.112680

14.

Kilmer

, Martin

. Factorization strategies for third-order tensors. Linear Algebra Appl, 2011; 435(3):641–658; doi: 10.1016/J.LAA.2010.09.020

15.

Kim

B-G

, Li

, Qiao

, et al. Smad4 signalling in T cells is required for suppression of gastrointestinal cancer. Nature, 2006; 441(7096):1015–1019.

16.

Liao

, Boscolo

, Yang

Y-L

, et al. Network component analysis: Reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci U S A, 2003; 100(26):15522–15527; doi: 10.1073/PNAS.2136632100

17.

Liu

, Cho

S-N

, Akkanti

, et al. ErbB2 pathway activation upon Smad4 loss promotes lung tumor growth and metastasis. Cell Rep, 2015; 10(9):1599–1613.

18.

Liu

, Wang

, Zheng

, et al. Robust PCA based method for discovering differentially expressed genes. BMC Bioinformatics, 2013; 14(8):1–10.

19.

Liu

, Xu

, Zheng

, et al. RPCA-based tumor classification using gene expression data. IEEE/ACM Trans Comput Biol Bioinform, 2014; 12(4):1–1.

20.

Liu

, Chen

, Zhu

. Improved robust tensor principal component analysis via low-rank core matrix. IEEE J Sel Top Signal Process, 2018; 12(6):1378–1389; doi: 10.1109/JSTSP.2018.2873142

21.

, Feng

, Chen

, et al. Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA; 2016; pp. 5249–5257.

22.

, Feng

, Chen

, et al. Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans Pattern Anal Mach Intell, 2020; 42(4):925–938; doi: 10.1109/TPAMI.2019.2891760

23.

T-H

, Kim

, Tai

Y-W

, et al. Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision. In: Proceedings of the IEEE International Conference on Computer Vision. Australia; 2013; pp. 145–152.

24.

T-H

, Tai

Y-W

, Bazin

J-C

, et al. Partial sum minimization of singular values in robust PCA: Algorithm and applications. IEEE Trans Pattern Anal Mach Intell, 2016; 38(4):744–758; doi: 10.1109/TPAMI.2015.2465956

25.

Peng

, Suo

, Dai

, et al. Reweighted low-rank matrix recovery and its application in image restoration. IEEE Trans Cybern, 2014; 44(12):2418–2430; doi: 10.1109/TCYB.2014.2307854

26.

Sun

, Yang

J-G

, Long

, et al. Infrared small target detection via spatial-temporal total variation regularization and weighted tensor nuclear norm. IEEE Access, 2019; 7:56667–56682; doi: 10.1109/ACCESS.2019.2914281

27.

Vaswani

, Bouwmans

, Javed

, et al. Robust subspace learning: Robust PCA, robust subspace tracking, and robust subspace recovery. IEEE Signal Process Mag, 2018; 35(4):32–55; doi: 10.1109/MSP.2018.2826566

28.

Wang

, Peng

, Zhao

, et al. Hyperspectral image restoration via total variation regularized low-rank tensor decomposition. IEEE J Sel Top Appl Earth Obs Remote Sens, 2018; 11(4):1227–1243; doi: 10.1109/JSTARS.2017.2779539

29.

Yang

H-J

, Zhao

Y-Y

, Liu

J-X

, et al. Sparse Regularization Tensor Robust PCA Based on t-product and Its Application in Cancer Genomic Data. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: Seoul, Korea (South); 2020; pp. 2131–2138.

30.

, Lin

, Zhou

, et al. MiR-144 suppresses cell proliferation, migration, and invasion in hepatocellular carcinoma by targeting SMAD4. Onco Targets Ther, 2016; 9:4705.

31.

, Gao

, Liu

, et al. Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data. Hum Genomics, 2019; 13(Suppl 1):46.

32.

Zhang

, Peng

. Infrared small target detection based on partial sum of the tensor nuclear norm. Remote Sens, 2019; 11(4); doi: 10.3390/RS11040382

33.

Zhao

Y-Y

, Jiao

C-N

, Wang

M-L

, et al. HTRPCA: Hypergraph regularized tensor robust principal component analysis for sample clustering in tumor omics data. Interdiscip Sci Comput Life Sci 2021:1–12; doi: 10.1007/S12539-021-00441-8

34.

Zhao

Y-Y

, Wang

M-L

, Wang

, et al. Tensor Robust Principal Component Analysis with Low-Rank Weight Constraints for Sample Clustering. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, South Korea; 2020; pp. 397–401.

35.

Zhou

, Feng

Outlier-Robust Tensor PCA. In: Computer Vision and Pattern Recognition. IEEE: Honolulu, HI, USA; 2017; pp. 3938–3946.