Sparse Inverse Covariance Estimation with L 0 Penalty for Network Construction with Omics Data

Abstract

Constructing coexpression and association networks with omics data is crucial for studying gene–gene interactions and underlying biological mechanisms. In recent years, learning the structure of a Gaussian graphical model from high-dimensional data using L₁ penalty has been well-studied and many applications in bioinformatics and computational biology have been found. However, besides the problem of biased estimators with LASSO, L₁ does not always choose the true model consistently. Based on our previous work with L₀ regularized regression (Liu and Li, 2014), we propose an L₀ regularized sparse inverse covariance estimation (L₀RICE) for structure learning with the efficient alternating direction (AD) method. The proposed method is robust and has the oracle property. The proposed method is applied to omics data including data, from next-generation sequencing technologies. Novel procedures for network construction and high-order gene–gene interaction detection with omics data are developed. Results from simulation and real omics data analysis indicate that L₀ regularized structure learning can identify high-order correlation structure with lower false positive rate and outperform graphical lasso by a large margin.

1. Introduction

With advances in biotechnologies, massive omics data are increasingly available in the public domain. How to mine those big data and extract useful information efficiently poses significant computational challenges. Genetic networks constructed from omics data are crucial for studying gene–gene interactions and exploring the functional mechanisms of genes. Even though there are many methods proposed for microarray and various omics data (Liu et al., 2014), there are fewer methods for large-scale network construction with integrated omics data including next generation sequencing (NGS) data. One main reason is that different types of omics data have different formats, and they are not always normally distributed. For instance, NGS data are counts of short reads, which follow Poisson or negative binomial, but not normal distributions. Many standard approaches can not be applied directly. Sparse inverse correlation matrix with L₁ penalty for Gaussian graphical models was originally proposed for normally distributed data and has found many applications for studying interactions in coexpressopn networks, brain connectivity, and social networks (Fan and Liu, 2013; Kiiveri, 2011; Kim et al., 2012; Chaibub et al., 2010).

The inverse of a covariance matrix is also referred to as a concentration or a precision matrix. Mathematically, given m independent random samples \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ { \bf x}_1 , { \bf x}_2 , \ldots , { \bf x}_m \} $$ \end{document} from an n-dimensional Gaussian distribution with x_i ∼ N(μ, Σ), our goal is to estimate the inverse covariance matrix Σ⁻¹. The nonzero elements of Σ⁻¹ correspond to conditional dependence between two random variables. When the sample size m ≪ n, the number of features, the estimated covariance matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\hat{ \sigma}$$ \end{document} is not guaranteed to be positive definite anymore, and the inverse of the covariance matrix cannot be estimated directly. Regularized methods with appropriate penalty must be used to estimate the sparse inverse Φ = Σ⁻¹. Theoretically, it is appealing to regularize the model with L₀ penalty |Φ|₀, where |Φ|₀ represents the number of nonzero elements in Ω. However, because of the computationally challenging and lack of efficient method for L₀ optimization, the popular approach is to optimize the convex relaxation |Ω|₁, where |Φ|₁ indicates the element-wise L₁ norm of the matrix (Banerjee et al., 2008; Friedman et al., 2008; Yuan and Lin, 2007). Several improved algorithms have been developed for L₁ regulation, including alternating linearization (Scheinberg et al., 2010), greedy methods (Johnson et al., 2012), and quadratic approximation (Hsieh et al., 2014). However, L₁ regularized methods are asymptotically biased, and they do not always choose the true model consistently (Zou, 2006). It has been shown that L₁ optimization can perform infinitely worse than L₀ in some cases, and optimal L₁ solutions are usually inferior to the results of L₀ from classical stepwise regression (Lin et al., 2010). In addition, L₁ with cross-validation tends to include too many spurious variables.

Alternating direction method of multipliers (ADMM) was originally proposed by Hestenes (1969). It has attracted a lot of attentions and found many applications in large-scale optimization recently (Boyd et al., 2011; Scheinberg et al., 2010; Yuan 2012; Ouyang et al., 2013; Goldstein et al., 2014; Chen and Cheng, 2014). In this article, we propose an L₀ regularized sparse inverse covariance estimation (L₀RICE) for structure learning. The proposed approach directly solves the original L₀ optimization under the framework of ADMM. The regularized parameter ρ will be chosen with the stability selection criteria (Liu et al., 2010; Liu et al., 2014; Meinshausen and Bählmann, 2010) and is quite robust with different values of choices. Even though the sparse inverse covariance matrix was originally designed for normally distributed data, it only takes the sample covariance matrix as an input without knowing the data type. Therefore, we extend it to nonparametric correlation measures and apply it to detect high-order interactions with different types of omics data. We first calculate the covariance (or more generally similarity) matrix from the data, and then detect the high-order correlations from the inverse covariance matrix with L₀RICE algorithm. We demonstrate less bias and lower false-discovery rate (FDR) in correlation estimation with L₀ regularization through simulation. We also apply the proposed approach to high-order gene–gene interaction and network construction with several types of omics data. The results indicate that the proposed approach can identify biologically important genes and networks with real data application.

2. Methods

Consider a random variable \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$X = [ X_1 , \ldots , X_n ] ^T$$ \end{document} of n dimensions with X ∼ N(μ, Σ). Let G = (V, E) be a Markov network representing the dependence structure of N(μ, Σ), where the set of vertices V are the n variables in X, and an edge between X_i and X_j in set E indicates their conditional dependence given the rest of the variables, which corresponds to a nonzero entry in the inverse covariance matrix Σ⁻¹. Therefore, learning the structure of graphical models is equivalent to learning the structure of Σ⁻¹. Given m iid random samples \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ { \bf x}_1 , { \bf x}_2 , \ldots , { \bf x}_m \} $$ \end{document} , the sample covariance matrix can be estimated with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}S = \frac { 1 } { m - 1 } \mathop\sum\limits_ { i = 1 } ^m ( { \bf x } _i - \hat { \mu } ) ( { \bf x } _i - \hat { \mu } ) ^T , \qquad { \rm where } \ \hat { \mu } = \frac { 1 } { m - 1 } \mathop\sum\limits_ { i = 1 } ^ { m } { \bf x } _i.\end{align*} \end{document}

Let Φ = Σ⁻¹ be the positive semidefinite matrix, with the notation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Phi \succ 0$$ \end{document} . Given a regularized parameter ρ > 0, the negative of the L₀-penalized MLE will be the error function to minimize for estimating Φ: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {{{ \arg \min} } E ( \Phi ) \atop { \Phi \succ 0}} = \ {{{ \arg \min} } \atop { \Phi \succ 0}}\, \left\{ - \log \det \Phi + tr ( S \Phi ) + { \rho \over 2} \parallel \Phi \parallel_0 \right\} , \tag{1} \end{align*} \end{document}

where ||Φ||₀ = ∑_i,j I(Φ_ij ≠ 0) is the number of nonzero elements in Φ, because I(Φ_ij) = 1 if and only if Φ_ij ≠ 0. If we define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ \frac { 0 } { 0 } = 0$$ \end{document} , we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\parallel \Phi \parallel_0 = \sum\nolimits_ { i , j } I ( \phi_ { ij } \neq 0 ) = \sum\nolimits_ { i , j } \frac { \phi_ { ij } ^2 } { \phi_ { ij } ^2 } $$ \end{document} . Thus, Equation (1) is equivalent to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {{{ \arg \min}\, } E ( \Phi ) \atop { \Phi \succ 0}} = \ {{{ \arg \min} } \atop { \Phi \succ 0}} \left\{ - \log \det \Phi + tr ( S \Phi ) + { \rho \over 2} \mathop\sum\limits_{i , j}{ \phi_{ij}^2 \over \phi_{ij}^2} \right\} , \tag{2} \end{align*} \end{document}

which, in turn, is equivalent to the following system: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \min_ { \Phi \succ 0 } E ( \Phi ) & = { { \arg \min } \atop { \Phi \succ 0 } } \left\{ - \log \det \Phi + tr ( S \Phi ) + \frac { \rho } { 2 } \mathop\sum\limits_ { i , j } \frac { \phi_ { ij } ^2 } { \omega_ { ij } ^2 } \right\} & \tag{3} \\ \Omega & = \Phi , & \tag{4} \end{align*} \end{document}

where Ω = (ω_ij)_n×n. The reason for translating the original problem into the system of Equations (3) and (4) is to reformulate the original problem into a series of convex optimizations.

2.1. ADMM-based L₀RICE methods

The alternating direction method of multipliers (ADMM) aims to solve the convex optimization problem: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \min_{ \Phi} E ( \Phi ) = f ( \Phi ) + g ( \Phi ) ,\end{align*} \end{document}

through solving an equivalent constraint problem: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \min_{ \Phi , \Theta} f ( \Phi ) + g ( \Theta ) , \qquad subject \ to \ \Phi = \Theta.\end{align*} \end{document}

The main idea of ADMM is to split the original large-scale optimization problem into two by first solving for Φ approximately with fixed Θ, and then updating Θ with fixed Φ. The dual variables are also updated accordingly. In such a way, we can decompose a large-scale optimization problem into multiple smaller suboptimization problems, and solve the original problem efficiently.

Equation (3) will be decomposed into two parts when we deal with the L₀ regularized sparse inverse covariance estimation. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} f ( \Phi ) & = - \log \det \Phi + tr ( S \Phi ) , \tag{5}\\ g ( \Theta , \Omega ) & = \frac { \rho } { 2 } \mathop\sum\limits_ { i , j } \frac { \theta_ { ij } ^2 } { \omega_ { ij } ^2 } = \frac { \rho } { 2 } \parallel \frac { \Theta_ { \circ } } { \Omega } \parallel_F^2 \tag{6} \end{align*} \end{document}

where ∘ indicates element-wise division, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Phi \succ 0$$ \end{document} . Both f (Φ) and g(Θ, Ω) are convex functions, when Ω is fixed. Given Ω = Φ in Equation (4), the ADMM form of Equation (3) becomes \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \min_{ \Phi , \Theta , \succ 0} f ( \Phi ) + g ( \Theta , \Omega ) , \qquad subject \ to \ \Theta = \Phi. \tag{7}\end{align*} \end{document}

The augmented Lagrangian is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} L_ { \Omega } ( \Phi , \Theta , U ) & = f ( \Phi ) + g ( \Theta , \Omega ) + U^T ( \Phi - \Theta ) + \frac { 1 } { 2 } \parallel \Phi - \Theta \parallel_F^2 \\ & = f ( \Phi ) + g ( \Theta , \Omega ) + U^T ( \Phi - \Theta ) + \frac { 1 } { 2 } \parallel \Phi - \Theta \parallel_F^2 + \frac { 1 } { 2 } \parallel U \parallel_F^2 - \frac { 1 } { 2 } \parallel U \parallel_F^2 \\ & = f ( \Phi ) + g ( \Theta , \Omega ) + \frac { 1 } { 2 } \parallel \Phi - \Theta + U \parallel_F^2 - \frac { 1 } { 2 } \parallel U \parallel_F^2 , & ( 8 ) \end{align*} \end{document}

where Lagrange multiplier U is the dual variable, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ \frac { 1 } { 2 } \parallel \Phi - \Theta \parallel_2^2$$ \end{document} is the augmented quadratic term for the robustness of the optimization. Without the quadratic term, L_Ω becomes a standard Lagrangian. The augmented Lagrangian and standard Lagrangian are equivalent, since for any feasible Φ, the quadratic term in the augmented Lagrangian is zero. Dropping the \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ \frac { 1 } { 2 } \parallel U \parallel_F^2$$ \end{document} in Equation (8), and combining Equation (8) and Equation (3), we have the following optimization problem: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}\min_ { \Phi , \Theta , \Omega \succ 0 } f ( \Phi ) + g ( \Theta , \Omega ) + \frac { 1 } { 2 } \parallel \Phi - \Theta + U \parallel_F^2 , \qquad subject \ to \ \Omega = \Phi. \tag { 9 } \end{align*} \end{document}

L₀RICE algorithm under the ADMM framework for Equation (9) is as follows:

L ₀ RICE Algorithm:
Given Φ⁰ = Θ⁰ = Ω⁰,
for k = 0, 1, …do,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Phi^{ k + 1 } = { { \arg \min } \atop { \Phi \succ 0 } } = ( f ( \Phi ) + \frac { 1 } { 2 } \parallel \Phi - \Theta^k + U^k \parallel_F^2 )$$ \end{document}
Ω^k+1 = Φ^k+1
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ \Theta^ { k + 1 } = { { \arg \min } \atop { \Theta \succ 0 } } ( g ( \Theta , \Omega^ { k + 1 } ) + \frac { 1 } { 2 } \parallel \Phi^ { k + 1 } - \Theta + U^k \parallel_F^2)$$ \end{document}
U^k⁺¹ = U^k + (Φ^k+1− Θ^k+1)
end for

The proof of the convergence and convergence rate of the general ADMM algorithm can be found in Eckstein and Bertsekas (1992) and the appendix of Goldfarb (2013).

2.1.1. Algorithm for Φ update

Given Θ, the suboptimization problem for Φ can be found analytically. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_ { \Phi } = - \log \det \Phi + tr ( S \Phi ) + \frac { 1 } { 2 } \parallel \Phi - \Theta^k + U^k \parallel_F^2.\end{align*} \end{document}

The first order derivative \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \frac { \partial E_ { \Phi } } { \partial \Phi } = - \Phi^ { - 1 } + S + \Phi - \Theta^k + U^k = 0.\end{align*} \end{document}

We have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}\Phi - \Phi^{- 1} = \Theta^k - U^k - S. \tag{10}\end{align*} \end{document}

Based on Equation (10), the algorithm to find the optimal Φ is as follows (Boyd et al., 2011):

1. Eigenvalue decomposition: Θ^k − U^k − S = PΛP^T, where Λ = diag( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\lambda_1 , \ldots , \lambda_n$$ \end{document} ) and P^TP = PP^T = 1.

2. Form a diagonal matrix D with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} D_ { ii } = \frac { \lambda_i + \sqrt { \lambda_i^2 + 4 } } { 2 } .\end{align*} \end{document}

3. Φ^k+1 = PDP^T.

2.1.2. Algorithm for Θ update

As shown in the L₀RICE algorithm, after we update Ω = Φ^k+1, we need to solve the suboptimal problem for Θ. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_ { \Theta } = g ( \Theta , \Phi^ { k + 1 } ) & + \frac { 1 } { 2 } \parallel \Phi^ { k + 1 } - \Theta + U^k \parallel_F^2 = \frac { \rho } { 2 } \parallel \frac { \Theta_ { \circ } } { \Phi^ { k + 1 } } \parallel_F^2 + \frac { 1 } { 2 } \parallel \Phi^ { k + 1 } - \Theta + U^k \parallel_F^2 , \\ & \frac { \partial E ( \Theta ) } { \partial \Theta } = \frac { \rho \Theta_ { \circ } } { ( \Phi^ { k + 1 } ) ^2_ { \circ } } - ( \Phi^ { k + 1 } - \Theta + U^k ) = 0 , \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$( \Phi^{k + 1} ) ^2_{ \circ} = \Phi^{k + 1}_{ \circ}. \Phi^{k + 1}$$ \end{document} involves the element-wise multiplication. Therefore, we have the Θ update: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Theta^ { k + 1 } = \frac { ( \Phi^ { k + 1 } ) ^2_ { \circ } } { \rho + ( \Phi^ { k + 1 } ) ^2_ { \circ } } \left( \Phi^ { k + 1 } + U^ { k } \right) . \tag { 11 } \end{align*} \end{document}

The L₀ regularization is enforced at the Θ optimization step, where different penalties are imposed with different \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Phi_{ij}^{k + 1}$$ \end{document} value. Since the the regularization term is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ \frac { \rho } { ( \Phi^ { k + 1 } ) _ { ij } ^2 } $$ \end{document} , smaller Φ_ij has a larger penalty, and the penalty goes to ∞ when Φ_ij goes to zero, so Θ_ij becomes zero, as shown in Equation (11) (Liu and Li, 2014).

2.1.3. Stopping criteria and ρ selection

The algorithm will stop when both ||U^k⁺¹ −U^k|| _F = ||Φ^k+1 − Θ^k+1||_F≤ ɛ and ||Θ^k+1 − Θ^k|| _F ≤ ɛ, where ||U^k⁺¹ − U^k|| _F controls the convergence of the dual variable U, while ||Θ^k+1 − Θ^k|| _F measures the convergence of primal variable Θ and Φ. The tuning parameter ρ acts as a cutoff point in Equation (11). When \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\phi_{ij}^2 < \rho , \theta_{ij}$$ \end{document} becomes smaller, but θ_ij will increase and θ_ij → Φ_ij when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\phi_{ij}^2 > \rho$$ \end{document} . Unlike L₁, which reduces all estimated parameters by ρ, L₀ regularization shrinks the small parameters to zero, while keeping the large parameters unaltered. The larger the ρ, the sparser the inverse matrix. Stability selection (Liu et al., 2010; Meinshausen, 2010) is used for determining ρ. It seeks ρ leading to the most stable sets of edges. Given a data X, stability selection first draws q sub samples \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \bf x}_1 , { \bf x}_2 , \ldots , { \bf x}_q$$ \end{document} of size l (1 < l < m), where we set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$l = \frac { 2 } { 3 } m$$ \end{document} in this article. We then estimate one separate network Aⁱ(ρ) for each sub-sample X_i and a fixed regularization parameter ρ. Stability selection then defines the average fraction of disagreements over all edges of the subsampled graphs as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} D ( \rho ) = \frac { \sum_ { j < k } \bar { \phi } _ { jk } ( \rho ) ( 1 - \bar { \phi } _ { jk } ( \rho ) ) } { ( ^q_2 ) } , \quad { \rm where } \quad \bar { \phi } _ { jk } = \frac { 1 } { q } \mathop\sum\limits_ { i = 1 } ^q \phi_ { jk } ^i ( \rho ) .\end{align*} \end{document}

The optimal \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\hat{ \rho}$$ \end{document} is then chosen as: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \hat{ \rho} = \min \{ \rho: \max_{0 < t < \rho}D ( t ) \le \alpha \} , \qquad { \rm where} \ \alpha = 0.05.\end{align*} \end{document}

The final network is constructed using the whole data X and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\hat{\rho}$$ \end{document} . In addition, when 0 ≤|Φ_ij|≤ 1, users can also pick their own ρ based on their problem of study. Our suggestion is to choose a ρ in the range of 0.0025 ≤ ρ ≤ 0.1, corresponding to 0.05 ≤ |Φ_ij| ≤ 0.32.

3. Computational Results

3.1 Simulations

3.1.1. Moderate-size networks

Data were simulated from two popular network models similar to Zhang and Mallick (2013), including a small world and a band 4 network. The covariance matrix Σ for the small world network is a Gaussian AR(1) process with element (covariance for i and j) σ_ij = 0.7^|i-j|. It corresponds to a small world network with bandwidth 1 (brand 1), where each node only connects to its direct neighbors. The element for the band 4 network Φ is defined as Φ_ij = 2I(|i − j| = 0) −0.5I(|i − j| = 1) −0.8I(|i − j| = 2) + 0.2I(|i − j| = 3) + 0.3I(|i − j| = 4), where I(.) is an indicator function, so the covariance matrix Σ = Φ⁻¹ is a Gaussian AR(4) process. Note that roughly half of the edges had the partial correlation coefficient of 0.1 or 0.15 in Φ. Therefore, it is harder to detect the structure of the band 4 than the small world network. The mean \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\mu^T = [ 2 , \ldots , 2 ]$$ \end{document} , and the normal data is generated with X ∼ N(μ, Σ). The sample size of the data is n = 100, with different dimensions m = 100, 200, and 500. The optimal ρ is determined by stable selection. Stability selection seeks the ρ leading to the most stable set of edges through subsampling. The simulation is performed 100 times, and the performance of L₀RICE is compared with the L₁RICE program in C downloaded online (Liu et al., 2009). The performance is measured by the area under the ROC curve (AUC), the false positive rate (FPR), and the false negative rate (FNR). FPR is defined as the rate the models predict an edge when there is no edge, while FNR is the rate of predicting no edge when there is an edge on the network.

The computational results on Table 1 demonstrate that L₀RICE performs better than L₁RICE across most categories and different network sizes, especially with the more difficult band 4 network. While both L₁ and L₀ perform very well and similarly for band 1 network with large off-diagonal values, L₀RICE performs substantially better with the more difficult band 4 network, particularly in controlling the FPR. L₀RICE achieves an AUC of 0.789, 0.784, and 0.769, while L₁RICE only reaches an AUC of 0.73, 0.738, and 0.735 with network sizes of 100, 200, and 500, respectively. In addition, L₀ has smaller FPR and FNR values compared to L₁RICE for the same network size. The FPRs with L₀RICE are substantially smaller than L₁RICE: the FPRs for L₀ are 8.13%, 7.06%, and 5.33%, while the FPRs for L₁ are 13.3%, 11.1%, and 10.3% with network sizes of m = 100, 200, and 500, respectively.

Table 1.

Performance Comparison over 100 Simulations and Different Sample Size (m = 100, 200, 500), Where Values in the Parenthesis are the Standard Deviations

	Band 1			Band 4
L₀RICE	AUC	FPR (%)	FNR (%)	AUC	FPR (%)	FNR (%)
m = 100	0.999 (±0.002)	0.02 (±0.12)	0.21 (±0.48)	0.789 (±0.008)	8.13 (±0.57)	34.0 (±1.40)
200	0.998 (±0.002)	0.076 (±0.20)	0.28 (±0.42)	0.784 (±0.008)	7.06 (±1.38)	36.1 (±1.05)
500	0.997 (±0.002)	0.062 (±0.05)	0.65 (±0.38)	0.769 (±0.003)	5.33 (±0.073)	40.9 (±0.62)

L₁RICE	AUC	FPR (%)	FNR (%)	AUC	FPR (%)	FNR (%)
m = 100	0.99 (±0.003)	1.52 (±0.22)	0.33 (±0.67)	0.73 (±0.006)	13.3 (±0.45)	41.3 (±1.19)
200	0.99 (±0.003)	1.21 (±0.07)	0.45 (±0.53)	0.738 (±0.004)	11.1 (±0.23)	41.3 (±0.94)
500	0.99 (±0.002)	1.08 (±0.09)	0.81 (±0.44)	0.735 (±0.003)	10.3 (±0.14)	42.1 (±0.52)

3.1.2. Large-size networks

Another set of simulations with a larger number of nodes (m = 1000, 1500, 2000, and 3000) and various sample sizes (n = 100, 200, 500) is performed with the range-dependent network model, motivated by the structure of protein–protein interaction (PPI) networks in biology (Higham and Przulj, 2011). In this model, the links between nodes i and j are inserted with probability 0.9 × 0.2^|i−^j|−1, then the Φ is defined as the Laplacian matrix (Belkin and Niyogi, 2003). A small value 0.1 is added to the diagonal to make sure that Φ is positive definite. The covariance matrix is Σ = Φ⁻¹. Depending on ρ and network size, the computational time for L₀RICE (MATLAB) is 3–12 min, which is slightly less efficient than graphical lasso in C (35–3 min). However, graphical lasso is less stable than L₀RICE. It does not always converge particularly for large-scale networks. The true and estimated Φs are then dichotomized into adjacency matrices with 0/1 entries, where 0 indicates no connection and 1 represents a nonzero correlation. The performance is then evaluated by comparing the estimated and the true structures of adjacency matrices with ROC curves in Figure 1. The optimal ρs for L₀RICE are 0.05, 0.05, 0.04, and 0.03 and for L₁RICE are 0.25, 0.2, 0.15, and 0.14 for the sample sizes of 1000, 1500, 2000, and 3000, respectively.

FIG. 1.

Performance comparison of L0RICE and L1RICE with different sample sizes and numbers of nodes.

Figure 1 shows that L₀RICE outperformed L₁RICE under different sample sizes and numbers of nodes (variables). With n = 100 and m = 1000, the AUC of L₀RICE is 0.928, while it is 0.906 for L₁RICE. In addition, the AUCs for n = 200 and m = 1500 are 0.932 and 0.875 for L₀ and L₁, respectively. Finally, the AUCs of L₀RICE are 0.970 and 0.969 for m = 2000, and 3000, respectively, while the AUCs of L₁RICE are 0.912 and 0.905 with the same number of nodes, indicating both L₀RICE and L₁RICE are robust and their performance does not deteriorate much when the number of nodes increases from 2000 to 3000.

3.1.3. Simulations with count data

Next generation sequencing (NGS)–based omics data are positive counts. They are not normally distributed and the variance is associated with their mean count. One approach to deal with such data is to compute the distribution-free nonparametric correlations for different data types, including Kendall's τ-rank correlation and Spearman's rank-order correlation (r) (Liu et al., 2012). Those correlations are only based on rank information and are more robust than the parametric approach. They can be applied to any data type. After we have τ and r, we can construct the covariance (similarity) matrix S = (s_ij) = (sin(πr/2)), or = (sin(πτ/2)). To simulate data that mimics actual NGS data, we generated count data from network structures similar to Liu et al. (2014). The count x_ij has the Poisson distribution Pois(τ_ij) with mean τ_ij, where {log(τ_ij),i < j} has the normal distribution N(μ, Σ) with mean μ and covariance matrix Σ. The network structure is represented in Σ and the sparse inverse matrix Φ = Σ⁻¹. We simulate the more difficult band 4 network with Φ defined the same way as in previous simulation. The generated count data have sample size 100 and three different network sizes: 100, 200, and 500. The partial correlations for some nodes can be small in such networks, so it is a difficult problem for structure detection. The tuning parameter ρ is again determined with a stability selection approach. The pairwise correlation is determined with Spearman's rank correlation. The computational results with L₀RICE and L₁RICE are reported in Table 2.

Table 2.

Performance Comparison over 100 Simulations and Different Sample Sizes (m = 100, 200, 500) with Simulated Count Data, Where Values in Parentheses Are the Standard Deviations

	L₀RICE			L₁RICE
L₀RICE	AUC	FPR (%)	FNR (%)	AUC	FPR (%)	FNR (%)
m = 100	0.742 (±0.009)	5.76 (±0.79)	46.1 (±1.73)	0.714 (±0.006)	10.40 (±0.38)	47.0 (±1.12)
200	0.743 (±0.006)	5.99 (±0.513)	45.4 (±0.513)	0.728 (±0.004)	7.49 (±0.29)	47.3 (±0.812)
500	0.753 (±0.003)	4.16 (±0.053)	45.1 (±0.531)	0.726 (±0.008)	10.9 (±0.525)	46.2 (±1.24)

Both L₀RICE and L₁RICE still perform reasonably well, and L₀RICE outperforms L₁RICE regardless of the number of nodes. In particular, the false positive rates (FPR) are 5.76%, 5.99%, and 4.16% with L₀RICE, while the FPRs with L₁RICE are 10.40%, 7.49%, and 10.9% for node sizes of 100, 200, and 500, respectively, indicating L₀RICE has lower false positive rates and clearly outperforms L₁RICE substantially.

3.2. Analysis of a metagenomic dataset

The metagenomic dataset was collected by our group (Tong et al., 2013). There are in total 299 samples from 76 inflammatory bowel disease (IBD) subjects, including Crohn's disease (CD), and 223 non-IBD subjects. Out of the 299 samples, 285 (72 IBDs and 213 controls) have metagenomic count data with a total of 5648 OTUs available. We merged OTUs at the species level, and then species with high abundances were selected for further study after discarding those species with less than two reads on average. This leads to 173 remaining features. Data were then normalized by the total count of each individual. The class y is a binary vector with y_k = 0 for non-IBD, and y_k = 1 for IBD samples. Because of the compositional nature of metagenomic data, Pearson correlation is not appropriate (Kurtz et al., 2015). Therefore, nonparametric Spearman's rank correlation (r) is used to measure the pairwise associations among species and between disease status and each species. The partial correlation is then estimated from the inverse of S = (s_ij) = (sin(πr/2)). The optimal tuning parameter is ρ* = 0.1 by stability selection. The network for interactions among species and direct and indirect associations between disease status (IBD/nonIBD) and species are presented in Figure 2.

FIG. 2.

Network for high-order interactions among species and direct and indirect interactions between IBD status and species: red, negative association; black, positive association.

Figure 2 shows that IBD is negatively (in red) associated with Faecalibacteriumprausnitzii and positively associated with Dialisterinvisus. Through Faecalibacteriumprausnitzii, IBD also has indirect negative correlations with Coprococcuseutactus, Eubacteriumrectale, and Ruminococcusobeum, and an indirect positive association with Ruminococcusgnavus. While Dialisterinvisus is less studied in IBD, Faecalibacteriumprausnitzii is a well-known beneficial human commensal anti-inflammatory bacterium (Laval et al., 2015; Cao et al., 2014; Stoll et al., 2014). The bacterial count of Faecalibacterium prausnitzii in IBD patients is 348, which is significantly lower than that in control (555), indicating the abundance of Faecalibacterium prausnitzii was decreased in IBD patients compared with healthy controls. In addition, the impacts of Faecalibacteriumprausnitzii on IL-23/Th17/IL-17 pathway markers have been determined in human monocytes and a rat model of colitis. It has been shown that Faecalibacteriumprausnitzii increases plasma anti-Th17 cytokines (IL-10 and IL-12) and suppresses IL-17 levels in both plasma and colonic mucosa, and protects the colon mucosa against the development of IBD by its metabolites, suggesting a promising potential for the use of Faecalibacteriumprausnitzii in the treatment of IBD (Zhang et al., 2014; Varela et al., 2013). Even though there are few studies looking into the interactions of multiple species, the direct and indirect interactions between Faecalibacteriumprausnitzii and other species on the network must be studied to explore their effects on IBD systematically.

3.3. microRNA-mRNA network with TCGA NGS data

The standard primary treatment for ovarian cancer is surgery followed by chemotherapy. The rationale for surgery as the primary treatment is to remove as much tumor as possible. If tumor nodules have invaded vital organs, surgeons may not be able to remove them without compromising the patient's life. It has been shown that leaving tumor nodules larger than 1 cm (defined as suboptimal debulking) is associated with reduced chemosensitivity and poor survival. If the tumor cannot be effectively removed by surgery, it will be better for a patient to be first treated with chemotherapy to partially shrink the tumor and then have surgery. However, the doctor does not know if there is a suboptimal debulking until he performs the surgery. Biomarkers may help physicians decide which patients should undergo surgery and who should be treated with chemotherapy first. We are trying to identify mRNA and microRNA markers with integrated analysis of microRNA and mRNA expression from RNA-seq data in TCGA with 367 samples, 13911 genes, and 342 microRNAs. There are 294 suboptimal and 73 optimal debulking samples. The data was normalized with log 2 transformation and quantile normalization, and then 18 microRNAs and 528 genes were identified using t-tests with P < 0.01. The pairwise correlations between microRNA and mRNA and among mRNA genes were then calculated with Spearman's rank correlation. The inverse of the pairwise correlation matrix represents the partial correlations between microRNAs and their target mRNAs. The optimal ρ is set to 0.04 with stability selection. We also incorporate the microRNA target prediction information into the analysis. Two databases, TargetScan and miRBase, were used to validate the true targets. Only the microRNA–mRNA pairs presented in both our network and the target databases were selected as the edges of the final network. Subnetwork associated with suboptimal debulking in ovarian cancer is shown in Figure 3.

FIG. 3.

Suboptimal debulking associated subnetwork for microRNA and mRNA interactions with TCGA ovarian cancer NGS data.

Figure 3 represents suboptimal debulking-associated interactions between mcroRNAs and mRNAs. Fourteen (14) mRNA genes (in purple) on the subnetwork are connected with each other and are the targets of six microRNAs. Those 14 mRNAs are on the ECM-receptor interaction and focal adhesion KEGG pathways with the FDR q values of 0.007 and 0.05, respectively. Enrichment analysis with STRING demonstrates that those 14 mRNAs involve several biological processes including extracellular matrix organization (q < 1e − 5) and skeletal system development (q < 1e − 4). Five of 14 mRNA genes including VCAN, COL11A1, COL5A2, POSTN, and THBS2 also appeared in the 10 gene signatures associated with metastasis, recurrence, and poor survival of ovarian cancer (Cheon and Orsulic, 2014). Six microRNAs regulated the 14 mRNAs and related pathways including miR-202, miR-493, miR-605, miR-758, miR-377, and miR-22. Among the six microRNAs, the expressions of miR-202, miR-22, and miR-493 are known to be associated with ovarian cancer, while miR-605, miR-758m, and miR-377 are associated with other types of cancers (Wan et al., 2014; Gadducci et al., 2014). The identified microTRNAs and mRNAs provide candidate biomarkers for further exploring their associations with suboptimal debulking.

4. Conclusions and Remarks

We propose a L₀-based sparse inverse covariance estimation for network structure learning with the efficient alternating direction method of multipliers (ADMM), and extended the method to various data types with the nonparametric rank correlation coefficients. The developed method approximates L₀ through solving a series of L₂ convex optimizations. It is very efficient, even when the dimension of the matrix is large. The computational time is similar between L₁ and L₀. We evaluated the performance of the method with simulations and multiomics data. The proposed method outperformed L₁ sparse inverse covariance estimation in almost all of the simulations. Application to metagenomic count data identified IBD associated species efficiently. When applied to multiomics data integration, biologically important genes and subnetworks correlated with optimal debulking were selected. The method learns the high-order correlation structure with lower FPR.

Footnotes

Acknowledgments

This work was partially supported by the National Science Foundation (NSF) grants DMS-222381 (Z.L.) and DMS-1220772 (S.L.). The funders had no role in the preparation of the article.

Author Disclosure Statement

The authors declare that no competing financial interests exist.

References

Banerjee

, Ghaoui

L.E.

, and d'Aspremont

2008. Model selection through sparse maximum likelihood estimation. J. Mach. Learn. Res., 9, 485176.

Belkin

, and Niyogi

2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396.

Boyd

, Parikh

, Chu

, et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3, 1172.

Cao

, Shen

, and Ran

Z.H.

2014. Association between Faecalibacterium prausnitzii reduction and inammatory bowel disease: A meta-analysis and systematic review of the literature. Gastroenterol. Res. Pract., 2014, 872725.

Chaibub

, Keller

, Attie

, and Yandell

2010. Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann. Appl. Stat., 4, 320179.

Chen

D.-Q.

, and Cheng

L.-Z.

2014. Fast linearized alternating direction minimization algorithm with adaptive parameter selection for multiplicative noise removal. J. Comput. Appl. Math. Arch., 257, 29–45.

Cheon

D.J.

, and Orsulic

2014. Ten-gene biomarker panel: A new hope for ovarian cancer?. Biomark Med. 8, 523–526.

Eckstein

, and Bertsekas

D.P.

1992. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathe. Program., 55, 293–318.

Fan

, and Liu

2013. Statistical analysis of big data on pharmacogenomics. Adv. Drug Deliv. Rev., 65, 987–1000.

10.

Friedman

, Hastie

, and Tibshirani

2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.

11.

Gadducci

, Sergiampietri

, Lanfredini

, and Guiggi

2014. Micro-RNAs and ovarian cancer: The state of art and perspectives of clinical research. Gynecol. Endocrinol., 30, 266–271.

12.

Goldfarb

, Ma

, and Scheinberg

2013. Fast alternating linearization methods for minimizing the sum of two convex functions. Mathe. Program., 141, 349–382.

13.

Goldstein

, ÓDonoghue

, Setzer

, and Baraniuk

2014. Fast alternating direction optimization methods. SIAM J. Imag. Sci., 7, 1588–1623.

14.

Hestenes

M.R.

1969. Multiplier and gradient methods. J. Optim. Theory Applica., 4, 302–320.

15.

Higham

D.J.

, and Przulj

2011. Random graph models and their application to proteinprotein interaction networks. In Balding

, Girolami

, and Stumpf

, eds. Handbook of Statistical Systems Biology. Wiley, New York.

16.

Hsieh

C.-J.

, Sustik

, Dhillon

, and Ravikumar

2014. QUIC: Quadratic approximation for sparse inverse covariance estimation. J. Mach. Learn. Res., 15, 2911–2947.

17.

Johnson

C.C.

, Jalali

, and Ravikumar

P.D.

2012. High-dimensional sparse inverse covariance estimation using greedy methods. JMLR W&CP, 22, 574–582.

18.

Kiiveri

H.T.

2011. Multivariate analysis of microarray data: Differential expression and differential connection. BMC Bioinform. 12, 42.

19.

Kim

, Jiang

, Teng

S.M.

, et al. 2012. Using biologically interrelated experiments to identify pathway genes in Arabidopsis. Bioinformatics, 28, 815–822.

20.

Kurtz

Z.D.

, Müller

C.L.

, Miraldi

E.R.

, et al. 2015. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput. Biol. 11, e1004226.

21.

Laval

, Martin

, Natividad

, et al. 2015. Lactobacillus rhamnosus CNCM I-3690 and the commensal bacterium Faecalibacterium prausnitzii A2-165 exhibit similar protective effects to induced barrier hyper-permeability in mice. Gut Microbes, 6, 1–9.

22.

Lin

, Foster

D.P.

, and Ungar

L.H.

2010. A risk ratio comparison of l0 and l1 penalized regressions. Technical Report, University of Pennsylvania. http://gosset.wharton.upenn.edu/research/risk ratio.pdf

23.

Liu

, Han

, Yuan

, et al. 2012. High dimensional semiparametric gaussian copula graphical models. Technical Report, Johns Hopkins University, 2012.

24.

Liu

, Roeder

, and Wasserman

2010. Stability approach to regularization selection for high dimensional graphical models. Adv. Neural Inform. Process. Syst., 24, 1432–1440.

25.

Liu

, Ji

, and Ye

2009. SLEP: Sparse learning with efficient projections. Arizona State University. www.public.asu.edu/jye02/Software/SLEP

26.

Liu

, and Li

2014. Efficient regularized regression for variable selection with L0 penalty. http://arxiv.org/abs/1407.7508

27.

Liu

, Sun

, Braun

, et al. 2014. Multilevel regularized regression for simultaneous taxa selection and network construction with metagenomic count data. Bioinformatics. 31, 1067–1074.

28.

Meinshausen

, and Bählmann

2010. Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol., 72, 417–473.

29.

Ouyang

, He

, Tran

, and Gray

2013. Stochastic alternating direction method of multipliers. JMLR W&CP, 28, 80.

30.

Scheinberg

, Ma

, and Goldfarb

2010. Sparse inverse covariance selection via alternating linearization methods, 2101–2109. In Advances in Neural Information Processing Systems, Vol. 23. MIT Press, Red Hook, NY.

31.

Stoll

M.L.

, Kumar

, Morrow

C.D.

, et al. 2014. Altered microbiota associated with abnormal humoral immune responses to 16 commensal organisms in enthesitis-related arthritis. Arthritis Res. Ther., 16, 486.

32.

Tong

, et al. 2013. A modular organization of the human intestinal mucosal microbiota and its association with inammatory bowel disease. PLoS One, 8, e80702.

33.

Varela

, Manichanh

, Gallart

, et al. 2013. Colonisation by Faecalibacterium prausnitzii and maintenance of clinical remission in patients with ulcerative colitis. Aliment Pharmacol. Ther., 38, 151–161.

34.

Wan

W.N.

, Zhang

Y.Q.

, Wang

X.M.

, et al. 2014. Down-regulated miR-22 as predictive biomarkers for prognosis of epithelial ovarian cancer. Diagn. Pathol., 9, 178.

35.

Yuan

, and Lin

2007. Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19.

36.

Yuan

X.M.

2012. Alternating direction method of multipliers for covariance selection models. J. Sci. Comput., 51, 261–273.

37.

Zhang

, and Mallick

B.K.

2013. Inferring gene networks from discrete expression data. Biostatistics. 14, 708–722.

38.

Zhang

, Qiu

, Zhang

, et al. 2014. Faecalibacterium prausnitzii inhibits interleukin-17 to ameliorate colorectal colitis in rats. PLoS One, 9, e109146.

39.

Zou

2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc., 101, 1418–1429.

Sparse Inverse Covariance Estimation with L 0 Penalty for Network Construction with Omics Data

Abstract

Abstract

1. Introduction

2. Methods

2.1. ADMM-based L0RICE methods

2.1.1. Algorithm for Φ update

2.1.2. Algorithm for Θ update

2.1.3. Stopping criteria and ρ selection

3. Computational Results

3.1 Simulations

3.1.1. Moderate-size networks

3.1.2. Large-size networks

3.1.3. Simulations with count data

3.2. Analysis of a metagenomic dataset

3.3. microRNA-mRNA network with TCGA NGS data

4. Conclusions and Remarks

Footnotes

Acknowledgments

Author Disclosure Statement

References

2.1. ADMM-based L₀RICE methods