Constructing coexpression and association networks with omics data is crucial for studying gene–gene interactions and underlying biological mechanisms. In recent years, learning the structure of a Gaussian graphical model from high-dimensional data using L1 penalty has been well-studied and many applications in bioinformatics and computational biology have been found. However, besides the problem of biased estimators with LASSO, L1 does not always choose the true model consistently. Based on our previous work with L0 regularized regression (Liu and Li, 2014), we propose an L0 regularized sparse inverse covariance estimation (L0RICE) for structure learning with the efficient alternating direction (AD) method. The proposed method is robust and has the oracle property. The proposed method is applied to omics data including data, from next-generation sequencing technologies. Novel procedures for network construction and high-order gene–gene interaction detection with omics data are developed. Results from simulation and real omics data analysis indicate that L0 regularized structure learning can identify high-order correlation structure with lower false positive rate and outperform graphical lasso by a large margin.
1. Introduction
With advances in biotechnologies, massive omics data are increasingly available in the public domain. How to mine those big data and extract useful information efficiently poses significant computational challenges. Genetic networks constructed from omics data are crucial for studying gene–gene interactions and exploring the functional mechanisms of genes. Even though there are many methods proposed for microarray and various omics data (Liu et al., 2014), there are fewer methods for large-scale network construction with integrated omics data including next generation sequencing (NGS) data. One main reason is that different types of omics data have different formats, and they are not always normally distributed. For instance, NGS data are counts of short reads, which follow Poisson or negative binomial, but not normal distributions. Many standard approaches can not be applied directly. Sparse inverse correlation matrix with L1 penalty for Gaussian graphical models was originally proposed for normally distributed data and has found many applications for studying interactions in coexpressopn networks, brain connectivity, and social networks (Fan and Liu, 2013; Kiiveri, 2011; Kim et al., 2012; Chaibub et al., 2010).
The inverse of a covariance matrix is also referred to as a concentration or a precision matrix. Mathematically, given m independent random samples \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ { \bf x}_1 , { \bf x}_2 , \ldots , { \bf x}_m \} $$
\end{document} from an n-dimensional Gaussian distribution with xi ∼ N(μ, Σ), our goal is to estimate the inverse covariance matrix Σ−1. The nonzero elements of Σ−1 correspond to conditional dependence between two random variables. When the sample size m ≪ n, the number of features, the estimated covariance matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\hat{ \sigma}$$
\end{document} is not guaranteed to be positive definite anymore, and the inverse of the covariance matrix cannot be estimated directly. Regularized methods with appropriate penalty must be used to estimate the sparse inverse Φ = Σ−1. Theoretically, it is appealing to regularize the model with L0 penalty |Φ|0, where |Φ|0 represents the number of nonzero elements in Ω. However, because of the computationally challenging and lack of efficient method for L0 optimization, the popular approach is to optimize the convex relaxation |Ω|1, where |Φ|1 indicates the element-wise L1 norm of the matrix (Banerjee et al., 2008; Friedman et al., 2008; Yuan and Lin, 2007). Several improved algorithms have been developed for L1 regulation, including alternating linearization (Scheinberg et al., 2010), greedy methods (Johnson et al., 2012), and quadratic approximation (Hsieh et al., 2014). However, L1 regularized methods are asymptotically biased, and they do not always choose the true model consistently (Zou, 2006). It has been shown that L1 optimization can perform infinitely worse than L0 in some cases, and optimal L1 solutions are usually inferior to the results of L0 from classical stepwise regression (Lin et al., 2010). In addition, L1 with cross-validation tends to include too many spurious variables.
Alternating direction method of multipliers (ADMM) was originally proposed by Hestenes (1969). It has attracted a lot of attentions and found many applications in large-scale optimization recently (Boyd et al., 2011; Scheinberg et al., 2010; Yuan 2012; Ouyang et al., 2013; Goldstein et al., 2014; Chen and Cheng, 2014). In this article, we propose an L0 regularized sparse inverse covariance estimation (L0RICE) for structure learning. The proposed approach directly solves the original L0 optimization under the framework of ADMM. The regularized parameter ρ will be chosen with the stability selection criteria (Liu et al., 2010; Liu et al., 2014; Meinshausen and Bählmann, 2010) and is quite robust with different values of choices. Even though the sparse inverse covariance matrix was originally designed for normally distributed data, it only takes the sample covariance matrix as an input without knowing the data type. Therefore, we extend it to nonparametric correlation measures and apply it to detect high-order interactions with different types of omics data. We first calculate the covariance (or more generally similarity) matrix from the data, and then detect the high-order correlations from the inverse covariance matrix with L0RICE algorithm. We demonstrate less bias and lower false-discovery rate (FDR) in correlation estimation with L0 regularization through simulation. We also apply the proposed approach to high-order gene–gene interaction and network construction with several types of omics data. The results indicate that the proposed approach can identify biologically important genes and networks with real data application.
2. Methods
Consider a random variable \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$X = [ X_1 , \ldots , X_n ] ^T$$
\end{document} of n dimensions with X ∼ N(μ, Σ). Let G = (V, E) be a Markov network representing the dependence structure of N(μ, Σ), where the set of vertices V are the n variables in X, and an edge between Xi and Xj in set E indicates their conditional dependence given the rest of the variables, which corresponds to a nonzero entry in the inverse covariance matrix Σ−1. Therefore, learning the structure of graphical models is equivalent to learning the structure of Σ−1. Given m iid random samples \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{ { \bf x}_1 , { \bf x}_2 , \ldots , { \bf x}_m \} $$
\end{document}, the sample covariance matrix can be estimated with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}S = \frac { 1 } { m - 1 } \mathop\sum\limits_ { i = 1 } ^m ( { \bf x } _i - \hat { \mu } ) ( { \bf x } _i - \hat { \mu } ) ^T , \qquad { \rm where } \ \hat { \mu } = \frac { 1 } { m - 1 } \mathop\sum\limits_ { i = 1 } ^ { m } { \bf x } _i.\end{align*}
\end{document}
Let Φ = Σ−1 be the positive semidefinite matrix, with the notation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Phi \succ 0$$
\end{document}. Given a regularized parameter ρ > 0, the negative of the L0-penalized MLE will be the error function to minimize for estimating Φ:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{{{ \arg \min} } E ( \Phi ) \atop { \Phi \succ 0}} = \ {{{
\arg \min} } \atop { \Phi \succ 0}}\, \left\{ - \log \det
\Phi + tr ( S \Phi ) + { \rho \over 2}
\parallel \Phi \parallel_0 \right\} , \tag{1}
\end{align*}
\end{document}
where ||Φ||0 = ∑i,jI(Φij ≠ 0) is the number of nonzero elements in Φ, because I(Φij) = 1 if and only if Φij ≠ 0. If we define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ \frac { 0 } { 0 } = 0$$
\end{document}, we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\parallel \Phi \parallel_0 = \sum\nolimits_ { i , j } I ( \phi_ { ij } \neq 0 ) = \sum\nolimits_ { i , j } \frac { \phi_ { ij } ^2 } { \phi_ { ij } ^2 } $$
\end{document}. Thus, Equation (1) is equivalent to
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{{{ \arg \min}\, } E ( \Phi ) \atop { \Phi \succ 0}} = \ {{{
\arg \min} } \atop { \Phi \succ 0}} \left\{ - \log \det \Phi +
tr ( S \Phi ) + { \rho \over 2} \mathop\sum\limits_{i , j}{
\phi_{ij}^2 \over \phi_{ij}^2} \right\} , \tag{2}
\end{align*}
\end{document}
which, in turn, is equivalent to the following system:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\min_ { \Phi \succ 0 } E ( \Phi ) & = { { \arg \min } \atop {
\Phi \succ 0 } } \left\{ - \log \det \Phi + tr ( S \Phi ) + \frac
{ \rho } { 2 } \mathop\sum\limits_ { i , j } \frac
{ \phi_ { ij } ^2 } { \omega_ { ij } ^2 } \right\} & \tag{3} \\
\Omega & = \Phi , & \tag{4} \end{align*}
\end{document}
where Ω = (ωij)n×n. The reason for translating the original problem into the system of Equations (3) and (4) is to reformulate the original problem into a series of convex optimizations.
2.1. ADMM-based L0RICE methods
The alternating direction method of multipliers (ADMM) aims to solve the convex optimization problem:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\min_{ \Phi} E ( \Phi ) = f ( \Phi ) + g ( \Phi ) ,\end{align*}
\end{document}
through solving an equivalent constraint problem:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\min_{ \Phi , \Theta} f ( \Phi ) + g ( \Theta ) , \qquad subject \
to \ \Phi = \Theta.\end{align*}
\end{document}
The main idea of ADMM is to split the original large-scale optimization problem into two by first solving for Φ approximately with fixed Θ, and then updating Θ with fixed Φ. The dual variables are also updated accordingly. In such a way, we can decompose a large-scale optimization problem into multiple smaller suboptimization problems, and solve the original problem efficiently.
Equation (3) will be decomposed into two parts when we deal with the L0 regularized sparse inverse covariance estimation.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
f ( \Phi ) & = - \log \det \Phi + tr ( S \Phi ) , \tag{5}\\
g ( \Theta , \Omega ) & = \frac { \rho } { 2 }
\mathop\sum\limits_ { i , j } \frac { \theta_ { ij } ^2 } {
\omega_ { ij } ^2 } = \frac { \rho } { 2 } \parallel \frac {
\Theta_ { \circ } } { \Omega }
\parallel_F^2 \tag{6}
\end{align*}
\end{document}
where ∘ indicates element-wise division, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Phi \succ 0$$
\end{document}. Both f (Φ) and g(Θ, Ω) are convex functions, when Ω is fixed. Given Ω = Φ in Equation (4), the ADMM form of Equation (3) becomes
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\min_{ \Phi , \Theta , \succ 0} f ( \Phi ) + g ( \Theta , \Omega )
, \qquad subject \ to \ \Theta = \Phi. \tag{7}\end{align*}
\end{document}
where Lagrange multiplier U is the dual variable, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ \frac { 1 } { 2 } \parallel \Phi - \Theta \parallel_2^2$$
\end{document} is the augmented quadratic term for the robustness of the optimization. Without the quadratic term, LΩ becomes a standard Lagrangian. The augmented Lagrangian and standard Lagrangian are equivalent, since for any feasible Φ, the quadratic term in the augmented Lagrangian is zero. Dropping the \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ \frac { 1 } { 2 } \parallel U \parallel_F^2$$
\end{document} in Equation (8), and combining Equation (8) and Equation (3), we have the following optimization problem:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\min_ { \Phi , \Theta , \Omega \succ 0 } f ( \Phi )
+ g ( \Theta , \Omega ) + \frac { 1 } { 2 } \parallel \Phi -
\Theta + U \parallel_F^2 , \qquad subject \ to \ \Omega = \Phi.
\tag { 9 } \end{align*}
\end{document}
L0RICE algorithm under the ADMM framework for Equation (9) is as follows:
The proof of the convergence and convergence rate of the general ADMM algorithm can be found in Eckstein and Bertsekas (1992) and the appendix of Goldfarb (2013).
2.1.1. Algorithm for Φ update
Given Θ, the suboptimization problem for Φ can be found analytically. Let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
E_ { \Phi } = - \log \det \Phi + tr ( S \Phi ) + \frac { 1 } { 2
} \parallel \Phi - \Theta^k + U^k \parallel_F^2.\end{align*}
\end{document}
The first order derivative
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\frac { \partial E_ { \Phi } } { \partial \Phi } = - \Phi^ { - 1
} + S + \Phi - \Theta^k + U^k = 0.\end{align*}
\end{document}
We have
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\Phi - \Phi^{- 1} = \Theta^k - U^k - S. \tag{10}\end{align*}
\end{document}
Based on Equation (10), the algorithm to find the optimal Φ is as follows (Boyd et al., 2011):
1. Eigenvalue decomposition: Θk − Uk − S = PΛPT, where Λ = diag(\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\lambda_1 , \ldots , \lambda_n$$
\end{document}) and PTP = PPT = 1.
2. Form a diagonal matrix D with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
D_ { ii } = \frac { \lambda_i + \sqrt { \lambda_i^2 + 4 } } { 2 }
.\end{align*}
\end{document}
3. Φk+1 = PDPT.
2.1.2. Algorithm for Θ update
As shown in the L0RICE algorithm, after we update Ω = Φk+1, we need to solve the suboptimal problem for Θ. Let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
E_ { \Theta } = g ( \Theta , \Phi^ { k + 1 } ) & + \frac { 1 } {
2 } \parallel \Phi^ { k + 1 } - \Theta + U^k \parallel_F^2 = \frac
{ \rho } { 2 } \parallel \frac { \Theta_ { \circ } } { \Phi^ { k
+ 1 } } \parallel_F^2 + \frac { 1 } { 2 } \parallel \Phi^ { k + 1
} - \Theta + U^k \parallel_F^2 , \\ & \frac { \partial E ( \Theta
) } { \partial \Theta } = \frac { \rho \Theta_ { \circ } } { (
\Phi^ { k + 1 } ) ^2_ { \circ } } - ( \Phi^ { k + 1 } - \Theta +
U^k ) = 0 , \end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( \Phi^{k + 1} ) ^2_{ \circ} = \Phi^{k + 1}_{ \circ}. \Phi^{k + 1}$$
\end{document} involves the element-wise multiplication. Therefore, we have the Θ update:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\Theta^ { k + 1 } = \frac { ( \Phi^ { k + 1 } ) ^2_ { \circ } } {
\rho + ( \Phi^ { k + 1 } ) ^2_ { \circ } } \left( \Phi^ { k + 1 }
+ U^ { k } \right) . \tag { 11 } \end{align*}
\end{document}
The L0 regularization is enforced at the Θ optimization step, where different penalties are imposed with different \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\Phi_{ij}^{k + 1}$$
\end{document} value. Since the the regularization term is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ \frac { \rho } { ( \Phi^ { k + 1 } ) _ { ij } ^2 } $$
\end{document}, smaller Φij has a larger penalty, and the penalty goes to ∞ when Φij goes to zero, so Θij becomes zero, as shown in Equation (11) (Liu and Li, 2014).
2.1.3. Stopping criteria and ρ selection
The algorithm will stop when both ||Uk+1 −Uk|| F = ||Φk+1 − Θk+1||F≤ ɛ and ||Θk+1 − Θk|| F ≤ ɛ, where ||Uk+1 − Uk|| F controls the convergence of the dual variable U, while ||Θk+1 − Θk|| F measures the convergence of primal variable Θ and Φ. The tuning parameter ρ acts as a cutoff point in Equation (11). When \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\phi_{ij}^2 < \rho , \theta_{ij}$$
\end{document} becomes smaller, but θij will increase and θij → Φij when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\phi_{ij}^2 > \rho$$
\end{document}. Unlike L1, which reduces all estimated parameters by ρ, L0 regularization shrinks the small parameters to zero, while keeping the large parameters unaltered. The larger the ρ, the sparser the inverse matrix. Stability selection (Liu et al., 2010; Meinshausen, 2010) is used for determining ρ. It seeks ρ leading to the most stable sets of edges. Given a data X, stability selection first draws q sub samples \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \bf x}_1 , { \bf x}_2 , \ldots , { \bf x}_q$$
\end{document} of size l (1 < l < m), where we set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$l = \frac { 2 } { 3 } m$$
\end{document} in this article. We then estimate one separate network Ai(ρ) for each sub-sample Xi and a fixed regularization parameter ρ. Stability selection then defines the average fraction of disagreements over all edges of the subsampled graphs as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
D ( \rho ) = \frac { \sum_ { j < k } \bar { \phi } _ { jk } ( \rho
) ( 1 - \bar { \phi } _ { jk } ( \rho ) ) } { ( ^q_2 ) } , \quad
{ \rm where } \quad \bar { \phi } _ { jk } = \frac { 1 } { q }
\mathop\sum\limits_ { i = 1 } ^q \phi_ { jk } ^i ( \rho )
.\end{align*}
\end{document}
The optimal \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\hat{ \rho}$$
\end{document} is then chosen as:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\hat{ \rho} = \min \{ \rho: \max_{0 < t < \rho}D ( t ) \le
\alpha \} , \qquad { \rm where} \ \alpha = 0.05.\end{align*}
\end{document}
The final network is constructed using the whole data X and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\hat{\rho}$$
\end{document}. In addition, when 0 ≤|Φij|≤ 1, users can also pick their own ρ based on their problem of study. Our suggestion is to choose a ρ in the range of 0.0025 ≤ ρ ≤ 0.1, corresponding to 0.05 ≤ |Φij| ≤ 0.32.
3. Computational Results
3.1 Simulations
3.1.1. Moderate-size networks
Data were simulated from two popular network models similar to Zhang and Mallick (2013), including a small world and a band 4 network. The covariance matrix Σ for the small world network is a Gaussian AR(1) process with element (covariance for i and j) σij = 0.7|i-j|. It corresponds to a small world network with bandwidth 1 (brand 1), where each node only connects to its direct neighbors. The element for the band 4 network Φ is defined as Φij = 2I(|i − j| = 0) −0.5I(|i − j| = 1) −0.8I(|i − j| = 2) + 0.2I(|i − j| = 3) + 0.3I(|i − j| = 4), where I(.) is an indicator function, so the covariance matrix Σ = Φ−1 is a Gaussian AR(4) process. Note that roughly half of the edges had the partial correlation coefficient of 0.1 or 0.15 in Φ. Therefore, it is harder to detect the structure of the band 4 than the small world network. The mean \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mu^T = [ 2 , \ldots , 2 ]$$
\end{document}, and the normal data is generated with X ∼ N(μ, Σ). The sample size of the data is n = 100, with different dimensions m = 100, 200, and 500. The optimal ρ is determined by stable selection. Stability selection seeks the ρ leading to the most stable set of edges through subsampling. The simulation is performed 100 times, and the performance of L0RICE is compared with the L1RICE program in C downloaded online (Liu et al., 2009). The performance is measured by the area under the ROC curve (AUC), the false positive rate (FPR), and the false negative rate (FNR). FPR is defined as the rate the models predict an edge when there is no edge, while FNR is the rate of predicting no edge when there is an edge on the network.
The computational results on Table 1 demonstrate that L0RICE performs better than L1RICE across most categories and different network sizes, especially with the more difficult band 4 network. While both L1 and L0 perform very well and similarly for band 1 network with large off-diagonal values, L0RICE performs substantially better with the more difficult band 4 network, particularly in controlling the FPR. L0RICE achieves an AUC of 0.789, 0.784, and 0.769, while L1RICE only reaches an AUC of 0.73, 0.738, and 0.735 with network sizes of 100, 200, and 500, respectively. In addition, L0 has smaller FPR and FNR values compared to L1RICE for the same network size. The FPRs with L0RICE are substantially smaller than L1RICE: the FPRs for L0 are 8.13%, 7.06%, and 5.33%, while the FPRs for L1 are 13.3%, 11.1%, and 10.3% with network sizes of m = 100, 200, and 500, respectively.
Performance Comparison over 100 Simulations and Different Sample Size (m = 100, 200, 500), Where Values in the Parenthesis are the Standard Deviations
Band 1
Band 4
L0RICE
AUC
FPR (%)
FNR (%)
AUC
FPR (%)
FNR (%)
m = 100
0.999 (±0.002)
0.02 (±0.12)
0.21 (±0.48)
0.789 (±0.008)
8.13 (±0.57)
34.0 (±1.40)
200
0.998 (±0.002)
0.076 (±0.20)
0.28 (±0.42)
0.784 (±0.008)
7.06 (±1.38)
36.1 (±1.05)
500
0.997 (±0.002)
0.062 (±0.05)
0.65 (±0.38)
0.769 (±0.003)
5.33 (±0.073)
40.9 (±0.62)
L1RICE
AUC
FPR (%)
FNR (%)
AUC
FPR (%)
FNR (%)
m = 100
0.99 (±0.003)
1.52 (±0.22)
0.33 (±0.67)
0.73 (±0.006)
13.3 (±0.45)
41.3 (±1.19)
200
0.99 (±0.003)
1.21 (±0.07)
0.45 (±0.53)
0.738 (±0.004)
11.1 (±0.23)
41.3 (±0.94)
500
0.99 (±0.002)
1.08 (±0.09)
0.81 (±0.44)
0.735 (±0.003)
10.3 (±0.14)
42.1 (±0.52)
3.1.2. Large-size networks
Another set of simulations with a larger number of nodes (m = 1000, 1500, 2000, and 3000) and various sample sizes (n = 100, 200, 500) is performed with the range-dependent network model, motivated by the structure of protein–protein interaction (PPI) networks in biology (Higham and Przulj, 2011). In this model, the links between nodes i and j are inserted with probability 0.9 × 0.2|i−j|−1, then the Φ is defined as the Laplacian matrix (Belkin and Niyogi, 2003). A small value 0.1 is added to the diagonal to make sure that Φ is positive definite. The covariance matrix is Σ = Φ−1. Depending on ρ and network size, the computational time for L0RICE (MATLAB) is 3–12 min, which is slightly less efficient than graphical lasso in C (35–3 min). However, graphical lasso is less stable than L0RICE. It does not always converge particularly for large-scale networks. The true and estimated Φs are then dichotomized into adjacency matrices with 0/1 entries, where 0 indicates no connection and 1 represents a nonzero correlation. The performance is then evaluated by comparing the estimated and the true structures of adjacency matrices with ROC curves in Figure 1. The optimal ρs for L0RICE are 0.05, 0.05, 0.04, and 0.03 and for L1RICE are 0.25, 0.2, 0.15, and 0.14 for the sample sizes of 1000, 1500, 2000, and 3000, respectively.
Performance comparison of L0RICE and L1RICE with different sample sizes and numbers of nodes.
Figure 1 shows that L0RICE outperformed L1RICE under different sample sizes and numbers of nodes (variables). With n = 100 and m = 1000, the AUC of L0RICE is 0.928, while it is 0.906 for L1RICE. In addition, the AUCs for n = 200 and m = 1500 are 0.932 and 0.875 for L0 and L1, respectively. Finally, the AUCs of L0RICE are 0.970 and 0.969 for m = 2000, and 3000, respectively, while the AUCs of L1RICE are 0.912 and 0.905 with the same number of nodes, indicating both L0RICE and L1RICE are robust and their performance does not deteriorate much when the number of nodes increases from 2000 to 3000.
3.1.3. Simulations with count data
Next generation sequencing (NGS)–based omics data are positive counts. They are not normally distributed and the variance is associated with their mean count. One approach to deal with such data is to compute the distribution-free nonparametric correlations for different data types, including Kendall's τ-rank correlation and Spearman's rank-order correlation (r) (Liu et al., 2012). Those correlations are only based on rank information and are more robust than the parametric approach. They can be applied to any data type. After we have τ and r, we can construct the covariance (similarity) matrix S = (sij) = (sin(πr/2)), or = (sin(πτ/2)). To simulate data that mimics actual NGS data, we generated count data from network structures similar to Liu et al. (2014). The count xij has the Poisson distribution Pois(τij) with mean τij, where {log(τij),i < j} has the normal distribution N(μ, Σ) with mean μ and covariance matrix Σ. The network structure is represented in Σ and the sparse inverse matrix Φ = Σ−1. We simulate the more difficult band 4 network with Φ defined the same way as in previous simulation. The generated count data have sample size 100 and three different network sizes: 100, 200, and 500. The partial correlations for some nodes can be small in such networks, so it is a difficult problem for structure detection. The tuning parameter ρ is again determined with a stability selection approach. The pairwise correlation is determined with Spearman's rank correlation. The computational results with L0RICE and L1RICE are reported in Table 2.
Performance Comparison over 100 Simulations and Different Sample Sizes (m = 100, 200, 500) with Simulated Count Data, Where Values in Parentheses Are the Standard Deviations
L0RICE
L1RICE
L0RICE
AUC
FPR (%)
FNR (%)
AUC
FPR (%)
FNR (%)
m = 100
0.742 (±0.009)
5.76 (±0.79)
46.1 (±1.73)
0.714 (±0.006)
10.40 (±0.38)
47.0 (±1.12)
200
0.743 (±0.006)
5.99 (±0.513)
45.4 (±0.513)
0.728 (±0.004)
7.49 (±0.29)
47.3 (±0.812)
500
0.753 (±0.003)
4.16 (±0.053)
45.1 (±0.531)
0.726 (±0.008)
10.9 (±0.525)
46.2 (±1.24)
Both L0RICE and L1RICE still perform reasonably well, and L0RICE outperforms L1RICE regardless of the number of nodes. In particular, the false positive rates (FPR) are 5.76%, 5.99%, and 4.16% with L0RICE, while the FPRs with L1RICE are 10.40%, 7.49%, and 10.9% for node sizes of 100, 200, and 500, respectively, indicating L0RICE has lower false positive rates and clearly outperforms L1RICE substantially.
3.2. Analysis of a metagenomic dataset
The metagenomic dataset was collected by our group (Tong et al., 2013). There are in total 299 samples from 76 inflammatory bowel disease (IBD) subjects, including Crohn's disease (CD), and 223 non-IBD subjects. Out of the 299 samples, 285 (72 IBDs and 213 controls) have metagenomic count data with a total of 5648 OTUs available. We merged OTUs at the species level, and then species with high abundances were selected for further study after discarding those species with less than two reads on average. This leads to 173 remaining features. Data were then normalized by the total count of each individual. The class y is a binary vector with yk = 0 for non-IBD, and yk = 1 for IBD samples. Because of the compositional nature of metagenomic data, Pearson correlation is not appropriate (Kurtz et al., 2015). Therefore, nonparametric Spearman's rank correlation (r) is used to measure the pairwise associations among species and between disease status and each species. The partial correlation is then estimated from the inverse of S = (sij) = (sin(πr/2)). The optimal tuning parameter is ρ* = 0.1 by stability selection. The network for interactions among species and direct and indirect associations between disease status (IBD/nonIBD) and species are presented in Figure 2.
Network for high-order interactions among species and direct and indirect interactions between IBD status and species: red, negative association; black, positive association.
Figure 2 shows that IBD is negatively (in red) associated with Faecalibacteriumprausnitzii and positively associated with Dialisterinvisus. Through Faecalibacteriumprausnitzii, IBD also has indirect negative correlations with Coprococcuseutactus, Eubacteriumrectale, and Ruminococcusobeum, and an indirect positive association with Ruminococcusgnavus. While Dialisterinvisus is less studied in IBD, Faecalibacteriumprausnitzii is a well-known beneficial human commensal anti-inflammatory bacterium (Laval et al., 2015; Cao et al., 2014; Stoll et al., 2014). The bacterial count of Faecalibacterium prausnitzii in IBD patients is 348, which is significantly lower than that in control (555), indicating the abundance of Faecalibacterium prausnitzii was decreased in IBD patients compared with healthy controls. In addition, the impacts of Faecalibacteriumprausnitzii on IL-23/Th17/IL-17 pathway markers have been determined in human monocytes and a rat model of colitis. It has been shown that Faecalibacteriumprausnitzii increases plasma anti-Th17 cytokines (IL-10 and IL-12) and suppresses IL-17 levels in both plasma and colonic mucosa, and protects the colon mucosa against the development of IBD by its metabolites, suggesting a promising potential for the use of Faecalibacteriumprausnitzii in the treatment of IBD (Zhang et al., 2014; Varela et al., 2013). Even though there are few studies looking into the interactions of multiple species, the direct and indirect interactions between Faecalibacteriumprausnitzii and other species on the network must be studied to explore their effects on IBD systematically.
3.3. microRNA-mRNA network with TCGA NGS data
The standard primary treatment for ovarian cancer is surgery followed by chemotherapy. The rationale for surgery as the primary treatment is to remove as much tumor as possible. If tumor nodules have invaded vital organs, surgeons may not be able to remove them without compromising the patient's life. It has been shown that leaving tumor nodules larger than 1 cm (defined as suboptimal debulking) is associated with reduced chemosensitivity and poor survival. If the tumor cannot be effectively removed by surgery, it will be better for a patient to be first treated with chemotherapy to partially shrink the tumor and then have surgery. However, the doctor does not know if there is a suboptimal debulking until he performs the surgery. Biomarkers may help physicians decide which patients should undergo surgery and who should be treated with chemotherapy first. We are trying to identify mRNA and microRNA markers with integrated analysis of microRNA and mRNA expression from RNA-seq data in TCGA with 367 samples, 13911 genes, and 342 microRNAs. There are 294 suboptimal and 73 optimal debulking samples. The data was normalized with log 2 transformation and quantile normalization, and then 18 microRNAs and 528 genes were identified using t-tests with P < 0.01. The pairwise correlations between microRNA and mRNA and among mRNA genes were then calculated with Spearman's rank correlation. The inverse of the pairwise correlation matrix represents the partial correlations between microRNAs and their target mRNAs. The optimal ρ is set to 0.04 with stability selection. We also incorporate the microRNA target prediction information into the analysis. Two databases, TargetScan and miRBase, were used to validate the true targets. Only the microRNA–mRNA pairs presented in both our network and the target databases were selected as the edges of the final network. Subnetwork associated with suboptimal debulking in ovarian cancer is shown in Figure 3.
Suboptimal debulking associated subnetwork for microRNA and mRNA interactions with TCGA ovarian cancer NGS data.
Figure 3 represents suboptimal debulking-associated interactions between mcroRNAs and mRNAs. Fourteen (14) mRNA genes (in purple) on the subnetwork are connected with each other and are the targets of six microRNAs. Those 14 mRNAs are on the ECM-receptor interaction and focal adhesion KEGG pathways with the FDR q values of 0.007 and 0.05, respectively. Enrichment analysis with STRING demonstrates that those 14 mRNAs involve several biological processes including extracellular matrix organization (q < 1e − 5) and skeletal system development (q < 1e − 4). Five of 14 mRNA genes including VCAN, COL11A1, COL5A2, POSTN, and THBS2 also appeared in the 10 gene signatures associated with metastasis, recurrence, and poor survival of ovarian cancer (Cheon and Orsulic, 2014). Six microRNAs regulated the 14 mRNAs and related pathways including miR-202, miR-493, miR-605, miR-758, miR-377, and miR-22. Among the six microRNAs, the expressions of miR-202, miR-22, and miR-493 are known to be associated with ovarian cancer, while miR-605, miR-758m, and miR-377 are associated with other types of cancers (Wan et al., 2014; Gadducci et al., 2014). The identified microTRNAs and mRNAs provide candidate biomarkers for further exploring their associations with suboptimal debulking.
4. Conclusions and Remarks
We propose a L0-based sparse inverse covariance estimation for network structure learning with the efficient alternating direction method of multipliers (ADMM), and extended the method to various data types with the nonparametric rank correlation coefficients. The developed method approximates L0 through solving a series of L2 convex optimizations. It is very efficient, even when the dimension of the matrix is large. The computational time is similar between L1 and L0. We evaluated the performance of the method with simulations and multiomics data. The proposed method outperformed L1 sparse inverse covariance estimation in almost all of the simulations. Application to metagenomic count data identified IBD associated species efficiently. When applied to multiomics data integration, biologically important genes and subnetworks correlated with optimal debulking were selected. The method learns the high-order correlation structure with lower FPR.
Footnotes
Acknowledgments
This work was partially supported by the National Science Foundation (NSF) grants DMS-222381 (Z.L.) and DMS-1220772 (S.L.). The funders had no role in the preparation of the article.
Author Disclosure Statement
The authors declare that no competing financial interests exist.
References
1.
BanerjeeO., GhaouiL.E., and d'AspremontA.2008. Model selection through sparse maximum likelihood estimation. J. Mach. Learn. Res., 9, 485176.
2.
BelkinM., and NiyogiP.2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396.
3.
BoydS., ParikhH., ChuE., et al.2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3, 1172.
4.
CaoY., ShenJ., and RanZ.H.2014. Association between Faecalibacterium prausnitzii reduction and inammatory bowel disease: A meta-analysis and systematic review of the literature. Gastroenterol. Res. Pract., 2014, 872725.
5.
ChaibubN., KellerM., AttieA., and YandellB.2010. Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann. Appl. Stat., 4, 320179.
6.
ChenD.-Q., and ChengL.-Z.2014. Fast linearized alternating direction minimization algorithm with adaptive parameter selection for multiplicative noise removal. J. Comput. Appl. Math. Arch., 257, 29–45.
7.
CheonD.J., and OrsulicS.2014. Ten-gene biomarker panel: A new hope for ovarian cancer?. Biomark Med. 8, 523–526.
8.
EcksteinJ., and BertsekasD.P.1992. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathe. Program., 55, 293–318.
9.
FanJ., and LiuH.2013. Statistical analysis of big data on pharmacogenomics. Adv. Drug Deliv. Rev., 65, 987–1000.
10.
FriedmanJ., HastieT., and TibshiraniR.2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.
11.
GadducciA., SergiampietriC., LanfrediniN., and GuiggiI.2014. Micro-RNAs and ovarian cancer: The state of art and perspectives of clinical research. Gynecol. Endocrinol., 30, 266–271.
12.
GoldfarbD., MaS., and ScheinbergK.2013. Fast alternating linearization methods for minimizing the sum of two convex functions. Mathe. Program., 141, 349–382.
13.
GoldsteinT., ÓDonoghueB., SetzerS., and BaraniukR.2014. Fast alternating direction optimization methods. SIAM J. Imag. Sci., 7, 1588–1623.
14.
HestenesM.R.1969. Multiplier and gradient methods. J. Optim. Theory Applica., 4, 302–320.
15.
HighamD.J., and PrzuljN.2011. Random graph models and their application to proteinprotein interaction networks. In BaldingD., GirolamiM., and StumpfM., eds. Handbook of Statistical Systems Biology. Wiley, New York.
16.
HsiehC.-J., SustikM., DhillonI., and RavikumarP.2014. QUIC: Quadratic approximation for sparse inverse covariance estimation. J. Mach. Learn. Res., 15, 2911–2947.
17.
JohnsonC.C., JalaliA., and RavikumarP.D.2012. High-dimensional sparse inverse covariance estimation using greedy methods. JMLR W&CP, 22, 574–582.
18.
KiiveriH.T.2011. Multivariate analysis of microarray data: Differential expression and differential connection. BMC Bioinform. 12, 42.
19.
KimK., JiangK., TengS.M., et al.2012. Using biologically interrelated experiments to identify pathway genes in Arabidopsis. Bioinformatics, 28, 815–822.
20.
KurtzZ.D., MüllerC.L., MiraldiE.R., et al.2015. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput. Biol. 11, e1004226.
21.
LavalL., MartinR., NatividadJ., et al.2015. Lactobacillus rhamnosus CNCM I-3690 and the commensal bacterium Faecalibacterium prausnitzii A2-165 exhibit similar protective effects to induced barrier hyper-permeability in mice. Gut Microbes, 6, 1–9.
LiuZ., and LiG.2014. Efficient regularized regression for variable selection with L0 penalty. http://arxiv.org/abs/1407.7508
27.
LiuZ., SunF., BraunJ., et al.2014. Multilevel regularized regression for simultaneous taxa selection and network construction with metagenomic count data. Bioinformatics. 31, 1067–1074.
28.
MeinshausenN., and BählmannP.2010. Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol., 72, 417–473.
29.
OuyangH., HeN., TranL., and GrayA.2013. Stochastic alternating direction method of multipliers. JMLR W&CP, 28, 80.
30.
ScheinbergK., MaS., and GoldfarbD.2010. Sparse inverse covariance selection via alternating linearization methods, 2101–2109. In Advances in Neural Information Processing Systems, Vol. 23. MIT Press, Red Hook, NY.
31.
StollM.L., KumarR., MorrowC.D., et al.2014. Altered microbiota associated with abnormal humoral immune responses to 16 commensal organisms in enthesitis-related arthritis. Arthritis Res. Ther., 16, 486.
32.
TongM., et al.2013. A modular organization of the human intestinal mucosal microbiota and its association with inammatory bowel disease. PLoS One, 8, e80702.
33.
VarelaE., ManichanhC., GallartM., et al.2013. Colonisation by Faecalibacterium prausnitzii and maintenance of clinical remission in patients with ulcerative colitis. Aliment Pharmacol. Ther., 38, 151–161.
34.
WanW.N., ZhangY.Q., WangX.M., et al.2014. Down-regulated miR-22 as predictive biomarkers for prognosis of epithelial ovarian cancer. Diagn. Pathol., 9, 178.
35.
YuanM., and LinY.2007. Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19.
36.
YuanX.M.2012. Alternating direction method of multipliers for covariance selection models. J. Sci. Comput., 51, 261–273.
37.
ZhangL., and MallickB.K.2013. Inferring gene networks from discrete expression data. Biostatistics. 14, 708–722.
38.
ZhangM., QiuX., ZhangH., et al.2014. Faecalibacterium prausnitzii inhibits interleukin-17 to ameliorate colorectal colitis in rats. PLoS One, 9, e109146.
39.
ZouH.2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc., 101, 1418–1429.