Context-Specific Nested Effects Models

Abstract

Advances in systems biology have made clear the importance of network models for capturing knowledge about complex relationships in gene regulation, metabolism, and cellular signaling. A common approach to uncovering biological networks involves performing perturbations on elements of the network, such as gene knockdown experiments, and measuring how the perturbation affects some reporter of the process under study. In this article, we develop context-specific nested effects models (CSNEMs), an approach to inferring such networks that generalizes nested effects models (NEMs). The main contribution of this work is that CSNEMs explicitly model the participation of a gene in multiple contexts, meaning that a gene can appear in multiple places in the network. Biologically, the representation of regulators in multiple contexts may indicate that these regulators have distinct roles in different cellular compartments or cell cycle phases. We present an evaluation of the method on simulated data as well as on data from a study of the sodium chloride stress response in Saccharomyces cerevisiae.

1. Introduction

Cellular processes such as gene regulation, metabolism, and signaling form complex interplay of molecular interactions. A primary means of uncovering the details of these processes is through the analysis of measured responses of cells to perturbation experiments. We present context-specific nested effects models (CSNEMs), which are graphical models for analyzing screens of high-dimensional phenotypes from gene perturbations. In this setting, the perturbation consists of knocking out, knocking down, or otherwise disabling the activity of a gene, via the use of deletion mutants, RNA interference, CRISPR/Cas9, or other techniques. The high-dimensional phenotype may be a transcriptomic, proteomic, metabolomic, or similar multidimensional profile of measurements. Such profiles provide indirect information about the pathways that connect the gene that is perturbed in an experiment to the effects observed in a phenotype. This poses a challenge for determining functional relationships since the precise mechanisms by which the perturbation relates to the phenotype must be inferred using computational and statistical methods, expert knowledge, or a combination of both.

Related work on inferring networks from gene expression data includes methods based on statistical dependencies between expression measurements (Friedman et al., 2000; Irrthum et al., 2010), which are used to construct networks of probable interactions between the genes measured in the expression profile. Other work on using phenotypic data uses clustering of phenotypic profiles, or the similarity between profiles, to construct networks among the perturbation genes (Piano et al., 2002; Ohya et al., 2005). The rationale behind these approaches is that genes that produce similar phenotypes when perturbed are likely to be functionally related (Markowetz and Spang, 2007).

The CSNEM approach is a generalization of the nested effects model (NEM) (Markowetz et al., 2005). In the NEM approach, a network structure among the perturbed elements of the cell is inferred from the nested structure of phenotypic profiles. The general idea is that perturbation of a gene that is further upstream in a signaling pathway would affect more elements than perturbation of a gene further downstream. For example, Figure 1A shows an NEM in which Hog1 is upstream of Cka2. The table underneath the graph represents the differential expressions of the high-dimensional phenotypes observed in the screen, with rows corresponding to single-gene knockouts and each column corresponding to an effect: one dimension of a phenotype, such as a particular transcript in a transcriptomic phenotype. In the table of effect measurements in the figure, a “1” indicates that a perturbation changed the response of the effect, and a “0” indicates that it did not. The deletion of Hog1 would affect e₁, e₂, e₃, and e₄ because they are all downstream of it. The deletion of Cka2, however, would only affect e₃ and e₄. Therefore, the nesting of the effects of the deletion of Cka2 within the effects of the deletion of Hog1 places the former downstream of the latter.

FIG. 1.

(A) An example of effect nesting in an NEM and (B) a partial intersection of effects as captured by a CSNEM. The table underneath each graph represents the differential expressions of the high-dimensional phenotypes observed in the screen, with rows corresponding to single-gene knockouts and each column corresponding to an effect, one dimension of a phenotype, where a “1” indicates that a perturbation changed the response of the effect, and a “0” indicates that it did not. CSNEM, context-specific nested effects model; NEM, nested effects model.

Such nesting of effects, however, does not always occur. The protein product of a gene may interact with those of other genes in a multitude of ways, and one might imagine a situation where two genes are interacting with each other upstream of a subset of the effects, but additionally have other roles independent of each other. This is the case in Figure 1B, where upstream of effects e₁, e₂, e₃, and e₄ Cka2 and Hog1 interacts as before, but Cka2 additionally affects e₅ and e₆ independent of Hog1. In such a case, we see that the phenotype induced by the perturbations of each gene includes effects downstream of the common pathway, but each perturbation also shows unique effects, and rather than being nested, the effects show a partial intersection. The example in Figure 1 is based on a pattern we identified in our application of CSNEM learning to experiments studying sodium chloride (NaCl) stress response in Saccharomyces cerevisiae.

In the CSNEM approach, we address this issue by explicitly considering the possibility that one gene may have multiple contexts of interaction. The model can be equivalently viewed either as a single graph model, where multiple nodes may represent multiple roles of the same gene, or as a mixture of multiple NEMs, where each NEM describes a different subset of the effects. Notably, mixtures of NEMs have been used for analyzing single-cell expression data (Siebourg-Polster et al., 2015). In that work, the mixture is used to account for variation of gene activation states across different cells.

In contrast, in a CSNEM, the mixture represents different patterns of interaction among the same sets of genes across different subsets of the measured effects. The effect pattern in Figure 1B can alternatively be accounted for by the introduction of a hidden node downstream of both Hog1 and Cka2, an approach explored by Sadeh et al. (2013), where they introduce a statistical test to infer a partially resolved NEM. In fact, Sadeh et al. showed that the presence of a hidden node downstream of a pair of genes is consistent with every possible configuration of effect responses. Their method aims to characterize all possible NEM models that are consistent with the data, and as a result, it never rejects the possibility of a hidden node existing downstream of any pair of genes. In contrast, in our approach, we aim to find a single parsimonious network model that optimally fits the data. We show how to cast the problem of learning a CSNEM as a modified version of NEM learning, evaluate the ability of this approach to recover a ground-truth network on simulated data, and present an application to the salt stress pathway in yeast.

2. Background: Nested Effects Models

Tresch and Markowetz (2008) formulated NEMs as a special case of effects models. In an effects model, there is a set of actions $A$ , and a set of effects $ℰ$ , and we wish to model which effects change in response to each action. In earlier work on NEMs (Markowetz et al., 2005), the actions and effects are, respectively, referred to as S-genes (S for signaling) and E-genes (E for effects). The actions correspond to perturbation experiments, whereas the effects correspond to the high-dimensional phenotype measured in the experiment. A general effects model can be represented by a binary matrix F, where $F_{a e} = 1$ if action a leads to a response (or change) in effect e, and 0 otherwise.

Let $n_{A}$ and $n_{ε}$ represent the number of actions and effects, respectively. An NEM is made up of a directed graph G, the nodes of which are the actions $A$ , and an $n_{A} \times n_{ε}$ binary matrix $Θ$ of attachments, in which $Θ_{a e} = 1$ if effect e is attached to action a, and 0 otherwise. A modeling constraint is that each effect is attached to at most one action.

The NEM is interpreted as follows: action a causes a response in effect e if and only if either e is attached directly to a, or there is a directed path in G from a to the action to which e is attached. Mathematically, this can be formulated in terms of matrix multiplication. Since what matters is which actions are reachable from other actions in G, we can work with $Γ$ , the $n_{A} \times n_{A}$ accessibility matrix of G. $Γ_{i j}$ is 1 if i = j or there is a directed path from i to j in G, and 0 otherwise. Using $Γ$ , we can express the effects matrix F of an NEM as $F = Γ Θ$ . Figure 2 shows an example of a graph G, its corresponding matrix $Γ$ , an attachment matrix $Θ$ , and the resultant effect matrix F.

FIG. 2.

(A) A graph G, (B) the accessibility matrix $Γ$ , (C) an attachment matrix $Θ$ , and (D) the resultant effect matrix $F = Γ Θ$ .

2.1. Likelihood computation

The problem of inferring an NEM from a data set D can be viewed as that of maximizing a likelihood. In this section, we review how the likelihood of an NEM is framed to illustrate how the likelihood of a CSNEM relates to it.

Supposing that we have some data consisting of measurements of the observable effects subjected to each action included in the model, and assuming data independence, for a general effects model, the log-likelihood of the model is $log L (F) = log ℙ (D | F) = \sum_{(a, e) \in A \times ℰ} log ℙ (D_{a e} | F_{a e}),$ (1)

where $ℙ (D_{a e} | F_{a e})$ is the probability of the data we observed in regard to effect e subjected to action a given that F_ae indicates whether we expect a response in e subjected to a. When the observed phenotype is, for example, gene expression data, a typical indicator of a response in effect e is differential expression of effect e between the experimental condition a and a control, such as a wild-type (WT) phenotype.

Let $R \in ℛ^{n_{ε} \times n_{A}}$ be a matrix of log-likelihood ratios such that $R_{e a} = log \frac{ℙ (D_{a e} | F_{a e} = 1)}{ℙ (D_{a e} | F_{a e} = 0)}$ , and let N represent the null model predicting no effect response to any action, Tresch and Markowetz (2008) showed that the log-likelihood of an effects model F is then $log L (F) = t r (F R) + {\underset{︸}{log L (N)}}_{c o n s t a n t w . r . t . d a t a},$ (2)

where $t r (\cdot)$ is the trace of a matrix. The above holds for any effects model in general. Since in an NEM, $F = Γ Θ$ , to maximize the likelihood of an NEM, one would maximize $t r (Γ Θ R)$ .

Computationally, maximizing this expression is difficult because it is a search over a discrete but exponentially large space of all possible $Γ$ and $Θ$ matrices. Early work on NEMs reduces some of the complexity of this search by observing that since $Θ$ can only have one for each effect across all actions by construction, and since $t r (Γ Θ R) = t r (R Γ Θ)$ , one can marginalize over all possible values of $Θ$ , assuming that they are equally likely a priori, yielding a marginal likelihood proportional to $\prod_{e \in ε} \sum_{a \in A} exp ({(R Γ)}_{e a})$ . This reduces the task to the search for a $Γ$ that maximizes this marginal likelihood, an exhaustive search for which is feasible for $n_{A} \leq 5$ (Markowetz et al., 2005). For larger graphs, however, the problem is still computationally restrictive, and multiple algorithms for learning NEM structure efficiently have been presented in the literature (Fröhlich et al., 2007; Markowetz et al., 2007), most of which have been implemented in the nem R package (Fröhlich et al., 2008). Other approaches to computing the likelihood have also been explored, such as the factor graph optimization approach by Vaske et al. (2009).

In this work, we show how learning a CSNEM can be cast as a more complex NEM learning problem. To solve the NEM learning problem, we use MC-EMiNEM, a method that does not attempt to optimize a marginal likelihood, as many of the above approaches do, but maximizes the log posterior $log ℙ (Γ, Θ | D) = log L (Γ Θ) + \sum_{(i, j) \in A \times A} log ℙ (Γ_{i, j}) + log ℙ (Θ),$ (3)

where $log ℙ (Γ_{i, j})$ is an edge-wise prior on the structure of the actions graph and $ℙ (Θ)$ is a prior on the attachment matrix. MC-EMiNEM uses Monte Carlo (MC) sampling and expectation maximization within MC steps to search for the $Γ$ and $Θ$ that are optimal with respect to this posterior (Niederberger et al., 2012). MC-EMiNEM is available as a part of the nem R package on Bioconductor (Froehlich et al., 2019).

3. Methods: Context-Specific Nested Effects Models

As briefly mentioned in the introduction, the motivation for developing CSNEMs is that there are cases in which phenotype effects are not nested, as in the example in Figure 1. In CSNEMs, we account for situations such as the partial overlap in Figure 1 by allowing an action in the graph to be represented by more than one node, and we call these different nodes that correspond to the same action and different contexts of the action. Mathematically, this enables the model to represent relationships that are not representable by an NEM. Biologically, different contexts in a CSNEM may correspond to participation in different pathways, either due to physical separation such as localization of molecules, or temporal separation, such as participation in different stages of the cell cycle.

The CSNEM in Figure 1B is presented as a single NEM-like graph with multiple contexts for the Cka2 node. Note that the same diagram can also be viewed as a pair of NEMs: one containing Hog1 and Cka2, which applies to effects e₁, e₂, e₃, and e₄, and another containing only Cka2, which applies to the effects e₅ and e₆. This view of a CSNEM as a mixture of NEMs is most useful in understanding our approach to learning a CSNEM from data.

3.1. The likelihood of a k-CSNEM

We define a k-CSNEM as a mixture of k NEMs, where the response of each effect e is governed by one of k NEMs, each of which can have a different graph G relating the actions $A$ . A k-CSNEM is therefore parameterized by k accessibility matrices $Γ^{1}, \dots, Γ^{k}$ , each of which is $n_{A} \times n_{A}$ and by a vector $θ$ , each coordinate of which takes one of $k n_{A} + 1$ values, specifying attachment to one of the $n_{A}$ actions in one of the k NEMs, or the absence of attachment.

The parameter $θ$ partitions the space of effects by assigning each effect to one of the k NEMs (or to none of them). As a matter of convention, we represent attachment of effect $e \in ℰ$ to an action $a \in A$ in mixture member $i \in {1, \dots, k}$ by $θ_{e} = (i - 1) n_{A} + a$ (we slightly abuse notation, treating actions as natural numbers $1, \dots, n_{A}$ here), and let $θ_{e} = 0$ if the effect is not attached to any action in any NEM.

In relation to the CSNEM, let us combine the mixture of NEMs into one structure by defining the block diagonal matrix $Γ$ made of blocks $Γ^{i}$ , define $Θ \in 0, 1^{| A | k \times | ℰ |}$ by $Θ_{a e} = 1$ iff $θ_{e} = a$ , and let $Ψ$ be a block matrix made up of k appended $| A | \times | A |$ identity matrices: $Γ = [\begin{matrix} Γ^{1} & 0 & \dots & 0 \\ 0 & Γ^{2} & ⋮ \\ ⋮ & ⋱ & 0 \\ 0 & \dots & 0 & Γ^{k} \end{matrix}], Ψ = \underset{k c o p i e s}{\underset{︸}{[\begin{matrix} I_{| A |} & I_{| A |} & \dots & I_{| A |} \end{matrix}]}} .$ (4)

In Section 6, we show that given these definitions, the log-likelihood of the CSNEM can be written as $log L (Γ^{1, \dots, k}, θ) = t r (Γ Θ (R Ψ)) + log L (N) .$ (5)

That is, the likelihood of a k-CSNEM is equal to the likelihood of an NEM with $k | A |$ actions for the data matrix $R Ψ$ , subject to the constraint that $Γ$ is block diagonal as in Equation (4). We can consequently use any NEM learner to learn a k-CSNEM mixture, as long as it supports constraining $Γ$ to be block-diagonal. Analogously to Equation (3), we can obtain a posterior probability for the CSNEM by introducing priors for $Γ$ and $Θ$ , and applying MC-EMiNEM to maximize that posterior. The block-diagonal constraint can be enforced using the edge-wise prior on the structure of $Γ$ , by setting the priors on edges that would violate block-diagonality to zero.

3.2. Compact visualization and identifiability of a k-CSNEM

Having obtained k NEMs and the corresponding partitioning of the effect set, a single graph can be composed by merging all action nodes across the graphs that have the same ancestors (are reachable from the same set of actions). Figure 3 provides an example: Figure 3A shows three graphs that describe the structures of three NEMs that compose a mixture, and Figure 3B shows the result of merging them. Note that Hog1 is reachable from no nodes but itself in all three NEMs. Consequently, in the compact CSNEM, there is only one version of Hog1. In contrast, Cka2 is reachable from Hog1 in one of the NEMs and is only reachable from itself in the others, which is why it has two contexts in the CSNEM. Similarly, Ckb14 is reachable from both Hog1 and Cka2 in one of the three NEMs, but not the others, and has two contexts as well. To keep track of the various contexts, we append the list of genes from which a context is reachable when displaying the graph, for example, the context of Cka2 that is reachable from Hog1 is labeled “Cka2 [Hog1],” whereas the context that is not reachable from other nodes is labeled simply “Cka2.” This is particularly helpful when viewing graphs with many nodes and many contexts.

FIG. 3.

Building a CSNEM from a mixture of NEMs. (A) Three NEMs that compose a mixture. (B) A single graph obtained by an edge-preserving merge of the three NEMs. (C) The corresponding action-set graph.

The merged graph in Figure 3B preserves the edges that were present in the mixture of NEMs, but it is not necessarily a unique maximizer of the likelihood, rather, it is a member of an equivalence class of equally likely CSNEMs. What characterizes the equivalence class is the set of inclusive ancestries of the nodes in the CSNEM. The inclusive ancestry of a node is a set of actions; this set contains the action at the node and all actions from which it is reachable: for example, the inclusive ancestry of the Cka2 node in the leftmost NEM in Figure 3A is ${H o g 1, C k a 2}$ , whereas the inclusive ancestry of the Cka2 node in the middle NEM is simply ${C k a 2}$ . The set of inclusive ancestries, for example, in Figure 3 is therefore $\{{H o g 1}, {C k a 2}, {C k b 12}, {H o g 1, C k a 2}, {H o g 1, C k a 2, C k b 12}\}$ . This set of sets can be uniquely graphically represented in an action-set graph, an inverted Hasse diagram of the sets ordered by inclusion (Hasse diagrams are arranged with the smallest elements at the bottom), an example of which is in Figure 3C. Any two CSNEMs with identical sets of inclusive ancestries necessarily have the same set of unique accessibility matrix columns $Γ_{\cdot a}^{i} : i \in {1, \dots, k}, a \in A$ , and consequently, have the same likelihood for likelihood-maximizing attachments $Θ$ .

The characterization of equivalence classes in terms of inclusive ancestry sets relates to previous results about NEM identifiability: for transitively closed $Γ$ , cycles form fully connected components that can be merged into single nodes (Markowetz et al., 2007). All nodes in such connected components have identical ancestry sets, yielding a graph that is isomorphic to the action-set graph, with the (often singleton) set of actions S at a node in the NEM being the set difference of the set of actions A in the corresponding node in the action-set graph and the union of the sets of actions at its parent nodes $S = A ∖ \cup p a (A)$ . This can also be extended to the case of nontransitive $Γ$ and the result on identifiability of nontransitive NEMs up to cycle reversals (Tresch and Markowetz, 2008): Figure 4 illustrates how a pair (a, b) of equivalent nontransitive NEM cycle structures between nodes X, Y, Z are represented by a unique action-set graph (c). Ancestry sets and action-set graphs can therefore be used to consistently represent equivalence classes of both CSNEMs and NEMs.

FIG. 4.

The action-set graph of a nontransitive NEM cycle structure. (A) A nontransitive cycle NEM and its corresponding effect matrix. (B) The reversed cycle and its corresponding effect matrix. (C) The action-set graph that corresponds to both graphs.

Note that while the set of ancestries characterizes the likelihood equivalence class, the posterior maximized by a Bayesian method such as MC-EMiNEM would be, for example, higher for CSNEMs with fewer edges in $Γ$ under a sparsifying edge prior.

4. Results

We have introduced the CSNEM model and showed how the CSNEM likelihood can be viewed as the likelihood of an NEM with $k n_{A}$ actions learned from a modified differential expression log-likelihood ratio matrix $R Ψ$ . Below, we use this transformation in conjunction with an existing NEM learning approach, MC-EMiNEM, to learn CSNEMs and evaluate the ability of this approach to recover a CSNEM from data that are generated by a known multiple-context model in simulation. Finally, we present the results of learning a CSNEM from the results of knockout experiments on S. cerevisiae cells under NaCl stress and discuss the biological significance of some patterns of context specificity that are identified in the CSNEM.

4.1. Evaluation on simulated data

We performed simulations to evaluate our ability to infer CSNEMs from data. We generated data from mixtures of NEMs of varying size: we varied the size of the NEMs in the mixture to contain n_A = 3,5,10, or 20 actions, and we varied the number of NEMs in the generating model from $j = 1$ to $j = 5$ , inclusive, with $j = 1$ being equivalent to a simple NEM model. The number of effects $n_{ε}$ was fixed at 1000. We generated 30 mixtures corresponding to each configuration of j and $n_{A}$ , resulting in a total of 600 generated models. To generate each mixture, first, we generated j random directed graphs $G_{1}, \dots, G_{j}$ of $n_{A}$ nodes, by drawing each of the possible $n_{A}^{2}$ edges of the graph with a probability of $0.2$ for graphs of size $n_{A} < 20$ and a probability of 0.04 for graphs of size $n_{A} = 20$ (with the higher edge density of 0.2 for 20 nodes, all nodes become reachable from all other nodes, yielding degenerate effect patterns where each effect is either affected by all actions, or by none).

Next, for each effect, with probability 0.3, we attach it nowhere, otherwise, we uniformly randomly attach it to one of the $n_{A} \times j$ nodes in all these graphs. Given these graphs and effect attachments, we infer which effects are reachable from each node and compute the $n_{A} \times n_{ε}$ binary effect matrix F^T, where $F_{a s}^{T} = 1$ if and only if effect s is reachable from action a in any one of the j graphs. Next, we generate a log-odds matrix that represents a noisy measurement of this effect matrix by drawing from $log \frac{B e t a (β, 1)}{B e t a (1, β)}$ for each “true” cell and from $log \frac{B e t a (1, β)}{B e t a (β, 1)}$ for each “false” cell, with $β = 10$ . This process generates the log-odds matrix R that we use as input to our learning method. Additionally, to examine the effect of noise in the measurement of effects on model inference, we generated log-odds matrices using $β = 1, 2, 5$ from the first 10 generating mixtures with $n_{A} = 20, j = 1, 3, 5$ .

Since in real-world applications, we usually do not know how many contexts are truly needed to describe a process under study, we sweep through values of k ranging from 1 to 8 and learn a k-CSNEM for each value of k from each generated log-odds matrix. CSNEMs were learned using the MC-EMiNEM implementation in the nem R package, with the learned network taken from the end of a 20,000 sample chain, the empirical Bayes step performed every 5000 steps, an acceptance sparsity prior of 0.5, and $k n_{A}$ edges changed in every MCMC step [see Niederberger et al. (2012) for details on how these settings are used in MC-EMiNEM]. The edge-wise prior for permissible edges was set to 0.2.

We evaluate each k-CSNEM learned from each log-odds matrix both in terms of the ability of the CSNEM to accurately model which effects are differentially expressed in response to each action and in terms of the relationships inferred among actions. In the former case, we use the F-measure to quantify how well the effect matrix F of the learned CSNEM matches that of the generating CSNEM, with the interpretation that if an effect responds to an action in both the learned and the generating model, it is a true positive, if it does not respond in the learned model but does in the generating model, it is a false negative, if it does not respond in either model, it is a true negative, and if it responds in the learned model but not the generating model, it is a false positive. Figure 5A shows the F-measures for learning the effect matrix across our simulations for the almost-noiseless case of $β = 10$ . Figure 5C shows the F-measures for learning the effect matrix of a 20-action network from log-odds matrices generated with varying settings of $β$ .

FIG. 5.

Box plots of simulation F-measures. Each plot represents an aggregate of results from 30 random simulation replicates. Grid rows correspond to the number of contexts in the generating model, the x-axis in each of the grid cells indicates the number of contexts in the learned model, and the y-axis represents: (A) the F-measure of recovering the generating model's effect matrix from the learned model across different sizes of action sets (grid columns) from log-odds matrices generated with $β = 10$ , (B) the F-measure of recovering ancestry relationships, (C) the F-measure of learning the effect matrix of a 20-action network from log-odds matrices generated with varying settings of β (grid columns), and (D) the F-measure of learning the effect matrix from 10-action networks of varying density (grid columns) with log-odds generated using $β = 10$ .

To compare the learned graph structures with the generating graph structures, we must first determine which contexts in the learned model correspond to which contexts in the generating model. For each action a in each model, we obtain a list of contexts that are distinguishable in terms of which actions are ancestors of the action a. We then match each of these contexts in each model to their best match in the other model. Each ancestor that the two contexts in the best match have in common counts as a true positive, each ancestor that appears in the context from the learned model but not in the context from the generating model counts as a false positive, and each ancestor that appears in the context from the generating model but not in the context from the true model counts as a false negative. We use these counts to summarize agreement between the structures of two CSNEMs in terms of an F-measure, which we call the pairwise ancestry F-measure. Figure 5B shows the pairwise ancestry F-measures across our simulations.

When the learned model is a plain NEM (k = 1), we see that as the generating model has more contexts, the recovery of both the effect and the ancestry pattern worsens (with the exception of the 10 actions case, examined below). This confirms that a CSNEM is necessary when multiple contexts are indeed in play in the generating system. When the learned model has multiple contexts, even when the number of contexts in the learned model exceeds the number of contexts in the generating model, the approach does not seem to be susceptible to overfitting. This pattern holds as we increase noise (decrease β) in data generation.

At $n_{A} = 10$ , the NEM appears to recover the effects patterns well even when there are multiple contexts in the generating models, and we hypothesize that this is because of high connectivity in those ground-truth networks: the average in-degree and out-degree of node is the product of one less than the number of actions times the edge density. We generated 20 mixtures for varying node densities (0.04, 0.1, 0.2, 0.5) with j = 1, 3, 5 contexts and $n_{A} = 10$ nodes and examined the effect–matrix F-measures across densities (Fig. 5D). Denser networks are perfectly recovered by single-context NEMs; this is likely because denser networks are more likely to lead to fully connected transitive reductions, reducing the number of unique response patterns of effects, yielding data that are easier to capture in a simple NEM model. When the generating models are not too dense, CSNEMs are better than NEMs at recovering the effect patterns generated from multiple contexts.

4.2. Application to NaCl stress response in S. cerevisiae

We apply our method to the exploration of NaCl stress response pathways in S. cerevisiae. We consider data obtained from a WT strain and 28 knockout strains. Transcript abundances were measured by microarray for each strain before NaCl treatment and 30 minutes after 0.7 M NaCl treatment. The data collection was described in detail in previous work (Berry and Gasch, 2008; Lee et al., 2011).

We are interested in how the gene knockouts change the cells' response to stress. Therefore, the actions $A$ in our model correspond to the knockouts. Since we use microarray data, the observations $ℰ$ correspond to transcripts. The change in response is quantified as a change in log-fold-change. For each strain, we have the log-fold-change of transcript abundances in the sample 30 minutes after NaCl treatment compared with the abundances in the sample before treatment. We then consider the difference between the log-fold-change in each knockout strain and that in the WT strain. To obtain the log-odds matrix R, we use an empirical Bayes method to obtain log-posterior-odds of differential expression (Lönnstedt and Speed, 2002; Smyth, 2004), which is implemented in the limma R package (Smyth, 2005). Figure 6 shows the 3-CSNEM that was learned from the data, and Figure 7 shows the NEM learned from the data. The MC-EMiNEM settings used for learning both these models are the same as those used for learning in the simulation experiments.

FIG. 6.

The 3-CSNEM network learned from Saccharomyces cerevisiae NaCl stress knockout microarray data. Action nodes and action-action edges are colored according to the NEM member in the mixture from which they came, in cyan, magenta, or yellow. Nodes that were merged because of identical ancestors in multiple mixture members are colored according to subtractive color mixing (cyan and magenta make blue, cyan and yellow make green, magenta and yellow make red, and all three make black). Effects are colored and grouped according to the actions to which they are attached. Where the number of effects in a group is less than 10, the effects are listed. Where it is 10 or more, the number of effects in the group is shown. Action-action edges are solid and action-effect edges are dashed.

FIG. 7.

The NEM network learned from S. cerevisiae NaCl stress knockout microarray data. Action nodes are in black and effect nodes are in blue. Effects are grouped according to the actions to which they are attached. Where the number of effects in a group is less than 10, the effects are listed. Where it is 10 or more, the number of effects in the group is shown.

The inferred network captures many known and several new features of the yeast stress responsive signaling network. The Hog1 kinase is a master regulator of the osmotic stress response (Nadal and Posas, 2015). The CSNEM network correctly places Hog1 at the top of the hierarchy in paths with known coregulators. For example, the network captures paths containing Hog1 and CK2 complex subunits Cka2 and Ckb1/2; Hog1 is known to interact physically with Cka2, and the two kinases regulate an overlapping set of genes (Chasman et al., 2014). The network also correctly predicts that the transcription factor Msn2 is regulated by Hog1, Pde2, and Snf1—all known regulators of Msn2 ( Rep et al., 2000; Mayorodomo et al., 2002; Lee et al., 2008; Petrenko et al., 2013); yet a separate branch represents only Pde2 and Msn2, consistent with Pde2 playing a more significant role in regulating this transcription factor during salt stress (Chasman et al., 2014). Another example is seen in YGR122W, a poorly characterized protein required for processing the transcriptional repressor Rim101—the CSNEM correctly puts YGR122W and Rim101 in the same paths, with at least one regulatory branch shared with Hog1 control.

The CSNEM naturally produces groups of effects where each group comprises those effects (i.e., transcripts) that are reachable from contexts of actions in the graph. We examined the groups of effects in terms of Gene Ontology (GO) enrichments. Figure 8 shows a comparison of these enrichments to those obtained from grouping effects by the attachments from a learned NEM. The figure also shows a coarser split of the effects into groups based on CSNEM contexts: if an action was merged from two or more contexts in the single-network CSNEM representation, all the effects attached to it are considered reachable from both (or all three) contexts from which the action was merged. Each column in the figure corresponds to a GO term and each row corresponds to a combination of contexts or an action. A point in the figure indicates that the set of effects reachable from the context(s) or action was found to be significantly enriched for the GO term. Significance was defined according to a hypergeometric test with the Benjamini–Hochberg method used to control the false discovery rate at 0.05; only groups of five or more effects were considered for enrichment analysis.

FIG. 8.

Comparison of effect group GO term enrichments. Columns correspond to GO terms and rows correspond to actions in the NEM and CSNEM, and to possible combinations of contexts of the 3-CSNEM. A point indicates that a GO term was found to be significantly enriched. Points are colored by knockout in the NEM and CSNEM plots and by context in the context-membership plot. GO, Gene Ontology.

A key advantage of our approach is that regulators can be represented in multiple pathways, capturing regulators that may have distinct roles in different cellular compartments or cell cycle phases. In fact, several of the GO terms for which the CSNEM effect groups are enriched are associated with subcellular localization and include transcripts encoding proteins localized to the nucleus, nucleolus, plasma membrane, endoplasmic reticulum, mitochondria, peroxisome, and cytoskeleton. The coarser split of effects by contexts also shows that there are clear divisions of localization across contexts in the CSNEM.

An interesting example of the benefits of the CSNEM approach is seen in its ability to capture the disparate signaling roles of the phosphatase Cdc14, a key regulator of mitotic progression in dividing cells (Weiss, 2012). Inactive Cdc14 is tethered to the nucleolus during much of the cell cycle but released upon mitosis to other subcellular regions where it dephosphorylates cyclins and other targets (Wurzenberger and Gerlich, 2011). Separate from its role in the cell cycle, Cdc14 was recently linked to the stress response in yeast (Breitkreutz et al., 2010; Chasman et al., 2014), although its precise role is not clear.

The CSNEM network places Cdc14 in multiple pathways that capture the distinct functions of the phosphatase. One path represents an isolated connection of Cdc14 to a group of genes regulated by the cell cycle network. Many of these genes are known to be regulated by Cdc14 during normal cell cycle progression. But consistent with a second role in the stress response, Cdc14 is also nested in a path regulated by Snf1, a kinase that responds to both nutrient/energy restriction and osmotic stress resulting from salt treatment (Ye et al., 2008). The Snf1-Cdc14 pathway is connected to 31 effectors that include genes induced by stress and related to glucose metabolism. Work from the Gasch laboratory previously showed through genetic analysis that Snf1 and Cdc14 function, at least in part, in the same pathway during the response to salt stress (Chasman et al., 2014). Yet both Cdc14 and Snf1 have other functions in the cell, leading to the regulation of only partially overlapping gene sets. Thus, the CSNEM approach successfully captured this complex regulatory distinction for Cdc14 and Snf1.

5. Discussion

We have introduced CSNEMs, a generalization of NEMs, which can explicitly model the different interactions that genes may have in different contexts. We have shown that how a CSNEM can be viewed as a mixture of NEMs and that the task of learning such a mixture can be cast as a single NEM-learning task with a modified data matrix and constrained action graph structure in which actions are replicated k times. Particularly, we took the approach of using a hard mixture where effects and actions are assigned to different contexts. A natural avenue for future investigation would be the exploration of soft-mixture approaches, which may prove more scalable for larger numbers of contexts and actions.

Applying our method to simulated data has shown that learning CSNEMs leads to good recovery of the effect patterns and ancestry relations that were present in the generating model. The results also show that a CSNEM is necessary when the generating model truly has multiple contexts, but slight overestimation or underestimation of the number of contexts does not seem to lead to overfitting. In practice, the correct number of contexts that a learned model should have is not known, and optimal selection of k is still an open problem that we plan to explore in future work. Existing approaches to model selection, such as a search for a plateau in likelihood, or the use of model complexity measures such as AIC point to possible solutions for this problem.

Our analysis of a CSNEM network learned from S. cerevisiae NaCl stress knockout microarray data revealed that the CSNEM does recover known regulatory patterns and, moreover, captures known patterns of context specificity in the genes under study. Analysis of GO term enrichments of the effects reachable from CSNEM nodes shows that many effect groups are associated with subcellular localization, a pattern even more evident in examining a coarser division of the effects, based on mixture contexts. We believe that localization may be one source of context specificity that is relevant in many applications. The main motivation for developing CSNEMS was the observation that effect nesting may not be an appropriate assumption for some settings because of the context-specific nature of interactions that some genes can have, and perhaps, more explicit modeling of contexts of interaction can lead to more faithful representations of the underlying biology.

6. Proofs

In this section, we derive Equation (5).

Recall that given k contexts, $Γ^{1}, \dots Γ^{k}$ are $| A | \times | A |$ accessibility matrices, and recall that $θ$ represents the attachment of effect $e \in ℰ$ to an action $a \in A$ in mixture member $i \in {1, \dots, k}$ by $θ_{e} = (i - 1) | A | + a$ , with $θ_{e} = 0$ for effects that are not attached to any action. We can then define the partition of $ℰ$ into k sets $ℰ_{1}, \dots, ℰ_{k}$ as $ℰ_{i} = {e \in ℰ | \exists a \in A : θ_{e} = (i - 1) | A | + a} f o r i \in {1, \dots, k} .$ (6)

Let us define a mapping of effect indices $ζ : {1, \dots, k} \times {1, \dots, | ℰ_{i} |} \to ℰ$ such that $ζ (i, j) = e$ when effect e is the jth member of partition $ℰ_{i}$ , and let us introduce two inverses to the index mapping $ζ : [κ (e) = i] \equiv [e \in ℰ_{i}]$ and $[η_{i} (e) = j] \equiv [ζ (i, j) = e]$ .

Given this partition of the effect space, the likelihood of a CSNEM is defined as the product of the NEM likelihoods per partition: $L (Γ^{1, \dots, k}, θ) = \prod_{i = 1}^{k} L (Γ^{i}, Θ^{i}),$ (7)

where $Θ^{i}$ is a matrix in ${0, 1}^{| A | \times | ℰ_{i} |}$ and $Θ_{a j}^{i} = 1$ iff $θ_{ζ (i, j)} = (i - 1) + a$ , and 0 otherwise.

Let Rⁱ be a matrix in $ℛ^{| ℰ_{i} | \times | A |}$ , where $R_{j a}^{i} = R_{ζ (i, j), a}$ (i.e., Rⁱ is a selection of effects from R based on the partition $ℰ_{i}$ ). Then, we can apply Equation (2) to express the log-likelihood of each NEM component as $log L (Γ^{i}, Θ^{i}) = \sum_{a \in A} \sum_{j = 1}^{| ε_{i} |} {(Γ^{i} Θ^{i})}_{a j} R_{j a}^{i} + log L (N) .$ (8)

With $Γ, Θ$ , and $Ψ$ defined as in Section 3.1, the log-likelihood of the CSNEM can be written as $\log L (Γ^{1, \dots, k}, θ) =^{By (7), (8)} \sum_{i = 1}^{k} \sum_{a \in A} \sum_{j = 1}^{| ℰ_{i} |} {(Γ^{i} Θ^{i})}_{a j} R_{j a}^{i} + \log L (N),$ (9)

= \sum_{a \in A} \sum_{e \in E} (Γ^{κ (e)} Θ^{κ (e)})_{a} η_{κ (e)} (e) R_{e a} + log L (N),

(11)

= \sum_{a \in A} \sum_{b \in A} \sum_{e \in E} Γ_{a b}^{κ (e)} Θ_{b η_{κ (e)} (e)}^{κ (e)} R_{e a} + log L (N),

(12)

= \sum_{e \in E} \sum_{a \in A} \sum_{b = | A | (κ (e) - 1) + 1}^{| A | κ (e)} Γ_{[(κ (e) - 1) | A | + a], b} Θ_{b e} R_{e a} + log L (N),

(13)

= \sum_{e \in E} \sum_{a \in A} \sum_{b = 1}^{| A | k} Γ_{[(κ (e) - 1) | A | + a], b} Θ_{b e} R_{e a} + log L (N),

(14)

= \sum_{e \in E} \sum_{a \in A} (Ψ Γ Θ)_{a e} R_{e a} + log L (N),

(17)

= \sum_{a \in A} (Ψ Γ Θ R)_{a a} + log L (N),

(18)

= t r (Ψ Γ Θ R) + log L (N),

(19)

= t r (Γ Θ (R Ψ)) + log L (N) .

(20)

Where we get Equation (14) because $Θ_{b e}$ is necessarily 0 for $b < (κ (e) - 1) | A | + 1$ or $b > κ (e) | A |$ . Similarly, we get Equation (16) because ${(Γ Θ)}_{x e}$ is necessarily 0 for $x < (κ (e) - 1) | A | + 1$ or $x > κ (e) | A |$ because of the block diagonal nature of $Γ$ and because $Θ$ can have at most one nonzero in each column.

Footnotes

Acknowledgment

We thank anonymous reviewers for many constructive comments.

Author Disclosure Statement

The authors declare they have no competing financial interests.

Funding Information

This research was supported by NIH/NLM grant T15 LM0007359, NIH/NIAID grant U54 AI117954, and NIH/NIGMS grant R01 GM083989.

References

Berry

D.B.

, and Gasch

A.P.

2008. Stress-activated genomic expression changes serve a preparative role for impending stress in yeast. Mol. Biol. Cell, 19, 4580–4587.

Breitkreutz

, Choi

, Sharom

J.R.

, et al. 2010. A global protein kinase and phosphatase interaction network in yeast. Science, 328, 1043–1046.

Chasman

, Ho

Y.-H.

, Berry

D.B.

, et al. 2014. Pathway connectivity and signaling coordination in the yeast stress-activated signaling network. Mol. Syst. Biol. 10, 759.

Friedman

, Linial

, Nachman

, et al. 2000. Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620.

Froehlich

, Markowetz

, Tresch

, et al. 2019. nem: (Dynamic) Nested Effects Models and Deterministic Effects Propagation Networks to Reconstruct Phenotypic Hierarchies. R Package Version 2.60.0.

Fröhlich

, Beißbarth

, Tresch

, et al. 2008. Analyzing gene perturbation screens with nested effects models in R and bioconductor. Bioinformatics, 24, 2549–2550.

Fröhlich

, Fellmann

, Sültmann

, et al. 2007. Large scale statistical inference of signaling pathways from RNAi and microarray data. BMC Bioinformatics, 8, 1.

Irrthum

, Wehenkel

, Geurts

, et al. 2010. Inferring regulatory networks from expression data using tree-based methods. PLoS One, 5, e12776.

Lee

M.V.

, Topper

S.E.

, Hubler

S.L.

, et al. 2011. A dynamic model of proteome changes reveals new roles for transcript alteration in yeast. Mol. Syst. Biol. 7, 514.

10.

Lee

, Cho

B.-R.

, Joo

H.-S.

, et al. 2008. Yeast Yak1 kinase, a bridge between PKA and stress-responsive transcription factors, Hsf1 and Msn2/Msn4. Mol. Microbiol. 70, 882–895.

11.

Lönnstedt

, and Speed

2002. Replicated microarray data. Stat. Sinica. 31–46.

12.

Markowetz

, Bloch

, and Spang

. 2005. Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics, 21, 4026–4032.

13.

Markowetz

, Kostka

, Troyanskaya

O.G.

, et al. 2007. Nested effects models for high-dimensional phenotyping screens. Bioinformatics, 23, i305–i312.

14.

Markowetz

, and Spang

2007. Inferring cellular networks–a review. BMC Bioinformatics, 8, S5.

15.

Mayordomo

, Estruch

, and Sanz

2002. Convergence of the target of rapamycin and the Snf1 protein kinase pathways in the regulation of the subcellular localization of Msn2, a transcriptional activator of STRE (Stress Response Element)-regulated genes. J. Biol. Chem. 277, 35650–35656.

16.

Nadal

, and Posas

2015. Osmostress-induced gene expression—A model to understand how stress-activated protein kinases (SAPKs) regulate transcription. FEBS J. 282, 3275–3285.

17.

Niederberger

, Etzold

, Lidschreiber

, et al. 2012. MC EMiNEM maps the interaction landscape of the Mediator. PLoS Comput. Biol. 8, e1002568.

18.

Ohya

, Sese

, Yukawa

, et al. 2005. High-dimensional and large-scale phenotyping of yeast mutants. Proc. Natl. Acad. Sci. U.S.A. 102, 19015–19020.

19.

Petrenko

, Chereji

R.V.

, McClean

M.N.

, et al. 2013. Noise and interlocking signaling pathways promote distinct transcription factor dynamics in response to different stresses. Mol. Biol. Cell, 24, 2045–2057.

20.

Piano

, Schetter

A.J.

, Morton

D.G.

, et al. 2002. Gene clustering based on RNAi phenotypes of ovary-enriched genes in C. elegans. Curr. Biol. 12, 1959–1964.

21.

Rep

, Krantz

, Thevelein

J.M.

, et al. 2000. The transcriptional response of Saccharomyces cerevisiae to osmotic shock Hot1p and Msn2p/Msn4p are required for the induction of subsets of high osmolarity glycerol pathway-dependent genes. J. Biol. Chem. 275, 8290–8300.

22.

Sadeh

M.J.

, Moffa

, and Spang

2013. Considering unknown unknowns: Reconstruction of nonconfoundable causal relations in biological networks. J. Comput. Biol. 20, 920–932.

23.

Siebourg-Polster

, Mudrak

, Emmenlauer

, et al. 2015. NEMix: Single-cell nested effects models for probabilistic pathway stimulation. PLoS Comput. Biol. 11, e1004078.

24.

Smyth

G.K.

2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. B. 3, 3.

25.

Smyth

G.K.

2005. Limma: Linear models for microarray data, 397–420. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York.

26.

Tresch

, and Markowetz

2008. Structure learning in nested effects models. Stat. Appl. Genet. Mol. B. 7.

27.

Vaske

C.J.

, House

, Luu

, et al. 2009. A factor graph nested effects model to identify networks from genetic perturbations. PLoS Comput. Biol. 5, e1000274.

28.

Weiss

E.L.

2012. Mitotic exit and separation of mother and daughter cells. Genetics, 192, 1165–1202.

29.

Wurzenberger

, and Gerlich

D.W.

2011. Phosphatases: Providing safe passage through mitotic exit. Nat. Rev. Mol. Cell Bio. 12, 469–482.

30.

, Elbing

, and Hohmann

2008. The pathway by which the yeast protein kinase Snf1p controls acquisition of sodium tolerance is different from that mediating glucose regulation. Microbiology, 154, 2814–2826.