Pathway Realizability in Chemical Networks

Abstract

The exploration of pathways and alternative pathways that have a specific function is of interest in numerous chemical contexts. A framework for specifying and searching for pathways has previously been developed, but a focus on which of the many pathway solutions are realizable, or can be made realizable, is missing. Realizable here means that there actually exists some sequencing of the reactions of the pathway that will execute the pathway. We present a method for analyzing the realizability of pathways based on the reachability question in Petri nets. For realizable pathways, our method also provides a certificate encoding an order of the reactions, which realizes the pathway. We present two extended notions of realizability of pathways, one of which is related to the concept of network catalysts. We exemplify our findings on the pentose phosphate pathway. Furthermore, we discuss the relevance of our concepts for elucidating the choices often implicitly made when depicting pathways. Lastly, we lay the foundation for the mathematical theory of realizability.

1. INTRODUCTION

Large Chemical Reaction Networks (CRNs) are fundamental to numerous scientific, industrial, and societal challenges. Applications include the analysis of metabolic networks and their regulation in health and biotechnology; optimization of chemical synthesis processes; modeling of molecular ion fragmentation in mass spectrometry; investigation of hypotheses concerning the origins of life; and environmental monitoring of pollutants. Subnetworks with specific properties, often referred to as pathways—such as synthetic routes to target molecules or metabolic subsystems—are of particular interest. Thus, the ability to define and identify pathways within a CRN is a central objective in chemical modeling, exploration, and design.

CRNs can be modeled as directed hypergraphs (Zeigarnik, 2000; Müller et al., 2022; Andersen et al., 2019, 2020), where vertices represent molecules and directed hyperedges represent reactions. By considering pathways in CRNs as sets of reactions with integer multiplicities, Andersen et al. (2019) formally defined pathways as integer hyperflows in hypergraphs. The integer hyperflow model for pathways is analogous to flux balance analysis (FBA), another method for pathway discovery. Both approaches enforce mass conservation and typically employ linear constraints to identify pathways. However, they differ in several respects; see Andersen et al. (2019) for a detailed discussion. Notably, FBA yields flux distributions, whereas integer hyperflows provide pathways as sets of reactions with integer stoichiometric coefficients, facilitating a mechanistic understanding of the pathway. Additionally, Andersen et al. (2019) introduced the concept of a chemical transformation motif in a CRN, offering a flexible framework for querying reaction networks for pathways. A chemical transformation motif specifies a pathway by prescribing the input and output compounds, allowing intermediate products that must be consumed entirely. Computationally, finding and enumerating pathways that fulfill a chemical transformation motif can be addressed via Integer Linear Programming (ILP) (Andersen et al., 2019). Although ILP is NP-hard in general and even in the restricted context of finding integer hyperflows in CRNs (Andersen et al., 2012), current ILP solvers perform well for many practically relevant networks and pathways (Andersen et al., 2019).

While integer hyperflows specify reactions and their multiplicities, they do not determine the sequence in which these reactions occur to achieve the overall chemical transformation. Indeed, a valid sequencing may not exist. Figure 6 illustrates such a scenario: No ordering of reactions e₁ and e₂ in the integer hyperflow renders it executable—essentially, molecules C or D must be present prior to their production. We introduce the term realizable for integer hyperflows where the corresponding chemical transformation can be executed by some sequence of constituent reactions. To address this, we develop a framework that converts integer hyperflows into corresponding Petri nets, enabling the application of Petri net methodologies to express and determine the realizability of integer hyperflows. Petri nets have been extensively employed to model various aspects of metabolic networks (Baldan et al., 2010).

For realizable integer hyperflows, we introduce the concept of a realizability certificate, which specifies an execution order for the reactions along the pathway. Determining an explicit sequence not only enhances mechanistic understanding but is also essential for studies where individual atom identities are important, such as computing atom traces (Andersen et al., 2014). We also explore methods to extend non-realizable integer hyperflows into realizable ones. One approach involves scaling the integer hyperflow, while another entails borrowing additional molecules that are subsequently returned. This latter method is closely related to the concept of a “network catalyst” (see, e.g., Braakman and Smith, 2013; Morowitz et al., 2008). An algorithmic approach to deciding realizability through borrowing thus serves as a crucial foundation for future computational treatments of higher-level chemical motifs like autocatalysis and hypercycles (Eigen, 1971; Eigen and Schuster, 1977; Szathmáry, 1988, 2013). Finally, we apply our methodology to the non-oxidative phase of the pentose phosphate pathway (PPP) to demonstrate its utility and to explore potential catalysts within the network. The PPP is a well-known example that underscores the importance of simplicity in solution finding (Noor et al., 2010; Meléndez-Hevia and Isidoro, 1985).

The primary focus of our article is the formal definition and exploration of the realizability of pathways. It is noteworthy that conventional representations of pathways in the life sciences literature often reside between the two extremes of integer hyperflows and realizability certificates. We believe that our formalization of these concepts can raise awareness of the implicit choices made when depicting pathways. This perspective is further elaborated in Section 5.

The remainder of this article is organized as follows. Section 2 presents the notation and definitions for directed hypergraphs, integer hyperflows, and Petri nets, with terminology following (Andersen et al., 2019). Section 3 defines the realizability problem, outlines our method for converting integer hyperflows into Petri nets, and introduces realizability certificates. In Section 4, we investigate methods for rendering non-realizable integer hyperflows realizable, either by scaling the hyperflow or by borrowing molecules. Section 5 discusses the implications of integer hyperflows and realizability certificates in pathway depiction. Section 6 examines the mathematical properties of pathway realizability.

2. PRELIMINARIES

2.1. Chemical reaction networks and pathways

In this article, we follow Andersen et al. (2019) and model CRNs as directed hypergraphs. A directed hypergraph $H$ = (V, E) has a set V of vertices representing the molecules. Reactions are represented as directed hyperedges E, where each edge e = (e⁺, e⁻) is an ordered pair of multisets of vertices, that is, e⁺, e⁻⊆ V.¹ We call e⁺ the tail of the edge e, and e⁻ the head. In the interest of conciseness, we will refer to directed hypergraphs simply as hypergraphs, directed hyperedges simply as edges, and CRNs as networks. For a multiset Q and an element q, we use m_q(Q) to denote its multiplicity, that is, the number of occurrences of q in Q. When denoting multisets, we use the notation {{…}}, for example, Q ={{a, a, b}} is a multiset with m_a(Q) = 2 and m_b(Q) = 1. For a vertex v ∈ V and a set of edges A, we use $δ_{A}^{+} (v)$ and $δ_{A}^{-} (v)$ to denote, respectively, the set of out-edges and in-edges of v contained in A, that is, the edges in A that have v in their tail and v in their head, respectively. We note that hypergraph modeling is equivalent to the more common modeling via a bipartite species-reactions graph (Fagerberg et al., 2013). Figure 1 shows a directed hypergraph in (a) and its equivalent bipartite graph in (b). The hypergraph modeling can be said to provide a slightly stronger distinction between molecules and reactions, and it forms the basis of the definition of hyperflows in Andersen et al. (2019), on which we build.

FIG. 1.

A directed hypergraph in (a) and the corresponding bipartite graph in (b).

To model pathways (Andersen et al., 2019) defines the extended hypergraph. Given a hypergraph $H$ = (V, E) the extended hypergraph is $\bar{H} = (V, \bar{E})$ with $\bar{E} = E \cup E^{-} \cup E^{+}$ , where $E^{-} = {e_{v}^{-} = (\emptyset, {{v}}) ∣ v \in V} E^{+} = {e_{v}^{+} = ({{v}}, \emptyset) ∣ v \in V}$ (1)

The hypergraph $\bar{H}$ has additional “half-edges” $e_{v}^{-}$ and $e_{v}^{+}$ , for each v ∈ V. These explicitly represent potential input and output channels to and from $H$ , that is, what is called exchange reactions in metabolic networks. An example of an extended hypergraph is shown in Figure 2.

FIG. 2.

Example of an extended hypergraph. It has vertices {A, B, C, D}, edges {e₁, e₂, e₃, e₄}, and a half-edge to and from each vertex. An edge e is represented by a box with arrows to (from) each element in e⁻ (e⁺).

An integer hyperflow is an integer-valued function f on the extended network, $f : \bar{E} \to ℕ_{0}$ , which satisfies the following flow conservation constraint on each vertex v ∈ V: $\sum_{e \in δ_{\bar{E}}^{+} (v)} m_{v} (e^{+}) f (e) - \sum_{e \in δ_{\bar{E}}^{-} (v)} m_{v} (e^{-}) f (e) = 0$ (2)

Note in particular that $f (e_{v}^{-})$ is the input flow for vertex v and $f (e_{v}^{+})$ is its output flow. We will for the remainder of the article refer to integer hyperflows simply as flows. An example of a flow is shown in Figure 3.

FIG. 3.

Example flow f on the extended hypergraph from Figure 2. Vertex D has been omitted as it has no in- or out-flow. Edges leaving or entering D have also been omitted as they have no flow. The flow on an edge is represented by an integer. For example, the half edge into B has flow f( $e_{B}^{-}$ ) = 2, the half edge leaving B has flow f( $e_{B}^{+}$ ) = 1, and edge e₁ has flow f(e₁) = 2.

2.2. Petri nets

Petri nets are an alternative method to analyze CRNs. Each molecular species in the network forms a place in the Petri net and each reaction corresponds to a transition (Koch, 2010; Reddy et al., 1993, 1996). The stoichiometric matrix commonly used in chemistry has an equivalent in Petri net terminology, called the incidence matrix (Koch, 2010). In Section 3, we will describe a transformation of a flow to a Petri net. The following notation for Petri nets (with the exception of arc weights) follows Esparza (1998).

A net is a triple (P, T, W) with a set of places P, a set of transitions T, and an arc weight function W : (P × T ) ∪ (T × P) →ℕ₀. A marking on a net is a function M : P → ℕ₀ assigning a number of tokens to each place. With M_∅ we denote the empty marking, that is, M_∅( p) = 0, ∀p ∈ P. A Petri net is a pair (N, M₀) of a net N and an initial marking M₀. For all x ∈ P ∪ T, we define the pre-set as ^•x = {y ∈ P ∪ T ∣ W (y, x) > 0} and the post-set as x^• = {y ∈ P ∪ T ∣ W (x, y) > 0}. We say that a transition t is enabled by the marking M if W (p, t) ≤ M ( p ), ∀p ∈ P. When a transition t is enabled it can fire, resulting in a marking $M'$ where $M'$ (p) = M(p) − W(p, t) + W (t, p), ∀p ∈ P. Such a firing is denoted by $M \overset{t}{\to} M'$ . A firing sequence σ is a sequence of firing transitions σ = t₁t₂…t_n. Such a firing sequence gives rise to a sequence of markings $M_{0} \overset{t_{1}}{\to} M_{1} \overset{t_{2}}{\to} M_{2} \overset{t_{3}}{\to} \dots \overset{t_{n}}{\to} M_{n}$ which is denoted by $M_{0} \overset{σ}{\to} M_{n}$ In Figure 4, we present an example of a firing sequence which in this instance is the sequence σ = t₁ t₂ t₃.

FIG. 4.

Example firing sequence. Here P = {p₁, p₂, p₃, p₄, p₅}, T = {t₁, t₂, t₃}, W = {(p₁, t₁) ↦1, (p₂, t₁) ↦ 1, (t₁, p₃) ↦1, (p₃, t₂) ↦ 1, (t₂, p₄) ↦ 1, (p₄, t₃) ↦ 1, (t₃, p₅) ↦ 1, (t₃, p₁) ↦ 1}, and the initial marking M₀ = {p₁↦1, p₂↦1, p₃↦0, p₄↦0, p₅↦0} which is depicted in (a). The firing sequence that leads to (d) is σ = t₁ t₂ t₃, which is illustrated through (a) to (d).

3. REALIZABILITY OF FLOWS

Andersen et al. (2019) described a method (summarized in Section 2.1) to specify pathways in CRNs and then proceeded to use ILP to enumerate pathway solutions fulfilling the specification. In this article, we focus on assessing the realizability of such a pathway solution and on determining a specific order of reactions that proves its realizability. To this end, we map flows into Petri nets and rephrase the question of realizability as a particular reachability question in the resulting Petri net.

3.1. Flows as Petri nets

We convert a hypergraph $H$ = (V, E) to a net N = (P, T, W) by using the vertices V as the places P and the edges E as the transitions T, and by defining the weight function from the incidence information as follows: for each vertex/place v ∈ V and edge/transition $e = (e^{+}, e^{-}) \in E$ let $W (v, e) = m_{v} (e^{+})$ and W(e, v) = m_v(e⁻). This conversion also works for extended hypergraphs, where the half-edges result in transitions with either an empty pre-set or post-set. The transitions corresponding to input reactions are thus always enabled.

Given a flow, we would like to constrain the Petri net such that it yields only firing sequences for that particular flow. We therefore further convert the extended hypergraph $\bar{H}$ into an extended net $(V \cup V_{E} \cup V_{T}, \bar{E}, W \cup W_{E})$ by adding for each edge $e \in \bar{E}$ an “external place” $v_{e} \in V_{E}$ with connectivity W (v_e, e) = 1 and for each edge $e^{+} \in E^{+}$ adding a “target place” v_e⁺ ∈ V_T with connectivity $W (e^{+}, v_{e^{+}}) = 1$ . In the following, we will denote the extended Petri net again by N. We then proceed by translating the given flow f of $\bar{H}$ into an initial marking M₀ on the extended net. To this end, we set M₀(v) = 0 for v ∈ V ∪ V_T and M₀ (v_e) = f (e) for places $v_{e} \in V_{E}$ . Additionally, we set the target marking denoted by M_T to M_T (v) = 0 for v ∈ V ∪ V_E and $M_{T} (v_{e^{+}}) = f (e^{+})$ for places $v_{e^{+}} \in V_{T} .$

Transitions in (N, M₀) therefore can fire at most the number of times specified by the flow. Furthermore, any firing sequence $M_{0} \overset{σ}{\to} M_{T}$ ending in the target marking must use each transition exactly the number of times specified by the flow. As an example, the flow in Figure 3 is converted to the Petri net in Figure 5.

FIG. 5.

The flow from Figure 3 converted to a Petri net with its initial marking. Places are circles, transitions are rectangles, and tokens are black dots. Arrows indicate pairs of places and transitions for which the weight function W is non-zero (in this example, all non-zero weights are equal to one). The target marking is M_T(A_T) = 1, M_T(B_T) = 1, M_T(C_T) = 1 and M_T(p) = 0 for all p ∈ P ∖{A_T, B_T, C_T}. We have omitted the part of the net that corresponds to the omitted part of Figure 3.

3.2. Realizability of flows

We are interested in whether a given pathway, represented by a flow f on an extended hypergraph $\bar{H} = (V, \bar{E})$ , is realizable in the following sense: Given the input molecules specified by the input flow, is there a sequence of reactions that respects the flow, which in the end produces the output molecules specified by the output flow? In the light of the construction of (N, M₀) from $(\bar{H}, f)$ , this question translates into a reachability problem on a Petri net.

Definition 3.1. A flow f on $\bar{H}$ is realizable if there is a firing sequence $M_{0} \overset{*}{\to} M_{T}$ on the Petri net (N, M₀) constructed from $(\bar{H}, f)$ .

Figure 6 shows that not all flows f on $\bar{H}$ are realizable. In this example, it is impossible to realize the flow as long as there is no flow entering either C or D. For the flow in Figure 3, on the other hand, such a firing sequence exists. The firing sequences corresponding to a realizable flow are not unique in general. For instance, the Petri net constructed from the flow presented in Figure 5 can reach the target marking M_T in essentially two different manners. Modulo the firing of input/output transitions, those two firing subsequences are e₁e₁e₂ and e₁e₂e₁. For a chemical example of a realizable flow see Figure 7. This is a flow for the formose reaction.

FIG. 6.

Example of a flow which is not realizable. Observe that the flow is indeed viable as it fulfils the flow conservation constraint. Furthermore, notice that there is no input flow to neither C nor D, and therefore in the corresponding Petri net it will not be possible to fire either of e₁ or e₂ which is necessary for it to be realized. However, if C or D was borrowed the related flow with this borrowing would be realizable.

FIG. 7.

An example of a flow for the formose reaction which is realizable. The input compound Formald is marked with green and Glycoald which is both an input and output compound is marked with turquoise.

3.3. Realizability certificate

In order to introduce realizability certificates that describe the causal order of the reactions needed to make the pathway realizable, we need some established terminology.

Definition 3.2 (Occurrence Net [Goltz and Reisig, 1983]). A net K = (P_K, T_K, F_K) with F_K ⊆ (P_K × T_K) ∪ (T_K × P_K) is an occurrence net iff

1. $\forall x, y \in P_{K} \cup T_{K} {x F}_{K}^{+} y \Rightarrow \neg ({yF}_{K}^{+} x)$ ( $F_{K}^{+}$ denoting the transitive closure of $F_{K}$ );

2. $\forall p \in P_{K} |^{•} p | \leq 1 \land | p^{•} | \leq 1$ .

“Occurrence net” is also defined in Genrich and Stankiewicz-Wiechno (1980) and Best and Merceron (1982), but is used with a different meaning in other sources [see, e.g., Nielsen et al. (1981)].

Definition 3.3 (Process [Goltz and Reisig, 1983] (adapted)). Let N = (P_N, T_N, W_N, M₀) be a Petri net and M a reachable marking in N. A process is a pair (K, q) of an occurrence net K = (P_K, T_K, F_K) and a mapping q : K → N which starts in M and satisfies the following properties:

1. q(P_K) ⊆ P_N and q(T_K) ⊆ T_N;

2. If C := {x ∈ P_K ∣ ^•x = ∅} then M(p) = ∣ q⁻¹ (p) ∩ C ∣ for all p ∈ P_N;

3. W_N (p, q(t)) = ∣ q⁻¹(p) ∩ ^•t ∣ and W_N (q(t), p) = ∣ q⁻¹(p) ∩ t^• ∣ for all t ∈ T_K and p ∈ P_N.

A process is thus an occurrence net that maps back to a Petri net, such that it respects the transitions, places and weight function of the Petri net. Furthermore, the process starts at the marking M in the net.

Definition 3.4. A realizability certificate for $(\bar{H}, f)$ is a process for the Petri net (N, M₀) constructed from $(\bar{H}, f)$ that leads from the initial marking M₀ to the target marking M_T.

A realizability certificate exists if and only if the target marking M_T is reachable from the initial marking M₀ (Goltz and Reisig, 1983, Theorem 3.6).

A realizability certificate can be constructed from the initial marking using an algorithm exemplified in (Goltz and Reisig, 1983). Furthermore, the Petri net tool A Low Level Analyzer (LoLA) (Schmidt, 2000) is, given a Petri net with its initial marking and a target marking, able to compute a so-called witness path, which is an object isomorphic to a realizability certificate (or tell if the target marking is unreachable and no realizability certificate exists). The computational complexity of reachability in Petri nets is a complex question in the general case (Mayr, 1981; Reutenauer, 1990). However, in practical cases, LoLA performs well—in particular, in our use cases, it normally finishes in less than 10 minutes. In this article, we used LoLA to produce the underlying certificate for our figures. For an example of a realizability certificate see Figure 8, which is a certificate for the flow in Figure 3. For a more chemical example of a realizability certificate, see Figure 9, which is for the flow in Figure 7. To draw realizability certificates more concisely, we have omitted $q^{- 1} (v)$ for all $v \in V_{E} \cup V_{T}$ , that is, the places in the occurrence net that correspond to the external or target places in the Petri net, as well as $q^{- 1} (v_{e}, e)$ for all $v_{e} \in V_{E}$ and $q^{- 1} (e^{+}, v_{e^{+}})$ for all $v_{e^{+}} \in V_{T}$ , that is, the arcs leaving the external places or entering the target places in the Petri net. We have also omitted transitions on which the corresponding edges have no flow and places corresponding to vertices with no in- nor out-flow.

FIG. 8.

A realizability certificate for the flow in Figure 3.

FIG. 9.

A realizability certificate for the flow in Figure 7. The input compounds are marked with green and the output compounds are marked with blue.

A realizability certificate is a directed acyclic graph (DAG) by Def. 3.2 (1); hence it has a topological sorting (Cormen et al., 2009), that is, a linear ordering of the vertices such that for every edge (u, v), u comes before v in the ordering. Such a topological sorting of a realizability certificate produces one possible firing sequence of its transitions, which realizes the flow.

Finally, we note that a realizability certificate is formulated in the Petri net literature such that it gives an individual token interpretation, where individual tokens are distinguishable (van Glabbeek, 2005). Such a property is an advantage (actually, a necessity) if one is to do atom tracing (Andersen et al., 2014) of stable isotope atoms through the pathway.

4. EXTENDED REALIZABILITY

We have demonstrated above that flows may not be realizable. In this section, we study various means by which non-realizable flows may be made realizable.

Definition 4.1 (Scaled-Realizable). A flow f on an extended hypergraph $\bar{H} = (V, \bar{E})$ is scaled-realizable, if there exists an integer k ≥ 1 such that the resulting flow k ⋅ f is realizable.

Asking if a flow f is scaled-realizable corresponds to asking if k copies of f can be realized concurrently. This is of interest as in the real world, a pathway is often not just happening once, but multiple times. Therefore, even if the flow is not realizable, it is meaningful to consider if the scaled flow is. Figure 10 is an example of such a flow which is not realizable itself but is scaled-realizable by a factor 2. The flow represents an alternative formose reaction. In order to see that this flow is indeed scaled-realizable, see the realizability certificate of the flow in Figure 11.

FIG. 10.

An example of a flow for the formose reaction which is not realizable but is scaled-realizable by a factor 2. The input compound Formald is marked with green and Glycoald which is both an input and output compound is marked with turquoise. The SMILES strings for all molecule identifiers are listed in Appendix and Appendix Table A1.

FIG. 11.

A realizability certificate for the flow in Figure 10 when scaled by a factor 2, making it scaled-realizable. The input compounds are marked with green and the output compounds are marked with blue. The SMILES strings for all molecule identifiers are listed in Appendix and Appendix Table A1.

However, not all flows are scaled-realizable. A counterexample is the flow presented in Figure 6: No integer scaling can alleviate the fact that firing e₁ or e₂ requires C or D to be present at the outset. We note that Thm. 3 from Sec. 6 provides an easily checkable condition which, if true, implies that a flow is not scaled-realizable.

Definition 4.2 (Borrow-Realizable). Let fbe a flow on an extended hypergraph, $\bar{H} = (V, \bar{E})$ and let b be a function b : V → ℕ₀. Set $f' (e_{v}^{-}) = b (v) + f (e_{v}^{-})$ and $f' (e_{v}^{+}) = b (v) + f (e_{v}^{+})$ for all v ∈ V, and $f'$ (e) = f (e) for all e ∈ E. Then f is borrow-realizable if there exists a function b such that $f'$ is realizable.

We denote b as the borrowing function and we say that $f'$ is the flow f where v ∈ V has been borrowed b(v) times. This models that molecules required for reactions in the pathway can be acquired from the environment (and returned afterwards). Formally, this is specified by having an additional input and output flow b(v) for species v. Furthermore, for a borrowing function b we define ∣b∣ = ∑_v∈V b(v), that is, the total count of molecules borrowed. The idea of borrowing tokens in the corresponding Petri net setting has been proposed in (Desel 1998, Proposition 10) together with a theorem which implies that $f'$ is realizable for some b with sufficiently large ∣b∣. That is, every flow is in fact borrow-realizable.

The combinatorics underlying the non-oxidative phase of the PPP has been studied in a series of works focusing on simplifying principles that explain the structure of metabolic networks (see, e.g., Noor et al., 2010; Meléndez-Hevia and Isidoro, 1985). An example of a simple flow from the PPP that is not scaled-realizable is shown in Figure 12. Here, the production of glyceraldehyde (Glyald) is dependent of the presence of Hex-2-ulose (2Hex), which depends on fructose 1-phosphate (F1P), which in turn depends on Glyald. This cycle of dependencies by Thm. 3 implies that firing is impossible unless one of the molecules in this cycle is present at the outset, which cannot be achieved by scaling. As illustrated in Figure 13 and proven by the existence of the realizability certificate, the flow is borrow-realizable with just one borrowing, namely of the compound Glyald. Thus Glyald can be seen as a network catalyst for this pathway.

FIG. 12.

Example of a flow for the pentose phosphate pathway that is not scaled-realizable. The flow is borrow-realizable. The input compound is marked with green and the output compound is marked with blue. The SMILES strings for all molecule identifiers are listed in Appendix and Appendix Table A1.

FIG. 13.

A realizability certificate for the flow in Figure 12 where the molecule Glyald is borrowed in order to make it borrow-realizable. The input compounds are marked with green, the output compounds are marked with blue and the borrowed compound is marked with purple. The SMILES strings for all molecule identifiers are listed in Appendix and Appendix Table A1.

5. REPRESENTATIONS OF PATHWAYS

We have described two ways of modeling pathways: flows and realizability certificates. The realizability certificate defines a causal order in the pathway and explicitly expresses which individual molecule is used when and for which reaction. A realizability certificate uniquely determines a corresponding flow. Flows, on the other hand, do not specify the order of the reactions or which one of multiple copies of a molecule is used in which reaction. A flow therefore may correspond to multiple different realizability certificates, each representing a different mechanism.

We want to point out that commonly used representations of pathways in the life science literature fall in between these two extremes, see Figure 14 for an example. In this example, the order of reactions is not fully resolved—for instance, is F6P produced before E4P or after? Indeed, some unspecified choice of borrowing is needed to set the pathway in motion. Additionally, the semantics of a molecule identifier appearing in several places is unclear—for instance, are the three appearances of G3P interchangeable in the associated reactions or do they signify different individual instances of the same type of molecule? In the former case, the figure corresponds to a much larger number of different realizability certificates than in the latter case. The answers to these questions have important consequences for investigations where the identity of individual atoms matter, such as atom tracing.

FIG. 14.

Example of a pathway drawing for the cyclic non-oxidative glycolysis (NOG) pathway. Recreated from (Bogorad et al., 2013, Fig. 2a).

Furthermore, when there is a choice between different pathway suggestions, avoiding borrow-realizable pathways often gives simpler depictions. However, this introduces a bias among the possible pathways, which may be unwanted, as borrow-realizable solutions are usually equally simple in chemical terms. We note that the need for borrowing in pathways is usually not discussed in the literature. Additionally, there has been a lack of computational methods to systematically look for borrow-realizable pathways, even if they could equally likely form part of what happens in nature. For instance, the PPP is usually depicted in a form that gives rise to a realizable flow depicted in Figure 15, with a realizability certificate shown in Figure 16. It could just as well be described by the equally simple and chemically realistic borrow-realizable pathway depicted in Figure 12.

FIG. 15.

A flow for the pentose phosphate pathway, which is realizable. The input compound is marked with green and the output compound is marked with blue. The SMILES strings for all molecule identifiers are listed in Appendix and Appendix Table A1.

FIG. 16.

A realizability certificate for the flow in Figure 15. The input compounds are marked with green and the output compounds are marked with blue. The SMILES strings for all molecule identifiers are listed in Appendix and Appendix Table A1.

We believe that our focus on the realizability of pathways may help raise awareness of the choices one often subconsciously makes when creating pathway illustrations.

6. MATHEMATICAL PROPERTIES OF REALIZABILITY

In this section, we take the first steps towards a mathematical theory of the realizability of flows. We begin with a result on realizable flows and prove that if the König representation of the flow-induced subhypergraph of the extended hypergraph and flow f does not have any cycles, then f is realizable.

Definition 6.1 (Flow-induced Subhypergraph). The flow-induced subhypergraph of an extended hypergraph $\bar{H} = (V, \bar{E})$ and a flow f is the directed hypergraph $\bar{H} [f] = (V', E')$ , with $\begin{array}{l} E' = {e \in E ∣ f (e) \neq 0} \\ V' = {v \in e^{+} \lor v \in e^{-} ∣ e \in E, f (e) \neq 0} \end{array}$ (3)

Definition 6.2 (König Representation [Andersen et al., 2020]). The König representation of a directed hypergraph $H$ = (V, E) is the directed multigraph K( $H) = (V', E')$ where $V'$ = V ∪ E and $\begin{array}{l} E' = {{(v, e) ∣ e = (e^{+}, e^{-}) \in E, v \in e^{+}}} \\ \cup {{(e, v) ∣ e = (e^{+}, e^{-}) \in E, v \in e^{-}}} \end{array}$

In short, the König representation of a hypergraph arises simply by considering both the circles and boxes of its visualization (in the style of e.g., Fig. 2) as nodes and the arrows as edges.

Lemma 1. If $K (\bar{H} [f])$ has no cycles, then f is realizable.

Proof. Since $K (\bar{H} [f])$ is a DAG, it has a topological sort. Order the nodes of $H$ on a line according to this. Create nodes for input (output) “half-edges” making them full hyperedges and put these new nodes first (last) in the topological sort. Put the number of tokens specified by the flow on the new input nodes. Create a firing sequence by moving a sweepline across the topological sort and fire a hyperedge when the last node in its source (multi)set is passed. Fire it the number of times specified by the multiplicity of the edge. By the definition of a topological sort, the following holds for any node v (i)

When the sweepline reaches v, v has received all its tokens in the flow.

(ii)

Node v only needs to supply tokens after the sweepline has reached v.

(iii)

If v is the last node in the sources of a hyperedge, the hyperedge can fire (i.e. there are still enough tokens on every node in its source).

Here (i) and (iii) are proven together by induction on the sweepline movements and (ii) is true by construction of the firing sequence. □

There exist flows requiring arbitrary scaling factors:

Theorem 1. For any integer k > 1, there exists a flow which is not scaled-realizable for any integer i < k but is scaled-realizable for all integers i ≥ k.

Proof. One family of such flows is represented by Figure 17, which fulfils the statement for k = 4: This flow is not scaled-realizable for i < 4 as all of B, C, D and E need to be present for r to fire in the corresponding Petri net. Therefore, there needs to be at least 4 tokens input to A. To prove that the flow is scaled-realizable for any integer i ≥ 4, input i tokens to A and output i − 4 of them from A, such that 4 tokens still reside on A. Fire the sequence bcder. There are now 4 tokens on A again. Repeat the firing of the sequence bcder until it has been fired k times, then output the remaining 4 tokens on A. Clearly, the construction of Figure 17 generalizes to any k > 1. If one would like to avoid the unbounded size of the hyperedge r in the family, a binary tree structure can be added on both sides of r (first merging k nodes into one, then expanding this into k nodes connected to A). □

FIG. 17.

A flow which is not scaled-realizable for an integer k < 4 but is for any integer k ≥ 4.

There also exist flows not scaled-realizable for any factor:

Theorem 2. The flow in Figure 18 is not scaled-realizable for any integer k ≥ 1.

FIG. 18.

Simple flow that is not scaled-realizable.

Proof. Assume that the flow is scaled-realizable for some factor s. In the firing sequence that realizes the flow, consider the point in time just before the k’th firing of r. Then at most s + (k − 1) tokens on A have been available. For r to now happen the k’th time at least 2k tokens on A have been available (in order to make the necessary firings of r). Hence s + (k − 1) ≥ 2k⇔s − 1 ≥ k. So s executions of r is not possible. □

We now give an easily checkable condition which if true implies that a flow is not scaled-realizable for any factor. In short, the condition is that at least one vertex of the flow cannot be reached during a graph traversal from its source set.

In more detail: Consider a directed hypergraph $H$ = (V, E). The set R(H, S) of vertices reachable from S in $H$ is defined by (i) S ⊆ R(H, S) and (ii) if e⁺ ⊆ R(H, S) for some e ∈ E, then e⁻ ⊆ R(H, S). It can be computed using the traversal procedure specified in Alg. 1. Setting w = max _e ∈ E ∣e⁺∣, Alg. 1 runs in O(w ∣E ∣²) time, since checking the condition of the “while” loop require O(w ∣E∣) time, and in every iteration, F shrinks by one edge.

Theorem 3. Given a flow f on an extended hypergraph $\bar{H} = (V, \bar{E})$ , let $\bar{H} [f] = (V', E')$ be the flow-induced subhypergraph and let S be the source set such that $S = {v \in V ∣ f (e_{v}^{-}) \neq 0}$ (4)

If there exists a vertex $v \in V'$ that is not returned by the traversal procedure on the graph $\bar{H} [f]$ and source set S, then the flow f is not scaled-realizable.

Proof. The flow-induced subhypergraph $\bar{H} [f]$ has as edges the internal edges of $\bar{H}$ on which there is flow, and as vertices the vertices of $\bar{H}$ that have either in- or out flow, without regard to the half-edges. The source set S contains the vertices of $\bar{H}$ with input flow according to f. The traversal specified in Alg. 1 corresponds to having an infinite amount of flow into the vertices in S and no restrictions on the number of times an edge can be followed. Therefore, if a vertex is not reachable from S by Alg. 1, it is also not reachable in the stricter case, where the search is restricted by the flow specification. Note that the omission of the edges on which there is no flow, as well as the vertices for which all internal edges entering or leaving them has no flow, is crucial in order to let Alg. 1 mimic the operations of a scaled flow. Otherwise, there might be ways of visiting the vertices, which would not be possible if only considering the paths represented by the flow specification. Moreover, observe that the omission of vertices which only have in- and outflow does not affect the result of the algorithm as these would be trivially visited. □

We remark that Thm. 3 only provides a sufficient condition for determining non-scaled-realizable flows and not a necessary condition. This follows from the flow in Figure 18: During graph traversal, this flow will have all its vertices visited, but by Thm. 2 it is not scaled-realizable for any factor.

The property of being scaled-realizable is closed under addition of the scaling factors:

Theorem 4. If a flow f is scaled-realizable for an integer k and an integer l, then it is also scaled-realizable for k + l.

Proof. Create a realizability certificate for (k + l) ⋅ f as the disjoint union of the realizability certificate for $f' = k \cdot f$ and the realizability certificate for $f ″ = l \cdot f$ . □

The family of flows from the proof of Thm. 1 has the following interesting property.

Definition 6.3 (Monotone Scaled-Realizable). A flow f is monotone scaled-realizable iff it is scaled-realizable for all integers j ≥ k, where k is the smallest factor for which it is scaled-realizable.

A natural question now arises whether all scaled-realizable flows are also monotone scaled-realizable. We did a computer-based search for counter-examples, but found none.

In more detail, we generated several pseudo-random directed hypergraphs in which we found a large number of different flows using the software package MØD (Andersen et al., 2016, 2018), which has a functionality for executing flow queries for hypergraphs via ILP (Andersen et al., 2019). We tested these flows for realizability and among the flows not directly realizable, we looked at those which were scaled-realizable with a smallest scale factor k = 2 or k = 3. If the lowest factor was k = 2, we tested if the flow was also scaled-realizable for factor j = 3. If the lowest factor was k = 3, we tested if the flow was also scaled-realizable for factors j where 3 < j ≤ 5. If so, we by Thm. 4 knew that the flow was monotone scaled-realizable. If not, we would have found a counter-example. Among the 1688 scaled-realizable flows studied, we found them all to be monotone scaled-realizable.

We thus close this section with the following conjecture:

Conjecture 1. All scaled-realizable flows are monotone scaled-realizable.

7. CONCLUSION

We introduced here a concept of realizability of a pathway given as a flow by converting the flow to a Petri net. The question of realizability can then be rephrased as a question of reachability in the Petri net, leading to notions of realizable, scaled-realizable, and borrow-realizable flows. The method is essential if one is interested in finding alternative realizable pathways to those already known by chemists. Reachability in Petri nets and equivalent formal systems is an active field of research [see, e.g., Alaniz et al. (2022)] and the references therein. Many of the relevant reachability problems $M \overset{*}{\to} M'$ are hard for arbitrary markings. It remains a relevant question for future work to see if restrictions imposed by chemistry, in particular conservation of mass, suffice to make the problems easier.

An interesting direction for future research is extending the framework to allow for atom tracing in CRNs. While current Petri net methods allow us to track individual tokens/molecules (van Glabbeek, 2005), full atom tracing requires enumerating all possible firing sequences, that is, all witness paths, which existing Petri net tools do not currently provide. On the other hand, atom-atom mapping, that is, how atoms rearrange during reactions, is already available through an existing graph transformation framework MØD (Andersen et al., 2016, 2018). Such a combination of witness path enumeration and atom-atom mapping is crucial for tracking isotopic labels and understanding reaction mechanisms and would significantly enhance the model’s applicability in systems chemistry, metabolic engineering, and synthetic biology.

Footnotes

AUTHORS’ CONTRIBUTIONS

J.L.A.: Conceptualization, methodology, software, writing—original draft, writing—review and editing, supervision. S.B.: Methodology, writing—original draft, writing—review and editing, visualization. R.F.: Conceptualization, methodology, writing—review and editing, supervision. C.F.: Conceptualization. D.M.: Conceptualization, methodology, writing—review and editing, supervision. P.F.S.: Conceptualization, writing—review and editing.

AUTHOR DISCLOSURE STATEMENT

The authors have no conflict of interest to declare.

FUNDING INFORMATION

This work is supported by the Novo Nordisk Foundation, grant NNF19OC0057834 and by the Independent Research Fund Denmark, Natural Sciences, grant DFF-0135-00420B.

Appendix

1

When comparing a multiset M and a set S, we view M as a set. I.e., M⊆S holds if every element in M is an element of S.

References

Alaniz

, Fu

, Gomez

, et al. Reachability in restricted chemical reaction networks. Technical report, arXiv, 2022.

Andersen

. MedØlDatschgerl (MØD). 2018. Available from: http://mod.imada.sdu.dk

Andersen

, Banke

, Fagerberg

, et al. On the realisability of chemical pathways. In Bioinformatics Research and Applications. ( Guo

Xuan

, Mangul

Serghei

, Patterson

Murray

, and Zelikovsky

Alexander

, editors). Springer Nature Singapore: Singapore; 2023. 409–419. ISBN 978-981-99-7074-2.

Andersen

, Flamm

, Merkle

, et al. and 50 Shades of rule composition: From chemical reactions to higher levels of abstraction. In, Formal Methods in Macro-Biology, volume 8738 of Lecture Notes in Computer Science. ( Fages

, Piazza

, eds.). Springer International Publishing: Berlin; 2014, pp. 117–135. ISBN 978-3-319-10397-6.

Andersen

, Flamm

, Merkle

, et al. A software package for chemically inspired graph transformation. In Graph Transformation: 9th International Conference, ICGT 2016, in Memory of Hartmut Ehrig, Held as Part of STAF 2016. ( Echahed

Rachid

and Minas

Mark

, editors), Springer International Publishing: Cham; 2016. Vienna, Austria, July 5-6, 2016, Proceedings, pages 73–88. ISBN 978-3-319-40530-8; doi: 10.1007/978-3-319-40530-8_5

Andersen

, Flamm

, Merkle

, et al. Chemical transformation motifs—modelling pathways as integer hyperflows. IEEE/ACM Trans Comput Biol Bioinform, 2019; 16(2):510–523; doi: 10.1109/TCBB.2017.2781724 and ISSN 1545-5963.

Andersen

, Flamm

, Merkle

, et al. Defining autocatalysis in chemical reaction networks. Journal of Systems Chemistry, 2020; 8:121–133 and ISSN 2571-7715. http://www.nls-publishers.com/shop/journal/journal+of+systems+chemistry+2020%2c+volume+8. TR: https://arxiv.org/abs/2107.03086

Andersen

, Flamm

, Merkle

, et al. Maximizing output and recognizing autocatalysis in chemical reaction networks is NP-complete. J Syst Chem, 2012; 3(1).

Baldan

, Cocco

, Marin

, et al. Petri nets for modelling metabolic pathways: A survey. Nat Comput, 2010; 9(4):955–989; doi: 10.1007/s11047-010-9180-6

10.

Best

, Merceron

. Discreteness, k-density and d-continuity of occurrence nets. In, Theoretical Computer Science. ( Armin

B. Cremers

and Hans-Peter

Kriegel

, editors), Springer Berlin Heidelberg: Berlin, Heidelberg; 1982. 73–83. ISBN 978-3-540-39421-1.

11.

Bogorad

, Lin

T-S

, Liao

. Synthetic non-oxidative glycolysis enables complete carbon conservation. Nature (London), 2013; 502(7473):693–697.

12.

Braakman

, Smith

. The compositional and evolutionary logic of metabolism. Phys Biol, 2013; 10(1):011001.

13.

Cormen

, Leiserson

, Rivest

, et al. Introduction to algorithms. 3 Edition, The MIT Press: Cambridge, MA; 2009.

14.

Desel

. Basic linear algebraic techniques for place/transition nets. In Lectures on Petri Nets I: Basic Models, Springer; 1998. 257–308.

15.

Eigen

, Schuster

. The hypercycle: A principle of natural self-organization. Die Naturwissenschaften; 1977.

16.

Eigen

. Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften, 1971; 58(10):465–523.

17.

Esparza

. Decidability and complexity of petri net problems—an introduction. In Lectures on Petri nets I: Basic models. Springer; 1998, 374–428.

18.

Fagerberg

, Flamm

, Merkle

, et al. On the complexity of reconstructing chemical reaction networks. MathComputSci, 2013; 7(3):275–292.

19.

Genrich

, Stankiewicz-Wiechno

. A dictionary of some basic notions of net theory. In Net Theory and Applications. ( Brauer

, ed.) Springer Berlin Heidelberg: Berlin, Heidelberg; 1980. 519–531. ISBN 978-3-540-39322-1.

20.

Goltz

, Reisig

. The non-sequential behaviour of petri nets. Information and Control, 1983; 57(2–3):125–147; doi: 10.1016/S0019-9958(83)80040-0 and ISSN 00199958.

21.

Koch

. Petri nets – a mathematical formalism to analyze chemical reaction networks. Mol Inform, 2010; 29(12):838–843; doi: 10.1002/minf.201000086 and ISSN 1868-1751.

22.

Mayr

. Persistence of vector replacement systems is decidable. Acta Informatica, 1981; 15(3):309–318.

23.

Meléndez-Hevia

, Isidoro

. The game of the pentose phosphate cycle. J Theor Biol, 1985; 117(2):251–263; doi: 10.1016/S0022-5193(85)80220-4 and ISSN 0022-5193.

24.

Morowitz

, Copley

, Smith

. Core metabolism as a self-organized system, chapter 20. In Protocells. The MIT Press; 2008. ISBN 9780262182683;0262182688.

25.

Müller

, Flamm

, Stadler

. What makes a reaction network “chemical”? J Cheminform, 2022; 14(1):63–63.

26.

Nielsen

, Plotkin

, Winskel

. Petri nets, event structures and domains, part i. Theoretical Computer Science, 1981; 13(1):85–108; doi: 10.1016/0304-3975(81)90112-2 and ISSN 0304-3975. Special Issue Semantics of Concurrent Computation.

27.

Noor

, Eden

, Milo

, et al. Central carbon metabolism as a minimal biochemical walk between precursors for biomass and energy. Mol Cell, 2010; 39(5):809–820; doi: 10.1016/j.molcel.2010.08.031 and ISSN 10972765.

28.

Reddy

, Liebman

, Mavrovouniotis

. Qualitative analysis of biochemical reaction systems. Comput Biol Med, 1996; 26(1):9–24.

29.

Reddy

, Mavrovouniotis

, Liebman

. Petri net representations in metabolic pathways. Proceedings. International Conference on Intelligent Systems for Molecular Biology. 1993; 328–36.

30.

Reutenauer

. The mathematics of Petri nets. Prentice-Hall Inc.: USA; 1990. ISBN 0135618878.

31.

Schmidt

. Lola a low level analyser. In Application and Theory of Petri Nets 2000, volume 1825 of Lecture Notes in Computer Science. ( Nielsen

, Simpson

, eds.) Springer Berlin Heidelberg; 2000, pp. 465–474. ISBN 978-3-540-67693-5; doi: 10.1007/3-540-44988-4_27

32.

Szathmáry

. A hypercyclic illusion. J Theor Biol, 1988; 134(4):561–563.

33.

Szathmáry

. On the propagation of a conceptual error concerning hypercycles and cooperation. J Syst Chem, 2013; 4(1):1.

34.

van Glabbeek

. The individual and collective token interpretations of petri nets. In CONCUR 2005 – Concurrency Theory. ( Abadi

, de Alfaro

, eds.) Springer Berlin Heidelberg: Berlin, Heidelberg; 2005, 323–337. ISBN 978-3-540-31934-4.

35.

Zeigarnik

. On hypercycles and hypercircuits in hypergraphs. Discrete Math. Chemistry, 2000; 51:377–383; doi: 10.1090/dimacs/051/28