Cayley Graphs of Semigroups Applied to Atom Tracking in Chemistry

Abstract

While atom tracking with isotope-labeled compounds is an essential and sophisticated wet-lab tool to, for example, illuminate reaction mechanisms, there exists only a limited amount of formal methods to approach the problem. Specifically, when large (bio-)chemical networks are considered where reactions are stereospecific, rigorous techniques are inevitable. We present an approach using the right Cayley graph of a monoid to track atoms concurrently through sequences of reactions and predict their potential location in product molecules. This can not only be used to systematically build hypothesis or reject reaction mechanisms (we will use the ANRORC mechanism “Addition of the Nucleophile, Ring Opening, and Ring Closure” as an example) but also to infer naturally occurring subsystems of (bio-)chemical systems. Our results include the analysis of the carbon traces within the tricarboxylic acid cycle and infer subsystems based on projections of the right Cayley graph onto a set of relevant atoms.

1. Introduction

Traditionally, atom tracking is used in chemistry to understand the underlying reactions and interactions behind some chemical or biological system. In practice, atoms are usually tracked using isotopic labeling experiments. In a typical isotopic labeling experiment, one or several atoms of some educt molecule of the chemical system we wish to examine are replaced by an isotopic equivalent (e.g., ¹²C is replaced with ¹³C). These compounds are then introduced to the system of interest, and the resulting product compounds are examined, for example, by mass spectrometry (Chahrour et al., 2015) or nuclear magnetic resonance (Deev et al., 2019). By determining the positions of the isotopes in the product compounds, information about the underlying reactions might then be derived. From a theoretical perspective, characterizing a formal framework to track atoms through reactions is an important step to understand the possible behaviors of a chemical or biological system.

In this contribution, we introduce such a framework based on concepts rooted in semigroup theory. Semigroup theory can be used as a tool to analyze biological systems such as metabolic and gene regulatory networks (Egri-Nagy and Nehaniv, 2008; Nehaniv et al., 2015). In particular, Krohn–Rhodes theory (Rhodes et al., 2010) was used to analyze biological systems by decomposing a semigroup into simpler components. The networks are modeled as state automatas (or ensembles of automatas), and their characteristic semigroup, that is, the semigroup that characterizes the transition function of the automata (Mikolajczak, 1991), is then decomposed using Krohn–Rhodes decompositions or, if not computationally feasible, the holonomy decomposition variant (Egri-Nagy and Nehaniv, 2015). The result is a set of symmetric natural subsystems and an associated hierarchy between them, which can then be used to reason about the system. In Andersen et al. (2019), algebraic structures were employed for modeling atom tracking: graph transformation rules are iteratively applied to sets of undirected graphs (molecules) to generate the hyperedges (the chemical reactions) of a directed hypergraph (the chemical reactions network) (Andersen et al., 2013, 2016). A semigroup is defined by using the (partial) transformations that naturally arise from modeling chemical reactions as graph transformations. Utilizing this particular semigroup, the so-called pathway tables can be constructed, detailing the orbit of single atoms through different pathways to help with the design of isotopic labeling experiments.

In this work, we show that we can gain a deeper understanding of the analyzed system by considering how atoms move in relation to each other. To this end, we briefly introduce useful terminology in Section 2, found in graph transformation theory as well as semigroup theory. In Section 3, we show how the possible trajectories of a subset of atoms can be intuitively represented as the (right) Cayley graph (Dénes, 1966) of the associated semigroup of a chemical network. Moreover, we define natural subsystems of a chemical network in terms of reversible atom configurations and show how they naturally relate to the strongly connected components of the corresponding Cayley graph. We show the usefulness of our approach in Section 4.1 by using the constructions defined in Section 3 to differentiate chemical pathways, based on the atom trajectories derived from each pathway. We then show how the Cayley graph additionally provides a natural handle for the analysis of cyclic chemical systems such as the tricarboxylic acid (TCA) cycle (Harvey and Ferrier, 2010).

2. Preliminaries

2.1. Graphs

In this contribution, we consider directed as well as undirected connected graphs $G = (V, E)$ with vertex set V(G) = V and edge set E(G) = E. A graph is vertex or edge labeled if its vertices or edges are equipped with a labeling function, respectively. If it is both vertex and edge labeled, we simply call the graph labeled. We write l(x) for the vertex labels $(x \in V (G))$ and edge labels $(x \in E (G))$ .

Given two (un)directed graphs G and $G'$ and a bijection $φ : V (G) \to V (G')$ , we say that $φ$ is edge-preserving if $(v, u) \in E (G)$ if and only if $(φ (v), φ (u)) \in E (G')$ . Additionally, if G and $G'$ are labeled, $φ$ is label-preserving if $l (v) = l (φ (v))$ for any $v \in V (G)$ and $l (v, u) = l (φ (v), φ (u))$ for any $(v, u) \in E (G)$ . The bijection $φ$ is an isomorphism if it is edge-preserving and, in the case that G and $G'$ are labeled, label-preserving. If $G = G'$ , then $φ$ is also an automorphism.

Given a (directed) graph G, we call $G (s t r o n g l y)$ connected if there exists a path from any vertex u to any vertex v. We call the subgraph H of G a (strongly) connected component if H is a maximal (strongly) connected subgraph.

Since the motivation of this work is rooted in chemistry, sometimes it is more natural to talk about the undirected labeled graphs as molecules, their vertices as atoms (with labels defining the atom type), and their edges as bonds (whose labels distinguish single, double, triple, and aromatic bonds, for instance), while still using common graph terminology for mathematical precision.

2.2. Graph transformations

As molecules are modeled as undirected labeled graphs, it is natural to think of chemical reactions as graph transformations, where a set of educt graphs are transformed into a set of product graphs. We model such transformations using the double pushout (DPO) approach. For a detailed overview of the DPO approach and its variations, see Habel et al. (2001). Here, we will use DPO as defined in the study of Andersen et al. (2016) that specifically describes how to model chemical reactions as rules in a DPO framework.

A rule p describing a transformation of a graph pattern L into a graph pattern R is denoted as a span $L \leftarrowl K \tor R$ , where K is the subgraph of L remaining unchanged during rewriting and l and r are the subgraph morphism K to L and R, respectively. The rule p can be applied to a graph G if and only if (1) L can be embedded in G (i.e., L is subgraph monomorphic to G) and (2) the graphs D and H exist such that the diagram depicted in Figure 1 commutes.

FIG. 1.

A direct derivation.

The graphs D and H are unique if they exist (Habel et al., 2001). The graph H is the resulting graph obtained by rewriting G with respect to the rule p. We call the application of p on G to obtain H via the map $m : L \to G$ , a direct derivation and denote it as $G \Rightarrow^{p, m} H$ or $G \Rightarrow^{p} H$ , if m is not important. We note that m is not necessarily unique, that is, there might exist a different map $m'$ such that $G \Rightarrow^{p, m'} H$ .

For a DPO rule p to model chemistry, we follow the modeling in Andersen et al. (2013) and impose three additional conditions that p must satisfy. (1) All graph morphisms must be injective (i.e., they describe subgraph isomorphisms). (2) The restriction of graph morphisms l and r to the vertices must be bijective, ensuring that atoms are conserved through a reaction. (3) Changes in charges and edges (chemical bonds) must conserve the total number of electrons.

In the above framework, a chemical reaction is a direct derivation $G \Rightarrow^{p, m} H$ , where each connected component of G and H corresponds to the educt and product molecules, respectively. Conditions (1) and (2) ensure that l and r, and by extension $l'$ and $r'$ , are bijective mappings when restricted to the vertices. As a consequence, we can track each atom through a chemical reaction modeled as a direct derivation by the map $l'^{- 1} \circ r'$ . We note that like m, $l'$ and $r'$ might not be unique for a given direct derivation $G \Rightarrow^{p} H$ . We define the set of all such maps $l'^{- 1} \circ r'$ for all possible maps $l'$ and $r'$ obtained from $G \Rightarrow^{p} H$ as $t r (G \Rightarrow^{p} H)$ . An example of a direct derivation representing a chemical reaction is depicted in Figure 2.

FIG. 2.

An example of a direct derivation. The mapping l, r, $l'$ , and $r'$ is implicitly given by the depicted positions of the atoms. Given a chemical network, each hyperedge directly corresponds to such a direct derivation.

2.3. Chemical networks

We consider a directed hypergraph where each edge $e = (e^{+}, e^{-})$ is a pair of subsets of vertices. Moreover, we let $Y_{e} = e^{+} \cup e^{-}$ denote the set of vertices that are comprised in the start-vertex $e^{+}$ and the end-vertex $e^{-}$ of e. In short, a chemical network $C N$ is a hypergraph where each vertex is a connected graph representing a molecule and each hyperedge a rule application corresponding to a chemical reaction. Hence, every hyperedge e of $C N$ corresponds to a set of direct derivations transforming the ingoing vertices of e into its outgoing vertices. For a given set of edges E of CN, let $D$ be the set of all direct derivations that can be obtained from E. Then, $t r (E) = ⋃_{G \Rightarrow^{p} H \in D} t r (G \Rightarrow^{p} H)$ and $t r (C N) = t r (E (C N))$ .

2.4. Semigroups and transformation semigroups

A semigroup is a pair $(S, \circ)$ , where S is a set and $\circ : S \times S \to S$ an associative binary operator on S. We often write ab for the product $a \circ b$ . A semigroup that contains the identity element 1 (i.e., $s 1 = s = 1 s$ for all $s \in S$ ) is a monoid. The order of a semigroup S is its cardinality $| S |$ . A subset $A \subseteq S$ is said to generate S or called a generating set for S, $⟨ A ⟩ = S$ , if all elements of S can be expressed as a finite product of elements in A.

Given a nonempty finite set X, a transformation on X is an arbitrary map $f : X \to X$ that assigns to every element $x \in X$ some element $f (x) \in X$ . The identity of a transformation on X is denoted $1_{X}$ . A transformation monoid is a transformation semigroup with identity. If $X = {1, \dots, n}$ , we often use the notation $(i_{1}, i_{2}, \dots, i_{n})$ for the transformation $f (j) = i_{j}$ , $1 \leq j \leq n$ . Note that the elements $i_{1}, i_{2}, \dots, i_{n}$ need not necessarily be pairwise distinct. Let T be the set of all possible transformations on X. If $S \subseteq T$ and S is closed under function composition $\circ$ , then $(S, \circ)$ forms a semigroup, also called a transformation semigroup. To emphasize that S is a collection of transformations on X, we will use the notation $(X, S)$ for transformation semigroups and say that S acts on X. Given a tuple $\bar{z} = (z_{1}, z_{2}, \dots, z_{n})$ of n distinct elements of X and a transformation semigroup $(X, S)$ , the orbit of $\bar{z}$ is defined as . In what follows, we use the notion $y \in t = (i_{1}, i_{2}, \dots, i_{n})$ to indicate that $y = i_{j}$ for some j, $1 \leq j \leq n$ .

Given a transformation semigroup $(X, S)$ with generating set A, in symbols $S = ⟨ A ⟩$ , we will employ the (right) Cayley graph $C a y (S, A)$ of S and A with vertex set S and edge set $E (C a y (S, A)) = {(s, s a) | s \in S, a \in A}$ . In addition, every edge $(s, s a)$ of $C a y (S, A)$ obtains label l_a, that is, the unique label that is associated to each generator a in A. Similarly, the projected Cayley graph $P C a y (S, A, \bar{z})$ is defined for tuples $\bar{z}$ : It has vertex set $O (\bar{z}, S)$ and for all $s \in O (\bar{z}, S)$ and for all $a \in A$ , there is an edge $(s, s a)$ with label l_a. A free semigroup $Σ^{+}$ is the semigroup containing all finite sequences of strings constructed from the alphabet $Σ$ with concatenation as the associative binary operator. Adding the empty string ε results in the free monoid $Σ^{*} = Σ^{+} \cup {U}$ .

3. Chemical Networks and Their Algebraic Structures

3.1. Characteristic monoids

Assume we are given some chemical network $C N$ that is some hypergraph modeling some chemistry. As we are interested in tracking the possible movements of atoms in CN, we are inherently interested in the reactions of CN, that is, in its edge set $E (C N)$ . Indeed, atoms can only reconfigure to construct new molecules under the execution of some reaction. We will refer to the execution of a reaction as an event. The possible reconfigurations of atoms caused by a single event are given by the set of atom maps $t r (C N)$ constituting a set of (partial) transformations on $X = ⋃_{M \in V (C N)} V (M)$ . Note that the vertex $M \in V (C N)$ corresponds to an entire molecule for which $V (M)$ denotes the set of atoms ( = labeled vertices). A transformation t on X describes the position (i.e., in what molecule and where in the molecule the atom is found) of each atom in X when X is transformed by t. In what follows, we will sometimes refer to such transformations on X as atom states, as the transformations encapsulate the “state” of the network, that is, the position of each atom. To track the possible movement of atoms through a chemical network, we must consider sequences of events.

Definition 1 (Event Traces): Let $Σ$ be an alphabet containing a unique identifier t for each atom map in $t r (C N)$ . Then, an event trace is an element of the free monoid $Σ^{*}$ .

The free monoid $Σ^{*}$ contains all possible sequences of events that can move the atoms of X. Note that $Σ^{*}$ does not track the actual atoms through event traces. For this, we use the following structure.

Definition 2 (Characteristic Monoids): Let the characteristic monoid of $C N$ be defined as the transformation monoid $S (C N) = (X, ⟨ t r (C N) \cup 1_{X} ⟩) .$ Moreover, given a set of edges $E \subseteq E (C N)$ , and the set of atoms $Y \subseteq X$ found in E (i.e., $Y = \cup_{e \in E} Y_{e}$ ), we let the characteristic monoid of E be defined as $S (E) = (Y, ⟨ t r (E) \cup 1_{Y} ⟩) .$

Let $σ : Σ \to t r (C N)$ be the function that maps all identifiers of $Σ$ to their corresponding atom map in $t r (C N)$ . Given an event trace $t = t_{1} t_{2} \dots t_{n} \in Σ^{*}$ , we let the events of t refer to their corresponding transformations in $t r (C N)$ when acting on an element $s \in S (C N)$ , that is, $s t = s σ (t_{1}) σ (t_{2}) \dots σ (t_{n}) \in S (C N)$ . Every event trace $t \in Σ^{*}$ gives rise to a member $S (C N)$ , in particular the transformation $1_{X} t$ , that represents the resulting atom state obtained from moving atoms according to t. Hence, there is a homomorphism from $Σ^{*}$ to $S (C N)$ , meaning that $S (C N)$ captures all possible movements of atoms through reactions of $C N$ .

Often, we are only interested in tracking the movement of a small number of atoms. Let $\bar{z}$ be a tuple of distinct elements from X that we want to track. Then, there is again a homomorphism from $Σ^{*}$ and $O (\bar{z}, S (C N))$ . Namely, for a given event trace $t \in Σ^{*}$ , we can track the atoms of $\bar{z}$ as the atom state $1_{{x | x \in \bar{z}}} t$ corresponding to an element in the orbit $O (\bar{z}, S (C N))$ , if we treat the element as a (partial) transformation. As a result, $O (\bar{z}, S (C N))$ characterizes the possible movements of the atoms in $\bar{z}$ , and we will refer to its elements as atom states similarly to elements in $S (C N)$ as they conceptually represent the same thing.

We note that the above definitions are not unlike some of the core definitions within algebraic automata theory (Mikolajczak, 1991). Here, the possible inputs of an automata are often defined in terms of strings obtained from the free monoid on the alphabet of the automata. The characteristic semigroup is then defined as the semigroup that characterizes the possible state transitions. In the same vein, we can view our notion of event traces as the possible “inputs” to our chemical network CN that moves some initial configuration of atoms $1_{X}$ . The characteristic monoid of CN then characterizes the possible movements of atoms through event traces.

In what follows, we let $C a y (C N)$ denote the Cayley graph , . Similarly, given a tuple of atoms $\bar{z}$ , we let denote the projected Cayley graph $P C a y (S (C N), t r (C N) \cup 1_{X}, \bar{z})$ . We note that by Definition 2, $S (C N)$ is constructed from the generating set $⟨ t r (C N) \cup 1_{X} ⟩$ , and hence, $C a y (C N)$ and $P C a y (C N, \bar{z})$ are well defined. Since the transformation $1_{X}$ will always result in a loop on every vertex of the (projected) Cayley graph and conveys no meaningful information, we will refrain from including any edge arising from $1_{X}$ .

We can illustrate the relation between atom states using the Cayley graph $C a y (C N)$ . More precisely, there exists an edge between two atom states $a, b \in S (C N)$ with label t, if it is possible to move the atoms in a to b using t. It is natural to relate $Σ^{*}$ to $C a y (C N)$ . Namely, any path in $C a y (C N)$ corresponds directly to an event trace in $Σ^{*}$ . Hence, where $Σ^{*}$ encapsulates the “inputs” of the chemical network and $S (C N)$ contains the possible atom states derived from $Σ^{*}$ , the Cayley graph $C a y (C N)$ captures how atom states from $S (C N)$ can be created by event traces.

Example: As an illustrative example, consider the reaction network $C N$ depicted in Figure 3a. For simplicity, we will use reactions r₀ and r₁ involved in the so-called formose reaction. We restrict ourselves to only consider the carbon atoms of all molecules and have labeled them with a corresponding unique id for easy reference. Here, the underlying set $X = {1, 2, \dots, 8}$ corresponds to the eight elements labeled by $1, 2, \dots, 8$ in Figure 3a. From $t r (C N)$ , we get four transformations: $s_{1} = [3, 4, 3, 4, 5, 6, 7, 8]$ , $s_{2} = [4, 3, 3, 4, 5, 6, 7, 8]$ (both obtained from r₀), and $s_{3} = [5, 6, 7, 8, 5, 6, 7, 8]$ , $s_{4} = [5, 6, 8, 7, 5, 6, 7, 8]$ (both obtained from r₁) with the corresponding alphabet $Σ = {s_{1}, s_{2}, s_{3}, s_{4}}$ . For a reaction, the corresponding transformation(s) maps the atoms of the educt molecules to the atoms of the product molecules, whereas all other atoms are mapped with the identity. The transformations describe how carbon atoms are rearranged into different configurations when an event is fired. s₁ and s₂ describe how the carbon atoms of a glycolaldehyde molecule are arranged in the molecule $p_{0, 0}$ when transformed via the reaction r₀. In the case of s₁, we observe that the carbons are rearranged such that $s_{1} (1) = 3$ and $s_{1} (2) = 4$ . Of course, due to the symmetries in the molecule $p_{0, 0}$ , reaction r₀ also results in the mirrored transformation of s₁, that is, $s_{2} (1) = 4$ and $s_{2} (2) = 3$ . The characteristic monoid of $C N$ , $S (C N)$ , has an order of 9. We illustrate the movement of atoms in $C N$ by its Cayley graph $C a y (C N)$ , which is depicted in Figure 3b. Any path originating from the identity element corresponds to an event trace, for example, we can track the atoms 1 and 2 through the event trace $s_{1} s_{3}$ as the corresponding path and realize $s_{1} s_{3} (1) = 8$ and $s_{1} s_{3} (2) = 7$ . Assume now that we were only interested in tracking the carbon atoms found in the glycolaldehyde molecule. To this end, we can examine $O (\bar{z}, S (C N))$ , which contains six elements, meaning that there exist six unique atom states for the atoms in a glycolaldehyde molecule. Again, we can study these movements using the projected Cayley graph $P C a y (C N, (1, 2))$ . The resulting graph is depicted in Figure 4a.

FIG. 3.

(a) A small example using molecules and reactions found in the formose reaction. The carbon atoms of each molecule are labeled with a unique identifier for easy reference. (b) The Cayley graph $C a y (C N)$ of (a) from the example of Section 3.1. From the graph, we observe the longest path from $1_{X}$ has length 2, meaning that any event trace can at most transform $1_{X}$ meaningfully twice. In fact, only two types of event traces are of interest: either the tracked atoms are immediately moved by the reaction r₁ to $p_{0, 2}$ or the atoms of glycolaldehyde are first moved to $p_{0, 0}$ using r₀ and then moved to $p_{0, 2}$ .

FIG. 4.

(a) The projected Cayley graph $P C a y (C N, (1, 2))$ from the example of Section 3.1. Like for $C a y (C N)$ , we observe that there are only two types of event traces of interest. However, since we are only tracking the atoms of the glycolaldehyde molecule, some atom states are effectively coalesced compared to $C a y (C N)$ . (b) The projected Cayley graph $P C a y (C N, (1, 2))$ from the example of Section 3.2. The graph shows the natural subsystems of the carbon atoms of a glycolaldehyde molecule. Vertices in the same box constitute vertices that are in the same natural subsystem. Note that edges between vertices in the same natural subsystem are not depicted [e.g., one of the eight hidden edges in the top-level subsystem is $(3, 4) \to (1, 2)$ with label s₅].

3.2. Natural subsystems of atom states

In the intersection between group theory and systems biology, attempts to formalize the notion of natural subsystems and hierarchical relations within such systems have been performed by works such as Nehaniv et al. (2015). Here, natural subsystems are defined as symmetric structures arising from a biological system. Such symmetries manifest as permutation groups of the associated semigroup representing said system. In such a model, the Krohn–Rhodes decomposition or the holonomy decomposition (Egri-Nagy and Nehaniv, 2015) can be used to construct a hierarchical structure on such natural subsystems of the biological system. In terms of atom tracking, however, defining natural subsystems in terms of the permutation groups in $S (C N)$ does not have an immediately useful interpretation. Similarly, the hierarchical structure obtained from methods such as holonomy decomposition is not intuitive to interpret. Instead, when talking about natural subsystems in terms of atom tracking, we are interested in systems of reversible event traces, that is, event traces that do not change the original configuration of atoms. To this end, it is natural to define natural subsystems of $S (C N)$ in terms of Green's relations (Clifford and Preston, 1967). For elements $s_{1}, s_{2} \in S (C N)$ , we define the reflexive transitive relation $⪰_{ℛ}$ as $s_{1} ⪰_{ℛ} s_{2}$ , if there exists an event trace $t \in Σ^{*}$ such that $s_{1} t = s_{2}$ . In addition, we define an equivalence relation $ℛ$ , where s₁ is equivalent to s₂, in symbols $s_{1} ℛ s_{2}$ whenever $s_{1} ⪰_{ℛ} s_{2}$ and $s_{2} ⪰_{ℛ} s_{1}$ .

Definition 3 (Natural Subsystems): The natural subsystems of $S (C N)$ is the set of equivalence classes induced by the $ℛ$ -relation.

The equivalence classes correspond to the strongly connected components of the Cayley graph $C a y (C N)$ (Froidure and Pin, 1997). We note that for a tuple of atoms $\bar{z}$ , the natural extension to natural subsystems of the orbit $O (\bar{z}, S (C N))$ is simply the strongly connected components of its projected Cayley graph $P C a y (C N, \bar{z})$ . The $ℛ$ relation is interesting, as the equivalence classes on $S (C N)$ induced by the $ℛ$ relation form pools of reversible event traces. More precisely, let $s_{1} ℛ s_{2}$ for some $s_{1}, s_{2} \in S (C N)$ , where $s_{1} \cdot t_{12} = s_{2}$ and $s_{2} \cdot t_{21} = s_{1}$ for some $t_{12}, t_{21} \in Σ^{*}$ . Then, the event traces $t_{12}$ and $t_{21}$ are reversible, that is, we can re-obtain s₁ as $s_{1} t_{12} t_{21} = s_{1}$ and s₂ as $s_{2} t_{21} t_{12} = s_{2}$ . Additionally, the quotient graph of the equivalence classes of the $ℛ$ relation on the Cayley graph $C a y (C N)$ naturally forms a hierarchical relation on the atom states of $S (C N)$ that has a useful interpretation from the point of view of chemistry as we will see in Section 4.3.

Example: Again, consider the reaction network obtained from the formose reaction depicted in Figure 3a. We will include the transformations obtained from reaction r₂ in additions to the transformations listed in Section 3.1: $s_{5} = [1, 2, 1, 2, 5, 6, 7, 8]$ and $s_{6} = [1, 2, 2, 1, 5, 6, 7, 8]$ (both obtained from r₂). Assume that we are interested in determining how carbon atoms of a glycolaldehyde molecule can reconfigure into different molecules. The projected Cayley graph $P C a y (C N, (1, 2))$ shows such configurations and is depicted in Figure 4b. Here, the atom states belonging to the same gray box are strongly connected and hence belong to the same natural subsystem. For clarity, we have removed edges between atom states in the same subsystem since any atom state in a subsystem can be transformed into any other state in the same subsystem.

Notably, we observe from Figure 4b that the atoms 1 and 2 in the glycolaldehyde molecule can swap positions. We could of course also realize that such a swap was possible by noticing the symmetries in the glycolaldehyde molecule and the fact that we can convert glycolaldehyde to the $p_{0, 0}$ molecule and vice versa. However, such patterns become immediately obvious from the projected Cayley graph. Finally, we can derive from Figure 4b that it is only possible to leave the original subsystem by applying transformation s₃ or s₄, corresponding to reaction r₁.

4. Results

4.1. Implementation

To test the practicality of the structures introduced in the previous section, we implemented the construction of the projected Cayley graph of a set of atoms in a chemical network. The resulting implementation can be found at https://github.com/Nojgaard/cat All code is written in python and uses the software package MØD (Andersen et al., 2016) and NetworkX (Hagberg et al., 2008) to construct the chemical networks and find the transformations used for the characteristic monoid. All figures in the following section were constructed with said implementation, and each run finished within seconds on an 8 core Intel Core i9 CPU with 64 GB memory. The most time-consuming part of the implementation was the computation of the transformations obtained from each hyperedge in the chemical network. In contrast, the construction time of the projected Cayley graph proved to be negligible.

4.2. Differentiating pathways

In this section, we will explore the possibilities of using the characteristic monoids of chemical networks to determine if it is possible to distinguish between two pathways P₁ and P₂, based on their atom states of their respective characteristic monoids. The motivation stems from methods such as isotope labeling. Here, a “labeled” atom is a detectable isotope whose position is known in some initial molecule and can then be detected, along with its exact position, in the product molecules of some pathway. In contrast to Andersen et al. (2019), we will not focus on the orbits of atoms in isolation, as we lose the ability to reason about atom positions in relation to each other. Moreover, as we will see here, the Cayley graph of the chemical network can be used to identify the exact event two pathways split.

Given a chemical network $C N$ , a pathway P is a set of hyperedges (i.e., reactions) from $C N$ equipped with a set of input and output molecules. We think of a pathway as a process that consumes a set of input molecules to construct a set of output molecules, using the reactions specified by P. In our case, a “labeled” atom is a point in $S (C N)$ . Given two pathways P₁ and P₂, we can characterize the possible movement of atoms as the characteristic monoids $S (P_{1})$ and $S (P_{2})$ . In practice, it might not be feasible to track every atom in $C N$ , for example, we are only able to replace a few atoms with its corresponding detectable isotope, and hence, it becomes useful to consider the orbits $O (\bar{z}, S (P_{1}))$ and $O (\bar{z}, S (P_{2}))$ , where $\bar{z}$ is the atom from the input molecules we can track. Clearly, of the atom states in $O (\bar{z}, S (P_{1}))$ and $O (\bar{z}, S (P_{2}))$ , we can only expect to observe, for example, in an isotope labeling experiment, the atom states that locate the tracked atoms in the output molecules. As a result, we arrive at the following observation.

Observation 1: Let $Y_{i} \subseteq O (\bar{z}, S (P_{i}))$ , $i \in {1, 2}$ , be the atom states we can hope to observe after some isotope labeling experiment. Then, we can always distinguish between P₁ and P₂ if $Y_{1} \cap Y_{2} = \emptyset$ .

Example: Consider the network $C N$ depicted in Figure 5a modeling the creation of product 4-phenyl-6-aminopyrimidine (denoted P) from the educt 4-(benzyloxy)-6-bromopyrimidine (denoted E) using ammonia. This well-investigated and widely used substitution mechanism (Addition of the Nucleophile, Ring Opening, and Ring Closure [ANRORC]) ( Van der Plas, 1978) was proven to nontrivially function via ring opening and ring closure (and an accompanied carbon replacement) via isotope labeling. Two possible pathways are modeled: the input molecules for the two pathways are the molecules E, NH₃, NH₂, whereas the output is the single molecule P. The first, seemingly correct but wrong, pathway $P_{1} = {r_{3}}$ converts E and an NH₃ molecule directly into P, by replacing the Br atom with NH₂. The second pathway consists of the reactions $P_{2} = {r_{0}, r_{1}, r_{2}, r_{4}}$ and models the ANRORC mechanism.

FIG. 5.

(a) The chemical network for the creation of P from E using ammonia. The dotted light gray and dark gray lines show the possible atom trajectories for the atoms 2 and 3, respectively. (b) The projected Cayley graph $P C a y (C N, (2, 3))$ .

Assume we wanted to device a strategy to decide what pathway is executed in reality. By replacing the nitrogen atoms of the E molecule with the isotope ¹³N, we would be able to observe where the atoms are positioned in the produced P molecule. Since we, by assumption, only label the nitrogen atoms of the E molecule, that is, the atoms 3 and 2, we can look at the orbits of the characteristic monoids $O ((2, 3), S (P_{1}))$ and $O ((2, 3), S (P_{2}))$ with the order of 5 and 2, respectively. We observe that both orbits only contain a single element locating (2, 3) in the P molecule, namely the element (14, 15) for $O ((2, 3), S (P_{1}))$ and $(14, 13)$ for $O ((2, 3), S (P_{2}))$ . As the possible configurations are different for P₁ and P₂, it is hence possible to always identify if the P molecule was created by P₁ or P₂.

This fact also becomes immediately obvious by looking at the projected Cayley graph $P C a y (C N, (2, 3))$ depicted in Figure 5b that shows the immediate divergence of atom states of the two pathways.

4.3. Natural subsystems in the TCA cycle

The citric acid cycle, also known as the TCA cycle or the Krebs cycle, is at the heart of many metabolic systems. The cycle is used by aerobic organisms to release stored energy in the form of ATP by the oxidation of acetyl-CoA into water and CO₂. The details for the TCA cycle can be found in any standard chemistry text book, for example, Harvey and Ferrier (2010). In Smith and Morowitz (2016), the trajectories of different carbon atoms in the TCA cycle were examined to explain the change of their oxidation states. It is well known that there is an enzymatic differentiation of the two carboxymethyl groups in citrate, which requires a rigorous stereochemical modeling of the graph grammar rules used (Andersen et al., 2017). Ignoring such stereochemical modeling would lead to atom mappings not occurring in nature. We will provide a formal handle to analyze theoretically possible carbon trajectories using the algebraic constructs provided in this article. As we will see, such structures provide intuitive interpretations for the TCA cycle. More precisely, assume that we are interested in answering the following questions: What are the possible trajectories of the carbons of an oxaloacetate (OAA) molecule within the TCA cycle while (1) ignoring the enzymatic differentiation of the two carboxymethyl groups in citrate (denoted TCA-) or (2) not ignoring (denoted TCA-). To answer these questions, we will decompose the characteristic monoid of the TCA cycle into its natural subsystems and examine them using the projected Cayley graph.

In our setting, the TCA cycle is the chemical network $C N$ , depicted in Figure 6, giving rise to transformations of the underlying monoid. The network is made up of 13 reactions; however, some of the reactions are not shown for simplicity. Of these 13 reactions, 7 of them yield exactly 1 transformation each while the remaining 6 yield 2 possible transformations each, resulting in a total of 19 transformations found. The reactions containing multiple transformations are due to automorphisms in molecules such as citrate and fumarate. When the enzymatic differentiation of the carboxymethyl group in citrate is not ignored, only 4 of the 13 reactions yield 2 possible transformations, as the carbon traces to and from citrate are more constrained. In short, while both TCA- and TCA- are modeled by the same network, the obtained transformations differ. More precisely, $| t r (C N) | = 19$ wrt. TCA- and $| t r (C N) | = 17$ wrt. TCA-.

FIG. 6.

A (simplified) chemical network modeling the TCA cycle. Note that any molecules not containing carbon atoms are modeled, but not depicted here. Each carbon atom is equipped with a unique ID for easy reference. TCA, tricarboxylic acid.

To start the cycle, an acetyl-CoA molecule is condensed with an OAA molecule, executing a cycle of reactions that ends up regenerating the OAA molecule while expelling two CO₂ and water on the way. When an original atom is expelled from the cycle, we will consider it permanently lost. The carbon atoms of the OAA molecule that we are interested in tracking are annotated with the IDs 4, 5, 6, and 7. Let $\bar{z} = (4, 5, 6, 7)$ . The projected Cayley graph of $P C a y (C N, \bar{z})$ wrt. TCA- (resp. TCA-) consists of 213 (resp. 67) vertices. The full Cayley graphs are depicted in Figure 7a and b, respectively. When a carbon atom leaves the TCA cycle, we denote it by “_.” For example, the atom state $(, 7, 6,)$ should be read as the original carbon atoms with IDs 4 and 7 has been expelled, whereas the carbon atoms with IDs 5 and 6 are located at the atoms with IDs 7 and 6, respectively.

FIG. 7.

(a) The projected Cayley graph $P C a y (C N, (4, 5, 6, 7))$ wrt. TCA-. The non-black-colored vertices of the same color correspond to atom states that are part of the same strongly connected component. (b) The projected Cayley graph wrt. TCA-. The non-black-colored vertices of the same color correspond to atom states that are part of the same strongly connected component.

We can find the natural subsystems of $C N$ as the strongly connected components of $P C a y (C N, \bar{z})$ . In TCA- (resp. TCA-), we find 92 (resp. 51) strongly connected components, of which 8 (resp. only 1) are nontrivial. Any nontrivial strongly connected component must invariably contain at least one tour around the TCA cycle since this is the only way the original atoms of the OAA molecules can be reused to create another OAA molecule. Moreover, any nontrivial strongly connected component represents a sequence(s) of reactions that uses (some of the) original atoms of the OAA molecule. To simplify $P C a y (C N, \bar{z})$ such that only the information on carbon traces of the atoms of OAA is depicted, we will construct the simplified projected Cayley graph, denoted $S C a y (C N, \bar{z})$ , as follows: collapse any vertex in $P C a y (C N, \bar{z})$ that is part of a trivial strongly connected component and whose atoms are not located in an OAA molecule. Moreover, for any nontrivial strongly connected component, hide the edges between atom states in the same strongly connected component, and finally, only include atom states if the atoms are located in an OAA molecule. The resulting graphs for TCA- and TCA- are depicted in Figure 8. Each box in the figure represents a natural subsystem that contains an atom state where every atom is either expelled or located in an OAA molecule. When ignoring the stereochemical formation of citrate, $(, 5, 6, 7)$ is a grey node in $S C a y (C N, \bar{z})$ (i.e., a representative of a strongly connected component $P C a y (C N, \bar{z})$ ), that is, there is a trajectory where three of the four original carbons of OAA are reused at the same location after a TCA- cycle turnover. However, in TCA-, only $(, 5,,)$ is a representative of a strongly connected component, that is, only the carbon with ID 5 of OAA can be kept at the same location when a multitude of TCA- turnovers are executed. If that carbon changes location, it will leave the TCA cycle after exactly two more turnovers (the natural subsystems reachable from $(, 5,,)$ do not correspond to strongly connected components) via positions $5 \to 6 \to 4 \to$ or via $5 \to 6 \to 7 \to$ . To the best of our knowledge, such investigations have not been executed formally before.

FIG. 8.

(a) The oxaloacetate molecule. The carbon atoms are equipped with IDs 4, 5, 6, and 7. (b) The simplified projected Cayley graph $S C a y (C N, (4, 5, 6, 7))$ , when adjusting for stereospecific citrate in $t r (C N)$ . (c) The simplified projected Cayley graph $S C a y (C N, (4, 5, 6, 7))$ when not considering stereospecificity.

Interestingly, $S C a y (C N, \bar{z})$ , as depicted in Figure 8c, allows us to closely examine each of the possible carbon trajectories of TCA-. For example, the fact that the atom state $(, 6, 7,)$ is present in $S C a y (C N, \bar{z})$ wrt. TCA-, means that there exists a sequence of reactions that expels the carbons with IDs 4 and 7, but reuses the carbon atoms with IDs 5 and 6 to create a new OAA atom, where 5 is located at 6 and 6 is located at 7. Structurally, the atoms 4 and 7 correspond to the outer atoms in the carbon backbone in the OAA molecule, whereas the atoms 5 and 6 correspond to the inner atoms in the carbon backbone. In other words, the presence of $(, 6, 7,)$ means that there exists a sequence of reactions that expels the outer atoms of the carbon backbone while recycling the inner atoms.

Figure 8c gives us a rough road map to determine exactly what sequence of events must have taken place to end up in the atom state $(, 6, 7,)$ . We start with the atom state $(4, 5, 6, 7)$ and see there is an edge directly to $(, 6, 7,)$ , meaning that we can expel the two outer atoms in a single cycle. This is, however, not the only way we can end up with the atom state $(, 6, 7,)$ . For example, after one cycle, we can expel the carbon with ID 4 and end up with the atom state $(, 5, 6, 7)$ , that is, all other atoms are still in their original positions. After another cycle, we can end up in the atom state $(, 6, 7,)$ or $(6, 5, 4)$ . Note that $(, 5, 6, 7)$ is part of a nontrivial strongly connected component, meaning that there exists a sequence of reactions in the TCA cycle that ends up in the exact same atom state. That is, we expel the carbon atom at position 4 (which is already expelled) while keeping all other atoms at their original position. In contrast, the atom state $(, 6, 5, 4)$ is part of a trivial strongly connected component, meaning that any sequence of reaction in the TCA cycle will have to change the atom state.

If any nontrivial strongly connected component in Figure 8c contains more than one vertex, it means that we can swap between atom states after a tour in the TCA cycle. As an example, consider the atom states $(, 6, 5,)$ and $(, 5, 6,)$ that are both part of the same strongly connected component. The fact that they are part of the same strongly connected component, means that it is possible to swap the inner atoms of the carbon backbone during the TCA cycle. If we would be interested in the exact sequence of transformations that lead to the swap, we simply examine the subgraph of $P C a y (C N, \bar{z})$ wrt. TCA- corresponding to that natural subsystem of $S C a y (C N, \bar{z})$ wrt. TCA-, as illustrated in Figure 9. The figure depicts all possible ways to swap the positions of atoms with IDs 5 and 6 as the possible paths between $(, 5, 6,)$ and $(, 6, 5,)$ . Figure 6 shows one such path traversing the TCA cycle without expelling any of the remaining carbon atoms.

FIG. 9.

The strongly connected component of $P C a y (C N, (4, 5, 6, 7))$ wrt. TCA- containing the state $(, 6, 5,)$ and $(, 5, 6,)$ .

5. Conclusion

In this work, we have extended the insights provided by Andersen et al. (2019) by showing the natural relationship between event traces, the characteristic monoid, and its corresponding Cayley graph. The projected Cayley graph provides valuable insights into local substructures of reversible event traces.

We observe future steps for this approach to branch in at least two directions. On one hand, these methods show obvious applications in isotopic labeling design. To this end, it is natural to extend the system to model the actual process of such experiments. For example, when doing isotopic labeling experiments with mass spectrometry, molecules are broken into fragments and the weight of such fragments is deduced to determine the topology of the fragment. Using our model to track where the atoms might end up in such fragments and how it affects their weight seems like a natural next step. On the other hand, a more rigorous investigation of the fundamental properties derived from semigroup theory of the characteristic monoid seems appealing. As we have shown here, understanding such relations might grant insights into the nature of the examined system.

Footnotes

Author Disclosure Statement

The authors declare they have no competing financial interests.

Funding Information

This work was supported by Novo Nordisk Foundation grant NNF19OC0057834 and by the Independent Research Fund Denmark, Natural Sciences, grant DFF-0135-00420B.

References

Andersen

J.L.

, Flamm

, Merkle

, et al. 2013. Inferring chemical reaction patterns using rule composition in graph grammars. J. Syst. Chem. 4, 4.

Andersen

J.L.

, Flamm

, Merkle

, et al. 2016. A software package for chemically inspired graph transformation, 73–88. In Echahed, R., and Minas, M., eds. Graph Transformation. Presented at the 9th International Conference, ICGT 2016, Proceedings Vol. 9761, LNCS, Springer, Vienna.

Andersen

J.L.

, Flamm

, Merkle

, et al. 2017. Chemical graph transformation with stereo-information, 54–69. In de Lara, J., and Plump, D., eds. Graph Transformation. Presented at the 10th International Conference, ICGT. LNCS, Marburg.

Andersen

J.L.

, Merkle

, and Rasmussen

P.S.

2020. Combining graph transformations and semigroups for isotopic labeling design. J. Comput. Biol. 27, 269–287.

Chahrour

, Cobice

, Malone

, et al. 2015. Stable isotope labelling methods in mass spectrometry-based quantitative proteomics. J. Pharm. Biomed. Anal. 113, 2–20.

Clifford

A.H.

, and Preston

G.B.

1967. The Algebraic Theory of Semigroups, Volume II. American Mathematical Society, Providence, RI.

Deev

S.L.

, Khalymbadzha

I.A.

, Tatyana

. Shestakova

T.S.

, et al. 2019. 15N labeling and analysis of 13C–15N and 1H–15N couplings in studies of the structures and chemical transformations of nitrogen heterocycles. RSC Adv. 9, 26856–26879.

Dénes

1966. Connections between transformation semigroups and graphs. (French summary), Theory of Graphs (Internat. Sympos., Rome), 93–101. Gordon and Breach, New York; Dunod, Paris, 1967.

Egri-Nagy

, and Nehaniv

C.L.

2008. Hierarchical coordinate systems for understanding complexity and its evolution, with applications to genetic regulatory networks. Artif. Life, 14, 299–312.

10.

Egri-Nagy

, and Nehaniv

C.L.

2015. Computational holonomy decomposition of transformation semigroups. arXiv preprint, arXiv:1508.06345.

11.

Froidure

, and Pin

J.-E.

1997. Algorithms for computing finite semigroups, 112–126. In Cucker, F., Shub, M. (eds). Foundations of Computational Mathematics. Springer, Berlin, Germany.

12.

Habel

, Müller

, Plump

, et al. 2001. Double-pushout graph transformation revisited. Math. Struct. Comp. Sci. 11, 637–688.

13.

Hagberg

A.A.

, Schult

D.A.

, Swart

P.J.

, et al. 2008. Exploring network structure, dynamics, and function using NetworkX, 11–15. In Varoquaux, G., Vaught, T., and Millman, J., eds. Proceedings of the 7th Python in Science Conference, Pasadena, CA.

14.

Harvey

, and Ferrier

2010. Biochemistry (Lippincott's Illustrated Reviews Series). Lippincott Williams & Wilkins, Baltimore, MD, Philadelphia, PA.

15.

Mikolajczak

1991. Algebraic and Structural Automata Theory. Elsevier, Amsterdam, the Netherlands.

16.

Nehaniv

C.L.

, Rhodes

, Egri-Nagy

, et al. 2015. Symmetry structure in discrete models of biochemical systems: natural subsystems and the weak control hierarchy in a new model of computation driven by interactions. Philos. Trans. A Math. Phys. Eng. Sci. 373, 20140223.

17.

Rhodes

, Nehaniv

C.L.

, and Hirsch

M.W.

2010. Applications of Automata Theory and Algebra: Via the Mathematical Theory of Complexity to Biology, Physics, Psychology, Philosophy, and Games. World Scientific.

18.

Smith

, and Morowitz

H.J.

2016. The Origin and Nature of Life on Earth: The Emergence of the Fourth Geosphere. Cambridge University Press, Cambridge, United Kingdom.

19.

Van der Plas

H.C.

1978. The SN(ANRORC) mechanism: a new mechanism for nucleophilic substitution. Acc. Chem. Res. 11, 462–468.