Abstract
The chemical basis of life involves more than simply the presence of biological molecules; biochemical systems embody a complex network of reactions with characteristic topological features. At the same time, chemical complexity is also present in nonbiological contexts, inviting us to clarify the relationship between chemistry and life through comparative studies. This study examines chemical networks from biology (the metabolism of E. coli) and astronomy (gas-phase reactions in dark molecular clouds) to establish common topological features that may be generic for any complex chemical system, as well as clear differences that may be topological signatures of life. The biological and astrochemical networks exhibit different scaling behaviors, and the network motifs found in the two systems show similarities as well as significant differences. The PageRank algorithm was used to quantify the degree to which individual species act primarily as products or reactants; in the metabolic network, these two roles are nearly identical for most species, whereas the astrochemical network shows a clearer partitioning into reactants and products. Key Words: Network theory—Metabolism—Interstellar medium—Astrochemistry. Astrobiology 12, 29–39.
1. Introduction
L
This distinction is of fundamental relevance to our understanding of life on Earth, as well as the search for life elsewhere in the Universe. To imagine what might be possible for biology in other planetary contexts, it can be helpful to cast the features of terrestrial biological systems in more abstract mathematical terms that may still be applicable even if the chemical details of life were dramatically different. In reality, only some of the features of living systems are outcomes of evolution by natural selection, whereas others are constrained by the physics of complex systems. The aim of the present study is to compare a complex biochemical network (E. coli metabolism) with a nonbiological chemical network of similar size (gas-phase reactions in dark molecular clouds) in a search for contrasts and commonalities. These two chemical systems exist in environments that are unimaginably different—one operates at around 310 K in condensed aqueous medium, whereas the other is active at 10 K at lower densities than can be achieved in a laboratory vacuum chamber—and topological elements that are shared between them may be generic features of complex chemical networks over which evolution has little control.
2. Theory
Throughout this study, chemical networks will be represented as directed bipartite graphs (Fig. 1H), meaning that they consist of two different types of nodes (representing reactions and species) and that edges are directional and exist only between nodes of different types. An edge pointing from a species node to a reaction node indicates that the species is a reactant; an edge pointing from a reaction node to a species node designates that species as one product of the reaction. Many previous network-theoretic studies of chemical networks have relied on projection-based methods in which all nodes represent species and edges represent participation in a common reaction (Jeong et al., 2000; Alves et al., 2002; Ma and Zeng, 2003; Arita, 2004; Zhao et al., 2007), or nodes represent reactions and edges denote a shared reactant or product (Wagner and Fell, 2001; Light and Kraulis, 2004; Light et al., 2005; Spirin et al., 2006; Kreimer et al., 2008). It has been shown, however (Montañez et al., 2010; Zhou and Nakhleh, 2011), that such projection schemes introduce statistical artifacts and discard potentially important information about the system's topology.

Illustration of graph properties. (
One important traditional measure of the large-scale structure of networks is the degree distribution, which measures the number of connections that a randomly chosen node makes with others. Many biological networks, including metabolic networks, have been shown to have a power-law degree distribution in which the probability of a node's having degree k scales as P(k)∼k −γ , with (typically) 2<γ<3 (Barabási and Oltvai, 2004). Power-law distributions (Fig. 1A) have certain very special properties that are not shared by other types of distributions. For example, in an exponential system (Fig. 1B) where the degree probability distribution scales as P(k)∼e −αk , rescaling all the k values by a positive constant would require a modification of the decay constant α, whereas the power-law scaling P(k)∼k −γ is unaffected by a rescaling of k. Only power laws exhibit this scale-free behavior. The origin of this power-law scaling is somewhat more controversial—many authors borrow from the physics-oriented literature of self-organized criticality and refer to such networks as “scale-free” (Albert and Barabási, 2002; Barabási, 2009), while others (largely from an engineering background) refer to “scale-rich” networks that are highly optimized for tolerance to defined types of perturbations (Carlson and Doyle, 1999; Tanaka, 2005). It has been observed (Solé and Munteanu, 2004; Jolley and Douglas, 2010) that nonbiological chemical networks, such as those in dark molecular clouds and planetary atmospheres, often follow exponential degree scaling, which leads to a relative scarcity of highly connected nodes relative to the power-law case often seen in biology. One exception to this rule is that models of Earth's atmosphere, in contrast to atmospheric models for other planets in the Solar System, show a power-law degree distribution, which suggests that the same non-equilibrium physics that shapes biological systems may be shaping the planet as a whole (Solé and Munteanu, 2004).
For bipartite graphs, degree may not always be a useful measure. In chemical networks, the degree distribution of species is likely to be very diverse, as some species will participate in many reactions, and others will participate in far fewer. The degree distribution of reactions will be far less interesting—most will have a degree of 2–6, depending on their number of reactants and products. A more useful quantity in bipartite graphs is the strength (Fig. 1H), calculated as the number of paths from a given node to its second-nearest neighbors of the same type (Montañez et al., 2010). Note that this quantity can be larger than the number of second-nearest neighbors (which often corresponds to the degree in projected unipartite graphs) because a node could be connected to one of its second-nearest neighbors by paths through more than one nearest neighbor. This is one example of information that is discarded in the transition from bipartite to projected unipartite graphs.
Another unipartite measure that does not generalize well to bipartite graphs is the transitivity, defined as the probability that, if edges exist between nodes A and B and between nodes B and C, nodes A and C will also be connected. Transitivity can provide a measure of the “cliquishness” of a network (Fig. 1C, 1D) and has been used extensively in the sociological literature (Newman, 2003). Geometrically, a calculation of the transitivity involves counting up the number of triangles in the graph—in a highly transitive network, triangles could be considered a “network motif” (Alon, 2007) that is overrepresented relative to what would be expected in a similar random graph. Bipartite graphs have a transitivity of zero by definition; the presence of triangles would require a connection between two nodes of the same type. By analogy, however, one could count up the number of quadrilaterals in a graph to define a “bipartite transitivity” measure. Furthermore, the directionality of edges allows for the detection of seven different types of quadrilaterals (Fig. 2A), allowing for a more nuanced view of the manner in which a particular network is transitive.

(
Degree distribution and transitivity are both calculated solely from local interactions within the graph; other measures give a more direct measurement of the graph's large-scale structure. One example is the modularity (Newman, 2006), defined as the degree to which the graph can be broken into disjoint modules for which intramodule edges are more prevalent than intermodule edges (Fig. 1E–G). Modularity is traditionally calculated by using community detection algorithms, which attempt to break the graph into successively smaller units until no more viable modules can be identified. Community detection does not require any modifications to work with bipartite chemical graphs; the modules returned are collections of species and reactions that form a self-contained subnetwork that is weakly connected to the rest of the system.
As a final example of a method that yields information about the global structure of a graph, bipartite chemical graphs can also be analyzed with the PageRank algorithm (Page et al., 1999; Langville and Meyer, 2005), which forms the algorithmic basis for the Google search engine. PageRank was initially developed as a method of assessing the “importance” of web pages by simulating the motion of an idealized random walker on a directed graph and ranking nodes based on the amount of time that the random walker spends there. To use a more physical analogy, nodes begin with an equal amount of PageRank and distribute it evenly among their outbound edges, until the distribution of PageRank becomes self-consistent. There are two ways for a node to have a high PageRank—one would be to have a large number of inbound connections (especially from highly ranked nodes), and the other would be for relationships with connecting nodes to be fairly exclusive, so that they do not share their donated PageRank broadly. It should be noted that, although PageRank is an attribute of individual nodes, it is strongly context-dependent—nodes with a similar local environment (i.e., similar degree and strength) can have dramatically different PageRank values, based on their position within the network as a whole (Fig. 3).

Illustration of results of the PageRank algorithm. All graphs are shaded such that nodes with the highest PageRank are black and those with the lowest are white. The direction of edges is indicated by a thicker line at the end toward which the edge is directed. (
Because of the inherent directionality of the PageRank algorithm, it can be evaluated in two different directions to provide different types of information. In the forward direction, nodes receive PageRank from their inbound connections; highly ranked species would be important products (meaning that they are formed in important reactions), and highly ranked reactions would be important consumers (meaning that they consume important species). In the reverse direction, highly ranked species would be important reactants (consumed in important reactions), and highly ranked reactions would be important producers (generating important species).
To avoid becoming caught in traps of cyclically referencing nodes, the idealized random walker of the PageRank algorithm has a finite probability at each step of “teleporting” to a randomly chosen node and resuming her walk from that point. This behavior can be modified somewhat through the usage of a “personalization vector” that defines each node's probability of being chosen as a teleportation destination. When the algorithm is run on a chemical network in the forward direction, this vector should be used to specify an initial set of reactant species—nebular abundances of heavy atoms and H2 in the case of the astrochemical network, or growth media components in the case of the biochemical network. When the algorithm is run in the reverse direction, the personalization vector should specify the final state of the system—either the equilibrium abundances obtained at the end of an astrochemical simulation or the measured components of E. coli biomass.
3. Data Sets
Gas-phase chemical reactions in dark molecular clouds were derived from the UMIST Database for Astrochemistry (UDfA), which contains 2937 gas-phase reactions between 225 distinct molecular species (Woodall et al., 2007). Grain-surface processes were not included, and reactions not viable at 10 K were removed. The ultimate basis for databases such as UDfA lies in studies of dark molecular clouds in which radio and IR spectroscopy are used to identify prevalent chemical species. The success of a chemical model depends on its ability to reproduce estimated abundances of observed chemical species based on the inferred age and initial composition of the cloud. UDfA is designed to be used in (pseudo-) time-dependent simulations and therefore contains rate constant information for all reactions; this rate information is crucial to the correct outcome of astrochemical simulations but was not used in this study.
Reaction rates in the UMIST database have been based, to the extent possible, on experimental measurements. Woodall et al. (2007) reported that ∼30% of rate constants in the 2006 release of UMIST are measured. About 66% are curated from the literature, including in particular the NIST Chemistry Webbook (
When the PageRank algorithm was evaluated with a personalization vector, initial abundances were chosen as specified by Woodall et al. (2007), and final abundances were obtained from the steady-state values at the end of a 108 y simulation by using the UDfA reaction set and physical parameters appropriate for dark molecular clouds (10 K, 104 cm−3, 10 mag visual extinction). A table with relative abundances at the end of the simulation is included in the Supplementary Information (Supplementary Information is available online at
The metabolism of E. coli K-12 was modeled based on the information available in the online
The second (and for the purposes of this study far more important) difference is that EcoCyc contains a significant number of generic reactions, in which the substrate of a particular enzymatic reaction is either poorly characterized or inherently nonspecific. This means that the EcoCyc database could not be used to naïvely build a network model; instead, these nonspecific reactions had to be identified and, in most cases, removed from the network. Many of these nonspecific reactions were in lipid metabolism, where enzymes will act on lipids or precursors without regard to the number of carbon atoms in their fatty acid chains. Because lipids make a significant contribution to the biomass of a bacterial cell, these nonspecific reactions were replaced by a series of specific reaction steps that lead to the synthesis of dipalmitoyl diphosphatidylethanolamine, a major constituent of E. coli's cell membrane (Ganong and Raetz, 1982); a diagram showing the added reactions is included in the Supplementary Information accompanying this article. In addition to species involved in generic reactions, species were removed if they were either completely disconnected from the largest (undirected) connected subgraph component (i.e., not integrated with the rest of the network) or if they were connected only via generic species or currency metabolites (ATP, NADPH, etc.). A list of all species removed from the network and the rationale for their removal are provided in the Supplementary Information accompanying this article. In addition, reaction nodes that were nearest neighbors of removed species (i.e., generic or non-integrated reactions) were also removed. The final EcoCyc-derived metabolic network included 821 species participating in 1289 reactions.
When the EcoCyc network was analyzed with the PageRank algorithm in the forward direction, initial abundances were estimated by using molar ratios in minimal growth media (59%
All network calculations were performed with the NetworkX library in Python (Hagberg et al., 2008).
4. Results
Strength distributions: The strength was calculated for each node in the two chemical networks, and rank-strength graphs are shown in Fig. 4. The x axis of each plot is the node strength, normalized so that the maximum strength of each curve is 1. The y axis is the fraction of nodes with a (normalized) strength greater than the value on the x axis, equivalent to 1 minus the cumulative distribution function (CDF). A steep slope indicates a heterogeneous distribution in which higher-ranking nodes have much higher strength than lower-ranking nodes; a shallow slope indicates a more homogeneous distribution.

Strength distributions in the metabolic (EcoCyc) and astrochemical (UDfA) reaction networks. The strength axis has been normalized to a [0,1] scale for each network to highlight the similar shapes of the strength distributions for the networks. The vertical axis shows the fraction of nodes with a (normalized) strength greater than the given value on the horizontal axis; this can be thought of as 1 minus the CDF. The EcoCyc species nodes show a region that is approximately a straight line on the log-log plot, indicative of power-law scaling over a limited range, whereas the other three sets of nodes show a shape more characteristic of exponential scaling. Non-normalized plots along with power-law and exponential fits for each curve are included in the Supplementary Information accompanying this article.
Quadrilateral motifs: Figure 2B shows the results of counting the number of directed quadrilaterals of each type in the network, normalized to the total number of edges. To determine whether these directed quadrilaterals represent significant network motifs, both graphs were shuffled in order to obtain randomized graphs. The shuffling procedure consists of randomly selecting two directed edges: (nA,nB) points from node A to node B and (nC,nD) points from C to D. As long as nodes A and C are of the same type (as are B and D), these two edges can be removed from the network and replaced by (nA,nD) and (nC,nB). This swap procedure is repeated 10n e times, where n e is the number of edges in the network. This procedure preserves the in and out degree of each node and leaves the strength distribution virtually unchanged while randomizing any larger-scale structure. Any measurement of structure beyond the single-edge scale is significant only if it is not preserved when the graph is shuffled.
Modularity: Community-detection calculations show a qualitative difference between the metabolic and astrochemical reaction networks. The EcoCyc network showed a clustering coefficient of 0.492 [comparable to those reported elsewhere (Newman, 2006)], which decreases to 0 upon shuffling of the network. The UDfA network, on the other hand, shows a clustering value of 0, which is unchanged upon shuffling. The EcoCyc network splits into 12 distinct subnetworks; the largest contains 334 nodes and the smallest 8.
PageRank: The top-ranking species for forward and reverse PageRank evaluations on both networks are shown in Table 1. The identities of the top-ranking species are not surprising; in each case they correspond to highly abundant species that participate in a large number of reactions and form “hubs” in the network. Closer inspection, however, shows an important difference—for the EcoCyc network, the rankings for the forward and reverse algorithms are nearly identical, whereas for UDfA they show larger differences. This intuition can be made more systematic by plotting the forward versus reverse PageRank of each species (Fig. 5). EcoCyc shows a few conjugate pairs of currency metabolites that are located far from the diagonal—ATP/ADP, NAD+/NADH, NADP+/NADPH—but most other species are clustered close to the diagonal (and the origin). UDfA shows several species located far from the diagonal—He+, He, and e− are located above the diagonal and are therefore important reactants; CO, H, H2, O, and OH are located below the diagonal and are stable leaving groups.

PageRank (PR) calculations for chemical networks. (
A node's forward PageRank tends to correlate strongly with its inbound degree and its reverse PageRank with its outbound degree. In fact, plots similar to Fig. 5 can be made comparing the in and out degree of nodes in the UDfA and EcoCyc networks rather than the forward and reverse PageRank values. The results look qualitatively similar (see Supplementary Information), except that construction of a histogram similar to Fig. 5A is complicated by the fact that the in and out degree, unlike PageRank, must take on integer values. One striking difference is the location of He in the UDfA plot; with an in degree much higher than its out degree, He would be located well below the diagonal on an in versus out degree plot but is located well above the diagonal in Fig. 5.
The results described above were initially obtained for the PageRank algorithm in the absence of a non-uniform personalization vector. If a chemically realistic personalization vector is provided as described above, the first major change that is observed is that a strong correlation emerges between initial (final) abundance and forward (reverse) PageRank. In this situation, more information about the network structure can be derived from the “excess PageRank,” in which the contribution to the PageRank that is attributable simply to initial (or final) abundance is removed by subtracting out a linear fit. Once this correction is applied, the distribution of PageRank asymmetry described above is unchanged. There is a positive correlation between PageRank values obtained for individual nodes with and without the personalization vectors; this correlation is fairly weak (R 2∼0.59) for EcoCyc and stronger (R 2∼0.79) for UDfA.
5. Discussion
Strength distributions: Rank-strength graphs for the reaction and species nodes of each chemical network are shown in Fig. 4. Each curve was fitted to both a power-law and an exponential decay curve; the EcoCyc species were more consistent with a power-law distribution with γ=2.06 (p power-law=0.00015 vs. p exponential=0.0030), the UDfA species fit somewhere between the two distribution types (p power-law=0.0013 vs. p exponential=0.0018), and the reaction nodes for both systems clearly fit an exponential distribution more closely (p=0.00094 vs. 0.00010 for EcoCyc; p=0.0002 vs. 4.4×10−5 for UDfA). Note that even the EcoCyc species do not follow a straight line on the log-log plot for the entire distribution; the power law holds up well at an intermediate range but shows deviations at the high and low extremes. Note also that, although the power-law scaling e −γ can take on any real value, strength (and degree) take on discrete integer values; this makes attempts to verify power-law scaling with statistical tests based on continuous distributions (such as a Kolmogorov-Smirnov goodness-of-fit test) inherently problematic. Most studies reporting power-law scaling in networks rely on less-sensitive measures such as the linear regression fits described above.
The scaling observed for species strength is consistent with the published results described in the Theory section above for degree distributions, which suggests that strength distributions can be used to measure features of a bipartite graph similar to the unipartite graph features measured by degree distributions. This result also underscores the importance of using the full bipartite graph when dealing with chemical systems; a projection scheme would have obscured the fact that reactions and species in EcoCyc show qualitatively different strength distributions and possibly imposed a power-law scaling upon a projected reaction subgraph (Montañez et al., 2010).
Quadrilateral motifs: Figure 2B shows clearly that quadrilateral Types 0 and 4 are clear motifs in both the EcoCyc and the UDfA networks. Type 0 corresponds to a catalytic cycle (Fig. 6); Type 4 corresponds to two parallel reactions that share a common substrate and a common product. Type 0 is a much stronger motif in EcoCyc than in UDfA; this may result from the fact that biological systems have means for driving unfavorable reactions at room temperature, whereas the cold, near-vacuum conditions of the ISM require that all non-photochemical reactions be energetically favorable and barrierless, making the formation of closed loops much more difficult. In addition, quadrilateral Types 5 and 6 are antimotifs for both EcoCyc and UDfA, meaning that they are less prevalent in the real chemical networks than in their randomized counterparts. The Type 5 antimotif can be seen as a two-step linear pathway that is folded back on itself through re-use of a by-product. For example, the metabolic example in Fig. 6 is a two-step reduction of methylglyoxal to L-1,2-propanediol via acetol. Both reductions use NADH as a reductant, and the “folding back” of the pathway occurs because of the re-use of NAD+ as a by-product. Likewise, the astrochemical example involves the removal of two hydrogen atoms from

Specific examples of quadrilaterals from the EcoCyc and UDfA networks. The shading scheme is the same as in Fig. 1; in each case the two central species nodes and the reaction nodes attached to them form a directed bipartite quadrilateral, whereas the four species nodes at the edges are included for convenience in understanding the full reactions. The Type 0 motif is a catalytic cycle in which the two central species are constantly recycled, and the Type 4 motif is the result of a single substrate-product pair (ATP/ADP or SO/O) involved in driving multiple reactions. The Type 5 and 6 antimotifs, in contrast, require the species in the quadrilateral to participate in more than one type of reaction and are rare in real chemical networks.
Modularity: As described above, community-detection calculations revealed a significantly modular structure for the biological EcoCyc network and a pronounced lack of modularity for the astrochemical UDfA network. One way to gain insight about the nature of the subnetworks returned during community detection is to calculate the betweenness centrality for each node—the fraction of shortest paths between pairs of nodes that passes through a given node. Most of the subnetworks, particularly the larger ones, contain a single highly connected metabolite (Pi, diphosphate, ammonia, etc.) or a conjugate pair of metabolites (ADP/ATP, NAD+/NADH, NADP+/NADPH, SAM/SAH) with extremely high betweenness centrality values; often more than 0.4. The modularity of the bipartite EcoCyc network is not, however, solely a result of the highly connected tail of the strength distribution—the fact that shuffling abolishes the network's modularity indicates that this feature results from larger-scale structure. It is probably significant that many of the highly connected subnetworks involve conjugate pairs of currency metabolites; ATP and ADP (for example) will still have high degree (and therefore high strength) in a randomly shuffled network but are far less likely to be connected to the same reaction (as reactant and product) than in the native network. This diminishes their ability to function as the nucleus of a highly connected clique, effectively eliminating the network's modularity. If the primary currency metabolites ATP, ADP, NAD+, NADH, NADP+, NADPH, and Pi were removed from the network, the community detection algorithm returned a modularity of 0, consistent with a dominant role for these metabolites in creating a modular network.
PageRank: The high reverse PageRank of He in the UDfA network (Fig. 5, Table 1) deserves some additional comment—one would assume that such a chemically inert species would be unlikely to function as an important reactant. He is so stable, in fact, that He+ is a very powerful oxidant that will take an electron away from a variety of different species in order to fill up the unoccupied valence and form He. Although He+ is a highly reactive species, UDfA only contains one reaction that forms it—the cosmic ray ionization of He. He+ receives a high reverse PageRank because of the large number of reactions in which it participates, whereas He receives a high reverse PageRank because of its exclusive connection to a single reaction that produces a highly ranked species, which gives it a crucial role in connecting the overall network.
Taking the difference between the forward and reverse PageRanks for each species gives a measure of the degree to which a given species is more important as a reactant than as a product (Fig. 5). This shows a particularly striking difference between the EcoCyc and UDfA networks; this difference measure is tightly distributed around zero for the EcoCyc network (σ=0.00122), whereas UDfA gives a significantly broader distribution (σ=0.00484; 2-sample Kolmogorov-Smirnov p=2.5×10−9). This is suggestive of the approach to equilibrium favored in astrochemical systems, which would lead to a clearer partition of reactant and product species as opposed to the far-from-equilibrium nature of biochemical systems in which most metabolites can function in both catabolic and anabolic pathways. Removing some of the distinctive features of metabolic networks increases σ only slightly—to 0.00189 if reversible reactions are made irreversible and 0.00188 if all currency metabolites are removed. This suggests that the differences in PageRank asymmetry between the two networks reflect more subtle features of the network structure.
Although these results are most readily interpreted in terms of kinetics and thermodynamics, it is important to bear in mind that the graph-based calculations do not explicitly include this type of information. Instead, kinetics enters indirectly through the graph topology, because the only reactions that are included in the network are those that were considered kinetically feasible, either based on general physical-chemical principles (UDfA) or because an enzyme exists that is able to catalyze them (EcoCyc), possibly by coupling unfavorable reactions to favorable processes such as ATP hydrolysis.
6. Conclusions
Despite the tremendous differences in their physical environments in terms of density, temperature, solvation, and energy sources, the reaction networks present in a bacterial cell and in dark molecular clouds show a number of striking similarities. Both systems show reaction node strength distributions that follow exponential scaling, which suggests topological similarities in the large-scale ordering of the networks. The same two strong directed quadrilateral motifs can be found in both networks, which suggests that the same fundamental principles of chemical reactivity shape the local connectivity structure of both networks. Examination of the PageRank asymmetries depicted in Fig. 5 shows that both systems contain species that are more important as reactants than as products: He, He+, and e− in UDfA, and ATP and NADPH in EcoCyc. Both networks also contain stable leaving groups that are more important as products than as reactants: CO, H, and H2 in UDfA, and ADP, Pi, CO2, and NADP+ in EcoCyc.
The most interesting features of these two chemical networks, however, are probably their differences. The ultimate explanation for the power-law scaling of EcoCyc's species degree distribution is beyond the scope of this study, and extensive discussions of power-law behavior have been given elsewhere (Newman, 2005). Whatever the reason for the power-law scaling, it remains a telling difference between biological and nonbiological chemical networks. Although the two quadrilateral network motifs revealed by Fig. 2B are common between EcoCyc and UDfA, it should be noted that Type 0, which corresponds to catalytic cycles, plays a larger role in EcoCyc, whereas UDfA is more completely dominated by the parallel pathways represented in Type 4 motifs. Another interesting difference is that, while Types 1, 2, and 3 in UDfA are nearly indifferent to shuffling, which indicates that they are present in roughly the same levels as would be expected in a similar random network, every type of directed quadrilateral is either a motif or an antimotif in EcoCyc, with Types 5 and 6 representing stronger antimotifs than they do in UDfA. This suggests that, although the fundamental physics of complex chemical networks dictate a commanding role for Types 0 and 4, evolutionary pressures led to a less-random character for the distribution of the other quadrilateral types in EcoCyc. Compared to UDfA, EcoCyc also shows more balance between the forward and reverse PageRank values for most nodes (Fig. 5). This could be a result of evolutionary pressure to maintain a highly dynamic system in which metabolic fluxes can be reconfigured by enzymatic regulation and any metabolite can participate in a variety of different pathways, depending on the changing needs of the cell. A system in which most metabolites can clearly be classified as preferentially reactants or products will tend to be more static and to progress toward equilibrium, as is generally the case for nonbiological reaction systems.
The presence of significant network motifs and antimotifs in both the astrochemical and metabolic networks emphasizes the fact that chemical reaction networks are decidedly nonrandom. The randomized chemical networks used to assess the significance of the quadrilateral network motifs exhibit the same degree and strength distributions as the real networks but are dramatically different at larger scales of organization. This has clear implications for network-based models for the origin of life (Kauffman, 1986), which often rely on the presence of phase transitions in random graph models. It is well known that Erdős–Rényi random graphs are not a good approximation of metabolic reaction networks (Jeong et al., 2000) and that models with a power-law distribution are likely to provide a better approximation. The results in this study make it clear that constructing a plausible chemical network will likely require more than simply reproducing the correct degree (or strength) distribution. Chemically realistic random networks will, for example, need to exhibit the types of quadrilateral motif distributions shown in Fig. 2. Networks based on “artificial chemistry,” where in silico “molecules” are related to each other by chemistry-like rules, may be a valuable approach for the generation of hypothetical chemical networks with plausible structure (Benkö et al., 2003; Hintze and Adami, 2008).
It should be emphasized that neither of the chemical networks examined here provides a complete enough description of the physical environment to fully explain why these systems function the way they do in nature. In the case of astrochemical networks, the timescale for chemical equilibration is uncomfortably close to the timescale for gravitational collapse. This is what is meant when simulations based on chemical networks are referred to as “pseudo-time dependent”—true time dependence would require coupling between the physical and chemical evolution of the cloud (Lee et al., 1996). The biological case is even more complicated. Biological reactions occur in the crowded confines of the cell, where local concentrations can be high, often permitting enzymes to operate in both directions under different intracellular conditions and leading to the large number of reversible reactions observed in biochemical networks. An even greater source of additional complexity comes from regulatory mechanisms, by which the cell is able to control both the quantity and activity of catalytic enzymes in order to meet changing physiological needs. The binary nature of the networks examined in this study, where reactions are either present or absent, does not capture this level of regulatory nuance.
It should also be noted that there are many more differences between astrochemical and metabolic reaction networks than the mere presence or absence of biology, and it would be unwise to attribute every observed difference to biological organization. For this reason, extension of this type of analysis to other types of reaction systems could yield further insight into the constraints that shape the structure of chemical networks, both biological and nonbiological. For example, a network analysis of an abiotic, non-equilibrium reaction system such as the Belousov-Zhabotinsky reaction (Field et al., 1972; Mikhailov and Showalter, 2006) could help to determine whether some of the observed features of the biological system are a result of non-equilibrium dynamics or whether they are primarily a response to evolutionary pressures. Planetary atmospheres could provide examples of abiotic systems under somewhat more clement conditions than the ISM, and geochemical systems could provide a case of a system that is not biological in the sense of being acted upon directly by evolution but may be driven by the action of biological systems. Further insight into the topology of bipartite metabolic reaction networks could also be obtained by applying these tools to more sophisticated stoichiometric metabolic models (Feist et al., 2007) or to multiple-organism networks that could allow for the modeling of metabolism at the ecosystem level (Caspi et al., 2010). Such metabolic networks are unlikely to be qualitatively different from EcoCyc; however, they could provide a sense of the robustness of metabolic network features to the inclusion or exclusion of reactions that differ between differently curated databases. The tools of network theory are well suited to comparative studies of diverse types of chemical systems and can provide us with valuable insights into where life fits in the broader landscape of chemical complexity.
Footnotes
Acknowledgments
This research was supported by a grant from the NASA Astrobiology Institute (NNA08CN85A). The authors would also like to thank Aurélien Mazurie and Sarah Hörst for helpful discussions and references to pertinent literature and Aric Hagberg for assistance with the PageRank algorithm.
Author Disclosure Statement
C.J.: No competing financial interests exist. T.D.: No competing financial interests exist.
Abbreviations
CDF, cumulative distribution function; EcoCyc, E. coli encyclopedia; ISM, interstellar medium; UDfA, UMIST Database for Astrochemistry.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
