Abstract

This special issue of the Journal of Computational Biology comprises many of the articles presented at the international workshop titled “Genomic Regulation: Experiments, Computational Modeling, and Philosophy.” This workshop took place on December 4–5, 2017 at Ben-Gurion University of the Negev in Beer Sheva, Israel, and was organized by the Jacques Loeb Centre for the History and Philosophy of the Life Sciences.
This workshop was inspired by discussions with Eric Davidson, a participant in almost all the previous Jacques Loeb Centre international workshops, and who passed away in September of 2015. Davidson's modeling of developmental gene regulatory networks (GRNs) in sea urchins, in which the combination of mathematical modeling and experimentation was crucial, was a source of inspiration for many of the contributions to this special issue. Davidson's models were based on experimental data; they were not purely computational simulations. He used modeling to test causal-mechanistic theories and to guide experimentation. Davidson's work is also paradigmatic for modeling in biology, because as an experimental developmental biologist, he developed mathematical modeling only at a later stage of his research, when he switched to a systems approach.
Although most of the articles in this special issue deal, at least to some extent, with models of developmental GRNs, the contributions expand these themes in various ways by exploring the roles and transformations of models in biology under different philosophical, historical, and scientific perspectives, including chemistry and evolutionary biology.
As an introduction to this special issue, Michel Morange provides a brief philosophical introduction to the nature and roles of models in science before focusing on models in molecular and cellular biology and their relationships with experimental data. He focuses on molecular biology in general, and on gene regulation in particular. His detailed analysis of pertinent models in molecular biology shows the diversity of models and functions and their change in history. According to Morange, the construction of a mathematical or computational model is not always the final step in a research project that started from rough data, but can also be the starting point. He also makes it clear that models are not always a step toward abstraction, but can also be the opposite, a step toward a material representation of an abstract phenomenon. He concludes that there is no universal path of progress in modeling.
The diversity of models is also a topic of interest for Ute Deichmann, who focuses on quantitative modeling in research that aims at understanding fundamental features of development and heredity, starting with Mendel's modeling of the generation of plant hybrids and ending with Eric Davidson's modeling of GRNs. Among other models discussed in the article are D'Arcy Thompson's models of biological form, Alan Turing's models of development, and Watson's and Crick's model of the DNA structure. In addition to analyzing the epistemologies of these and other models, she compares their varying fruitfulness, explanatory scope, and reception by relating them to models' regard—or disregard—for basic principles of biology, in particular biological specificity or biological information and genetic or genomic causality. The marginalization of causality is particularly pertinent in phenomenological models as well as in big-data only approaches in which models often are regarded superfluous. The consequences are discussed.
A major part of this special issue is devoted to the modeling of gene regulation in development in different organisms at different periods of development.
Sorin Istrail gives an overview of his collaboration with Eric Davidson, focusing on computer science contributions to the study of the regulatory genome that their joint work produced. He presents four inspiring questions that Eric Davidson asked, and the follow-up, seven technical problems resolved with the methods of computer science. At the center, and unifying the intellectual backbone of those technical challenges, stands “Causality.” Their collaboration produced the causality-inferred cisGRN-Lexicon database, containing the cis-regulatory architecture of 600+ transcription-factor-encoding genes and other regulatory genes, in eight species: human, mouse, fruit fly, sea urchin, nematode, rat, chicken, and zebrafish. These cis-regulatory architectures are causality-inferred regulatory regions of genes, derived experimentally through the experimental method called “cis-regulatory analysis” (also known as the Davidson criterion). In this research program, causality challenges for computer science show up in two components: (1) how to define data structures that represent the causality-inferred DNA structure data by the Davidson criterion, and to define a versatile software system to host them and (2) how to identify by automated software for text analysis the experimental technical articles applying the Davidson criterion to the analysis to genes. The cisGRN-Lexicon meta-analysis (Part 1) is presented next. The article ends with some reflections on epistemology and philosophy themes concerning the role causality, logic, and proof in the emerging elegant mathematical theory and practice of the regulatory genome.
Isabelle Peter and Sorin Istrail discuss the function of the regulatory genome in terms of computational information processing. They illustrate the computational functions of the genome using two cases from the sea urchin embryo wherein regulatory functions were systematically analyzed by experiment and formalized using computational logic models. The first example focuses on the regulatory system of a single gene, endo16, and the second on information processing at the GRN level. Adopting the point of view of computation, they relate regulatory function to the execution of highly complex information processing functions at several levels of organization. Based on the observation that the function of the regulatory genome at the single gene and GRN level can be approximated by computational logic formulas similar to the logic gates used in computer science, they discuss the insights generated by experiments and computational models that illuminate the properties of the regulatory genome. Based on these insights, they raise questions of general importance, such as how the mechanisms of single gene regulation relate to regulatory operations at the network level. They show that even a relatively simple expression pattern requires complex computation of regulatory inputs and that these regulatory functions are integrated at the level of GRNs to compute more complex developmental processes.
Based on the example of pattern formation in the vertebrate neural tube, James Briscoe explores the roles of in silico replications and models as well as traditional experimental methods in the investigation of developmental processes. He holds that although these methods are important tools, coherent explanations can only be provided if they are supplemented by theory that identifies principles. He points to the fruitfulness of the theory of GRNs, developed by Eric Davidson and his colleagues, as a logical and formal framework for understanding embryonic patterning formation. In this study, the power of experimental data combined with computation simulation is described, and the potential of the GRN approach to provide a mechanistic and causal explanation for a complex set of gene regulatory events in development is illustrated. Briscoe shows that comparisons between different GRNs have been used as a basis for generating general principles. According to him, dynamical systems theory presents a natural formalism for describing and investigating GRNs. He also illuminates the limitations of the GRN framework and the challenges to be met. He makes it clear that development is not yet fully understood and that many questions are still unresolved, for example, how a developing embryo constrains the initial conditions to cope with the effect of nonlinearity.
Focusing mainly on examples from mammalian hematopoiesis, that is, the generation of blood cells, Ellen Rothenberg examines why it is still problematic to apply many of the available modeling approaches to explain and predict several kinds of development. She shows that the GRN theory has been extremely successful in explaining the increasing and irreversible complexity in early embryonic development, but has shortcomings regarding later periods of development, such as the generation of blood cells. Moreover, various types of models, which are all referred to as gene network models, actually seek to achieve very different goals. Among them are correlation-based “omics” models (which are not discussed in this article) and deterministic continuous-valued models of ordinary differential equations, and deterministic Boolean process models of regulatory systems.
She lists elements that would need to be taken into account in future more advanced models, including the importance of cellular history, as manifested, for example, in different chromatin states that may change the accessibility of different genomic DNA, the stochasticity of transcriptional activation, different mechanisms enforcing repression with different reversibilities at different genomic sites in the same cells, and the unknown syntax of gene regulation by multiple enhancers. The need to account for gene regulatory change kinetics in models could thus drive progress in the basic understanding of gene regulation in development.
The transition from experiment to modeling is not obvious. The available experimental data are anything but complete, and GRN research without modeling can be of high value, as is shown here in the contribution by Roger Patient. Aiming at a better understanding of how the human stem cells of the bone marrow are programmed, Patient and his collaborators delineate the GRNs that specify these cells during their development. They use the amphibian experimental model because many of the mechanisms employed there are conserved in mammals, including humans. By using known temporal expression patterns and activity inhibition methods for transcription factors and signal receptors, the article deciphers the GRN responsible for the differentiation of mesodermal cells located in the lateral plate of the embryo, into definitive adult hemangioblasts, which eventually give rise to hematopoetic stem cells.
In “Chemical Modeling from Dyes to Beta-Blockers: A Brief History,” Anthony S. Travis demonstrates how, from the 1870s, in the hands of the German biomedical researcher Paul Ehrlich, chemical structures and properties of aromatic organic compounds, in particular the nature and locations of substituents, or side chains, of synthetic dyestuffs, inspired highly speculative but useful models. They were used for, successively, describing the structure of the cell wall, an explanation of the mechanism of immunity, and suggesting structural features of synthetic drugs. In the early 1900s, John Langley in England redefined the nature of certain side chains, or receptors. In the 1960s, this played a central role in the development of models for the design of drugs, notably beta-blockers that were discovered by James Black at Britain's Imperial Chemical Industries. The receptor concept is still embedded in structure–activity relationship studies for drug development.
Mathematical models have been prevalent in evolutionary biology since its synthesis with population genetics in the late 1920s. Evolutionary theory has been dominated by Darwin's concept of novelty arising from gradual changes and, therefore, most models have dealt with natural selection in populations, not the generation of morphological novelties beyond the level of populations. Consequently, there are no models that integrate and test disparate views of micro- and macroevolution and evo-devo.
In contrast, based on a clear characterization of evolutionary novelty (the generation of new morphologies [similar to that of an invention in economics]) and innovation (the successful introduction of a novelty into an ecosystem), Doug Erwin discusses possible approaches to developing either a formal model or a conceptual framework for novelty and innovation. He concludes that although building a model for novelty and innovation is challenging, it is so far only possible to generate simulation models based on different types of novelty and that here, too, important issues remain unresolved. The construction of a conceptual framework for novelty and innovation would be only the first step in addressing a much richer variety of questions, among them, that of the causal relationships between the relevant variables and of the mechanisms underlying novelty. Although there are a few examples in which the co-option of a GRN to a new developmental address generated a new morphology, Erwin makes it clear that it has not yet been shown whether novelties at different levels are generated in different ways, and to what extent the paces of novelty and successful innovation have changed over time.
Michal Ziv-Ukelson and her collaborators make use of the huge and rapidly growing databases of microbial genomes to further explore the idea that groups of genes that are clustered locally together across many genomes usually express protein products that interact in the same biological pathway. They show that these highly conserved gene blocks are biologically related to operons. In contrast to the operon definition given by Jacob and Monod, their computationally predicted gene blocks could contain genes spanning both strands of the DNA, and could consist of one or more operons, or alternatively just a part of an operon. Based on data mining, they formulate a new variant of the well-known gene cluster discovery problem, where the sought gene blocks are a superset of a set of genes required to comply by user-specified constraints. They develop new techniques that comply with the rapidly growing genomic databases. The authors envisage further experimental perturbation studies to explore their hypothesis.
