Abstract
Abstract
Spatial pattern formation of the primary anterior–posterior morphogenetic gradient of the transcription factor Bicoid (Bcd) has been studied experimentally and computationally for many years. Bcd specifies positional information for the downstream segmentation genes, affecting the fly body plan. More recently, a number of researchers have focused on the patterning dynamics of the underlying bcd messenger RNA (mRNA) gradient, which is translated into Bcd protein. New, more accurate techniques for visualizing bcd mRNA need to be combined with quantitative signal extraction techniques to reconstruct the bcd mRNA distribution. Here, we present a robust technique for quantifying gradients with a two-exponential model. This approach (1) has natural, biologically relevant parameters and (2) is invariant to linear transformations of the data arising due to variation in experimental conditions (e.g., microscope settings, nonspecific background signal). This allows us to quantify bcd mRNA gradient variability from embryo to embryo (important for studying the robustness of developmental regulatory networks); sort out atypical gradients; and classify embryos to developmental stage by quantitative gradient parameters.
1. Introduction
1.1. Biology
A
More recently, studies have focused on the underlying dynamics and patterning of the bcd messenger RNA (mRNA) gradient, since the Bcd protein forms via translation from the mRNA. The bcd RNA gradient forms earlier than the protein gradient and exhibits a number of distinct features from the protein pattern. These have been the subject of several mathematical modeling projects, as well as new quantitative experimental projects, to characterize the bcd mRNA gradient (Spirov et al., 2009; Lipshitz, 2009; Kavousanakis et al., 2010; Deng et al., 2010; Little et al., 2011; Cheung et al., 2011; Dalessi et al., 2012; Liu and Niranjan, 2012; Fahmy et al., 2014; Ali-Murthy and Kornberg, 2016).
There are a number of features to bcd RNA patterning making it more complex to study than the Bcd protein pattern. These features require new and more sophisticated techniques in data acquisition and signal processing to extract quantitative data. This article presents and validates a new method for quantitative analysis of spatial profiles of bcd RNA reliably extracted from whole-embryo three-dimensional (3D) scans (confocal microscopy) of fluorescent in situ hybridization (FISH) RNA data.
1.2. Data
Figure 1A shows a sagittal section through the middle of such a whole-embryo scan, with fluorescence intensity proportional to the concentration of bcd mRNA. The data set is 3D, and the RNA transport setting up this 3D pattern has components in the three coordinates: head-to-tail (AP); top-to-bottom (dorsoventral, DV); and inside-to-outside (basal–apical, BA). The gradient is chiefly along the AP direction: biologically, the mRNA spreads posteriorly from a maternal deposition at the anterior end of the embryo. There are, however, concentration differences in the DV direction, and while bcd RNA and protein patterns are most intense in the surface, or cortex, of the embryo, bcd is also found in the interior of the embryo, and there is a concentration gradient in the BA direction. The transport processes establishing these gradients may differ between the different coordinates: AP transport of bcd RNA involves minus-end motors trafficking along microtubules, assisted by proteins such as Staufen (Stau; Weil et al., 2006, 2008; Spirov et al., 2009; Fahmy et al., 2014; Ali-Murthy and Kornberg, 2016); DV “bending” of bcd pattern may reflect geometric asymmetries in the embryo; and BA transport appears to occur at later stages of development, by an unknown mechanism (Bullock and Ish-Horowicz, 2001; Spirov et al., 2009; Fahmy et al., 2014).

Preparation of data for quantitative analysis of sagittal images by 1D SSA.
1.3. Approach
In whole-embryo imaging, variability can arise during tissue fixation and staining with fluorophores, as well as from differences in microscope settings (gain and offset) between measurements of different batches of embryos on different days. Here, we discuss features of the data extraction that are insensitive to such experimental variation.
The aim of our approach is to create a model for apical and basal profiles (see Fig. 1B) with bcd gradients, estimate the model parameters, and show that they can help to obtain biological results, in particular, to compare different ages in the embryo development. We show an example of how data extracted and modeled by this technique can provide new biological insights into bcd RNA gradient formation.
The novelty of the approach consists in consideration of the model parameters, which do not depend on linear transformation of the data and thereby on the nonspecific background signal and the microscope settings. It is very important, since otherwise the comparison results can be caused by the experimental conditions, not by the biology reasons.
1.4. Model
A two-exponential fit of a Bcd protein profile can be well approximated by a single exponential plus a nearly constant background (Houchmandzadeh et al., 2002; Alexandrov et al., 2008). In contrast, while some bcd RNA profiles show such characteristics, many others, especially at early stages, show a much sharper exponential drop in the anterior, plus a constant or even posteriorly rising component through the rest of the embryo (Fig. 2). The transition between components can be readily visible in RNA patterns (and not in protein), as a “kink” around the 20% egg length (%EL)–30%EL position. These different components suggest multiple scales (or mechanisms) in the posterior-ward transport of bcd RNA.

Representative examples of AP profiles of bcd mRNA, illustrating the variety of cases and efficacy of the modeling approach. The raw data are noisy; the smooth pattern is the ESPRIT fit, which is the sum of two exponentials; they are depicted individually: anterior exponential 1, which decreases to zero from the left; trunk shallow exponential 2, which is similar to a straight line. The residual noise is oscillating around zero.
1.5. Technique
We previously applied a signal extraction technique based on singular spectrum analysis (SSA) to quantify Bcd protein gradients (Alexandrov et al., 2008). This demonstrated that SSA could reliably and automatically extract AP Bcd protein gradients. These were the sum of two exponentials, one with a significant decay constant (strong curvature) and one of nearly linear form, capturing the nonspecific background signal. Here, we adapt the SSA technique to the more complex cases of bcd RNA gradients, validating the reliability and effectiveness of the approach. SSA itself is used for signal extraction, and the SSA-related method ESPRIT (Roy and Kailath, 1989; Golyandina and Zhigljavsky, 2013) is used for the estimation of signal parameters.
SSA techniques have proven to be robust to signal extraction from data with substantial experimental variability and intrinsic noise (Golyandina et al., 2001; Alexandrov et al., 2008; Alonso et al., 2005; Golyandina et al., 2012; Golyandina and Zhigljavsky, 2013). The use of SSA for extraction of signals in gene expression data was in Spirov et al. (2012), Golyandina et al. (2012), and Shlemov et al. (2015a,b).
2. Methods
2.1. Two-exponential modeling
2.1.1. Description of the model
We fit the following two-exponential function (of AP distance, x) to bcd mRNA data, to capture the distinct two-component pattern of most bcd RNA gradients (with the “kink,” commonly observed at 20%EL–30%EL):
or
for
One-exponential plus constant background is a special case of Equation (1), with the rate
Note that raw image data are likely of the form
2.1.2. Model characteristics independent of the microscopy gain/offset and background
To remove effects from variability in microscope settings (gain and offset) and the unknown form of nonspecific background (Houchmandzadeh et al., 2002; Myasnikova et al., 2005; Holloway et al., 2006), gradient characteristics can be used, which do not change under a linear transformation of the gradient.
That is, if each profile (gradient) can be represented by the linear transformation:
where
The signal given by Equation (1) has characteristics that approximately satisfy independency from linear transformations if the second (shallow gradient) exponential rate is small enough (
where
One can see that some parameters (e.g., the pre-exponential coefficients C1 and C2) depend on A and/or B, while the rate of the first exponential
Note that similar considerations about parameters independent on linear transformations can be applied to any number of active exponentials (e.g., we can consider both anterior and tail ones and study their rates), while the shallow exponential should be alone. It is important that the noise level directly depends on the scaling A and therefore we do not use it as an embryo characteristic.
2.1.3. Model characteristics of apical and basal profiles
We apply the model to both apical and basal profiles, adding corresponding upper indices to parameter notation. Several characteristics that reflect the relationship between two profiles of the same embryo (therefore, A and B are the same for both profiles) can be added. For clarity, we will write “anterior” or “shallow” in the lower indices instead of 1 and 2 correspondingly.
Thus, the following combinations of the model parameters of the profiles can be considered almost independent of a linear transformation of the intensities, that is, of A and B:
where Equation (3) are characteristics of apical profiles, Equation (4) are characteristics of basal profiles, and characteristics of Equation (5) show relationships between apical and basal profiles.
Hereinafter we use the following model characteristics based on these combinations:
the anterior gradient rates the logarithmic ratio between the anterior gradient pre-exponential coefficients for the apical and basal profiles indicators of nonincrease in the shallow components
Note that these characteristics have sense if
These relationships underlie the quantitative conclusions in this article. We also use these relationships to screen for atypical embryos and outliers, aiding in following the development of the bcd RNA gradient over time and for studying apical–basal differences.
2.1.4. Estimation of the two-exponential model parameters
We use the subspace-based method ESPRIT, motivated by the success of SSA (also a subspace-based method) in smoothing one-dimensional (1D) gene profiles from Drosophila embryos (Alonso et al., 2005; Golyandina et al., 2012). On profiles from different genes, the method proved to be robust to high noise and to variations in embryo characteristics.
The mathematical details of ESPRIT can be found in Supplementary Material. We use the method to estimate the exponential decays in Equation (1):
Since the first exponential is expected to be rapidly decreasing (
2.2. Data
2.2.1. FISH and data acquisition
For approbation of the suggested model, we consider the data set of the confocal images of early Drosophila embryos stained by FISH for bcd mRNA. The data were characterized in Spirov et al. (2009), and it is the largest such data set up to date. The data acquisition and processing are described in Supplementary Material. Computational tools to process midsagittal images are described in Supplementary Material too. Our data set consists of images of about 160 embryos, ranging in stage from unfertilized eggs (not analyzed) to early nuclear cleavage cycle 14A [nc14, same data set as in Spirov et al. (2009)]. In the present study, we analyzed 124 embryos. These were divided into three developmental stages, based on preliminary analysis and biological considerations: Cleavage or preblastoderm (nc1–nc9); Syncytial Blastoderm (nc10–nc13); and Cellularizing Blastoderm (nc14A). The Cleavage stage is long, lasting about 80 minutes (at room temperature), and has highly variable bcd mRNA gradients. For more detailed analysis, we subdivided Cleavage into two subgroups: Early (nc1–nc8) and Late (nc9). The Syncytial Blastoderm stage spans about 45 minutes, and this could be subdivided into two subgroups: nc10–nc12 and nc13. The last stage, early nc14A, is short (15–20 minutes), but highly variable and dynamic. Careful visual inspection allowed us to divide the nc14A embryos into three subgroups: early, mid, and late (Spirov et al., 2009).
2.2.2. Construction of 1D profiles
Raw data from the confocal microscope consist of mRNA intensities as per a small circular area with two-dimensional spatial coordinates. After selecting the regions of interest (ROI chains), two techniques were tested for converting the data into 1D AP profiles. The first (and simplest) technique projects intensities onto an AP axis orthogonal to the DV axis by discarding the DV component of the coordinate (Fig. 1A). This has been used by many groups, see, for example, Surkova et al. (2008) and Houchmandzadeh et al. (2002). The second technique preserves the natural curvilinear coordinates of the embryo, with distance between ROIs calculated by
Regardless of the technique (AP or curvilinear coordinates), the 1D coordinates obtained are not equidistant. Linear interpolation was used to create equidistant points of a given spatial step. A step 0.08%EL–0.1%EL was chosen to generate approximately equal numbers of points for the two techniques. These results obtained by means of AP coordinates appear to be more precise than that obtained by curvilinear coordinates, see Supplementary Material for comparison. Therefore, we can consider only AP coordinates in the article.
3. Results and Discussion
3.1. Model application
Figure 2 demonstrates a set of examples to illustrate the variety of profiles that can be fit by the two-exponential model. These include the typical profiles considered in Spirov et al. (2009), with a rapidly decreasing anterior gradient and slowly decreasing gradient to the posterior (Fig. 2A), and profiles with increasing (Fig. 2B) or flat (Fig. 2C) posterior gradients.
Data are generally too biased and noisy from the terminal regions of the embryo: 0%EL–10%EL and 90%EL–100%EL [Cf. Surkova et al. (2008) and Houchmandzadeh et al. (2002); see Fig. 2]. Processing and analyzing data from 10%EL to 90%EL are sufficient for extracting bcd RNA profiles from nearly all embryos older than nc6. For very early embryos (CleavageEarly stage), gradients have just begun to form from initial terminal locales; in these cases, it may be more appropriate to process the data from 5%EL (Fig. 2D). For uniformity, we will process all the data on 10%EL–90%EL. Figure 2 shows that the two-exponential model suits different types of data very well.
Typically, embryos have a decreasing anterior exponential component and decreasing or close-to-constant shallow posterior gradients (both for the apical and basal profiles). We call these type 1 (typical) embryos. Some embryos, however, show a posteriorly increasing shallow gradient for either apical or basal profiles. We call these type 2 (atypical) embryos. Type 2 profiles are common early in development (Cleavage) and uncommon in later stages. Here, we focus on type 1 embryos, which represent the majority of the data set.
Detection of type 1 can be performed by means of exponential rates of the shallow exponents: (A)
3.2. Model validation
Even within one-developmental stage, the shape of mRNA profiles from embryo to embryo is highly variable. This makes construction of a prototype profile challenging, and complicates understanding of the underlying biological mechanisms. Fortunately, the variability is mostly due to a minority of embryos, and these can be detected using the two-exponential model. Removal of such embryos reduces the variability significantly.
We remove embryos, which were not described by the model with reasonable parameters. First, the pre-exponential coefficients for anterior and shallow exponentials should be positive. Therefore, we assume that (B)
Figure 1 in Supplementary Material shows scatter plots of
Ninety-two embryos satisfy conditions (A)–(C) from the complete data set of 124 embryos; the analysis in the rest of this article is on these 92 embryos. For these 92 embryos, it was checked that the systematic errors in the model are negligible relative to the residual noise or to the profile itself. Thus, we conclude that the profiles suit the considered model with sufficient accuracy (see Supplementary Material for details).
3.3. Model efficacy for finding trends in developmental biology
The parameters from the two-exponential fits are quite variable, both within and between developmental stages (Fig. 3A), as expected from the observed variability in profiles (Section 1).

Log-ratio of two pre-exponential factors
Although the large variability and small sample size do not allow for statistically significant conclusions for all comparisons, several observations can be made. In particular, CleavageEarly has the largest average anterior exponential decay constant of any developmental stage (i.e., the steepest profile). This difference is statistically significant (t-test), but could be rendered insignificant by moderate changes in just one of the six embryos. We therefore combine groups to obtain three age groups (from 7) with larger sample sizes: (1) Cleavage,
Table 1 shows the average values for
One-way analysis of variance (both parametric and nonparametric; Kruskal–Wallis) confirms that both
3.3.1. Potentials of the approach
In Section 3.2, we screened embryos into type 1 using condition (A), that is, posteriorly non-increasing profiles. We can apply the suggested approach to embryos of type 2 with posteriorly increasing profiles (see Fig. 2B). Moreover, the model can be extended to three exponentials (see Fig. 4). With the extension of SSA to fit a three-exponential model, these sorts of patterns can be readily analyzed by the present approach, broadening the use of the technique to allow for the comparison of patterns from different genes [e.g., consider Stau protein (Spirov et al., 2009), which has a sharp rise in the vicinity of the posterior pole]. The article with biological results of application of the suggested approach, which includes a detailed description of the patterning dynamics in terms of the model parameters, is under preparation.

AP profile of the Stau protein [Cf. Spirov et al. (2009, fig. 6)]. The raw data are noisy; the smooth pattern is obtained by the 3-exponential model: anterior exponential 1, decreasing to zero from the left; shallow exponential 2, which is similar to a straight line; posterior exponential 3, increasing from zero to the right. The residual noise is oscillating around zero.
The approach presented here is likely to be an effective tool for quantifying other spatial gradients in developmental biology, which could aid in revealing new features in the patterning dynamics and regulation of critical developmental events, especially where there are large dynamic changes and high variability—that is, in cases where it is difficult to construct a reference or prototype profile. Examples include the dorsal gradient in DV Drosophila patterning (Kanodia et al., 2012, 2011; Reeves et al., 2012) and retinoic acid in vertebrate embryos (Schilling et al., 2012).
4. Conclusions
The new mathematical model described here enables the study of substantial quantitative problems in bcd mRNA gradient formation, including quantification of the between-embryo variability of the gradient; the filtering of atypical gradients; and the classification of embryos on the basis of quantitative gradient parameters. We are using these abilities to quantitatively study the dynamics of bcd mRNA profiles at very early stages of development. Finally, we can also now use the new mRNA gradient model to compare mRNA patterning with the Bcd protein gradient, previously analyzed in Alexandrov et al. (2008).
Footnotes
Acknowledgments
This work has been supported by U.S. NIH grant R01-GM072022 and the Russian Foundation for Basic Research grants 15-04-06480 and 16-04-00821.
Author Disclosure Statement
The authors declare that no competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
