Dynamic Modeling of miRNA-mediated Feed-Forward Loops

Abstract

Given the important role of microRNAs (miRNAs) in genome-wide regulation of gene expression, increasing interest is devoted to mixed transcriptional and post-transcriptional regulatory networks analyzing the combinatorial effect of transcription factors (TFs) and miRNAs on target genes. In particular, miRNAs are known to be involved in feed-forward loops (FFLs), where a TF regulates a miRNA and they both regulate a target gene. Different algorithms have been proposed to identify miRNA targets, based on pairing between the 5′ region of the miRNA and the 3′UTR of the target gene, and correlation between miRNA host genes and target mRNA expression data. Here we propose a quantitative approach integrating an existing method for mixed FFL identification based on sequence analysis with differential equation modeling approach that permits us to select active FFLs based on their dynamics. Different models are assessed based on their ability to properly reproduce miRNA and mRNA expression data in terms of identification criteria, namely: goodness of fit, precision of the estimates, and comparison with submodels. In comparison with standard approaches based on correlation, our method improves in specificity. As a case study, we applied our method to adipogenic differentiation gene expression data providing potential novel players in this regulatory network. Supplementary Material for this article is available at www.liebertonline.com/cmb.

1. Introduction

MicroRNAs (miRNAs) are small (∼22 nt) non-coding RNAs that post-transcriptionally regulate gene expression. They are transcribed as pri-miRNAs, then processed and exported from the nucleus to the cytoplasm in the form of pre-miRNA hairpins, where they are cleaved by Dicer enzyme and incorporated in the RNA-induced silencing complex (RISC) to allow the interaction with target mRNAs via base pairing: binding to mRNA 3′ UTR causes the decrease of the frequency of translation and the increase of mRNA degradation rate (Du and Zamore, 2005; Bartel, 2004; Baek et al., 2008; Selbach et al., 2008). MiRNAs are known to be involved in different biological processes (e.g., cell cycle control, cellular growth, differentiation, apoptosis, and embryogenesis) and to play critical roles in human diseases (Jiang et al., 2009). Their important regulatory role has come into focus in the last few years, and major attention has been paid to miRNAs and their target genes identification (Lagos-Quintana et al., 2003; Bentwich et al., 2005; Jung et al., 2010; Lagos-Quintana et al., 2001). Different algorithms have been developed for this purpose, based on sequence data, looking for evolutionarily conserved Watson-Crick pairing between the 5′ region of the miRNA and the 3′UTR of the target gene (Griffiths-Jones et al., 2006; Bartel, 2009; Friedman et al., 2009; Lewis et al., 2003, 2005). There is also increasing interest in the dynamic description and the quantification of the regulation of gene expression by miRNAs, and several scientific studies have characterized miRNA-mediated degradation rates using models based on ordinary differential equation (Khanin and Vinciotti, 2008; Shimoni et al., 2007; Levine et al., 2007a,b; Vohradsky et al., 2010).

Given the important role of miRNAs in genome-wide regulation of gene expression, increasing interest is devoted to mixed transcriptional and post-transcriptional regulatory networks analyzing the combinatorial effect of transcription factors (TFs) and miRNAs on target genes. In particular, miRNAs are known to be involved in feed-forward loops (FFLs), where a TF regulates a miRNA and they both regulate a target gene (Shimoni et al., 2007; Shalgi et al., 2007; Tsang et al., 2007; Re et al., 2009). The dynamic of FFL has been extensively studied in transcriptional networks (Mangan and Alon, 2003; Kalir et al., 2005; Kaplan et al., 2008; Macia et al., 2009; Alon, 2007) since this regulatory pattern is overrepresented in biological networks with respect to random networks (Milo et al., 2002; Shen-Orr et al., 2002) and thus represents a basic building block, favored by evolution and playing important functional roles. For example, FFLs involving miRNAs permit us to accomplish target gene fine tuning and noise buffering (Li et al., 2009; Wu et al., 2009). In Tsang et al. (2007), correlation between miRNA host genes and target mRNA was assessed together with conserved 3′UTR motifs to define putative regulatory relationships between a miRNA and a set of target genes sharing the same TF. A quantitative description of the regulatory interactions (e.g., based on differential equation models) was helpful to characterize putative miRNA-mediated FFLs. A similar approach has been adopted elsewhere (Vu and Vohradsky, 2007; Chen et al., 2004, 2005), where differential equations were fitted to expression data for transcriptional networks not involving miRNAs. As regards small RNA-mediated FFL, a differential equation-based model has been used in Shimoni et al. (2007), only to simulate the dynamic of a generic circuit using plausible parameter values derived from literature.

In this work, we propose a general analytical framework that is based on the use of differential equations to extensively characterize a list of putative miRNA-mediated FFLs. Our approach, when applied to a list of putative FFLs, provides some criteria to select active FFLs based on their ability to reproduce dynamic expression data. In this context, we do not use the data to validate the models, but, on the opposite, three models are used to fit the data and select active FFLs based on the goodness of fit. The first model M1 is borrowed from previous literature (Khanin and Vinciotti, 2008; Shimoni et al., 2007; Levine et al., 2007a,b). Models M2 and M3 are linear simplifications of model M1 since, as shown in the following, the choice of the most appropriate model strictly depends on the available dataset.

We estimate the significance of our method in comparison with random FFLs obtained by randomly selecting links between miRNAs, TFs, and target mRNA and in comparison with a more standard approach, based on correlation between TF, miRNA, and target mRNA.

2. Models

In the miRNA-mediated FFL circuit (Fig. 1A), a transcription factor TF (X₁) regulates a miRNA (X₂), and they both regulate a target mRNA (X₃). Three models based on ordinary differential equations (ODEs) are examined to describe the miRNA and target mRNA expression kinetics. All models consider X₁ as forcing function and describe the rate of change of X₂ and X₃ as the balance between their synthesis/transcription (S_i) and degradation (D_i) with the basal expression level (X_ib) as initial condition; the correspondent compartmental model is shown in Figure 1B. Thus, for i = 2,3, the differential equation describing the variables is \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*}\dot{\rm X}_i ({\rm t}) = {\rm S}_i ({\rm t}) - {\rm D}_i ({\rm t}) \qquad {\rm X}_i (0) = {\rm X}_{ib} \tag{1}\end{align*} \end{document}

FIG. 1.

miRNA-mediated FFL. (A) Topologial model of the FFL where a TF regulates a miRNA, and they both regulate the target mRNA. TF regulations can be positive or negative, while miRNA regulation of the target gene is negative. (B) Compartmental model of the FFL where S and D represents synthesis and degradation, respectively, and dotted arrows are the regulation processes affecting S and D.

The synthesis is expressed as the sum of a basal term (S_ib), plus a positive (activation) or negative (repression) term (ΔS_i) encoding the effect of the specific TF on the transcription of miRNA and target mRNA. As regards degradation (D_i), for miRNA it is assumed to be a function only of its expression, while for the target mRNA the effect of the miRNA level is also modeled. \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} & \quad \dot{X}_2 (t) = S_{2b} + \Delta S_2 [ X_1 (t) ] - D_2 [ X_2 (t) ] \\ & \dot{X}_3 (t) = S_{3b} + \Delta S_3 [ X_1 (t) ] - D_3 [ X_2 (t), X_3 (t) ] \tag{2} \end{align*} \end{document}

The three models adopt the same description for miRNA degradation (i.e., a first order process with constant rate d₂), while they differ in the functional description assumed for ΔS₂, ΔS₃, and D₃.

(1) Model M1 describes the TF regulation on the miRNA (ΔS₂) and the target mRNA (ΔS₃) by a saturative Michaelis-Menten function, and the miRNA-mediated degradation of the target mRNA (D₃) as the sum of a first order process, with constant rate, with respect to X₃ and a nonlinear term that depends also on X₂ as provided elsewhere (Khanin and Vinciotti, 2008; Shimoni et al., 2007; Levine et al., 2007a,b).

(2) Model M2 assumes TF regulation (ΔS₂, ΔS₃) to be linearly dependent on its level, while the functional description of target mRNA degradation (D₃) has nonlinear dynamics as in M1.

(3) Model M3 is derived from M2 linearizing the miRNA-mediated degradation model (D₃); thus, the kinetics of the whole model is linear.

Since in log scale spot array data are expressed as differences with respect to a basal pre-differentiation state, it is convenient to consider as state variables x_t = X_i − X_ib for i = 1,2,3, where X_ib is the reference, collected at day −3. Considering that at the basal state \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\dot{X}_i (t) = 0$$ \end{document} for i = 2,3 it is possible to express the basal transcriptions S_i as function of the regulation parameters and the basal expression levels. After some passages, models M1, M2, and M3 turn out to be as follows:

Model M1: \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} & \dot {x} _2 (t) = \frac {\alpha_2 x_1 (t)} {\beta_2 + x_1 (t)} - d_2 x_2 (t) \qquad \qquad \qquad \qquad \quad x_2 (0) = 0 \\ & \dot {x} _3 (t) = \frac {\alpha_3 x_1 (t)} {\beta_3 + x_1 (t)} - px_3 (t) - qx_2 (t) - rx_2 (t) x_3 (t) \quad x_3 (0) = 0 \tag {3} \end{align*} \end{document}

Model M2: \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} & \dot{x}_2 (t) = a_2 x_1 (t) - d_2 x_2 (t) \qquad \qquad \qquad \qquad \ x_2 (0) = 0 \\ & \dot{x}_3 (t) = a_3 x_1 (t) - px_3 (t) - qx_2 (t) - rx_2 (t) x_3 (t) \quad x_3 (0) = 0 \tag {4} \end{align*} \end{document}

Model M3: \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} & \dot{x}_2 (t) = a_2x_1 (t) - d_2x_2 (t) \qquad \qquad \quad x_2 (0) = 0 \\ & \dot{x}_3 (t) = a_3x_1 (t) - d_3x_3 (t) - sx_2 (t) \quad \quad x_3 (0) = 0 \tag {5} \end{align*} \end{document}

The mathematical derivation of Equations 3, 4, and 5 and the meaning of each parameter in terms of synthesis and degradation rate are detailed in the Supplementary Material (Supplementary Material is available at www.liebertonline.com/cmb).

Model identification

A priori identifiability of M1, M2, and M3 (Equations 3, 4, and 5), tested using the software DAISY (Bellu et al., 2007), indicates that all three models are a priori globally identifiable; in other words, it is theoretically possible to estimate the set of unknown parameters θ from the data, at least under ideal conditions (noise-free data, continuous time observations, and error-free model structure).

\documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat {\bf \theta}$$ \end{document} is estimated by Weighted Least Square, that is, minimizing the Weighted Residual Sum of Squares (WRSS) \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*}WRSS = \sum_{i = 2, 3} \sum_{j = 1}^{N_i} \omega_i (t_j) [ z_i (t_j) - x_i (t_j, {\bf \theta}) ] ^2 \tag {6}\end{align*} \end{document}

where z_i(t_j) is the observed datum at time j, x_i(t_j, θ) is the predicted datum at time j computed using the model (Equations 3, 4, and 5), ω_i(t_j) is the weight assigned to datum j (inverse of the variance of the measurement error), and N_i is the number of time points. The external summation takes into account that residuals for both miRNA and target mRNA are simultaneously minimized; thus, miRNA and mRNA time series collected under the same experimental conditions are required for model identification. The measurement error is assumed to be Gaussian with zero mean and a known variance. The variance is experimentally determined by analyzing replicates of each measure. A general model for the error variance is \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*}v_i (t_j) = \alpha + \beta [ z_i (t_j) ] ^{\gamma} \tag {7}\end{align*} \end{document}

where α, β_, and γ are parameters to be estimated from replicates by plotting the mean of each replicate against its variance and fitting on these data the unknown parameters of the error model (Equation 7), as described in Cobelli et al. (2000).

Since data are affected by a measurement error, also \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat {\bf \theta}$$ \end{document} is affected by an error, and the a posteriori identifiability of the models assesses the precision with which the parameters are estimated in terms of percentage coefficient of variation (CV) \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*}CV (\hat {\bf \theta}) = \frac {SD (\hat {\bf \theta})} {\hat {\bf \theta}} \cdot 100 \tag {8} \end{align*} \end{document}

where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$SD (\hat {\bf \theta})$$ \end{document} is the standard deviation of the estimate.

FFLs selection

For each model, selection of active FFLs from a large set of putative ones exploits identification results in terms of consistency with the three following criteria:

1. Goodness of fit. A valid model should provide an adequate fit to the data. The goodness of fit can be evaluated on residuals, based both on their whiteness, (i.e., residuals should be uncorrelated) and on their amplitude (i.e., deviation between predicted and observed values should be comparable to the measurement error). To evaluate the whiteness of the residuals, the number of runs (i.e., subsequences of residuals having the same sign) are analyzed for both miRNA and mRNA residual patterns. For the amplitude property, a global measure is provided by WRSS divided by the degree of freedom (i.e., difference between the number of data and the number of parameters); since weighted residuals should be independent with unit variance, WRSS should be the outcome of a random variable with Chi-Square distribution.

2. Precision of the estimates. FFLs having all parameter estimates with CV < 100 are considered reliable.

3. Comparison with submodels. In order to verify that the FFL model (Fig. 1B) is the optimal description of the circuit, its performance is compared with that of two submodels (Fig. 2) with missing regulatory links: in Submodel 1 the regulatory link between the TF and the target mRNA is missing, while in Submodel 2 the effect of miRNA on target mRNA degradation rate is not considered. Once the two submodels are identified, their performance is assessed versus the original one based on the Akaike Information Criterion (AIC), which implements the principle of parsimony (i.e., selects the model best able to fit the data with the minimum number of parameters): \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*}AIC = WRSS + 2L \tag {9}\end{align*} \end{document}

FIG. 2.

Submodels with missing regulatory links with respect to the FFL. (A) No effect of TF regulation on target gene. (B) No effect of miRNA on mRNA degradation rate.

The FFL model is selected if its AIC is the lowest compared with submodels.

Summing up, if criteria 1 and 2 are satisfied for a dataset of putative FFL data (i.e., the model satisfactorily reproduces the data with all parameters precisely estimated from them), criteria 3 is applied and the FFL topology is finally selected as active provided that the complete model results in being the optimal model according to the AIC.

3. A Case Study on Adipogenesis

To discuss a practical application of the proposed method, we applied it to miRNA and mRNA expression time series of human multipotent adipose-derived stem cells (hMADS) upon adipogenic differentiation. The initial panel of putative FFLs was selected based on sequence analysis; therefore, it includes also false positive matches and/or FFLs nonactive during adipogenesis.

Data

Two independent cell culture experiments were performed as biological replicates during adipogenic differentiation of human mesenchymal stem cells as previously described (Scheideler et al., 2008; Karbiener et al., 2009). Cells were harvested at the pre-confluent stage as reference (day −3) and at seven subsequent time points during human adipogenic differentiation: day −2 and 0 before, and 1, 2, 5, 10, and 15 days after induction of differentiation. All hybridizations were repeated with reversed dye assignment (dye-swap). Background subtraction as well as global mean and dye swap normalization were applied. The resulting ratios were log2 transformed and the independent experiments were averaged. Complete miRNA and mRNA time-series expression data used for this study conform to the MIAME guidelines and are available in GEO database (GSE29186).

A list of mixed TF/miRNA FFLs was generated by means of a bioinformatic pipeline mainly based on an ab-initio sequence analysis of human and mouse regulatory regions as described in Re et al. (2009) using CircuitsDB (Friard et al., 2010). Briefly, in CircuitsDB, a catalogue of non-redundant promoter regions for protein-coding and miRNA genes in the human and mouse genomes were first constructed (for additional details, see Supplementary Material, available at www.liebertonline.com/cmb). In parallel to that, a catalogue of non-redundant human and mouse 3′-UTR regions for protein-coding genes was defined. A transcriptional regulatory network and, separately, a list of post-transcriptionally regulated genes was then generated for human by looking for conserved overrepresented motifs in the human and mouse promoters and 3′-UTRs previously assembled. The two networks were subsequently combined looking for mixed feed-forward regulatory loops (i.e., all the possible instances in which a master transcription factor regulates a miRNA and together with it a set of joint target coding genes).

Associating to the list of 474 miRNA-mediated FFLs obtained using CircuitsDB with the available miRNA and mRNA time series data, the final dataset consisted of 329 putative FFLs (Table S1) including 33 TFs, 35 miRNAs, and 184 target mRNAs (Supplementary Material is available at www.liebertonline.com/cmb).

Measurement error

The measurement error models for miRNA and mRNA expression data were derived from the replicates, shown in Figure 3A,B, as the mean of the intensities versus their variance. To better define the dependence of the variance on the intensity, the positive x-axis was divided in intervals, and for each interval, the variance mean values were averaged as shown in Figure 3C,D. By fitting Equation (7) on these data, the resulting models are \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} & v_2 (t_j) = 0.0484 \\ v_i (t_j) = & 0.033 + 0.031 \cdot z_i (t_j) ^2 \quad i = 1, 3 \tag {10} \end{align*} \end{document}

FIG. 3.

Measurement error variance against expression estimated from the replicates for miRNA (A) and mRNA (B) datasets. (C, D) These data are binned, and for each interval, the mean ± standard deviation is represented; the red line shows the fitted measurement error models, Equations 10.

where v₂ and v_i in Equation (10) are referred to the miRNA and to the mRNA (valid for both TFs, and target mRNAs) datasets, respectively.

Implementation

To assess criterion 1 (i.e., whiteness and amplitude of the residuals), statistical tests could not be applied due to the low number (seven) of samples. Thus, conservative empirical thresholds were set to satisfy criterion 1: both miRNA and target mRNA residuals time series must have at least 3 runs and WRSS divided by the degree of freedom lower than 2. All computations were performed in the Matlab environment (Matlab R2010a); further details are supplied in the Supplementary Material (Supplementary Material is available at www.liebertonline.com/cmb).

4. Results

When the three criteria were applied to M1, no FFLs were selected as active, essentially because criterion 2 failed, indicating that the functional descriptions built in the model were too complex to be resolved from the available data. Conversely, three FFLs were selected with M2 and 23 with M3 as summarized in Tables 1 and 2, respectively, where estimated parameters and their precision are reported. Two out of the three FFLs selected using M2 were identified also with M3; thus, the total number of active FFLs is 24. It is interesting to notice that most of selected FFLs (21 out of 24) are incoherent. This type of FFL is known to play a significant role in biological regulation conferring precision and stability to gene expression regulation (Mangan and Alon, 2003; Wu et al., 2009; Hornstein and Shomron, 2006; Osella et al., 2011). As discussed in Macia et al. (2009), the target gene of incoherent FFLs generally shows a pulser response characterized by a rapid increase/decrease of its concentration followed by the return to a new basal level, while the target gene of coherent FFLs tends to exhibit a grader response characterized by a transient increase/decrease from the initial to the final state. These behaviors were confirmed by our data, as evident from Figure 4, where expression profiles of two incoherent (A) and two coherent (B) FFLs are shown along with the mean target gene expression levels (considering absolute values) between selected incoherent (C) and coherent (D) FFLs. Analyzing the active FFLs from a biological point of view, it was found that, out of the 24 selected FFLs, nine FFLs involve TFs and six involve miRNAs (marked with an x in Tables 1 and 2), which are already known from the literature to be regulators of adipogenesis and adipocyte-related functions. A discussion of the results in comparison with the biological literature is available as Supplementary Material (Supplementary Material is available at www.liebertonline.com/cmb).

FIG. 4.

Expression profiles of selected FFLs. TF (green), miRNA (blue), and target mRNA (red) for two incoherent (A) and two coherent (B) FFLs: spots represent experimental data while lines represent the predicted/reconstructed profiles. (C, D) The average absolute value of predicted target mRNA expression for incoherent and coherent FFLs.

Table 1.

Summary of Selected FFLs and Their Estimated Parameters Using Model M2

	TF		miRNA	Target mRNA	a ₂ (CV)	a ₃ (CV)	d ₂ (CV)	p (CV)	q (CV)	r (CV)	_C/I
1	hif1a	x	hsa-miR-24	h41	1.22 (74)	2.83 (44)	0.96 (79)	1.10 (67)	2.17 (52)	0.87 (78)	I
2	srf		hsa-miR-100	impdh1	1.40 (41)	0.68 (59)	0.57 (54)	0.91 (27)	0.34 (63)	0.84 (30)	I
3	tcf4		hsa-miR-23a	ndufa7	0.47 (62)	0.77 (44)	0.30 (83)	1.04 (21)	0.66 (42)	1.29 (69)	I
Mean (absolute values)					1.03 (59)	1.43 (49)	0.61 (72)	1.02 (38)	1.06 (52)	0.85 (54)
SE (absolute values)					0.49 (17)	1.22 (9)	0.33 (16)	0.10 (25)	0.98 (11)	0.25 (26)

TF, miRNA, and target mRNA names of selected FFLs using model M2 are reported along with the estimated parameters, their precision in terms of CV, and a flag to distinguish between coherent (C) and incoherent (I) FFLs. TF and miRNA already known to be key regulators of adipogenesis, and adipocyte-related functions are marked with an x. SE, standard error.

Table 2.

Summary of Selected FFLs and Their Estimated Parameters Using Model M3

	TF		miRNA		Target mRNA	a ₂ (CV)	a ₃ (CV)	d ₂ (CV)	d ₃ (CV)	s (CV)	_C/I
1	runx1		hsa-miR-148b		tnfrsf6b	−0.07 (17)	−1.73 (50)	–	8.28 (25)	4.49 (43)	I
2	runx1		hsa-miR-148b		loc51026	−0.07 (17)	2.70 (9)	–	2.85 (1)	2.70 (76)	C
3	runx1		hsa-miR-148b		tmod	−0.07 (17)	−1.79 (37)	–	3.70 (13)	1.44 (76)	I
4	esr1	x	hsa-miR-148b		map1b	0.14 (18)	1.69 (60)	–	2.49 (6)	4.49 (29)	I
5	esr1	x	hsa-miR-148b		tparl	0.15 (18)	−2.00 (48)	–	2.89 (2)	2.78 (42)	C
6	esr1	x	hsa-miR-148b		apt6m8-9	0.15 (18)	5.37 (33)	–	2.04 (19)	2.41 (41)	I
7	esr1	x	hsa-miR-152		apt6m8-9	0.42 (39)	3.94 (19)	0.16 (67)	1.12 (7)	1.55 (35)	I
8	esr1	x	hsa-miR-30c	x	emp1	0.65 (21)	−4.58 (28)	0.09 (46)	2.16 (5)	1.76 (40)	C
9	ets1		hsa-miR-199a^*		hke2	−0.22 (17)	−1.60 (80)	0.90 (17)	0.35 (99)	7.52 (73)	I
10	hif1a	x	hsa-miR-199b		crtl1	−2.32 (59)	−4.99 (24)	1.26 (63)	0.97 (96)	2.47 (27)	I
11	hif1a	x	hsa-miR-24		h41	1.21 (34)	6.02 (35)	0.93 (35)	1.87 (16)	4.21 (46)	I
12	hif1a	x	hsa-miR-199a	x	crtl1	−2.17 (30)	−3.05 (45)	1.19 (25)	1.27 (19)	1.25 (63)	I
13	foxm1		hsa-let-7a		nap1l1	−0.02 (3)	−0.89 (29)	0.14 (58)	1.70 (3)	14.10 (52)	I
14	irf1		hsa-miR-29a	x	timm8b	−0.40 (7)	−5.00 (34)	–	2.23 (34)	0.60 (37)	I
15	irf7		hsa-miR-129		hs6st	−0.04 (82)	2.53 (37)	–	2.38 (6)	13.93 (89)	C
16	irf2		hsa-miR-125b		bcl2	−0.33 (16)	6.07 (23)	–	3.27 (3)	1.55 (45)	C
17	myc	x	hsa-miR-202		tnfrsf4	0.08 (42)	0.34 (92)	–	1.20 (95)	4.44 (87)	I
18	myod1		hsa-miR-34a	x	kcnq1	−0.30 (22)	−2.03 (27)	0.14 (35)	1.49 (24)	2.04 (29)	I
19	myod1		hsa-miR-34a	x	scn2b	−0.28 (21)	−2.00 (27)	0.12 (37)	4.61 (5)	0.58 (88)	I
20	ncx		hsa-let-7e	x	nap1l1	0.13 (67)	8.38 (14)	0.08 (87)	3.17 (3)	10.61 (76)	I
21	nfya		hsa-miR-148b		p3	−0.17 (18)	−1.66 (88)	–	4.05 (6)	4.73 (29)	I
22	tcf4		hsa-miR-23a		ndufa7	0.50 (66)	0.84 (54)	0.33 (89)	1.05 (28)	0.60 (75)	I
23	tel2		hsa-miR-199a^*		hke2	0.65 (37)	6.91 (22)	0.34 (41)	1.34 (5)	4.41 (30)	I
Mean (absolute values)						0.46 (30)	3.31 (40)	0.47 (50)	2.46 (23)	4.12 (53)
SE (absolute values)						0.63 (21)	2.19 (22)	0.46 (23)	1.66 (31)	3.91 (22)

TF, miRNA, and target mRNA names of selected FFLs using model M3 are reported along with the estimated parameters, their precision in terms of CV, and a flag to distinguish between coherent (C) and incoherent (I) FFLs. TF and miRNA already known to be key regulators of adipogenesis and adipocyte-related functions are marked with an x. When the estimated degradation parameter (d₂) was small and with low precision (i.e., the process was too slow to be determined in the time horizon of the experiment), it was set to 0, and model identification was repeated. SE, standard error.

To estimate the significance of the proposed method, 10 sets of 329 random FFLs were generated choosing at random one miRNA and two mRNAs from the available data. Applying the previously described selection procedure, no FFLs were selected using M2, and an average of 15.6 FFLs, with a standard deviation of 1.5, were selected using M3. Instead, using a simple correlation analysis to choose FFLs having a correlation coefficient above 0.75 in absolute values for all three links, 12 FFLs were selected on the list of putative FFLs, and 18.6 ± 4.6 were selected on the randomized datasets.

5. Discussion

FFL selection procedure

In this work, we propose a method to select active FFLs from a large set of putative ones based on miRNA and mRNA expression time series, using differential equation-based models and identification criteria. A list of putative mixed transcriptional and post-trancriptional FFLs is generated on the basis of conserved overrepresented motifs in human and mouse promoters and 3′ UTR. Identification of three alternative dynamic models, able to describe the miRNA and target mRNA dynamic data based on ordinary differential equations (ODEs) using the TF profile as forcing function, provides the basis for the selection of active FFLs. A putative FFL is selected as active if the feed-forward topology (Fig. 1A), associated with a plausible dynamic description, is necessary and sufficient to reproduce the available gene expression profiles; in other words, the model is able to reproduce data (criterion 1), outperforming with respect to submodels in terms of principle of parsimony (criterion 3), and its parameters can be estimated with acceptable precision from available data (criterion 2).

Comparison of dynamic models

Instead of postulating a univocal description for miRNA and mRNA expression kinetics, three models of increasing complexity are proposed. Model M1 assumes Michaelis-Menten kinetics for miRNA and target mRNA regulation accomplished by the TF, and models miRNA-mediated degradation of the target mRNA as a first order process with constant rate plus a nonlinear term dependent on miRNA and target mRNA expression. In model M2 linearity is assumed for TF regulation on miRNA and target mRNA, whereas nonlinearity is maintained for miRNA-mediated degradation of the target mRNA. In M3 also, the miRNA-mediated degradation of the target mRNA is linearized; thus, the whole model is described by a linear kinetics. The increasing complexity of the models adapts to different type of gene expression data. The choice of the most appropriate model depends on the range and on the number of time points of the available time series and can be made using the same criteria described for the selection of active FFLs: goodness of fit, precision of the estimates and principle of parsimony. In particular, to estimate the Michaelis-Menten parameters of model M1 the whole Michaelis-Menten curve should be observable requiring expression data in an adequate range and sufficiently detailed. If these criteria are not satisfied by the available data, the linearization of the model still provide an adequate fit, allowing also a more precise estimation of the parameters. That does not mean that the more complex model is invalid, but only that the linearized one is more suitable for the available dataset.

Case study

In our case study, we used the three models on gene expression time series to select active FFLs during human adipogenesis. Since they showed a comparable ability to reproduce the data, the simplest model M3 was selected based on the principle of parsimony in 251 out of the 329 analyzed FFLs. Moreover, parameter estimates of model M1 were affected by very high CVs in all FFLs and those of M2 in all FFLs but 3, indicating that nonlinear models M1 and M2 were not a posteriori identifiable. Figure 5 shows the effect of the linearization of the synthesis mediated by the TF (ΔS₂), that is, of using model M2 instead of M1 (panel A), and of the subsequent linearization of the degradation of the target mRNA (D₃), that is, of using model M3 instead of M2 (panel B). In particular, using the analyzed dataset, the Michaelis-Menten curve is in the linear range (Fig. 5A, left panel) and model M1 is not a posteriori identifiable (α₂ and β₂ show high CVs). In this case, X₁ is much lower than the half saturation constant β₂, then parameters α₂ and β₂ cannot be separately resolved but only the ratio between the two can be estimated. Conversely, using M2 the parameter related to the synthesis mediated by the TF (ΔS₂) is a posteriori identifiable (Fig. 5A, right panel). Similarly, for the nonlinear description of the miRNA-mediated degradation rate (Fig. 5B, left panel) parameter r shows high CV and thus model M2 is not a posteriori identifiable. However, since rx₃ is much lower than q, the miRNA-mediated degradation rate can be reasonably linearized as in M3 (Fig. 5B, right panel) providing a simplification of the model with a reduced number of parameters and fit comparable to M2.

FIG. 5.

Comparison between the candidate models. (A) Upper panels: Similarity between models M1 and M2 predictions for miRNA (blue) profile indicates that the Michaelis-Menten function is not necessary. Lower panels: Confirmation that the model prediction of the link between TF and miRNA, postulated as linear for M2, is operating in the linear range for M1. (B) Upper panels: Similarity between M2 and M3 predictions for target mRNA (red) profile suggests that a linear description of target mRNA degradation is sufficient. Lower panel: Confirmation that the miRNA-mediated degradation rate, postulated as linear for M3, is operating in the linear range for M2.

Analyzing the active FFLs from a biological point of view, it was found that, out of the 24 selected FFLs, nine FFLs involve TFs and six involve miRNAs (marked with an x in Tables 1 and 2), which are already known from the literature to be regulators of adipogenesis and adipocyte-related functions. A discussion of the results in comparison with the biological literature is available as Supplementary Material (Supplementary Material is available at www.liebertonline.com/cmb). However, little information is available in the literature regarding miRNA-mediated FFLs involved in adipogenesis and most datasets such as the ones presented in (El Baroudi et al., 2011) contain mainly information related to cancer. The limited available knowledge about human transcription networks and miRNA-mediated regulations in adipogenesis makes biological validation of regulatory links difficult and, at the same time, highlights the importance of the development of algorithms, like the one presented in this work, to predict testable regulation processes.

The significance of our method was estimated in comparison with random FFLs obtained by randomly selecting links between miRNAs, TFs, and target mRNA. Ten sets of 329 random FFLs (equal to the number of putative FFLs estimated by pairing between the 5′ region of the miRNA and the 3′UTR of the target gene) were generated. When the previously described selection procedure was applied to the randomized set of FFLs, an average of 15.6 selected FFLs with a standard deviation of 1.5 was obtained. This represents a rough estimation of the number of False Positive FFLs among the 24 selected by our method. Let's note that, if instead of differential equation-based modeling, FFLs were selected based on correlation between TF, miRNA, and target mRNA, 12 FFLs would have been selected on the original dataset and 18.6 ± 4.6 on the randomized datasets, thus showing the increased specificity achieved by our approach.

The presented method selects triplets that can be explained by a simple FFL, whose effect can be isolated from the rest of the network, and described by one of the three proposed models. Thus, the presence of possible additional regulatory links is not excluded by our analysis, but, for the selected FFLs, this scheme provides a minimal plausible description of the regulatory interactions. The approach presented here does not allow identifying topologies incorporating more than one TF and/or miRNA. More complex topologies will be studied in future work by extending the approach here developed; however, this will require to collect more informative data, e.g., with a tighter sampling schedule.

Footnotes

Acknowledgments

We thank Prof. Gérard Ailhaud, Dr. Christian Dani, and Dr. Ez-Zoubir Amri for hMADS cells. This work was supported by CARIPARO 2008-2010 “Systems Biology approaches to infer gene regulation from gene and protein time series data.”

Disclosure Statement

No competing financial interests exist.

References

Alon

2007. Network motifs: theory and experimental approaches. Nat. Rev. Genet., 8:450–461.

Baek

, Villen

, Shin

et al. 2008. The impact of microRNAs on protein output. Nature, 455:64–71.

Bartel

D.P.

2009. MicroRNAs: target recognition and regulatory functions. Cell, 136:215–233.

Bartel

D.P.

2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116:281–297.

Bellu

, Saccomani

M.P.

, Audoly

et al. 2007. DAISY: a new software tool to test global identifiability of biological and physiological systems. Comput. Methods Programs Biomed., 88:52–61.

Bentwich

, Avniel

, Karov

et al. 2005. Identification of hundreds of conserved and nonconserved human microRNAs. Nat. Genet., 37:766–770.

Chen

H.C.

, Lee

H.C.

, Lin

T.Y.

et al. 2004. Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle. Bioinformatics, 20:1914–1927.

Chen

K.C.

, Wang

T.Y.

, Tseng

H.H.

et al. 2005. A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics, 21:2883–2890.

Cobelli

, Foster

, Toffolo

2000. Tracer Kinetics in Biomedical Research: From Data to ModelKluwer Academic/Plenum, New York.

10.

, Zamore

P.D.

2005. microPrimer: the biogenesis and function of microRNA. Development, 132:4645–4652.

11.

El Baroudi

, Cora

, Bosia

et al. 2011. A curated database of miRNA-mediated feed-forward loops involving MYC as master regulator. PLoS ONE, 6:e14742.

12.

Friard

, Re

, Taverna

et al. 2010. CircuitsDB: a database of mixed microRNA/transcription factor feed-forward regulatory circuits in human and mouse. BMC Bioinform., 11:435.

13.

Friedman

R.C.

, Farh

K.K.

, Burge

C.B.

et al. 2009. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res., 19:92–105.

14.

Griffiths-Jones

, Grocock

R.J.

, van Dongen

et al. 2006. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res., 34:D140-4.

15.

Hornstein

, Shomron

2006. Canalization of development by microRNAs. Nat. Genet., 38:S20–S24.

16.

Jiang

, Wang

, Hao

et al. 2009. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res., 37:D98–D104.

17.

Jung

C.H.

, Hansen

M.A.

, Makunin

I.V.

et al. 2010. Identification of novel non-coding RNAs using profiles of short sequence reads from next generation sequencing data. BMC Genomics, 11:77.

18.

Kalir

, Mangan

, Alon

2005. A coherent feed-forward loop with a SUM input function prolongs flagella expression in Escherichia coli. Mol. Syst. Biol., 12005.0006.

19.

Kaplan

, Bren

, Dekel

et al. 2008. The incoherent feed-forward loop can generate non-monotonic input functions for genes. Mol. Syst. Biol., 4:203.

20.

Karbiener

, Fischer

, Nowitsch

et al. 2009. microRNA miR-27b impairs human adipocyte differentiation and targets PPARgamma. Biochem. Biophys. Res. Commun., 390:247–251.

21.

Khanin

, Vinciotti

2008. Computational modeling of post-transcriptional gene regulation by microRNAs. J. Comput. Biol., 15:305–316.

22.

Lagos-Quintana

, Rauhut

, Lendeckel

et al. 2001. Identification of novel genes coding for small expressed RNAs. Science, 294:853–858.

23.

Lagos-Quintana

, Rauhut

, Meyer

et al. 2003. New microRNAs from mouse and human. RNA, 9:175–179.

24.

Levine

, Ben Jacob

, Levine

2007a. Target-specific and global effectors in gene regulation by MicroRNA. Biophys. J., 93:L52-4.

25.

Levine

, Zhang

, Kuhlman

et al. 2007b. Quantitative characteristics of gene regulation by small RNA. PLoS Biol., 5:e229.

26.

Lewis

B.P.

, Burge

C.B.

, Bartel

D.P.

2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell, 120:15–20.

27.

Lewis

B.P.

, Shih

I.H.

, Jones-Rhoades

M.W.

et al. 2003. Prediction of mammalian microRNA targets. Cell, 115:787–798.

28.

, Cassidy

J.J.

, Reinke

C.A.

et al. 2009. A microRNA imparts robustness against environmental fluctuation during development. Cell, 137:273–282.

29.

Macia

, Widder

, Sole

2009. Specialized or flexible feed-forward loop motifs: a question of topology. BMC Syst. Biol., 3:84.

30.

Mangan

, Alon

2003. Structure and function of the feed-forward loop network motif. Proc. Natl. Acad. Sci. USA, 100:11980–11985.

31.

Milo

, Shen-Orr

, Itzkovitz

et al. 2002. Network motifs: simple building blocks of complex networks. Science, 298:824–827.

32.

Osella

, Bosia

, Corà

et al. 2011. The role of incoherent microRNA-mediated feedforward loops in noise buffering. PLoS Comput. Biol., 7:e1001101.

33.

, Cora

, Taverna

et al. 2009. Genome-wide survey of microRNA-transcription factor feed-forward regulatory circuits in human. Mol. Biosyst., 5:854–867.

34.

Scheideler

, Elabd

, Zaragosi

L.E.

et al. 2008. Comparative transcriptomics of human multipotent stem cells during adipogenesis and osteoblastogenesis. BMC Genomics, 9:340.

35.

Selbach

, Schwanhausser

, Thierfelder

et al. 2008. Widespread changes in protein synthesis induced by microRNAs. Nature, 455:58–63.

36.

Shalgi

, Lieber

, Oren

et al. 2007. Global and local architecture of the mammalian microRNA-transcription factor regulatory network. PLoS Comput. Biol., 3:e131.

37.

Shen-Orr

S.S.

, Milo

, Mangan

et al. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet., 31:64–68.

38.

Shimoni

, Friedlander

, Hetzroni

et al. 2007. Regulation of gene expression by small non-coding RNAs: a quantitative view. Mol. Syst. Biol., 3:138.

39.

Tsang

, Zhu

, van Oudenaarden

2007. MicroRNA-mediated feedback and feedforward loops are recurrent network motifs in mammals. Mol. Cell, 26:753–767.

40.

Vohradsky

, Panek

, Vomastek

2010. Numerical modelling of microRNA-mediated mRNA decay identifies novel mechanism of microRNA controlled mRNA downregulation. Nucleic Acids Res., 38:4579–4585.

41.

T.T.

, Vohradsky

2007. Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae. Nucleic Acids Res., 35:279–287.

42.

C.I.

, Shen

, Tang

2009. Evolution under canalization and the dual roles of microRNAs: a hypothesis. Genome Res., 19:734–743.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.64 MB