Estimation of Ordinary Differential Equation Models for Gene Regulatory Networks Through Data Cloning

Abstract

Ordinary differential equations (ODEs) are widely used for elucidating dynamic processes in various fields. One of the applications of ODEs is to describe dynamics of gene regulatory networks (GRNs), which is a critical step in understanding disease mechanisms. However, estimation of ODE models for GRNs is challenging because of inflexibility of the model and noisy data with complex error structures such as heteroscedasticity, correlations between genes, and time dependency. In addition, either a likelihood or Bayesian approach is commonly used for estimation of ODE models, but both approaches have benefits and drawbacks in their own right. Data cloning is a maximum likelihood (ML) estimation method through the Bayesian framework. Since it works in the Bayesian framework, it is free from local optimum problems that are common drawbacks of ML methods. Also, its inference is invariant for the selection of prior distributions, which is a major issue in Bayesian methods. This study proposes an estimation method of ODE models for GRNs through data cloning. The proposed method is demonstrated through simulation and it is applied to real gene expression time-course data.

1. INTRODUCTION

Ordinary differential equation (ODE) models are widely used for elucidating dynamic processes in various fields such as engineering, physics, chemistry, and biomedical sciences. One of the applications of ODEs is to describe dynamics of gene regulatory networks (GRNs). From GRNs, we identify genetic pathways and gene interactions, as well as the genetic causes of human diseases. Thus, identification of GRNs is a critical and fundamental step in understanding disease mechanisms and developing new diagnostic methods, drugs, and therapies. However, since GRNs show only connections and/or directions of genes, it is not sufficient for understanding whole gene regulation processes. Hence, dynamics of GRNs should be addressed and ODEs are often used for this purpose.

ODEs for GRNs provide information about the change in gene expressions, and they are constructed using functional relationships of genes identified by the GRNs and trajectories observed from time-course gene expression data (Bachmann et al, 2012; Kim, 2016). Once ODEs for GRNs are specified, ODE parameters should be estimated with observed time-course gene expression data. However, the estimation of ODE parameters is an important but challenging problem because of inflexible model nature of ODEs and noisy data. Time-course gene expression data typically have complex error structures such as heteroscedasticity, correlations between genes, and time dependency because they are usually obtained from high-throughput experiments, which are performed on biological samples using RNA-sequencing or microarray technologies at discrete time points. Hence, these characteristics of the data exacerbate the difficulties of estimating ODE models for GRNs.

To estimate ODE models, either a likelihood or Bayesian approach is commonly used. When we simply focus on fitting ODE solution to observed trajectories, the least square method can be used (Bates and Watts, 1988; Hemker, 1972; Li et al, 2005; Seber and Wild, 1989). Under independent Gaussian errors with constant variance, the least square estimation is equivalent to the maximum likelihood (ML) estimation. Various likelihood estimation methods for ODEs have been developed so far. Himmelblau et al (1967) proposed a method to convert ODEs into a nonlinear system using numerical quadrature, and de Boor and Swartz (1973) introduced a method for approximating ODE solutions using piecewise polynomial functions.

Bock (1981) developed a method for applying numerical optimization to individual time interval segments, which is a partition of the entire time interval. Varah (1982) proposed a two-step estimation method using regression splines. Also, Liang and Wu (2008) used a similar idea to two-step nonparametric regression in the measurement error framework. Ramsay et al (2007) proposed a two-step smoothing method known as generalized profiling method, which jointly takes into account smoothing of data trajectories and estimation of ODE parameters.

However, the likelihood approaches require optimizations for the estimation of ODE models, and these tasks are not easy. In practice, efficient optimization algorithms are required and their convergence should be guaranteed. Also, since the likelihood function surface is sensitive to ODE parameters, they may fail to attain the global optimum.

In contrast, Bayesian approaches are free from the optimizations and local optimum problems because they use sampling techniques to obtain the posterior distributions of ODE parameters. Huang et al (2006) and Huang and Wu (2006) estimated the ODE parameters of a HIV dynamic system using a hierarchical Bayesian method based on numerical ODE solutions. Another Bayesian method is to utilize the Gaussian process, which can provide a distribution over both fitted trajectories and ODEs (see Calderhead et al, 2009; Chkrebtii et al, 2016; Wang and Barber, 2014). Bhaumik and Ghosal (2015) extended the two-step smoothing idea into Bayesian estimation, and Huang et al (2020) proposed a one-step generative Bayesian model that directly combines nonparametric regression functions and ODEs. However, for small size of data, the posterior inference could depend much on prior information and it might not yield reliable inference (Kim, 2016).

To estimate ODE models for GRNs, Cao and Zhao (2008) and Campbell and Chkrebtii (2013) employed the generalized profiling method proposed by Ramsay et al (2007) under the assumptions of independence and constant error variance. Kim and Kim (2018) proposed an ODE model considering the complex error structures of time-course gene expression data and they developed an iterative estimation algorithm based on the generalized profiling method. However, this algorithm necessitates solving optimization problems iteratively. Hence, it is not easy to find the global optimum. On the contrary, if we consider Bayesian approaches for ODE models for GRNs, prior information could have a significant impact on the inference because gene experiments typically yield small size of data.

To overcome the drawbacks of both likelihood and Bayesian methods in ODE models for GRNs, this study proposes an estimation method based on data cloning. Data cloning was introduced by Lele et al (2007) and it provides ML estimates (MLEs) and their standard errors (SEs) using the Bayesian framework. It is a full Bayesian model that uses proper prior distributions and the likelihood function for K copies (clones) of original data. It is particularly useful for likelihood inference of complex models such as state-space models, mixed effect models, and nonlinear dynamic models (Lele et al, 2010; Lele et al, 2007). Although data cloning is an ML method, it is free from local optimum problems because it does not require any optimizations. Also, its inference is completely invariant to the choice of prior distributions. Therefore, the method proposed in this study can provide more reliable and accurate inference for ODE models for GRNs.

We introduce some background of this study in Section 2 and we propose an estimation method for ODE models for GRNs based on data cloning in Section 3. The proposed estimation method is verified through a simulation study, and then it is applied to the inference of the ODE model describing a GRN for zebrafish retina cell.

2. PRELIMINARIES

2.1. ODE model for GRN

In general, GRN analyses identify connections between transcription factors and targets as well as describe the dynamics of the identified networks. For the latter, ODEs are often used and they are helpful for understanding the whole gene regulation process (e.g., de Jong, 2002; Endy and Brent, 2001; Hasty et al, 2001; Karlebach and Shamir, 2008). ODEs for GRNs describe change of gene expression levels using their derivatives as follows: $\frac{d x (t)}{d t} = f (x (t) | θ), 0 \leq t \leq T,$ (1)

where is a vector representing gene expression levels of d genes, f is a $d \times 1$ vector of d ODEs describing the change of $x (t)$ over time t, and $θ$ is a vector of ODE parameters.

Time-course gene expression data are often collected from high-throughput gene experiments performed at discrete time points, and they quantify mRNA abundance for a huge number of genes at a time. Thus, time-course gene expression data typically have complex error structures. Subramaniam and Hsiao (2012) indicated that gene expression data from such experiments usually have heteroscedastic error depending on the expression level. In addition, since gene expression levels are measured for the same biological samples at each time point, the data have correlation between genes as well as their time dependency. ODEs are specified using a GRN and trajectories from time-course gene expression data.

Therefore, ODE models for such data should consider the complex but systematic error structures for more accurate prediction. To consider the complex error structures, Kim and Kim (2018) introduced a model as follows: $y (t) = x (t) + Σ_{t}^{1 ∕ 2} \in_{t}, t = t_{1}, \dots, t_{N},$ (2)

where is a vector of expression levels of d genes observed at time t, is a vector of ODE solutions from Equation (1), $Σ_{t}$ is a $d \times d$ diagonal matrix with variance functions $σ_{i}^{2} (x_{i} (t)), i = 1, \dots, d$ as diagonal elements, and $\in_{t}$ is a random vector with $E (\in_{t}) = 0$ .

The variance function $σ_{i}^{2} (x_{i} (t))$ captures the heteroscedastic error pattern for the ith gene by is a parameter vector of the variance function for the ith gene. To illustrate correlation between genes and time dependency in the model, a vector autoregressive (VAR) model for the random vectors can be used. By assuming the Markov property, the VAR model with lag one is defined by $\in_{t_{j}} = B \in_{t_{j - 1}} + δ_{t_{j}}, j = 1, \dots, N$ , where B is a $d \times d$ coefficient matrix and $δ_{t_{1}}, \dots, δ_{t_{N}} \sim^{i i d} M V N (0, Ω)$ .

2.2. Data cloning

Data cloning proposed by Lele et al (2007) is an ML estimation method using Bayesian MCMC, and it is particularly useful for estimating complex models such as hierarchical models, state-space models, mixed-effect models, and nonlinear dynamic models. Furthermore, this method is free from sensitivity issues associated with the selection of prior distributions. For a given problem, data cloning requires a full Bayesian model with proper prior distributions for all model parameters. The difference between data cloning and traditional Bayesian methods is that data cloning employs the likelihood function for K copies of the observed data $z = (z_{1}, \dots, z_{N})$ , rather than the likelihood function for the data.

Under suitable regularity conditions, Walker (1969) showed that as the sample size increases, the posterior distribution converges to a multivariate normal (MVN) distribution with the MLE as the mean vector and the inverse of the Fisher information matrix as the covariance matrix. It means that the posterior distribution converges to the asymptotic distribution of the MLE, and for large sample sizes, the likelihood and Bayesian inferences become similar to each other.

If we assume that the K copies are independent of each other, the likelihood function of the K copies is given by , where $θ$ is the parameter vector. In that case, notice that $θ$ maximizing is exactly same as the maximum point of $L (θ | z)$ , which is the likelihood function for the observed data. Moreover, the Fisher information matrix for , where $I_{n} (\hat{θ})$ is the Fisher information matrix based on $L (θ | z)$ .

Let $π_{K} (θ | z)$ be the posterior distribution with the K-cloned likelihood function and the prior distribution $π (θ)$ . Lele et al (2010) proved that, under regularity conditions, $π_{K} (θ | z)$ converges to MVN with the mean vector equals the MLEs and the covariance matrix equals $K^{- 1} I_{n}^{- 1} (\hat{θ})$ as K goes to infinity. The regularity conditions are as follows: (1) $L (\hat{θ} | z) > 0$ and $π (\hat{θ}) > 0$ , where $\hat{θ}$ is the MLE. (2) $π (\cdot)$ is continuous at $\hat{θ}$ and the likelihood function has continuous second derivatives in a neighborhood of $\hat{θ}$ . (3) The likelihood function has a unique mode at the MLE but possibly multiple smaller peaks.

The proof of Lele et al (2010) does not directly derived from the results of Walker (1969) because we cannot have independent K copied data sets in reality. It considers $π_{K} (θ | z)$ as another distribution defined over the parameter space, not the K-cloned posterior distribution. By doing so, $π_{K} (θ | z)$ can be simply considered as a function of the observed data z and $π (\cdot)$ . Hence, this proof comes from deterministic convergences of a sequence of functions, not probabilistic convergences employed in Walker (1969) (for the proof, see Lele et al, 2010).

If we use this result called data cloning, we do not have to solve complex optimizations to obtain the MLEs and their asymptotic SEs. We can generate samples of the K-cloned posterior, $π_{K} (θ | z)$ , using usual MCMC techniques, and the posterior means and variances can be obtained from the sample. Then, the posterior means are the MLEs and K times the variances become the asymptotic variance of the MLEs. In addition, this inference is not sensitive to the selection of the prior distributions. Note that data cloning does not improve the efficiency of the estimator because the efficiency depends on the sample size, not on the number of clones K.

3. METHODS

3.1. Bayesian model

In general, ML estimations for the ODE model in Equation (2) require to solve multiple optimization problems iteratively because it needs to estimate both ODE parameters and nuisance parameters. Since ODEs are inflexible models and gene expression data are noisy, it is not easy to solve the optimizations and it might fail to find the global optimum. To overcome this problem, we propose an estimation method for the ODE model of Equation (2) through data cloning.

As mentioned in Section 2.2, data cloning requires a full Bayesian model. To construct a full Bayesian model for Equation (2), we first derive the likelihood function for time-course gene expression data. Since the VAR model with lag one and MVN distribution are assumed in the ODE model of Equation (2), the likelihood function is given by

where $y = {(y {(t_{1})}^{⊤}, \dots, y {(t_{N})}^{⊤})}^{⊤}$ is a data vector, $p (y (t_{1}); θ, α, B, Ω)$ is the probability density function (pdf) of the MVN distribution with the mean vector $x (t_{1})$ and covariance matrix $Σ_{t_{1}}^{1 ∕ 2} V a r (\in_{t_{1}}) Σ_{t_{1}}^{1 ∕ 2}$ , and $p (y_{t_{j}} | y_{t_{j - 1}}; θ, α, B, Ω)$ is the conditional pdf of the MVN distribution with the mean vector $x (t_{j}) + Σ_{t_{j}}^{1 ∕ 2} B \in_{t_{j - 1}}$ and covariance matrix $Σ_{t_{j}}^{1 ∕ 2} Ω Σ_{t_{j}}^{1 ∕ 2}$ .

Now, we need to specify the prior distributions for the ODE parameter vector $θ$ and nuisance parameters $(α, B, Ω)$ . The inference from data cloning is invariant for the selection of prior distributions. As noninformative proper prior distributions, we consider uniform distributions with wide support that covers the domains of parameters. Meanwhile, as informative proper prior distributions, we consider the followings:

where MVNL represents a multivariate log-normal distribution and IW indicates an inverse Wishart distribution. As a prior distribution for the ODE parameter vector $θ$ , we can consider either an MVN distribution or an MVLN distribution. If each transcription factor in ODEs is known to be an activator or an inhibitor, we can use an MVLN prior distribution for ODE parameters. In that case, the sign of each ODE parameter should be specified because the log-normal pdf has positive support. On the contrary, if the transcription factor is unknown or its effect is inferred, we can consider an MVN prior distribution. For the variance function parameter $α$ , we assume an MVLN prior distribution with positive support to describe an exponentially increasing pattern. Also, we can use an MVN prior distribution for the vectorization of the coefficient matrix B , and we can assume an IW prior distribution for the covariance matrix $Ω$ .

From the likelihood function and prior distributions, we can have the posterior distribution for the original data set as follows: $π_{1} (θ, τ | y) = \frac{L (θ, τ | y) π (θ) π (τ)}{\int \int L (θ, τ | y) π (θ) π (τ) d τ d θ},$ (5)

where $τ = {(α^{⊤}, v e c {(B)}^{⊤}, v e c {(Ω)}^{⊤})}^{⊤}$ is the nuisance parameter vector, $L (θ, τ | y)$ is $L (θ, α, B, Ω | y)$ of Equation (3), and $π (τ) = π (α) π (B) π (Ω)$ by assuming independent prior distribution.

3.2. Estimation through data cloning

Suppose that we independently repeat the same gene experiment K times and we obtain independent K data sets with the same values as the original time-course data y . Then, the likelihood function for the K-copied data set becomes . Notice that, of course, we cannot have such independently repeated gene experiments in reality and the data cloning does not assume such independent experiments as mentioned in Section 2.2.

The posterior distribution for the K-cloned data is given by $π_{K} (θ, τ | y) = \frac{{[L (θ, τ | y)]}^{K} π (θ) π (τ)}{\int \int {[L (θ, τ | y)]}^{K} π (θ) π (τ) d τ d θ} .$ (6)

Let $(\hat{θ}, \hat{τ})$ be the MLE of $(θ, τ)$ . Also, suppose that the prior distributions $π (θ)$ and $π (τ)$ are positive over the entire parameter space. Then, since $L (\hat{θ}, \hat{τ} | y) > L (θ, τ | y)$ for all $(θ, τ)$ in the parameter space, as K increases, we have $\frac{π_{K} (θ, τ | y)}{π_{K} (\hat{θ}, \hat{τ} | y)} = \frac{{[L (θ, τ | y)]}^{K} π (θ) π (τ)}{{[L (\hat{θ}, \hat{τ} | y)]}^{K} π (\hat{θ}) π (\hat{τ})} \to \{\begin{matrix} 0, i f (θ, τ) \neq (\hat{θ}, \hat{τ}) \\ 1, i f (θ, τ) = (\hat{θ}, \hat{τ}) \end{matrix} .$ (7)

Equation (7) means that as $K \to \infty$ , the posterior distribution $π_{K} (θ, τ | y)$ is degenerated at the MLE $(\hat{θ}, \hat{τ})$ , and this degeneration is independent of the prior distribution $π (θ)$ and $π (τ)$ .

The posterior distribution $π_{K} (θ, τ | y)$ of Equation (6) can be obtained using usual MCMC techniques and this study uses the adaptive Metropolis algorithm (Haario et al, 2001). From the data cloning, for large enough K, the posterior means for ODE parameters are the MLEs of $θ$ and the $\sqrt{K}$ -fold standard deviations of the posterior distribution are the asymptotic SEs of the MLEs. Thus, the data cloning method requires both large enough N and K to attain accurate inference for ODE parameters.

It is not easy to increase the sample size due to the cost and time for experiments, but we can easily increase the number of clones K. As shown in Equation (7), when K is large enough, the posterior distribution is almost degenerated and we can have the MLEs and their SEs closer and closer to the true estimates. Hence, to find an adequate number K, Lele et al (2010) proposed to use K when the ratio of the largest eigenvalues of the posterior covariance matrix from K-cloned data to that from the original data is less than a threshold. We follow their rule in the simulation and real data analysis.

4. SIMULATION

To verify the performance of the proposed estimation method, simulation* is performed in this section. In the simulation, we generate time-course data from the model of Equation (2). For the simulation, the following ODEs for three genes are considered: $\begin{matrix} \frac{d x_{1} (t)}{d t} = θ_{1} - θ_{2} x_{1} (t), \frac{d x_{2} (t)}{d t} = θ_{2} x_{3} (t) - θ_{4} x_{2} (t), \\ \frac{d x_{3} (t)}{d t} = θ_{5} \frac{x_{1} (t)}{1 + x_{1} (t)} - θ_{6} x_{3} (t), \end{matrix}$ (8)

Equation (8) presents a nonlinear ODEs for three genes. The specific setting for the model of Equation (2) is as follows: The gene expression levels at the initial time , ODE parameters $θ_{m} = 1, m = 1, \dots, 6$ , the coefficients of the variance function $α_{1 i} = 3$ and $α_{2 i} = 1, i = 1, 2, 3$ , and $\in_{t}, t = t_{1}, \dots, t_{N}$ have the VAR(1) model as described in Section 2.1. For the simulation, we consider the number of time points N = 30 and N = 50, and the data generation is iterated 200 times for each number of time points. An example of simulated time-course data and the ODE trajectories are shown in Figure 1.

FIG. 1.

True ODE trajectories and fitted ODE trajectories with data cloning. ODE, ordinary differential equation.

To verify the invariance property for the selection of prior distributions, we consider both informative and noninformative prior distributions. For the informative priors, Equation (4) is considered. To evaluate the accuracy of SE estimates for MLEs, we obtain an SE estimate for MLE of each ODE parameter using the Monte Carlo (MC) method. In the MC method, we estimate the MLE for each simulated data set by directly solving the maximization problem for the likelihood function of Equation (3), and then we obtain the SE estimate by computing standard deviation of 200 MLEs. In addition, we compute the coverage probabilities (CPs) for 90% and 95% confidence intervals (CIs) to investigate the accuracy of statistical inference from the proposed method.

For each simulated time-course data set, we obtain the MLE and its asymptotic SE estimate using the proposed data cloning method. Based on the MLE and its asymptotic SE estimate, we can construct the CI under the asymptotic normality. By counting the number of times that the CI includes the true parameter value among 200 CIs, we can compute CP. In this simulation, the performance of the proposed method is evaluated in terms of bias, SE, and CP.

Table 1 shows the mean of ODE parameter estimates, their biases in absolute value, the mean of their SEs, and 90% and 95% CPs obtained from the proposed data cloning method with informative and noninformative prior distributions, respectively. As shown in Table 1, for all statistics, there are no significant differences between informative and noninformative priors. It demonstrates that the proposed method meets the invariant property for the selection of prior distributions.

Table 1.

Simulation Results for the Proposed Data Cloning Method

N	Method	Estimate	$θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4}$	$θ_{5}$	$θ_{6}$
30	MC	SE	0.026	0.032	0.061	0.051	0.053	0.066
		Mean	0.994	0.993	1.002	1.004	1.005	1.004
	DC	\|Bias\|	0.006	0.007	0.002	0.004	0.005	0.004
	Infor.	SE	0.022	0.030	0.057	0.049	0.050	0.061
	Prior	CP (90%)	0.870	0.860	0.900	0.895	0.900	0.900
		CP (95%)	0.915	0.905	0.950	0.935	0.955	0.940
		Mean	0.996	0.995	0.998	0.998	1.003	1.003
	DC	\|Bias\|	0.004	0.005	0.002	0.002	0.003	0.003
	NInfor.	SE	0.022	0.030	0.058	0.050	0.050	0.061
	Prior	CP (90%)	0.870	0.855	0.895	0.900	0.900	0.900
		CP (95%)	0.925	0.910	0.950	0.935	0.945	0.945
50	MC	SE	0.019	0.024	0.048	0.040	0.039	0.049
		Mean	1.000	1.000	1.003	1.004	1.005	1.004
	DC	\|Bias\|	0.000	0.000	0.003	0.004	0.005	0.004
	Infor.	SE	0.018	0.023	0.046	0.038	0.039	0.048
	Prior	CP (90%)	0.910	0.910	0.910	0.915	0.900	0.905
		CP (95%)	0.970	0.960	0.955	0.955	0.960	0.965
		Mean	0.998	0.998	1.004	1.004	1.005	1.004
	DC	\|Bias\|	0.002	0.002	0.004	0.004	0.005	0.004
	NInfor.	SE	0.018	0.023	0.046	0.038	0.039	0.049
	Prior	CP (90%)	0.900	0.900	0.900	0.900	0.905	0.905
		CP (95%)	0.975	0.955	0.955	0.955	0.960	0.965

CP, coverage probability; DC, data cloning; Infor., informative; MC, Monte Carlo; NInfor., noninformative; SE, standard error.

For both N = 30 and 50 cases, the biases of the ODE parameter estimates are relatively small and the estimated asymptotic SEs are also close to the SEs from the MC method. For both the ODE parameter estimates and asymptotic SEs, the differences slightly decrease as N increases. For accurate statistical inference, the CP should be greater than or equal to the confidence level. As shown in Table 1, although there exist many CP values less than their confidence level for N = 30 case, all CP values for N = 50 case are greater than or equal to their confidence level.

5. REAL DATA ANALYSIS

In this section, we estimate and infer parameters of an ODE model for a retina cell gene network of zebrafish using the method proposed in this study. The zebrafish eye anatomy is similar to that of humans, and its retina cells can restore vision. Thus, the GRN of zebrafish retina cells has been studied to develop treatments for impairment of human eye (Bibliowicz et al, 2011; Fumitaka et al, 2007; Qin et al, 2009).

Linder and Rempala (2013) proposed a network for three important genes, $S t a t 3, S o c s 3 b, H s p 70$ , in the regeneration process of retina cells of zebrafish using the algebraic statistical model, and it hypothesized that $S t a t 3$ is an activator of $S o c s 3 b$ and $H s p 70$ genes, and $S o c s 3 b$ is an inhibitor of $S t a t 3$ . To illustrate the dynamics of this GRN, the following ODE model was constructed based on their interactions and trajectories obtained from RNA-seq time-course data^†: $\begin{matrix} \frac{d S t a t 3 (t)}{d t} & = θ_{1} + θ_{2} S t a t 3 (t) - θ_{3} S t a t 3 (t) S o c s 3 b (t), \\ \frac{d S o c s 3 b (t)}{d t} & = θ_{4} + θ_{5} S t a t 3 (t) - θ_{6} S o c s 3 b (t), \\ \frac{d H s p 70 (t)}{d t} & = θ_{7} + θ_{8} S t a t 3 (t) - θ_{9} H s p 70 (t) . \end{matrix}$ (9)

In Equation (9), ODE parameters $θ_{5}$ and $θ_{8}$ describe the role of $S t a t 3$ as an activator of $S o c s 3 b$ and $H s p 70$ , respectively, and $θ_{3}$ is the ODE parameter related to the hypothesis of $S o c s 3 b$ inhibiting $S t a t 3$ . We now investigate the hypothesized network through the inference for the ODE parameters $θ_{3}$ , $θ_{5}$ , and $θ_{8}$ . For the inference, we use the proposed method with informative and noninformative prior distributions, respectively.

Table 2 provides the MLE, asymptotic SE, and 95% CI for each ODE parameter, and Figure 2 shows data points and the fitted lines obtained using informative and noninformative prior distributions. As shown in Table 2 and Figure 2, the outcomes of informative and noninformative prior distributions are strikingly similar. Similarly to the synthetic data analysis in Section 4, it confirms that the inference by the proposed method is invariant for the choice of prior distributions.

FIG. 2.

Data points and fitted ODE trajectories.

Table 2.

Estimation of Ordinary Differential Equation Parameters for Zebrafish Retina Cell Data

	Data cloning (Informative prior)			Data cloning (Noninformative prior)
	Estimate	SE	95% CI	Estimate	SE	95% CI
$θ_{1}$	−0.8717	0.3392	−1.537 to −0.207	−0.8605	0.3642	−1.574 to −0.147
$θ_{2}$	−0.0395	0.0606	−0.158 to 0.079	−0.0398	0.0649	−0.167 to 0.087
$θ_{3}$	−0.0087	0.0027	−0.014 to −0.003	−0.0087	0.0027	−0.014 to −0.003
$θ_{4}$	5.2335	1.4602	2.372 to 8.095	5.2079	1.5787	2.114 to 8.302
$θ_{5}$	−0.2031	0.0913	−0.382 to −0.024	−0.2026	0.1000	−0.399 to −0.007
$θ_{6}$	0.1193	0.0296	0.061 to 0.177	0.1182	0.0309	0.058 to 0.179
$θ_{7}$	4.7526	1.4091	1.991 to 7.514	4.6161	1.4821	1.711 to 7.521
$θ_{8}$	0.0301	0.0382	−0.045 to 0.105	0.0319	0.0408	−0.048 to 0.112
$θ_{9}$	0.3149	0.0645	0.188 to 0.441	0.3084	0.0686	0.174 to 0.443

95% CI, 95% confidence interval; SE, standard error.

To infer the hypothesized GRN, we need to see the MLEs and 95% CIs for $θ_{3}$ , $θ_{5}$ , and $θ_{8}$ . The estimates of both $θ_{3}$ and $θ_{5}$ are negative and their 95% CI do not include zero. This means that $S o c s 3 b$ might not inhibit $S t a t 3$ but rather activate $S t a t 3$ , and $S t a t 3$ might be an inhibitor of $S o c s 3 b$ . In addition, since the 95% CI for $θ_{8}$ includes zero, we can reject the hypothesis of $S t a t 3$ activating $H s p 70$ at the level of significance 0.05. Thus, this inference result contradicts the GRN proposed by Linder and Rempala (2013). This contradiction might occur due to the misspecification of the hypothesized GRN or/and the ODEs of Equation (9).

6. SUMMARY AND DISCUSSION

ODE models are often used to describe the dynamics of GRNs and it is critical to understand whole gene regulation processes. For accurate estimation of ODE models for GRNs, we should consider intrinsic attributes of time-course gene expression data such as heteroscedasticity, correlation between genes, and time dependency. However, it is not easy to accurately estimate ODE models with such complex error structures. ML estimation approaches for such models could have local optimum problems due to iterative optimizations, and Bayesian estimation could have sensitivity issues depending on the choice of prior distributions due to small size of data. To overcome these problems, this study proposes an estimation method based on data cloning, which is an ML method through Bayesian MCMC. Since it uses Bayesian framework, it is free from local optimum problems, and its inference is invariant for the choice of prior distributions.

However, similar to Bayesian methods for complex models, the proposed method still requires high computing cost for the convergence of MCMC. To reduce the computing cost, we can consider sequential Monte Carlo (SMC) algorithms, also known as particle filters (Del Moral et al, 2006; Doucet et al, 2000). SMC methods combine importance sampling and resampling algorithms. To obtain posterior distributions, it sequentially samples from a sequence of intermediate probability distributions defined on a common space. It is expected to efficiently obtain target distributions.

Footnotes

AUTHORs' CONTRIBUTIONS

Methodology, software, formal analysis, investigation, data curation, and writing—original draft by D.S. Conceptualization, methodology, validation, writing—original draft, writing—review and editing, supervision, project administration, and funding acquisition by J.K.

AUTHOR DISCLOSURE STATEMENT

The authors declare they have no conflicting financial interests.

FUNDING INFORMATION

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Nos. NRF-2018R1D1A1B07049818 and NRF-2022R1F1A1072444).

References

Bachmann

, Raue

, Schilling

, et al. Predictive mathematical models of cancer signalling pathways. J Intern Med, 2012; 271(2):155–165; doi: 10.1111/j.1365-2796.2011.02492.x

Bates

, Watts

. Nonlinear Regression Analysis and Its Applications. Wiley: New York; 1988.

Bhaumik

, Ghosal

. Bayesian two-step estimation in differential equation models. Electron J Stat, 2015; 9(2):3124–3154; doi: 10.1214/15-EJS1099

Bibliowicz

, Tittle

, Gross

. Towards a better understanding of human eye disease: Insights from the zebrafish, Danio rerio. Prog Mol Biol Transl Sci, 2011; 100:287; doi: 10.1016/B978-0-12-384878-9.00007-8

Bock

HG.

Numerical Treatment of Inverse Problems in Chemical Reaction Kinetics. In: Modelling of Chemical Reaction Systems. (Ebert KH, Deuflhard P, Jager W. eds.) Springer: New York, 1981; pp. 102–125.

Calderhead

, Girolami

, Lawrence

. Accelerating Bayesian inference over nonlinear differential equations with Gaussian processes. Adv Neural Inf Process Syst, 2009; 21:217–224.

Campbell

, Chkrebtii

. Maximum profile likelihood of differential equation parameters through model based smoothing state estimates. Math Biosci, 2013; 246(2):283–292; doi: 10.1016/j.mbs.2013.03.011

Cao

, Zhao

. Estimating dynamic models for gene regulation network. Bioinfomatics, 2008; 24(14):1619–1624; doi: 10.1093/bioinformatics/btn246

Chkrebtii

, Campbell

, Calderhead

, et al. Bayesian solution uncertainty quantification for differential equations. Bayesian Anal, 2016; 11(4):1239–1267; doi: 10.1214/16-BA1017

10.

de Boor

, Swartz

. Collocation at Gaussian points. SIAM J Numer Anal, 1973; 10(4):582–606; doi: 10.1137/0710052

11.

de Jong

Modeling and simulation of genetic regulatory systems: A literature review. J Comput Biol, 2002; 9(1):69–105; doi: 10.1089/10665270252833208

12.

Del Moral

, Doucet

, Jasra

. Sequential Monte Carlo samplers. J R Stat Soc Series B, 2006; 68(3):411–436; doi: 10.1111/j.1467-9868.2006.00553.x

13.

Doucet

, Godsill

, Andrieu

. On sequential Monte Carlo sampling methods for Bayesian filtering. Stat Comput, 2000; 10(3):197–208; doi: 10.1023/A:1008935410038

14.

Endy

, Brent

. Modelling cellular behaviour. Nature, 2001; 409:391–395; doi: 10.1038/35053181

15.

Fumitaka

, Sotaro

, Tadamichi

, et al. Wnt signaling promotes regeneration in the retina of adult mammals. J Neurosci, 2007; 27(15):4210–4219; doi: 10.1523/JNEUROSCI.4193-06.2007

16.

Haario

, Saksman

, Tamminen

. An adaptive Metropolis algorithm. Bernoulli, 2001; 7(2):223–242.

17.

Hasty

, McMillen

, Isaacs

, et al. Computational studies of gene regulatory networks: In numero molecular biology. Nat Rev Genet, 2001; 2:268–279; doi: 10.1038/35066056

18.

Hemker

Numerical Methods for Differential Equations in System Simulation and in Parameter Estimation. In: Analysis and Simulation of Biochemical Systems. (Hemker HC, Hess B. eds.) Elsevier: North Holland, 1972; pp. 59–80.

19.

Himmelblau

, Jones

, Bischoff

. Determination of rate constants for complex kinetics models. Ind Eng Chem Fundam, 1967; 6(4):539–543; doi: 10.1021/i160024a008

20.

Huang

, Handel

, Song

. A Bayesian approach to estimate parameters of ordinary differential equation. Comput Stat, 2020; 35:1481–1499; doi: 10.1007/s00180-020-00962-8

21.

Huang

, Liu

, Wu

. Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics, 2006; 62(2):413–423; doi: 10.1111/j.1541-0420.2005.00447.x

22.

Huang

, Wu

. A Bayesian approach for estimating antiviral efficacy in HIV dynamic models. J Appl Stat, 2006; 33(2):155–174; doi: 10.1080/02664760500250552

23.

Karlebach

, Shamir

. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Bio, 2008; 9:770–780; doi: 10.1038/nrm2503

24.

Kim

Validation and selection of ODE models for gene regulatory networks. Chemometr Intell Lab Syst, 2016; 157:104–110; doi: 10.1016/j.chemolab.2016.06.016

25.

Kim

, Kim

. Estimation of dynamic systems for gene regulatory networks from dependent time-course data. J Comput Biol, 2018; 25(9):987–996; doi: 10.1089/cmb.2018.0062

26.

Lele

, Dennis

, Lutscher

. Data cloning: Easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecol Lett, 2007; 10(7):551–563; doi: 10.1111/j.1461-0248.2007.01047.x

27.

Lele

, Nadeem

, Schmuland

. Estimability and likelihood inference for generalized linear mixed models using data cloning. J Am Stat Assoc, 2010; 105(492):1617–1625; doi: 10.1198/jasa.2010.tm09757

28.

, Osborne

, Pravan

. Parameter estimation of ordinary differential equations. IMA J Numer Anal, 2005; 25(2):264–285; doi: 10.1093/imanum/drh016

29.

Liang

, Wu

. Parameter estimation for differential equation models using a framework of measurement error in regression models. J Am Stat Assoc, 2008; 103(484):1570–1583; doi: 10.1198/016214508000000797

30.

Linder

, Rempala

. Algebraic statistical model for biochemical network dynamics inference. J Coupled Syst Multiscale Dyn, 2013; 1(4):1–7; doi: 10.1166/jcsmd.2013.1032

31.

Qin

, Barthel

, Raymond

. Genetic evidence for shared mechanisms of epimorphic regeneration in zebrafish. Proc Natl Acad Sci U S A, 2009; 106(23):9310–9315; doi: 10.1073/pnas.0811186106

32.

Ramsay

, Hooker

, Campbell

, et al. Parameter estimation for differential equations: A generalized smoothing approach (with discussion). J R Stat Soc Series B, 2007; 69(5):741–796; doi: 10.1111/j.1467-9868.2007.00610.x

33.

Seber

, Wild

. Nonlinear Regression. Wiley: New York; 1989.

34.

Subramaniam

, Hsiao

. Gene-expression measurement: Variance-modeling considerations for robust data analysis. Nat Immunol, 2012; 13:199–203; doi: 10.1038/ni.2244

35.

Varah

A spline least squares method for numerical parameter estimation in differential equations. SIAM J Sci Stat Comp, 1982; 3(1):28–46; doi: 10.1137/0903003

36.

Walker

AM.

On the asymptotic behaviour of posterior distributions. J R Stat Soc Series B, 1969; 31(1):80–88; doi: 10.1111/j.2517-6161.1969.tb00767.x

37.

Wang

, Barber

Gaussian Processes for Bayesian Estimation in Ordinary Differential Equations. In: Proceedings of the 31st International Conference on Machine Learning, ICML’14. (Xing EP, Jebara T. eds.) Beijing, China, 2014; pp. 1485–1493.