Using gene expression programming to infer gene regulatory networks from time-series data

doi:10.1016/j.compbiolchem.2013.09.004

Computational Biology and Chemistry

Volume 47, December 2013, Pages 198-206

https://doi.org/10.1016/j.compbiolchem.2013.09.004 Get rights and content

Highlights

•
Gene expression programing is an effective method to evolve ordinary differential equations (ODEs) model from observed time-series data.
•
We propose a least mean square method to optimize parameters of ODEs model.
•
The partitioning is used to decrease the complexity of the genetic network inference problem, in the process of identification of structure of system.

Abstract

Gene regulatory networks inference is currently a topic under heavy research in the systems biology field. In this paper, gene regulatory networks are inferred via evolutionary model based on time-series microarray data. A non-linear differential equation model is adopted. Gene expression programming (GEP) is applied to identify the structure of the model and least mean square (LMS) is used to optimize the parameters in ordinary differential equations (ODEs). The proposed work has been first verified by synthetic data with noise-free and noisy time-series data, respectively, and then its effectiveness is confirmed by three real time-series expression datasets. Finally, a gene regulatory network was constructed with 12 Yeast genes. Experimental results demonstrate that our model can improve the prediction accuracy of microarray time-series data effectively.

Graphical abstract

Introduction

The increasing availability of high-throughput measurements of transcripts has presented a golden opportunity to infer gene regulatory networks. Measuring the levels of gene expression in different conditions is useful in medical diagnosis, treatment, and drug design (Huang et al., 2010, Sun and Hurley, 2009). Many gene expression experiments produce time-series data with only a few time points owing to the high measurement costs. Accurate prediction of the behavior of gene regulatory networks (GRNs) will also speed up biotechnological projects; as such predictions are quicker and cheaper than lab experiments. Therefore, it is highly desired to infer the model of gene regulatory networks using gene expression time-series data. How to predict gene regulatory networks has become an important research area in bioinformatics.

Recently, many dynamic modeling of gene regulatory networks from time-series data has received more and more research interest (De Jong, 2002, Karlebach and Shamir, 2008), such as Boolean network (Akutsu et al., 1999, Bornholdt, 2008), dynamic Bayesian networks (Ghahramani, 1998, Murphy and Mian, 1999, Liu et al., 2006), neural networks (Lee and Yang, 2008), differential equations (De Jong, 2002, Chen et al., 1999, De Hoon et al., 2002, D’haeseleer et al., 1999), state-space model (Wu et al., 2004), stochastic model (Wang et al., 2008, Wang et al., 2010), and so no. A recent review to infer gene regulatory networks based on data integration in dynamical models can be seen in reference Hecker et al. (2009). The system of ordinary differential equations (ODEs) is a powerful and flexible model to describe complex relations, so many methods are proposed to infer genetic regulatory systems using ODEs. For example, Li et al. (2011) have proposed a new hybrid algorithm integrating ordinary differential equation models with local dynamic Bayesian network to infer gene regulatory network. Vilela et al. (2009) identified neutral biochemical network models from time-series data, combining Monte Carlo to optimize the parameters. Zhou et al. (2012) reconstructed GRN from time-series microarray data using stepwise multiple linear regression. Yang et al. (2012) proposed flexible neural tree model which is used for gene regulatory network reconstruction and time-series prediction from gene expression profiling. Unfortunately, most results reported on ODEs have been focused on fix structure of equations which describe the gene regulatory networks and the only one goal was to optimize parameters and coefficients. So it is the motivation of this paper to develop a system biology approach to determine the suitable form of equations which describe the network and to infer reverse engineer gene regulatory network from time-series data with higher accuracy and better scalability.

In our study, we cope with an arbitrary form in the right-hand side of the ODEs models. In order to identify the models, gene expression programming (GEP) is utilized to evolve the right-hand side of the ODEs from the observed time-series gene expression dataset. GEP is a new evolutionary algorithm which has good performance to solve time-series prediction problem (Zuo et al., 2004). To decrease the complexity of the genetic network inference problem, the partitioning (Bongard and Lipson, 2007) is used in the process of identification of structure of system. Each ODE can be inferred separately and the research space reduces rapidly. In this paper, two synthetic time-series datasets obtained by E-cell system (Tomita et al., 1999) and three other real microarray datasets from Worm gene expression time-series dataset (Yeung et al., 2001), Human cell time-series dataset (Whitfield et al., 2002) and Yeast time-series dataset (Woolf and Wang, 2000, Schneider and Guarente, 1991) are used to test our method. Finally, a gene regulatory network was constructed with Yeast time-series dataset. Experiment results show that our method is capable of improving the prediction accuracy of microarray time-series data effectively.

Section snippets

Modeling gene regulation with ordinary differential equation

Ordinary differential equations (ODEs) is one of the most popular tools to model complex systems, which basic relationships are known between the system components. In the inverse problem, we often use ODEs to analysis the model from the observed time-series data.

To allow the flexibility of the model, we consider the following general form: $\frac{d x_{i}}{d t} = f_{i} (x_{1}, x_{2}, \dots, x_{n}) (i = 1,2,3, \dots, n),$ where x_i the state is variable and n is the number of the observed data time points.

In order to identify the system, GEP

Experimental results and discussion

In this paper, to confirm the effectiveness of the proposed approach, at first, it has been applied to a synthetic genetic network inference problem. For this, we have considered both the noise-free and noisy data. Even with the presence of noise, the proposed approach has successfully reverse engineer the network from the synthetic data. Afterwards, this approach is tested using three real-world gene expression time-series datasets. Specifically, the accuracy of inferring gene regulatory

Conclusion

In conclusion, we have developed a new algorithm, ordinary differential equation integrated gene expression programming and least mean square, for inference of gene regulatory networks from time-series data. Our method has two advantages: (1) using GEP method could succeed in creating the gene regulatory networks of ODEs model, which are very close to the targeted system; (2) with partitioning, we can acquire the best model very fast, and each node of the genetic regulatory network can be

Acknowledgements

We thank the anonymous reviewers for their constructive comments and criticisms that helped a lot in improving the quality of the paper. We also thank Dr. Yong Cong for useful discussions and suggestions.

This research was supported by the National Natural Science Foundation of China (60972131), Scientific Research Foundation for Returned Scholars, Ministry of Education of China (20111139), Science and Technology Support Foundation of Sichuan Province (2011GZ0201), Science and Technology

References (42)

S. Ando
Evolutionary modeling and inference of gene network
Information Sciences
(2002)
M. Hecker
Gene regulatory network inference: data integration in dynamic models—A
Biosystems
(2009)
W.P. Lee et al.
A clustering-based approach for inferring recurrent neural networks as gene regulatory networks
Neurocomputing
(2008)
T.F. Liu
Model gene network by semi-fixed Bayesian network
Expert Systems with Applications
(2006)
T. Akutsu
Identification of genetic networks from a small number of gene expression patterns under the Boolean network model
J. Bongard et al.
Automated reverse engineering of nonlinear dynamical systems
Proceedings of the National Academy of Sciences
(2007)
S. Bornholdt
Boolean network models of cellular regulation: prospects and limitations
Journal of the Royal Society Interface
(2008)
J.C. Butcher
The numerical analysis of ordinary differential equations: Runge-Kutta and general linear methods:
(1987)
T. Chen
Modeling gene expression with differential equations
P. D’haeseleer
Linear modeling of mRNA expression levels during CNS development and injury

M. De Hoon

Inferring gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations

H. De Jong

Modeling and simulation of genetic regulatory systems: a literature review

Journal of Computational Biology

(2002)

L. Duan

Distance guided classification with gene expression programming

Advanced Data Mining and Applications

(2006)

L. Duan

Mining class contrast functions by gene expression programming

Advanced Data Mining and Applications

(2009)

C. Ferreira

Gene expression programming: A new adaptive algorithm for solving problems

arXiv preprint cs/0102027

(2001)

C. Ferreira

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Studies in Computational Intelligence)

(2006)

C. Ferreira

Designing neural networks using gene expression programming

Applied Soft Computing Technologies: The Challenge of Complexity

(2006)

Z. Ghahramani

Learning dynamic Bayesian networks

Adaptive Processing of Sequences and Data Structures

(1998)

H. Huang

Bayesian approach to transforming public gene expression repositories into disease diagnosis databases

Proceedings of the National Academy of Sciences

(2010)

V.K. Karakasis et al.

Data mining based on gene expression programming and clonal selection

V.K. Karakasis et al.

Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection

IEEE Transactions on Evolutionary Computation

(2008)

Cited by (0)

View full text

Using gene expression programming to infer gene regulatory networks from time-series data

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Modeling gene regulation with ordinary differential equation

Experimental results and discussion

Conclusion

Acknowledgements

Information Sciences

Biosystems

Neurocomputing

Expert Systems with Applications

Identification of genetic networks from a small number of gene expression patterns under the Boolean network model

Automated reverse engineering of nonlinear dynamical systems

Proceedings of the National Academy of Sciences

Boolean network models of cellular regulation: prospects and limitations

Journal of the Royal Society Interface

The numerical analysis of ordinary differential equations: Runge-Kutta and general linear methods:

Modeling gene expression with differential equations

Linear modeling of mRNA expression levels during CNS development and injury

Inferring gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations

Modeling and simulation of genetic regulatory systems: a literature review

Journal of Computational Biology

Distance guided classification with gene expression programming

Advanced Data Mining and Applications

Mining class contrast functions by gene expression programming

Advanced Data Mining and Applications

Gene expression programming: A new adaptive algorithm for solving problems

arXiv preprint cs/0102027

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Studies in Computational Intelligence)

Designing neural networks using gene expression programming

Applied Soft Computing Technologies: The Challenge of Complexity

Learning dynamic Bayesian networks

Adaptive Processing of Sequences and Data Structures

Bayesian approach to transforming public gene expression repositories into disease diagnosis databases

Proceedings of the National Academy of Sciences

Data mining based on gene expression programming and clonal selection

Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection

IEEE Transactions on Evolutionary Computation