Elsevier

Computational Biology and Chemistry

Volume 47, December 2013, Pages 198-206
Computational Biology and Chemistry

Using gene expression programming to infer gene regulatory networks from time-series data

https://doi.org/10.1016/j.compbiolchem.2013.09.004Get rights and content

Highlights

  • Gene expression programing is an effective method to evolve ordinary differential equations (ODEs) model from observed time-series data.

  • We propose a least mean square method to optimize parameters of ODEs model.

  • The partitioning is used to decrease the complexity of the genetic network inference problem, in the process of identification of structure of system.

Abstract

Gene regulatory networks inference is currently a topic under heavy research in the systems biology field. In this paper, gene regulatory networks are inferred via evolutionary model based on time-series microarray data. A non-linear differential equation model is adopted. Gene expression programming (GEP) is applied to identify the structure of the model and least mean square (LMS) is used to optimize the parameters in ordinary differential equations (ODEs). The proposed work has been first verified by synthetic data with noise-free and noisy time-series data, respectively, and then its effectiveness is confirmed by three real time-series expression datasets. Finally, a gene regulatory network was constructed with 12 Yeast genes. Experimental results demonstrate that our model can improve the prediction accuracy of microarray time-series data effectively.

Introduction

The increasing availability of high-throughput measurements of transcripts has presented a golden opportunity to infer gene regulatory networks. Measuring the levels of gene expression in different conditions is useful in medical diagnosis, treatment, and drug design (Huang et al., 2010, Sun and Hurley, 2009). Many gene expression experiments produce time-series data with only a few time points owing to the high measurement costs. Accurate prediction of the behavior of gene regulatory networks (GRNs) will also speed up biotechnological projects; as such predictions are quicker and cheaper than lab experiments. Therefore, it is highly desired to infer the model of gene regulatory networks using gene expression time-series data. How to predict gene regulatory networks has become an important research area in bioinformatics.

Recently, many dynamic modeling of gene regulatory networks from time-series data has received more and more research interest (De Jong, 2002, Karlebach and Shamir, 2008), such as Boolean network (Akutsu et al., 1999, Bornholdt, 2008), dynamic Bayesian networks (Ghahramani, 1998, Murphy and Mian, 1999, Liu et al., 2006), neural networks (Lee and Yang, 2008), differential equations (De Jong, 2002, Chen et al., 1999, De Hoon et al., 2002, D’haeseleer et al., 1999), state-space model (Wu et al., 2004), stochastic model (Wang et al., 2008, Wang et al., 2010), and so no. A recent review to infer gene regulatory networks based on data integration in dynamical models can be seen in reference Hecker et al. (2009). The system of ordinary differential equations (ODEs) is a powerful and flexible model to describe complex relations, so many methods are proposed to infer genetic regulatory systems using ODEs. For example, Li et al. (2011) have proposed a new hybrid algorithm integrating ordinary differential equation models with local dynamic Bayesian network to infer gene regulatory network. Vilela et al. (2009) identified neutral biochemical network models from time-series data, combining Monte Carlo to optimize the parameters. Zhou et al. (2012) reconstructed GRN from time-series microarray data using stepwise multiple linear regression. Yang et al. (2012) proposed flexible neural tree model which is used for gene regulatory network reconstruction and time-series prediction from gene expression profiling. Unfortunately, most results reported on ODEs have been focused on fix structure of equations which describe the gene regulatory networks and the only one goal was to optimize parameters and coefficients. So it is the motivation of this paper to develop a system biology approach to determine the suitable form of equations which describe the network and to infer reverse engineer gene regulatory network from time-series data with higher accuracy and better scalability.

In our study, we cope with an arbitrary form in the right-hand side of the ODEs models. In order to identify the models, gene expression programming (GEP) is utilized to evolve the right-hand side of the ODEs from the observed time-series gene expression dataset. GEP is a new evolutionary algorithm which has good performance to solve time-series prediction problem (Zuo et al., 2004). To decrease the complexity of the genetic network inference problem, the partitioning (Bongard and Lipson, 2007) is used in the process of identification of structure of system. Each ODE can be inferred separately and the research space reduces rapidly. In this paper, two synthetic time-series datasets obtained by E-cell system (Tomita et al., 1999) and three other real microarray datasets from Worm gene expression time-series dataset (Yeung et al., 2001), Human cell time-series dataset (Whitfield et al., 2002) and Yeast time-series dataset (Woolf and Wang, 2000, Schneider and Guarente, 1991) are used to test our method. Finally, a gene regulatory network was constructed with Yeast time-series dataset. Experiment results show that our method is capable of improving the prediction accuracy of microarray time-series data effectively.

Section snippets

Modeling gene regulation with ordinary differential equation

Ordinary differential equations (ODEs) is one of the most popular tools to model complex systems, which basic relationships are known between the system components. In the inverse problem, we often use ODEs to analysis the model from the observed time-series data.

To allow the flexibility of the model, we consider the following general form:dxidt=fi(x1,x2,,xn)(i=1,2,3,,n),where xi the state is variable and n is the number of the observed data time points.

In order to identify the system, GEP

Experimental results and discussion

In this paper, to confirm the effectiveness of the proposed approach, at first, it has been applied to a synthetic genetic network inference problem. For this, we have considered both the noise-free and noisy data. Even with the presence of noise, the proposed approach has successfully reverse engineer the network from the synthetic data. Afterwards, this approach is tested using three real-world gene expression time-series datasets. Specifically, the accuracy of inferring gene regulatory

Conclusion

In conclusion, we have developed a new algorithm, ordinary differential equation integrated gene expression programming and least mean square, for inference of gene regulatory networks from time-series data. Our method has two advantages: (1) using GEP method could succeed in creating the gene regulatory networks of ODEs model, which are very close to the targeted system; (2) with partitioning, we can acquire the best model very fast, and each node of the genetic regulatory network can be

Acknowledgements

We thank the anonymous reviewers for their constructive comments and criticisms that helped a lot in improving the quality of the paper. We also thank Dr. Yong Cong for useful discussions and suggestions.

This research was supported by the National Natural Science Foundation of China (60972131), Scientific Research Foundation for Returned Scholars, Ministry of Education of China (20111139), Science and Technology Support Foundation of Sichuan Province (2011GZ0201), Science and Technology

References (42)

  • M. De Hoon

    Inferring gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations

  • H. De Jong

    Modeling and simulation of genetic regulatory systems: a literature review

    Journal of Computational Biology

    (2002)
  • L. Duan

    Distance guided classification with gene expression programming

    Advanced Data Mining and Applications

    (2006)
  • L. Duan

    Mining class contrast functions by gene expression programming

    Advanced Data Mining and Applications

    (2009)
  • C. Ferreira

    Gene expression programming: A new adaptive algorithm for solving problems

    arXiv preprint cs/0102027

    (2001)
  • C. Ferreira

    Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Studies in Computational Intelligence)

    (2006)
  • C. Ferreira

    Designing neural networks using gene expression programming

    Applied Soft Computing Technologies: The Challenge of Complexity

    (2006)
  • Z. Ghahramani

    Learning dynamic Bayesian networks

    Adaptive Processing of Sequences and Data Structures

    (1998)
  • H. Huang

    Bayesian approach to transforming public gene expression repositories into disease diagnosis databases

    Proceedings of the National Academy of Sciences

    (2010)
  • V.K. Karakasis et al.

    Data mining based on gene expression programming and clonal selection

  • V.K. Karakasis et al.

    Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection

    IEEE Transactions on Evolutionary Computation

    (2008)
  • Cited by (0)

    View full text