Elsevier

Biosystems

Volume 80, Issue 2, May 2005, Pages 155-164
Biosystems

Reverse engineering of biochemical equations from time-course data by means of genetic programming

https://doi.org/10.1016/j.biosystems.2004.11.003Get rights and content

Abstract

Increased research aimed at simulating biological systems requires sophisticated parameter estimation methods. All current approaches, including genetic algorithms, need pre-existing equations to be functional. A generalized approach to predict not only parameters but also biochemical equations from only observable time-course information must be developed and a computational method to generate arbitrary equations without knowledge of biochemical reaction mechanisms must be developed. We present a technique to predict an equation using genetic programming. Our technique can search topology and numerical parameters of mathematical expression simultaneously. To improve the search ability of numeric constants, we added numeric mutation to the conventional procedure. As case studies, we predicted two equations of enzyme-catalyzed reactions regarding adenylate kinase and phosphofructokinase. Our numerical experimental results showed that our approach could obtain correct topology and parameters that were close to the originals. The mean errors between given and simulation-predicted time-courses were 1.6 × 10−5% and 2.0 × 10−3%, respectively. Our equation prediction approach can be applied to identify metabolic reactions from observable time-courses.

Introduction

Computational experiments are useful for gaining an understanding of many properties of basic biological systems. In cases where no parameters are available for in silico modeling, techniques to predict parameters are useful. Earlier studies (Gilman and Ross, 1995, Park et al., 1997, Mendes and Kell, 1998, Rodriguez-Acosta et al., 1999, Pinchuk et al., 2000, Hatakeyama et al., 2003, Moles et al., 2003) focused on parameter estimations based on pre-existing human-supplied equations using optimization methods such as genetic algorithms (Holland, 1975, Goldberg, 1989).

To construct a biochemical equation such as the Michaelis–Menten equation, the reaction mechanism must be known and in vitro experimental measurements are required to determine parameters. However, computational models from kinetic equations and parameters measured in vitro may not always represent the in vivo phenomena. Therefore, it is necessary to develop a method to dynamically predict equations that represent the biochemical processes given their time-course measured in vivo.

We have proposed a technique for the dynamic modeling of complicated network structures by combining a genetic algorithm and the S-system (Kikuchi et al., 2003). However, this method also requires the pre-definition of mathematical expressions such as differential equations for a target phenomenon.

A method to automatically create computer programs using genetic programming was proposed by Koza (1992); it is an evolutionary algorithm that applies Darwinian concepts such as selection, mutation, and crossover. The method succeeded in the automatic creation of parameterized topologies such as the design of electric circuits and system controllers (Koza, 2002). Genetic programming can produce a computer program or a mathematical expression that has not been pre-specified by the human observer. The genetic algorithm, on the other hand, can only produce numerical coefficients for pre-existing equations. To represent computer programs, genetic programming employs tree structures as individuals instead of the fixed-length string used by genetic algorithm. To search an equation topology including variables and numerical parameters, mathematical operations are applied for nodes, and variables or numerical constants are used for leaves.

Many biological applications using genetic programming have been proposed. These include classification of lymphomas (Hong and Cho, 2004), medical data such as chest pain, Ljubljana breast cancer, dermatology, Wisconsin breast cancer, and pediatric adrenocortical tumors (Bojarczuk et al., 2004), nuclear magnetic resonance spectra (Gray et al., 1998), data mining of DNA chips (Langdon and Buxton, 2004), RNA motif detection (Hu, 2003), mineral identification (Ross et al., 2001), quantitative analysis of pyrolysismass spectral data (Gilbert et al., 1997), expression-profiling of plants (Kell et al., 2001), extraction of a rule for diagnosing pulmonary embolism (Biesheuvel et al., 2004), searching for discrimination rules in protease proteolytic cleavage activity (Yang et al., 2003), and optimization for the batch bioprocessing of fermentation (Cheema et al., 2002). Koza et al., 2000, Koza et al., 2001, Koza et al., 2003 devised a method to predict metabolic networks involved in the phospholipid cycle and the synthesis and degradation of ketone bodies. The metabolic networks were coded to genetic programming and discovered simultaneously the topology of networks and kinetic parameters. Differential equations for small gene regulatory networks represented by weighted networks were inferred using genetic programming (Sakamoto and Iba, 2001, Ando et al., 2002). However, no differential equations of real metabolic reactions have been reported to date.

We now present a method that uses genetic programming to predict biochemical equations from only time-course information. To improve prediction accuracy, we used the numeric mutation procedure of Evett and Fernandez (1998). As case studies, we predicted two equations of real enzyme-catalyzed reactions, adenylate kinase (Rizzi et al., 1997) and phosphofructokinase (Wright and Albe, 1994). In both cases, the equations were predicted with high accuracy: the mean error between given and simulation-predicted time-courses was 1.6 × 10−5% and 2.0 × 10−3%, respectively. Our experimental results indicate that our method can accurately predict biochemical equations.

Section snippets

Fundamental algorithm of genetic programming

Genetic programming is one of the learning algorithms informed by the evolutionary process. An outline of each genetic programming strategy is presented below; for details, see Koza, 1992, Koza, 1994, Banzhaf et al. (1998), Koza et al., 1999, Koza et al., 2000, Koza et al., 2003.

Experiment 1—reaction of adenylate kinase

As the first example, we predicted an equation for the metabolic adenylate kinase (EC 2.7.4.3) reaction, which allows the production of adenosine 5′-diphosphate (ADP) from adenosine 5′-monophosphate (AMP) and adenosine triphosphate (ATP). It has been modeled with a nearequilibrium reaction in the Saccharomyces cerevisiae mathematical model (Rizzi et al., 1997):rADK=rADKmax[ADP]21[ADP][AMP]Keq[ADP]2,where [ATP], [ADP], and [AMP] indicate concentrations, and rADK represents the flux to produce

Discussion

The curves of fitness decreased faster in the genetic programming with numeric mutation than the plain genetic programming (Fig. 3, Fig. 5). Searches performed with the plain genetic programming were stuck to local minima. Table 2 summarizes the convergence rate, prediction error, and the number of nodes in the best-evaluated individual. We conducted 50 search trials in each experiment. Using numeric mutations, an equation with the correct topology was obtained in 16% (Experiment 1) and 2%

Conclusions

We illustrated a method that uses genetic programming to predict equations of biochemical reactions based on time-course information. Our numerical experimental results show that our method simultaneously and accurately predicted the equation topologies and numerical parameters. The errors were 1.6 × 10−5% in the prediction of enzyme reactions with adenylate kinase in the S. cerevisiae model, and 2.0 × 10−3% with phosphofructokinase in the D. discoideum model. In both sets of experiments, inclusion

Acknowledgements

We thank Bin Hu of the Institute for Advanced Biosciences, Keio University, for editing this manuscript. This work was supported by a grant from the Ministry of Education, Culture, Sports, Science and Technology, a grant-in-aid from the 21st Century Center of Excellence (COE) program of Keio University, Understanding and control of life's function via systems biology, a grant from the New Energy and Industrial Technology Development and Organization (NEDO) of the Ministry of Economy, Trade and

References (52)

  • P.J. Angeline

    Advances in Genetic Programming 2

    (1996)
  • W. Banzhaf et al.

    Genetic Programming—An Introduction

    (1998)
  • P. Baumann et al.

    The phosphofructokinase of Dictyostelium discoideum

    Biochemistry

    (1968)
  • J.J. Cheema et al.

    Genetic programming assisted stochastic optimization strategies for optimization of glucose to gluconic acid fermentation

    Biotechnol. Prog.

    (2002)
  • I. Christian et al.

    Investigating the influence of depth and degree of genotypic change on fitness in genetic programming

  • A. Cornish-Bowden

    Fundamentals of Enzyme Kinetics

    (2004)
  • J. Eggermont et al.

    Stepwise adaptation of weights for symbolic regression with genetic programming

  • M. Evett et al.

    Numeric mutation improves the discovery of numeric constants in genetic programming

  • D.B. Fogel et al.

    Comparing genetic operators with gaussian mutations in simulated evolutionary processes using linear systems

    Bio. Cybern.

    (1990)
  • R.J. Gilbert et al.

    Genetic programming: a novel method for the quantitative analysis of pyrolysis mass spectral data

    Anal. Chem.

    (1997)
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization and Machine Learning

    (1989)
  • H.F. Gray et al.

    Genetic programming for classification and feature selection: analysis of 1H nuclear magnetic resonance spectra from human brain tumour biopsies

    NMR Biomed.

    (1998)
  • M. Hatakeyama et al.

    A computational model on the modulation of mitogen-activated protein kinase (MAPK) and Akt pathways in heregulin-induced ErbB signalling

    Biochem. J.

    (2003)
  • J.H. Holland

    Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence

    (1975)
  • J.H. Hong et al.

    Lymphoma cancer classification using genetic programming with SNR features

  • Y.J. Hu

    GPRM: a genetic programming approach to finding common RNA secondary structure elements

    Nucleic Acids Res.

    (2003)
  • Cited by (45)

    • Reverse engineering and its applications

      2018, Omics Technologies and Bio-engineering: Towards Improving Quality of Life
    • Using evolutionary computations to understand the design and evolution of gene and cell regulatory networks

      2013, Methods
      Citation Excerpt :

      Sakamoto and Iba [104] introduced a least-mean-square (LMS) approach to improve this. Sugimoto and co-workers [105] developed a GP which predicted two equations of a metabolic reaction scheme for adenylate kinase and phosphofructokinase in a Michaelis–Menten format, a challenging task if the underlying mechanism is not known. Numerical integration can be very time consuming in GP; [106] introduced a symbolic pre-processing regression step to avoid this.

    • The Monte Carlo em method for the parameter estimation of biological models

      2011, Electronic Notes in Theoretical Computer Science
    • Integration of reaction kinetics theory and gene expression programming to infer reaction mechanism

      2017, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • A double swarm methodology for parameter estimation in oscillating Gene Regulatory Networks

      2015, 2015 IEEE Congress on Evolutionary Computation, CEC 2015 - Proceedings
    View all citing articles on Scopus
    View full text