Reverse engineering of biochemical equations from time-course data by means of genetic programming
Introduction
Computational experiments are useful for gaining an understanding of many properties of basic biological systems. In cases where no parameters are available for in silico modeling, techniques to predict parameters are useful. Earlier studies (Gilman and Ross, 1995, Park et al., 1997, Mendes and Kell, 1998, Rodriguez-Acosta et al., 1999, Pinchuk et al., 2000, Hatakeyama et al., 2003, Moles et al., 2003) focused on parameter estimations based on pre-existing human-supplied equations using optimization methods such as genetic algorithms (Holland, 1975, Goldberg, 1989).
To construct a biochemical equation such as the Michaelis–Menten equation, the reaction mechanism must be known and in vitro experimental measurements are required to determine parameters. However, computational models from kinetic equations and parameters measured in vitro may not always represent the in vivo phenomena. Therefore, it is necessary to develop a method to dynamically predict equations that represent the biochemical processes given their time-course measured in vivo.
We have proposed a technique for the dynamic modeling of complicated network structures by combining a genetic algorithm and the S-system (Kikuchi et al., 2003). However, this method also requires the pre-definition of mathematical expressions such as differential equations for a target phenomenon.
A method to automatically create computer programs using genetic programming was proposed by Koza (1992); it is an evolutionary algorithm that applies Darwinian concepts such as selection, mutation, and crossover. The method succeeded in the automatic creation of parameterized topologies such as the design of electric circuits and system controllers (Koza, 2002). Genetic programming can produce a computer program or a mathematical expression that has not been pre-specified by the human observer. The genetic algorithm, on the other hand, can only produce numerical coefficients for pre-existing equations. To represent computer programs, genetic programming employs tree structures as individuals instead of the fixed-length string used by genetic algorithm. To search an equation topology including variables and numerical parameters, mathematical operations are applied for nodes, and variables or numerical constants are used for leaves.
Many biological applications using genetic programming have been proposed. These include classification of lymphomas (Hong and Cho, 2004), medical data such as chest pain, Ljubljana breast cancer, dermatology, Wisconsin breast cancer, and pediatric adrenocortical tumors (Bojarczuk et al., 2004), nuclear magnetic resonance spectra (Gray et al., 1998), data mining of DNA chips (Langdon and Buxton, 2004), RNA motif detection (Hu, 2003), mineral identification (Ross et al., 2001), quantitative analysis of pyrolysismass spectral data (Gilbert et al., 1997), expression-profiling of plants (Kell et al., 2001), extraction of a rule for diagnosing pulmonary embolism (Biesheuvel et al., 2004), searching for discrimination rules in protease proteolytic cleavage activity (Yang et al., 2003), and optimization for the batch bioprocessing of fermentation (Cheema et al., 2002). Koza et al., 2000, Koza et al., 2001, Koza et al., 2003 devised a method to predict metabolic networks involved in the phospholipid cycle and the synthesis and degradation of ketone bodies. The metabolic networks were coded to genetic programming and discovered simultaneously the topology of networks and kinetic parameters. Differential equations for small gene regulatory networks represented by weighted networks were inferred using genetic programming (Sakamoto and Iba, 2001, Ando et al., 2002). However, no differential equations of real metabolic reactions have been reported to date.
We now present a method that uses genetic programming to predict biochemical equations from only time-course information. To improve prediction accuracy, we used the numeric mutation procedure of Evett and Fernandez (1998). As case studies, we predicted two equations of real enzyme-catalyzed reactions, adenylate kinase (Rizzi et al., 1997) and phosphofructokinase (Wright and Albe, 1994). In both cases, the equations were predicted with high accuracy: the mean error between given and simulation-predicted time-courses was 1.6 × 10−5% and 2.0 × 10−3%, respectively. Our experimental results indicate that our method can accurately predict biochemical equations.
Section snippets
Fundamental algorithm of genetic programming
Genetic programming is one of the learning algorithms informed by the evolutionary process. An outline of each genetic programming strategy is presented below; for details, see Koza, 1992, Koza, 1994, Banzhaf et al. (1998), Koza et al., 1999, Koza et al., 2000, Koza et al., 2003.
Experiment 1—reaction of adenylate kinase
As the first example, we predicted an equation for the metabolic adenylate kinase (EC 2.7.4.3) reaction, which allows the production of adenosine 5′-diphosphate (ADP) from adenosine 5′-monophosphate (AMP) and adenosine triphosphate (ATP). It has been modeled with a nearequilibrium reaction in the Saccharomyces cerevisiae mathematical model (Rizzi et al., 1997):where [ATP], [ADP], and [AMP] indicate concentrations, and rADK represents the flux to produce
Discussion
The curves of fitness decreased faster in the genetic programming with numeric mutation than the plain genetic programming (Fig. 3, Fig. 5). Searches performed with the plain genetic programming were stuck to local minima. Table 2 summarizes the convergence rate, prediction error, and the number of nodes in the best-evaluated individual. We conducted 50 search trials in each experiment. Using numeric mutations, an equation with the correct topology was obtained in 16% (Experiment 1) and 2%
Conclusions
We illustrated a method that uses genetic programming to predict equations of biochemical reactions based on time-course information. Our numerical experimental results show that our method simultaneously and accurately predicted the equation topologies and numerical parameters. The errors were 1.6 × 10−5% in the prediction of enzyme reactions with adenylate kinase in the S. cerevisiae model, and 2.0 × 10−3% with phosphofructokinase in the D. discoideum model. In both sets of experiments, inclusion
Acknowledgements
We thank Bin Hu of the Institute for Advanced Biosciences, Keio University, for editing this manuscript. This work was supported by a grant from the Ministry of Education, Culture, Sports, Science and Technology, a grant-in-aid from the 21st Century Center of Excellence (COE) program of Keio University, Understanding and control of life's function via systems biology, a grant from the New Energy and Industrial Technology Development and Organization (NEDO) of the Ministry of Economy, Trade and
References (52)
- et al.
Evolutionary modeling and inference of gene network
Inform. Sci.
(2002) - et al.
Genetic programming outperformed multivariable logistic regression in diagnosing pulmonary embolism
J. Clin. Epidemiol.
(2004) - et al.
A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets
Artif. Intell. Med.
(2004) - et al.
Genetic-algorithm selection of a regulatory structure that directs flux in a simple metabolic model
Biophys. J.
(1995) - et al.
Optimal sampling time selection for parameter estimation in dynamic pathway modeling
BioSystems
(2004) - et al.
It's a noisy business! Genetic regulation at the nanomolar scale
Trends Genet.
(1999) - et al.
Model assessment and refinement using strategies from biochemical systems theory: application to metabolism in human red blood cells
J. Theor. Biol.
(1996) - et al.
Nonlinear optimization of biotechnological processes by stochastic algorithms: application to the maximization of the production rate of ethanol, glycerol and carbohydrates by Saccharomyces cerevisiae
J. Biotechnol.
(1999) - et al.
Carbohydrate metabolism in Dictyostelium discoideumI. Model construction
J. Theor. Biol.
(1994) - et al.
Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min–max scoring function
Biosystems
(2003)
Advances in Genetic Programming 2
Genetic Programming—An Introduction
The phosphofructokinase of Dictyostelium discoideum
Biochemistry
Genetic programming assisted stochastic optimization strategies for optimization of glucose to gluconic acid fermentation
Biotechnol. Prog.
Investigating the influence of depth and degree of genotypic change on fitness in genetic programming
Fundamentals of Enzyme Kinetics
Stepwise adaptation of weights for symbolic regression with genetic programming
Numeric mutation improves the discovery of numeric constants in genetic programming
Comparing genetic operators with gaussian mutations in simulated evolutionary processes using linear systems
Bio. Cybern.
Genetic programming: a novel method for the quantitative analysis of pyrolysis mass spectral data
Anal. Chem.
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic programming for classification and feature selection: analysis of 1H nuclear magnetic resonance spectra from human brain tumour biopsies
NMR Biomed.
A computational model on the modulation of mitogen-activated protein kinase (MAPK) and Akt pathways in heregulin-induced ErbB signalling
Biochem. J.
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence
Lymphoma cancer classification using genetic programming with SNR features
GPRM: a genetic programming approach to finding common RNA secondary structure elements
Nucleic Acids Res.
Cited by (45)
Reverse engineering and its applications
2018, Omics Technologies and Bio-engineering: Towards Improving Quality of LifeUsing evolutionary computations to understand the design and evolution of gene and cell regulatory networks
2013, MethodsCitation Excerpt :Sakamoto and Iba [104] introduced a least-mean-square (LMS) approach to improve this. Sugimoto and co-workers [105] developed a GP which predicted two equations of a metabolic reaction scheme for adenylate kinase and phosphofructokinase in a Michaelis–Menten format, a challenging task if the underlying mechanism is not known. Numerical integration can be very time consuming in GP; [106] introduced a symbolic pre-processing regression step to avoid this.
The Monte Carlo em method for the parameter estimation of biological models
2011, Electronic Notes in Theoretical Computer ScienceIntegrating Computational Methods to Investigate the Macroecology of Microbiomes
2020, Frontiers in GeneticsIntegration of reaction kinetics theory and gene expression programming to infer reaction mechanism
2017, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)A double swarm methodology for parameter estimation in oscillating Gene Regulatory Networks
2015, 2015 IEEE Congress on Evolutionary Computation, CEC 2015 - Proceedings