Elsevier

Neurocomputing

Volume 148, 19 January 2015, Pages 143-149
Neurocomputing

Discovery scientific laws by hybrid evolutionary model

https://doi.org/10.1016/j.neucom.2012.07.058Get rights and content

Abstract

Constructing a mathematical model is an important issue in engineering application and scientific research. Discovery high-level knowledge such as laws of natural science in the observed data automatically is a very important and difficult task in systematic research. The authors have got some significant results with respect to this problem. In this paper, high-level knowledge modelled by systems of ordinary differential equations (ODEs) is discovered in the observed data routinely by a hybrid evolutionary algorithm called HEA-GP. The application is used to demonstrate the potential of HEA-GP. The results show that the dynamic models discovered automatically in observed data by computer sometimes can compare with the models discovered by humanity. In addition, a prototype of KDD Automatic System has been developed which can be used to discover models in observed data automatically.

Introduction

For centuries, scientists have attempted to identify and document analytical laws that underlie physical phenomena in nature [1]. The major impediment to scientific progress in many fields is the inability to make sense of the huge amounts of data that have been collected from a variety of sources. In the field of knowledge discovery in databases (KDD), there have been major efforts in developing automatic methods to find significant and interesting models (or patterns) in complex data and forecast the future based on those data. In general, however, the success of such efforts has been limited in the degree of automation during the process of KDD and in the level of the models discovered by data mining methods. Usually the goals of description and prediction are achieved by performing the following primary data mining tasks: summarization, classification, regression, clustering, dependency modelling, and change and deviation detection [2].

Recently, Schmidt and Lipson [1] have demonstrated the discovery of physical laws, from scratch, directly from experimentally captured data with the use of a computational search. They used the presented approach to detect nonlinear energy conservation laws, Newtonian force laws, geometric invariants, and system manifolds in various synthetic and physically implemented systems without prior knowledge about physics, kinematics, or geometry. Ngan et al. [3] used grammar based genetic programming for data mining of medical knowledge. Despite all those methods and models mentioned above, our research focuses on discovering high-level knowledge in complex data modelled by complicated functions and systems of ordinary differential equations (ODEs). Cao and Kang [4] have proposed a two-level evolutionary modelling algorithm to approach this task. Some numerical experiments were done to test their algorithms׳ effectiveness. In this paper, we run a Hybrid Evolutionary Algorithm (HEA) with GP on the applications of time series to demonstrate its potential in discovering the dynamic models in observed data automatically.

The rest of the paper is organized as follows. Section 2 is the Related Works. Section 3 is the description of HEA with GP. Section 4 gives two examples of the application of HEA-GP. Section 5 is the discussion and Section 6 gives some conclusions.

Section snippets

Problem statement

Suppose a dynamic system can be described by n interrelated functions x1(t),x2(t),,xn(t) and a series of observed data collected at the time ti=t0+iΔt(i=0,1,2,,m1) can be written as the following form:X=[x1(0)x2(0)xn(0)x1(t1)x2(t1)xn(t1)x1(tn)x2(tn)xn(tn)]where t0 denotes the starting time (here t0=0), Δt denotes the interval between two observations, and xj(ti)(j=0,1,2,,n) denotes the observed value of variable xj at the time ti.

If we denote x(t)=[x1(t),x2(t),,xn(t)], f(t,x)=[f1(t,x

The general non-linear programming problem

The general non-linear programming (NLP) problem can be expressed in the following form:minf(X,Y)s.t.hi(X,Y)=0,i=1,2,,k1,gj(X,Y)0,j=k1+1,k2+1,,kXlowerXXupper,YlowerYYupperwhere XRp, YNq and the objective function f(X,Y), the equality constraints hi(X,Y) and the inequality constraints gj(X,Y) are usually nonlinear functions which include both real variable vector X and integer variable vector Y.

Denoting the domain: D={(X,Y)|XlowerXXupper,YlowerYYupper}

We introduce the concept of a

The application of HEA-GP

In this section, we provide two examples to illustrate the process of identifying models by genetic programming.

Discussion

In view of the drawbacks in modelling the complex systems by the use of most available methods, we consider using HEA with GP to approach the modelling problem of complex systems. That is, in the case that limited information is known to a system, to partly replace human intelligence with computational intelligence in some steps of traditional modelling, including the development of assumptions, the construction and the calculation of a model, to complete the whole modelling task.

The results

Conclusions

We have discussed how genetic programming can be used for generating models of economic processes in a data driven manner. A framework has been proposed within which the development of an economic model can be formulated as a HEA-GP search. The advantage of the method is that various assumptions regarding model structure can be relaxed, letting the data speak for themselves. The proposed framework follows the modelling approach familiar to economists, where they can specify definitions,

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 71101096 and 61172165), the Natural Science Foundation of Guangdong Province (Nos. S201101000849, S2011010003890 and S2012010008540), Shenzhen Basic Research Project for Development of Science and Technology (Nos. JC201006020807A and JC201105190819A), Research Project of SZIIT (Nos. CXTD2-005 and BC2009014).

Fei Tang received his M.S. degree from Shenzhen University, China, in 2007. Currently, he is an assistant professor at Shenzhen Institute of Information Technology. His research interests include artificial intelligence and image processing.

References (15)

  • D. Agnelli et al.

    Image classificationan evolutionary approach

    Pattern Recognit. Lett.

    (2002)
  • M. Schmidt et al.

    Distilling freeform natural laws from experimental data

    Science

    (2009)
  • U.M. Fayyad et al.

    Advances in Knowledge Discovery and Data Mining

    (1966)
  • P.S. Ngan, M.L. Wong, K.S. Leung, J.C.Y. Cheng, Using grammar based genetic programming for data mining of medical...
  • H.Q. Cao, L.S. Kang, Z. Michalewicz, Y.P. Chen, A two-level evolutionary algorithm for modeling system of ordinary...
  • M. Mitchell

    An Introduction to Genetic Algorithms

    (1996)
  • J.R. Koza

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
There are more references available in the full text version of this article.

Cited by (3)

  • Improving the prediction of material properties of concrete using Kaizen Programming with Simulated Annealing

    2017, Neurocomputing
    Citation Excerpt :

    Thus, one might say that the models are gray-boxes. This is one of the reasons for using SR methods [15,19]. We are not aware of an algorithm that can assure full interpretability of models, and we do not guarantee it either.

  • GP-RVM: Genetic Programing-Based Symbolic Regression Using Relevance Vector Machine

    2018, Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018

Fei Tang received his M.S. degree from Shenzhen University, China, in 2007. Currently, he is an assistant professor at Shenzhen Institute of Information Technology. His research interests include artificial intelligence and image processing.

Sanfeng Chen was born in 1979 and received the Ph.D. degree from the University of Science and Technology of China, in 2008. She is an assistant professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen, China and Shenzhen Institute of Information Technology, Shenzhen, China. Her research interests include artificial intelligence, signal processing and pattern recognition.

Xu Tan was born in 1981 and received his Ph.D. degree in Management Science and Engineering from National University of Defense Technology, China, in 2009. Currently, he is an associate professor at Shenzhen Institute of Information Technology. His research interests include granular computing, intelligent decision and knowledge discovery.

Tao Hu received the Ph.D. degree from Huazhong University of Science and Technology, in 2009. He is an assistant professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen Institute of Information Technology, Shenzhen, China. His research interests include machine vision and image processing.

Guangming Lin was born in 1963 and received the Ph.D. degree from the University of New South Wales, in 2003. He is a professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen, China and Shenzhen Institute of Information Technology, Shenzhen, China. His research interests include evolutionary algorithms, parallel computing and optimization.

Zuo Kang received the Ph.D. degree from Wu Han University, China, in 2006. He is a professor at Wu Han University. His research interests include evolutionary computation and parallel computing.

View full text