Discovery scientific laws by hybrid evolutionary model
Introduction
For centuries, scientists have attempted to identify and document analytical laws that underlie physical phenomena in nature [1]. The major impediment to scientific progress in many fields is the inability to make sense of the huge amounts of data that have been collected from a variety of sources. In the field of knowledge discovery in databases (KDD), there have been major efforts in developing automatic methods to find significant and interesting models (or patterns) in complex data and forecast the future based on those data. In general, however, the success of such efforts has been limited in the degree of automation during the process of KDD and in the level of the models discovered by data mining methods. Usually the goals of description and prediction are achieved by performing the following primary data mining tasks: summarization, classification, regression, clustering, dependency modelling, and change and deviation detection [2].
Recently, Schmidt and Lipson [1] have demonstrated the discovery of physical laws, from scratch, directly from experimentally captured data with the use of a computational search. They used the presented approach to detect nonlinear energy conservation laws, Newtonian force laws, geometric invariants, and system manifolds in various synthetic and physically implemented systems without prior knowledge about physics, kinematics, or geometry. Ngan et al. [3] used grammar based genetic programming for data mining of medical knowledge. Despite all those methods and models mentioned above, our research focuses on discovering high-level knowledge in complex data modelled by complicated functions and systems of ordinary differential equations (ODEs). Cao and Kang [4] have proposed a two-level evolutionary modelling algorithm to approach this task. Some numerical experiments were done to test their algorithms׳ effectiveness. In this paper, we run a Hybrid Evolutionary Algorithm (HEA) with GP on the applications of time series to demonstrate its potential in discovering the dynamic models in observed data automatically.
The rest of the paper is organized as follows. Section 2 is the Related Works. Section 3 is the description of HEA with GP. Section 4 gives two examples of the application of HEA-GP. Section 5 is the discussion and Section 6 gives some conclusions.
Section snippets
Problem statement
Suppose a dynamic system can be described by n interrelated functions and a series of observed data collected at the time can be written as the following form:where t0 denotes the starting time (here ), denotes the interval between two observations, and denotes the observed value of variable xj at the time ti.
If we denote ,
The general non-linear programming problem
The general non-linear programming (NLP) problem can be expressed in the following form:where , and the objective function , the equality constraints and the inequality constraints are usually nonlinear functions which include both real variable vector X and integer variable vector Y.
Denoting the domain:
We introduce the concept of a
The application of HEA-GP
In this section, we provide two examples to illustrate the process of identifying models by genetic programming.
Discussion
In view of the drawbacks in modelling the complex systems by the use of most available methods, we consider using HEA with GP to approach the modelling problem of complex systems. That is, in the case that limited information is known to a system, to partly replace human intelligence with computational intelligence in some steps of traditional modelling, including the development of assumptions, the construction and the calculation of a model, to complete the whole modelling task.
The results
Conclusions
We have discussed how genetic programming can be used for generating models of economic processes in a data driven manner. A framework has been proposed within which the development of an economic model can be formulated as a HEA-GP search. The advantage of the method is that various assumptions regarding model structure can be relaxed, letting the data speak for themselves. The proposed framework follows the modelling approach familiar to economists, where they can specify definitions,
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. 71101096 and 61172165), the Natural Science Foundation of Guangdong Province (Nos. S201101000849, S2011010003890 and S2012010008540), Shenzhen Basic Research Project for Development of Science and Technology (Nos. JC201006020807A and JC201105190819A), Research Project of SZIIT (Nos. CXTD2-005 and BC2009014).
Fei Tang received his M.S. degree from Shenzhen University, China, in 2007. Currently, he is an assistant professor at Shenzhen Institute of Information Technology. His research interests include artificial intelligence and image processing.
References (15)
- et al.
Image classificationan evolutionary approach
Pattern Recognit. Lett.
(2002) - et al.
Distilling freeform natural laws from experimental data
Science
(2009) - et al.
Advances in Knowledge Discovery and Data Mining
(1966) - P.S. Ngan, M.L. Wong, K.S. Leung, J.C.Y. Cheng, Using grammar based genetic programming for data mining of medical...
- H.Q. Cao, L.S. Kang, Z. Michalewicz, Y.P. Chen, A two-level evolutionary algorithm for modeling system of ordinary...
An Introduction to Genetic Algorithms
(1996)Genetic Programming: On the Programming of Computers by Means of Natural Selection
(1992)
Cited by (3)
Improving the prediction of material properties of concrete using Kaizen Programming with Simulated Annealing
2017, NeurocomputingCitation Excerpt :Thus, one might say that the models are gray-boxes. This is one of the reasons for using SR methods [15,19]. We are not aware of an algorithm that can assure full interpretability of models, and we do not guarantee it either.
GP-RVM: Genetic Programing-Based Symbolic Regression Using Relevance Vector Machine
2018, Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018
Fei Tang received his M.S. degree from Shenzhen University, China, in 2007. Currently, he is an assistant professor at Shenzhen Institute of Information Technology. His research interests include artificial intelligence and image processing.
Sanfeng Chen was born in 1979 and received the Ph.D. degree from the University of Science and Technology of China, in 2008. She is an assistant professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen, China and Shenzhen Institute of Information Technology, Shenzhen, China. Her research interests include artificial intelligence, signal processing and pattern recognition.
Xu Tan was born in 1981 and received his Ph.D. degree in Management Science and Engineering from National University of Defense Technology, China, in 2009. Currently, he is an associate professor at Shenzhen Institute of Information Technology. His research interests include granular computing, intelligent decision and knowledge discovery.
Tao Hu received the Ph.D. degree from Huazhong University of Science and Technology, in 2009. He is an assistant professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen Institute of Information Technology, Shenzhen, China. His research interests include machine vision and image processing.
Guangming Lin was born in 1963 and received the Ph.D. degree from the University of New South Wales, in 2003. He is a professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen, China and Shenzhen Institute of Information Technology, Shenzhen, China. His research interests include evolutionary algorithms, parallel computing and optimization.
Zuo Kang received the Ph.D. degree from Wu Han University, China, in 2006. He is a professor at Wu Han University. His research interests include evolutionary computation and parallel computing.