Abstract
Feature construction is an effective way to eliminate the limitation of poor data representation in many tasks such as high-dimensional symbolic regression. Genetic Programming (GP) is a good choice for feature construction for its natural ability to explore the feature space to detect and combine important features. However, there is very little contribution devoted to enhance the generalisation performance of GP for high-dimensional symbolic regression by feature construction. This work aims to develop a new feature construction method namely genetic programming with embedded feature construction (GPEFC) for high-dimensional symbolic regression. GPEFC keeps track of new small informative building blocks on best fitness gain individuals and constructs new features using these building blocks. The new constructed features augment the Terminal Set of GP dynamically. A series of experiments were conducted to investigate the learning ability and generalisation performance of GPEFC. The results show that GPEFC can evolve more compact models in an efficient way, has better learning ability and better generalisation performance than standard GP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmed, S., Zhang, M., Peng, L., Xue, B.: Multiple feature construction for effective biomarker identification and classification using genetic programming. In: Proceedings of the 2014 conference on Genetic and evolutionary computation. pp. 249–256. ACM (2014)
Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. Evolutionary Computation, IEEE Transactions on 16(5), 645–661 (2012)
Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection, vol. 1. MIT press (1992)
Krawiec, K.: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3(4), 329–343 (2002)
Neshatian, K., Zhang, M., Johnston, M.: Feature construction and dimension reduction using genetic programming. In: AI 2007: Advances in Artificial Intelligence, pp. 160–170. Springer (2007)
Amari, S.i., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783–789 (1999)
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Statistics and computing 14(3), 199–222 (2004)
Kushchu, I.: Genetic programming and evolutionary generalization. Evolutionary Computation, IEEE Transactions on 6(5), 431–442 (2002)
Castelli, M., Manzoni, L., Silva, S., Vanneschi, L.: A quantitative study of learning and generalization in genetic programming. In: Genetic Programming, pp. 25–36. Springer (2011)
Chen, Q., Xue, B., Shang, L., Zhang, M.: Improving generalisation of genetic programming for symbolic regression with structural risk minimisation. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference. pp. 709–716. ACM (2016)
Gonçalves, I., Silva, S., Fonseca, C.M.: On the generalization ability of geometric semantic genetic programming. In: Genetic Programming, pp. 41–52. Springer (2015)
Uy, N.Q., Hien, N.T., Hoai, N.X., ONeill, M.: Improving the generalisation ability of genetic programming with semantic similarity based crossover. In: Genetic Programming, pp. 184–195. Springer (2010)
Glavan, M., Gradišar, D., Atanasijević-Kunc, M., Strmčnik, S., Mušič, G.: Input variable selection for model-based production control and optimisation. The international journal of advanced manufacturing technology 68(9-12), 2743–2759 (2013)
Smits, G., Kordon, A., Vladislavleva, K., Jordaan, E., Kotanchek, M.: Variable selection in industrial datasets using pareto genetic programming. GENETIC PROGRAMMING SERIES 9, 79 (2006)
Arnaldo, I., Krawiec, K., O’Reilly, U.M.: Multiple regression genetic programming. In: Proceedings of the 2014 conference on Genetic and evolutionary computation. pp. 879–886. ACM (2014)
Azad, R.M.A., Ryan, C.: A simple approach to lifetime learning in genetic programming-based symbolic regression. Evolutionary computation 22(2), 287–317 (2014)
Kommenda, M., Affenzeller, M., Kronberger, G., Burlacu, B., Winkler, S.: Multi-population genetic programming with data migration for symbolic regression. In: Computational Intelligence and Efficiency in Engineering Systems, pp. 75–87. Springer (2015)
Mousavi Astarabadi, S.S., Ebadzadeh, M.M.: Avoiding overfitting in symbolic regression using the first order derivative of gp trees. In: Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference. pp. 1441–1442. ACM (2015)
Vafaie, H., De Jong, K.: Genetic algorithms as a tool for restructuring feature space representations. In: Tools with Artificial Intelligence, 1995. Proceedings., Seventh International Conference on. pp. 8–11. IEEE (1995)
Otero, F.E., Silva, M.M., Freitas, A.A., Nievola, J.C.: Genetic programming for attribute construction in data mining. In: EuroGP. vol. 3, pp. 384–393. Springer (2003)
Li, D.C., Liu, C.W.: Extending attribute information for small data set classification. Knowledge and Data Engineering, IEEE Transactions on 24(3), 452–464 (2012)
Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing 8(1), 3–15 (2016)
Koza, J.R.: Genetic programming II: automatic discovery of reusable programs. MIT press (1994)
Ballard, D., Rosca, J.: Genetic programming with adaptive representations (1994)
Oppacher, U.M.O.F.: The troubling aspects of a building block hypothesis for genetic programming. Foundations of Genetic Algorithms 1995 (FOGA 3) 3, 73 (2014)
Kinzett, D., Johnston, M., Zhang, M.: Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evolutionary Intelligence 2(4), 151–168 (2009)
Kinzett, D., Zhang, M., Johnston, M.: Analysis of building blocks with numerical simplification in genetic programming. In: Genetic Programming, pp. 289–300. Springer (2010)
Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml
Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8(4), 413–432 (2007)
Rosenwald, A., Wright, G., Chan, W.C., Connors, J.M., Campo, E., Fisher, R.I., Gascoyne, R.D., Muller-Hermelink, H.K., Smeland, E.B., Giltnane, J.M., et al.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. New England Journal of Medicine 346(25), 1937–1947 (2002)
Vanneschi, L., Silva, S., Castelli, M., Manzoni, L.: Geometric semantic genetic programming for real life applications. In: Genetic Programming Theory and Practice XI, pp. 191–209. Springer (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chen, Q., Zhang, M., Xue, B. (2017). Genetic Programming with Embedded Feature Construction for High-Dimensional Symbolic Regression. In: Leu, G., Singh, H., Elsayed, S. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-49049-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-49049-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49048-9
Online ISBN: 978-3-319-49049-6
eBook Packages: EngineeringEngineering (R0)