Skip to main content

Generalisation in Genetic Programming for Symbolic Regression: Challenges and Future Directions

  • Chapter
  • First Online:

Part of the book series: Women in Engineering and Science ((WES))

Abstract

Symbolic regression, as a regression analysis technique, can find the structure and coefficients of a regression model simultaneously. Genetic programming is an attractive and leading technique for symbolic regression, since it does not require any predefined model structure and has a flexible representation. However, genetic-programming-based symbolic regression (GPSR) often has a poor generalisation ability that hampers its applications to science or industry modelling. In recent years, many researchers have realised the issue and devoted much effort to enhance the generalisation ability of GPSR. This chapter first introduces the generalisation in GPSR and then reviews the state-of-the-art contributions. This chapter also analyses challenges in the area and highlights a number of future directions for interested researchers.

The authors are with the Evolutionary Computation Research Group at the School of Engineering and Computer Science, Victoria University of Wellington.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. A. Agapitos, A. Brabazon, M. O’Neill, Controlling overfitting in symbolic regression based on a bias/variance error decomposition, in Parallel Problem Solving from Nature-PPSN XII (Springer, Berlin, 2012), pp. 438–447

    Book  Google Scholar 

  2. S.-i. Amari, S. Wu, Improving support vector machine classifiers by modifying kernel functions. Neural Netw. 12(6), 783–789 (1999)

    Article  Google Scholar 

  3. D.A. Augusto, H.J. Barbosa, Symbolic regression via genetic programming, in Proceedings. Vol. 1. Sixth Brazilian Symposium on Neural Networks (IEEE, Piscataway, 2000), pp. 173–178

    Google Scholar 

  4. R.M.A. Azad, C. Ryan, Variance based selection to improve test set performance in genetic programming, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2011), pp. 1315–1322

    Google Scholar 

  5. W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming—An Introduction: On the Automatic Evolution of Computer Programs and Its Applications (dpunkt-Verlag and Morgan Kaufmann, San Francisco, 1998)

    Book  MATH  Google Scholar 

  6. C.M. Bishop et al., Pattern Recognition and Machine Learning, vol. 4 (Springer, New York, 2006)

    MATH  Google Scholar 

  7. A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, Occam’s razor. Inf. Process. Lett. 24(6), 377–380 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  8. M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)

    Article  MATH  Google Scholar 

  9. M. Castelli, I. Gonçalves, L. Manzoni, L. Vanneschi, Pruning techniques for mixed ensembles of genetic programming models, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2018), pp. 52–67

    Google Scholar 

  10. Q. Chen, B. Xue, M. Zhang, Generalisation and domain adaptation in GP with gradient descent for symbolic regression, in 2015 IEEE Congress on Evolutionary Computation (CEC), May 2015, pp. 1137–1144

    Google Scholar 

  11. Q. Chen, B. Xue, L. Shang, M. Zhang, Improving generalisation of genetic programming for symbolic regression with structural risk minimisation, in Proceedings of the 18th Annual Conference on Genetic and Evolutionary Computation (GECCO) (ACM, New York, 2016), pp. 709–716

    Google Scholar 

  12. Q. Chen, B. Xue, Y. Mei, M. Zhang, Geometric semantic crossover with an angle-aware mating scheme in genetic programming for symbolic regression, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2017), pp. 229–245

    Google Scholar 

  13. Q. Chen, M. Zhang, B. Xue, Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans. Evol. Comput. 21(5), 792–806 (2017)

    Article  Google Scholar 

  14. Q. Chen, M. Zhang, B. Xue, New geometric semantic operators in genetic programming: perpendicular crossover and random segment mutation, in Proceedings of the 19th Annual Conference on Genetic and Evolutionary Computation Conference Companion (2017), pp. 223–224

    Google Scholar 

  15. Q. Chen, B. Xue, M. Zhang, Instance based transfer learning for genetic programming for symbolic regression, in 2019 IEEE Congress on Evolutionary Computation (CEC) (IEEE, Piscataway, 2019), pp. 3006–3013

    Book  Google Scholar 

  16. Q. Chen, M. Zhang, B. Xue, Structural risk minimization-driven genetic programming for enhancing generalization in symbolic regression. IEEE Trans. Evol. Comput. 23(4), 703–717 (2019)

    Article  Google Scholar 

  17. Q. Chen, B. Xue, M. Zhang, Rademacher complexity for enhancing the generalization of genetic programming for symbolic regression. IEEE Trans. Cybern. (2020). https://doi.org/10.1109/TCYB.2020.3004361

  18. D. Cohn, L. Atlas, R. Ladner, Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)

    Google Scholar 

  19. W. Dai, Q. Yang, G.-R. Xue, Y. Yu, Boosting for transfer learning, in Proceedings of the 24th International Conference on Machine Learning (ACM, New York, 2007), pp. 193–200

    Google Scholar 

  20. G. Dick, Sensitivity-like analysis for feature selection in genetic programming, in Proceedings of the 19th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2017), pp. 401–408

    Google Scholar 

  21. P. Domingos, A unified bias-variance decomposition for zero-one and squared loss. AAAI/IAAI 2000, 564–569 (2000)

    Google Scholar 

  22. C. Ferreira, U. Gepsoft, What is gene expression programming (2008)

    Google Scholar 

  23. J. Fitzgerald, C. Ryan, On size, complexity and generalisation error in GP, in Proceedings of the 16th Annual Conference on Genetic and Evolutionary Computation Conference (GECCO) (2014), pp. 903–910

    Google Scholar 

  24. J. Fitzgerald, R. Azad, C. Ryan, A bootstrapping approach to reduce over-fitting in genetic programming, in Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2013), pp. 1113–1120

    Google Scholar 

  25. J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning. Springer Series in Statistics, vol. 1 (Springer, New York, 2001)

    Google Scholar 

  26. C. Gagné, M. Schoenauer, M. Parizeau, M. Tomassini, Genetic programming, validation sets, and parsimony pressure, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2006), pp. 109–120

    Google Scholar 

  27. S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma. Neural Netw. 4(1) (2008)

    Google Scholar 

  28. I. Gonçalves, S. Silva, Balancing learning and overfitting in genetic programming with interleaved sampling of training data, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2013), pp. 73–84

    Google Scholar 

  29. M. Gulsen, A.E. Smith, A hierarchical genetic algorithm for system identification and curve fitting with a supercomputer implementation, in Evolutionary Algorithms (Springer, Berlin, 1999), pp. 111–137

    Google Scholar 

  30. M. Gulsen, A. Smith, D. Tate, A genetic algorithm approach to curve fitting. Int. J. Prod. Res. 33(7), 1911–1923 (1995)

    Article  MATH  Google Scholar 

  31. T. Helmuth, N.F. McPhee, L. Spector, Lexicase selection for program synthesis: a diversity analysis, in Genetic Programming Theory and Practice XIII (Springer, Berlin, 2016), pp. 151–167

    Book  Google Scholar 

  32. N.T. Hien, N.X. Hoai, B. McKay, A study on genetic programming with layered learning and incremental sampling, in 2011 IEEE Congress of Evolutionary Computation (CEC) (IEEE, Piscataway, 2011), pp. 1179–1185

    Book  Google Scholar 

  33. M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2003), pp. 70–82

    MATH  Google Scholar 

  34. V. Koltchinskii, Rademacher penalties and structural risk minimization. IEEE Trans. Inf. Theory 47(5), 1902–1914 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  35. M. Kommenda, M. Affenzeller, B. Burlacu, G. Kronberger, S.M. Winkler, Genetic programming with data migration for symbolic regression, in Proceedings of the 16th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2014), pp. 1361–1366

    Google Scholar 

  36. T. Kowaliw, R. Doursat, Bias-variance decomposition in genetic programming. Open Math. 14(1), 62–80 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  37. J.R. Koza, Genetic Programming II, Automatic Discovery of Reusable Subprograms (MIT Press, Cambridge, 1992)

    Google Scholar 

  38. J. Kubalík, E. Derner, R. Babuška, Symbolic regression driven by training data and prior knowledge, in Proceedings of the 24th Genetic and Evolutionary Computation Conference (GECCO) (2020), pp. 958–966

    Google Scholar 

  39. I. Kuscu, Generalisation and domain specific functions in genetic programming, in Proceedings of the 2000 Congress on Evolutionary Computation (CEC), vol. 2 (IEEE, Piscataway, 2000), pp. 1393–1400

    Google Scholar 

  40. N. Le, H.N. Xuan, A. Brabazon, T.P. Thi, Complexity measures in genetic programming learning: a brief review, in Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC) (IEEE, Piscataway, 2016), pp. 2409–2416

    Google Scholar 

  41. S. Luke, L. Panait, Fighting bloat with nonparametric parsimony pressure, in International Conference on Parallel Problem Solving from Nature (PPSN) (Springer, Berlin, 2002), pp. 411–421

    Google Scholar 

  42. S. Luke, L. Panait, Lexicographic parsimony pressure, in Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation (GECCO) (Morgan Kaufmann, Burlington, 2002), pp. 829–836

    Google Scholar 

  43. S. Luke, L. Panait, A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309–344 (2006)

    Article  Google Scholar 

  44. Y. Martínez, E. Naredo, L. Trujillo, P. Legrand, U. López, A comparison of fitness-case sampling methods for genetic programming. J. Exp. Theor. Artif. Intell. 29(6), 1203–1224 (2017)

    Article  Google Scholar 

  45. J.F. Miller, P. Thomson, Cartesian genetic programming, in Genetic Programming (Springer, Berlin, 2000), pp. 121–132

    Google Scholar 

  46. T.M. Mitchell, Machine Learning (McGraw Hill, Burr Ridge, IL, 1997), p. 45

    MATH  Google Scholar 

  47. Q.U. Nguyen, X.H. Nguyen, M. O’Neill, Semantic aware crossover for genetic programming: the case for real-valued function regression, in Genetic Programming (Springer, Berlin, 2009), pp. 292–302

    Book  Google Scholar 

  48. Q.U. Nguyen, X.H. Nguyen, M. O’Neill, Examining the landscape of semantic similarity based mutation, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO) (ACM, New York, 2011), pp. 1363–1370

    Google Scholar 

  49. J. Ni, R.H. Drieberg, P.I. Rockett, The use of an analytic quotient operator in genetic programming. IEEE Trans. Evol. Comput. 17(1), 146–152 (2012)

    Article  Google Scholar 

  50. M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program Evolvable Mach. 11(3–4), 339–363 (2010)

    Article  Google Scholar 

  51. L. Panait, S. Luke, Methods for evolving robust programs, in Proceedings of the 5th Annual Conference on Genetic and Evolutionary Computation (GECCO) (Springer, Berlin, 2003), pp. 1740–1751

    MATH  Google Scholar 

  52. G. Paris, D. Robilliard, C. Fonlupt, Exploring overfitting in genetic programming, in International Conference on Artificial Evolution (Evolution Artificielle) (Springer, Berlin, 2003), pp. 267–277

    MATH  Google Scholar 

  53. R. Poli, W.B. Langdon, N.F. McPhee, J.R. Koza, A Field Guide to Genetic Programming (2008). http://Lulu.com

  54. C. Raymond, Q. Chen, B. Xue, M. Zhang, Genetic programming with rademacher complexity for symbolic regression, in 2019 IEEE Congress on Evolutionary Computation (CEC) (IEEE, Piscataway, 2019), pp. 2657–2664

    Book  Google Scholar 

  55. C. Raymond, Q. Chen, B. Xue, M. Zhang, Adaptive weighted splines: a new representation to genetic programming for symbolic regression, in Proceedings of the 24th Genetic and Evolutionary Computation Conference (GECCO) (2020), pp. 1003–1011

    Google Scholar 

  56. D. Rivero, E. Fernandez-Blanco, C. Fernandez-Lozano, A. Pazos, Population subset selection for the use of a validation dataset for overfitting control in genetic programming. J. Exp. Theor. Artif. Intell. 32(2), 243–271 (2020)

    Article  Google Scholar 

  57. S.H. Rudy, S.L. Brunton, J.L. Proctor, J.N. Kutz, Data-driven discovery of partial differential equations. Sci. Adv. 3(4), e1602614 (2017)

    Google Scholar 

  58. M. Schmidt, H. Lipson, Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009)

    Article  Google Scholar 

  59. S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program Evolvable Mach. 13(2), 197–238 (2012)

    Article  Google Scholar 

  60. S. Sun, R. Ouyang, B. Zhang, T.-Y. Zhang, Data-driven discovery of formulas by symbolic regression. MRS Bull. 44(7), 559–564 (2019)

    Article  Google Scholar 

  61. C. Tuite, A. Agapitos, M. O’Neill, A. Brabazon, Tackling overfitting in evolutionary-driven financial model induction, in Natural Computing in Computational Finance (Springer, Berlin, 2011), pp. 141–161

    Google Scholar 

  62. N.Q. Uy, N.X. Hoai, M. O’Neill, Semantics based mutation in genetic programming: the case for real-valued symbolic regression, in 15th International Conference on Soft Computing, Mendel, vol. 9 (2009), pp. 73–91

    Google Scholar 

  63. N.Q. Uy, N.X. Hoai, M. O’Neill, R.I. McKay, E. Galván-López, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program Evolvable Mach. 12(2), 91–119 (2011)

    Article  Google Scholar 

  64. L. Vanneschi, S. Gustafson, Using crossover based similarity measure to improve genetic programming generalization ability, in Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2009), pp. 1139–1146

    Google Scholar 

  65. L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming, in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2010), pp. 877–884

    Google Scholar 

  66. V. Vapnik, Estimation of Dependences Based on Empirical Data (Springer Science & Business Media, Berlin, 2006)

    Book  MATH  Google Scholar 

  67. E.J. Vladislavleva, G.F. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2008)

    Article  Google Scholar 

  68. E. Vladislavleva, T. Friedrich, F. Neumann, M. Wagner, Predicting the energy output of wind farms based on weather data: Important variables and their correlation. Renew. Energy 50, 236–243 (2013)

    Article  Google Scholar 

  69. M. Willis, H. Hiden, M. Hinchliffe, B. McKay, G.W. Barton, Systems modelling using genetic programming. Comput. Chem. Eng. 21, S1161–S1166 (1997)

    Article  Google Scholar 

  70. C. Xu, W. Wang, P. Liu, A genetic programming model for real-time crash prediction on freeways. IEEE Trans. Intell. Transp. Syst. 14(2), 574–586 (2012)

    Article  Google Scholar 

  71. B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Xue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chen, Q., Xue, B. (2022). Generalisation in Genetic Programming for Symbolic Regression: Challenges and Future Directions. In: Smith, A.E. (eds) Women in Computational Intelligence. Women in Engineering and Science. Springer, Cham. https://doi.org/10.1007/978-3-030-79092-9_13

Download citation

Publish with us

Policies and ethics