Skip to main content
Log in

An improved semantic schema modeling for genetic programming

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

A considerable research effort has been performed recently to improve the power of genetic programming (GP) by accommodating semantic awareness. The semantics of a tree implies its behavior during the execution. A reliable theoretical modeling of GP should be aware of the behavior of individuals. Schema theory is a theoretical tool used to model the distribution of the population over a set of similar points in the search space, referred by schema. There are several major issues with relying on prior schema theories, which define schemata in syntactic level. Incorporating semantic awareness in schema theory has been scarcely studied in the literature. In this paper, we present an improved approach for developing the semantic schema in GP. The semantics of a tree is interpreted as the normalized mutual information between its output vector and the target. A new model of the semantic search space is introduced according to semantics definition, and the semantic building block space is presented as an intermediate space between semantic and genotype ones. An improved approach is provided for representing trees in building block space. The presented schema is characterized by Poisson distribution of trees in this space. The corresponding schema theory is developed for predicting the expected number of individuals belonging to proposed schema, in the next generation. The suggested schema theory provides new insight on the relation between syntactic and semantic spaces. It has been shown to be efficient in comparison with the existing semantic schema, in both generalization and diversity-preserving aspects. Experimental results also indicate that the proposed schema is much less computationally expensive than the similar work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Schema and subtrees are shown in infix format according to be syntactically comparable with each other

References

  • Altenberg L (1994a) Emergent phenomena in genetic programming. In: Evolutionary programming—proceedings of the third annual conference, pp 233–241

  • Altenberg L (1994b) The evolution of evolvability in genetic programming. In: Kinnear K (ed) Advances in genetic programming. MIT Press, Cambridge, pp 47–74

  • Altenberg L (1995) The schema theorem and Price’s theorem. In: Whitley D, Vose M (eds) Foundations of genetic algorithms 3. Morgan Kaufmann, Los Altos, pp 23–49

    Google Scholar 

  • Amir Haeri M, Ebadzadeh M (2014) Estimation of mutual information by the fuzzy histogram. Fuzzy Optim Decis Mak 13:287–318

    Article  Google Scholar 

  • Beadle L, Johnson CG (2008) Semantically driven crossover in genetic programming. In: IEEE congress on evolutionary computation, pp 111–116

  • Beadle L, Johnson CG (2009a) Semantic analysis of program initialisation in genetic programming. Genet Program Evolvable Mach 10:307–337

    Article  Google Scholar 

  • Beadle L, Johnson CG (2009b) Semantically driven mutation in genetic programming. In: IEEE congress on evolutionary computation, pp 1336–1342

  • Card S, Mohan C (2008) Towards an information theoretic framework for genetic programming. In: Riolo R, Soule T, Worzel B (eds) Genetic programming theory and practice V. Genetic and evolutionary computation series. Springer, Berlin, pp 87–106

    Chapter  Google Scholar 

  • Castelli M, Fumagalli A (2016) An evolutionary system for exploitation of fractured geothermal reservoirs. Comput Geosci 20:385–396

    Article  MathSciNet  Google Scholar 

  • Castelli M, Vanneschi L, Silva S (2014) Prediction of the unified Parkinson’s disease rating scale assessment using a genetic programming system with geometric semantic genetic operators. Expert Syst Appl 41:4608–4616

    Article  Google Scholar 

  • Castelli M, Silva S, Vanneschi L (2015) A C++ framework for geometric semantic genetic programming. Genet Program Evolvable Mach 16:73–81. doi:10.1007/s10710-014-9218-0

    Article  Google Scholar 

  • Castelli M, Manzoni L, Silva S, Vanneschi L, Popovič A (2016) The influence of population size in geometric semantic GP. Swarm Evol Comput 32:110–120

  • D’haeseleer P, Bluming J (1994) Effects of locality in individual and population evolution. In: Kinnear K (ed) Advances in genetic programming. MIT Press, Cambridge, pp 177–198

  • Galvan-Lopez E, Cody-Kenny B, Trujillo L, Kattan A (2013) Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity. In: 2013 IEEE congress on evolutionary computation. IEEE, pp 2972–2979

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc., Reading

    MATH  Google Scholar 

  • Gustafson S, Burke EK, Kendall G (2004) Sampling of unique structures and behaviours in genetic programming. In: Keijzer M et al (eds) Genetic programming. Springer, Berlin, pp 279–288

  • Haynes T (1997) Phenotypical building blocks for genetic programming. In: Back T (ed) Genetic algorithms: proceedings of the seventh international conference, Michigan State University, East Lansing, MI, USA, 19–23 July. Morgan Kaufmann, pp 26–33

  • Holland JH (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge

    Google Scholar 

  • Jackson D (2010a) Phenotypic diversity in initial genetic programming populations. In: Esparcia-Alcazar AI et al (eds) Genetic programming. Springer, Istanbul, pp 98–109

  • Jackson D (2010b) Promoting phenotypic diversity in genetic programming. In: Schaefer R et al (eds) Parallel problem solving from nature, PPSN XI. Springer, Krakow, pp 472–481

  • Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan C, Soule T, Keijzer M, Tsang E, Poli R, Costa E (eds) Genetic programming, vol 2610. Lecture notes in computer science. Springer, Berlin, pp 70–82. doi:10.1007/3-540-36599-0_7

  • Kinzett D, Zhang M, Johnston M (2010) Analysis of building blocks with numerical simplification in genetic programming. In: Esparcia-Alcázar A, Ekárt A, Silva S, Dignum S, Uyar AŞ (eds) Genetic programming, vol 6021. Lecture notes in computer science. Springer, Berlin, pp 289–300

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  • Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69:066138

    Article  MathSciNet  Google Scholar 

  • Krawiec K (2016) The framework of behavioral program synthesis. In: Behavioral program synthesis with genetic programming. Springer, Switzerland, pp 35–41

  • Krawiec K, Lichocki P (2009a) Approximating geometric crossover in semantic space. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 987–994

  • Krawiec K, Lichocki P (2009b) Approximating geometric crossover in semantic space. Paper presented at the proceedings of the 11th annual conference on genetic and evolutionary computation, Montreal, Qubec, Canada

  • Krawiec K, Pawlak T (2013a) Approximating geometric crossover by semantic backpropagation. Paper presented at the proceedings of the 15th annual conference on genetic and evolutionary computation, Amsterdam, The Netherlands

  • Krawiec K, Pawlak T (2013b) Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators. Genet Program Evolvable Mach 14:31–63

    Article  Google Scholar 

  • Langdon WB, Poli R (2002) Foundations of genetic programming. Springer, Berlin

    Book  MATH  Google Scholar 

  • Langdon WB, Banzhaf W (2005) Repeated sequences in linear genetic programming genomes. Complex Syst 15:285–306

  • Langdon WB, Banzhaf W (2008) Repeated patterns in genetic programming. Nat Comput 7:589–613

    Article  MathSciNet  MATH  Google Scholar 

  • Majeed H (2005) A new approach to evaluate GP schema in context. Paper presented at the proceedings of the 2005 workshops on genetic and evolutionary computation, Washington, D.C., USA, 25–29 June

  • McDermott J et al (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 791–798

  • McKay RI, Nguyen XH, Cheney JR, Kim M, Mori N, Hoang TH (2009) Estimating the distribution and propagation of genetic programming building blocks through tree compression. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 1011–1018

  • McPhee NF, Poli R (2002) Using schema theory to explore interactions of multiple operators. Paper presented at the GECCO 2002: proceedings of the genetic and evolutionary computation conference, New York

  • McPhee NF, Ohs B, Hutchison T (2008) Semantic building blocks in genetic programming. Paper presented at the proceedings of the 11th European conference on genetic programming, Naples, Italy

  • Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16:233–248

  • Moraglio A, Mambrini A (2013) Runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In: Coello Coello CA et al (eds) Proceedings of the 15th annual conference on genetic and evolutionary computation. ACM, pp 989–996

  • Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. In: Coello Coello CA (ed) Parallel problem solving from nature-PPSN XII. Springer, Berlin, pp 21–31

  • Nguyen QU, Neill MO, Hoai NX (2010) Predicting the tide with genetic programming and semantic-based crossovers. In: 2010 second international conference on knowledge and systems engineering (KSE). IEEE, pp 89–95

  • Nguyen QU, Nguyen XH, O’Neill M (2011a) Examining the landscape of semantic similarity based mutation. In: Proceedings of the 13th annual conference on genetic and evolutionary computation. ACM, pp 1363–1370

  • Nguyen QU, Nguyen XH, O’Neill M, Mckay RI, Galvan-Lopez E (2011b) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolvable Mach 12:91–119

    Article  Google Scholar 

  • Nguyen QU, Nguyen XH, O’Neill M, McKay RI, Phong DN (2013) On the roles of semantic locality of crossover in genetic programming. Inf Sci 235:195–213

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen QU, Pham TA, Nguyen XH, McDermott J (2016) Subtree semantic geometric crossover for genetic programming. Genet Program Evolvable Mach 17:25–53

    Article  Google Scholar 

  • O’Reilly UM, Oppacher F (1994) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD, Vose MD (eds) Foundations of genetic algorithms 3. Morgan Kaufmann, Estes Park, pp 73–88

    Google Scholar 

  • Pawlak TP (2015) Competent algorithms for geometric semantic genetic programming review. Ph.D. thesis, Poznan University of Technology, Pozna’n, Poland

  • Pawlak TP, Krawiec K (2016) Semantic geometric initialization. In: Heywood IM, McDermott J, Castelli M, Costa E, Sim K (eds) Genetic programming: 19th European conference, EuroGP 2016, Porto, Portugal, March 30–April 1, 2016, proceedings. Springer, Cham, pp 261–277

  • Pawlak TP, Wieloch B, Krawiec K (2015) Semantic backpropagation for designing search operators in genetic programming. IEEE Trans Evol Comput 19:326–340

    Article  Google Scholar 

  • Pham TA, Nguyen QU, Nguyen XH, O’Neill M (2013) Examining the diversity property of semantic similarity based crossover. In: Krawiec K, Moraglio A, Hu T, Etaner-Uyar AŞ, Hu B (eds) Genetic programming: 16th European conference, EuroGP 2013, Vienna, Austria, April 3–5, 2013. Proceedings. Springer, Berlin, pp 265–276

  • Poli R (2000) Exact schema theorem and effective fitness for GP with one-point crossover. In: Whitley D, Goldberg D, Cantu-Paz E, Spector L, Parmee I, Beyer H-G (eds) Proceedings of the genetic and evolutionary computation conference, Las Vegas. Morgan Kaufmann, pp 469–476

  • Poli R (2001) General schema theory for genetic programming with subtree-swapping crossover. In: Miller J, Tomassini M, Lanzi P, Ryan C, Tettamanzi AB, Langdon W (eds) Genetic programming, vol 2038. Lecture notes in computer science. Springer, Berlin, pp 143–159

  • Poli R, Langdon WB (1997a) An experimental analysis of schema creation, propagation and disruption in genetic programming. In: Genetic algorithms: proceedings of the seventh international conference, 19–23 July. Morgan Kaufmann, Michigan State University, East Lansing, MI, USA, pp 18–25

  • Poli R, Langdon WB (1997b) A new schema theory for genetic programming with one-point crossover and point mutation. In: Genetic programming 1997: proceedings of the second annual conference, 13–16 July. Morgan Kaufmann, Stanford University, CA, USA, pp 278–285

  • Poli R, Langdon WB (1998) Schema theory for genetic programming with one-point crossover and point mutation. Evol Comput 6:231–252

    Article  Google Scholar 

  • Poli R, McPhee NF (2001) Exact schema theorems for GP with one-point and standard crossover operating on linear structures and their application to the study of the evolution of size. Paper presented at the genetic programming, proceedings of EuroGP’2001, Lake Como, Italy

  • Poli R, McPhee NF (2003a) General schema theory for genetic programming with subtree-swapping crossover: part I. Evol Comput 11:53–66

    Article  Google Scholar 

  • Poli R, McPhee NF (2003b) General schema theory for genetic programming with subtree-swapping crossover: part II. Evol Comput 11:169–206

    Article  Google Scholar 

  • Poli R, Stephens CR (2005) The building block basis for genetic programming and variable-length. Genet Algorithms Int J Comput Intell Res 1:183–197

    Google Scholar 

  • Poli R, Banzhaf W, Langdon W, Miller J, Nordin P, Fogarty T (2000) Hyperschema theory for GP with one-point crossover, building blocks, and some new results in GA theory. In: Genetic programming, vol 1802. Lecture notes in computer science. Springer, Berlin, pp 163–180

  • Poli R, McPhee N, Rowe J (2004) Exact schema theory and Markov chain models for genetic programming and variable-length genetic algorithms with homologous crossover. Genet Program Evolvable Mach 5:31–70

    Article  Google Scholar 

  • Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471

    Article  MATH  Google Scholar 

  • Rosca JP (1995a) Entropy-driven adaptive representation. In: Proceedings of the workshop on genetic programming: from theory to real-world applications. Citeseer, pp 719–736

  • Rosca JP (1995b) Genetic programming exploratory power and the discovery of functions. In: Evolutionary programming. MIT Press, Cambridge, pp 719–736

  • Rosca JP (1997) Analysis of complexity drift in genetic programming. In: Koza JR, Deb K, Dorigo M, Fogel DB, Garzon M, Iba H, Riolo RL (eds) Genetic programming 1997: proceedings of the second annual conference, Stanford University, CA, USA, 13–16 July. Morgan Kaufmann, pp 286–294

  • Rosca JP, Ballard DH (1995) Causality in genetic programming. Paper presented at the proceedings of the 6th international conference on genetic algorithms

  • Rosca JP, Ballard DH (1996) Discovery of subroutines in genetic programming. In: Angeline PJ, Kinnear K (eds) Advances in genetic programming. MIT Press, Cambridge, pp 177–201

  • Rosca JP, Ballard DH (1999) Rooted-tree schemata in genetic programming. In: Spector L, Langdon WB, O’Reilly UM, Angeline PJ (eds) Advances in genetic programming. MIT Press, Cambridge, pp 243–271

  • Ryan C (1994) Pygmies and civil servants. In: Advances in genetic programming. MIT Press, Cambridge, pp 243–263

  • Sastry K, O’Reilly U-M, Goldberg DE, Hill D (2003) Building block supply in genetic programming. In: Riolo RL, Worzel B (eds) Genetic programming theory and practice. Kluwer, Dordrecht, pp 137–154

    Chapter  Google Scholar 

  • Shan Y, McKay R, Essam D, Abbass H (2006) A Survey of probabilistic model building genetic programming. In: Studies in computational intelligence. Scalable optimization via probabilistic modeling, vol 33. Springer, Berlin, pp 121–160

  • Smart W, Zhang M (2008) Empirical analysis of schemata in genetic programming using maximal schemata and MSG. In: Evolutionary computation. CEC 2008. (IEEE world congress on computational intelligence). IEEE, pp 2983–2990

  • Smart W, Andreae P, Zhang M (2007) Empirical analysis of GP tree-fragments. Paper presented at the proceedings of the 10th European conference on genetic programming, Valencia, Spain

  • Snedecor GW, Cochran WG (1967) Statistical methods, 6th edn. The Iowa State University, Ames

    MATH  Google Scholar 

  • Tackett WA (1995) Mining the genetic program. IEEE Expert Intell Syst Appl 10:28–38

  • Tomassini M, Vanneschi L, Collard P, Clergue M (2005) A study of fitness distance correlation as a difficulty measure in genetic programming. Evol Comput 13:213–239

    Article  MATH  Google Scholar 

  • Vanneschi L, Castelli M, Manzoni L, Silva S (2013) A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. Springer, Berlin

  • Vanneschi L, Castelli M, Silva S (2014a) A survey of semantic methods in genetic programming. Genet Program Evolvable Mach 15:195–214

    Article  Google Scholar 

  • Vanneschi L, Silva S, Castelli M, Manzoni L (2014b) Geometric semantic genetic programming for real life applications. In: Riolo R, Moore HJ, Kotanchek M (eds) Genetic programming theory and practice XI. Springer, New York, pp 191–209

    Chapter  Google Scholar 

  • Welch BL (1947) The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika 34:28–35

    MathSciNet  MATH  Google Scholar 

  • Whigham PA (1995) A schema theorem for context-free grammars. In: IEEE conference on evolutionary computation, Perth, Australia, 29 Nov–1 Dec 1995. IEEE Press, pp 178–181

  • Wilson GC, Heywood MI (2005) Context-based repeated sequences in linear genetic programming. Paper presented at the proceedings of the 8th European conference on genetic programming, Lausanne, Switzerland, 30 Mar–1 Apr

  • Wyns B, De Bruyne P, Boullart L (2006) Characterizing diversity in genetic programming. In: Collet P et al (eds) Genetic programming. Springer, Budapest, pp 250–259

  • Zhu Z, Nandi AK, Aslam MW (2013) Adapted geometric semantic genetic programming for diabetes and breast cancer classification. In: 2013 IEEE international workshop on machine learning for signal processing (MLSP). IEEE, pp 1–5

  • Zojaji Z, Ebadzadeh MM (2015) Semantic schema theory for genetic programming. Appl Intell 44:67–87

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mehdi Ebadzadeh.

Ethics declarations

Conflict of interest

Authors Zahra Zojaji and Mohammad Mehdi Ebadzadeh declare that they have no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by A. Di Nola.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zojaji, Z., Ebadzadeh, M.M. An improved semantic schema modeling for genetic programming. Soft Comput 22, 3237–3260 (2018). https://doi.org/10.1007/s00500-017-2781-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2781-6

Keywords

Navigation