Multi-level diversity promotion strategies for Grammar-guided Genetic Programming
Introduction
Grammar-guided Genetic Programming (G3P) may be considered as a natural extension of the original paradigm introduced by Koza in late 1980s [1]. Differently from Genetic Programming (GP), G3P exploits a grammar in order to ensure that all the individuals in the population are syntactically valid.
While the embryonic idea of using a grammar may be attributed to Koza himself [2], the first line of research that can be sensibly labeled “grammar-guided” dates back to mid-1990s, with Whigham’s Context-free Grammar Genetic Programming (CFGGP) [3] and Geyer-Schulz’s rule-based expert system [4]. Here, phenotypes are still trees, but are derived according to an arbitrary context-free grammar and genetic operators are designed to preserve this representation.
Grammatical Evolution (GE), probably the best known G3P approach, has been proposed by Ryan, Collins, and O’Neill in 1998 [5]. It encodes individuals into genomes as unstructured, variable-length sequences of bits grouped in codons, eventually interpreted in the context of a user-supplied grammar. More specifically, the integer values of the codons are used to select among the list of possible derivations in a grammar in the Backus–Naur form. This procedure is a mapping from the individual represented as a bit string to the resulting string of the language defined by the grammar—GE is thus said to adopt an indirect representation of the individuals.
The main advantage of G3P is apparent: changing the base grammar allows to exploit the very same Evolutionary Algorithm (EA) for virtually any possible problem without modification. On the other hand, the mapping procedure which characterizes GE has been shown to impair the evolution process [6]. The locality of a representation describes how much small genotypic changes caused by the application of the genetic operators correspond to small changes in the fitness of individuals. It has been widely acknowledged by scholars that “high-locality representations preserve the difficulty of a problem and phenotypically easy problems also remain genotypically easy. Using low-locality representations is equivalent to randomizing the search process”. [7]. GE may exhibit a remarkable low-locality, as the change of a single bit in the genome is likely to affect many different derivations and, eventually, to result in a largely different fitness.
In the past years literature reported several successful application of GE [8], together with scholarly articles that scrutinize its peculiar evolutionary processes [6], [9], [10], [11]. Among these, a few proposals arose for a different mapping which could address the limitations of the original GE mapping, e.g., GE [12], SGE [13], WHGE [14]. A crucial problem that emerged from such studies is that mapping may impact on the population diversity: in particular, the tendency to map different genotypes to the same phenotype may result in many individuals being the same and, eventually, may lead to premature convergence [6], [15], [16]. While the lack of diversity is not necessarily a problem per se, it is frequently associated with poor performances. Diversity is not the end goal of an EA, but promoting it can be an important mean goal.
In this paper, we address this topic in depth. We first analyze experimentally four G3P approaches (CFGGP, GE, SGE, and WHGE—see Section 3) in order to understand if and how they are affected by lack of diversity. Then, we propose two general strategies for promoting diversity in G3P, one being an adaptation of an existing diversity promotion strategy—namely deterministic crowding [17]—to G3P. Both strategies are independent from the problem tackled and the details of the fitness function. Not being dependent on the structure of the solution nor on the actual grammar, the strategies are independent from the genotype–phenotype mapping. Moreover, the two diversity promotion strategies are not influenced by the characteristics of the EA, such as the selection criteria or the genetic operators. They may be set to operate at a very specific level, namely, genotype, phenotype, or fitness. Beyond the goal of improving G3P effectiveness, and hence further extend its applicability, our study aims at better understanding how diversity promotion may impact on EAs based on indirect representations.
We performed a thorough experimental analysis based on benchmark problems and G3P variants, differing in the representation of the individuals. We show that the considered G3P variants indeed have an issue of lack of diversity and we also show that diversity promotion always results in an improvement of the search effectiveness: regardless the G3P variant being used and the problem being tackled, some of the diversity promotion strategies here considered always lead to a better final best fitness (on average). The experimental results suggest that similar mechanisms could be beneficial for different EAs.
A brief and preliminary study along the same line of this paper has been presented in [18]. Here, we extend the cited paper in several ways: (a) we provide a much deeper discussion of the diversity promotion strategy proposed in [18] and consider another strategy based on the adaptation of deterministic crowding to G3P; (b) we apply the two strategies to variants of G3P (CFGGP, GE, SGE, WHGE), instead of only on GE; (c) we perform a much deeper experimental evaluation considering a larger set of benchmark problems and analyzing the results in greater detail.
The remainder of the article is organized as follows. In Section 2, we survey the relevant literature with respect to diversity promotion. In Section 3, we give a common formulation of G3P techniques and then describe in details the different considered G3P variants. In Section 4, we introduce the two strategies for diversity promotion in G3P. In Section 5, we describe the experimental evaluation and discuss the results. Finally, in Section 6, we draw the conclusions.
Section snippets
Related works
The lack of diversity frequently limits the effectiveness of evolutionary algorithms: Holland himself analyzed the issue, talking about the lack of speciation in his seminal works [19]. It is, however, an endemic phenomenon, possibly rooted in the very use of a fitness function instead of a real environment [15]. A set of diverse individuals is not the final goal, yet, most scholars agree that enforcing a higher level of diversity within the population may be beneficial for the overall
G3P
In this section, we describe the G3P variants analyzed in the experimental assessment. We considered four variants: Context-free grammar Genetic Programming (CFGGP) [42], standard Grammatical Evolution (GE) [5], Structured Grammatical Evolution (SGE) [13], [43], and Weighted Hierarchical Grammatical Evolution (WHGE) [14], [44]. We describe these proposals in terms of a common framework, presented in Section 3.1, where a fixed-size population is evolved iteratively.
All the variants are based on
Strategies for promoting diversity
We here describe the two strategies for diversity promotion: Partitioned Population G3P (PPG3P) and Deterministic Crowding G3P (DCG3P). Both are multi-level, in the sense that they can be configured to work at the level of the genotype, phenotype, or fitness. The former has been presented in [18], the latter is based on the original idea deterministic crowding [17], but adapted to the case of G3P.
Experimental evaluation
We performed experiments in two phases: first, in order to experimentally verify to which degree the lack of diversity is an issue in existing G3P approaches, we performed a set of experiments with SSG3P and measured the diversity at the level of genotype, phenotype, and fitness; then, we compared the effectiveness of the two considered diversity promotion strategies (PPG3P and DCG3P) against the one of SSG3P, considered as a baseline.
Conclusions
Grammar-guided Genetic Programming (G3P) is a family of Evolutionary Algorithms for which the possibility of tackling different problems by simply changing the user-provided grammar resulted in large adoption and enduring popularity. The most widespread members of this family are based on an indirect representation: individuals are described at three levels in terms of their genotype, phenotype, and fitness. This representation stimulated criticism and attracted many studies: many of them found
Acknowledgments
We would like to thank the anonymous reviewers for their insightful comments and constructive suggestions.
Declaration of competing interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2019.105599.
References (56)
- et al.
On the roles of semantic locality of crossover in genetic programming
Inform. Sci.
(2013) - et al.
Divergence of character and premature convergence: A survey of methodologies for promoting diversity in evolutionary optimization
Information Sciences, Vol. 329
Inform. Sci.
(2016) Genetic Programming: On the Programming of Computers by Means of Natural Selection, Vol. 1
(1992)- et al.
Grammar-based genetic programming: a survey
Genet. Program. Evol. Mach.
(2010) - P.A. Whigham, Inductive bias and genetic programming, in: First International Conference on Genetic Algorithms in...
Fuzzy Rule-Based Expert Systems and Genetic Machine Learning, Vol. 3
(1997)- et al.
Grammatical evolution: Evolving programs for an arbitrary language
A comparative analysis of dynamic locality and redundancy in grammatical evolution
Representations for Genetic and Evolutionary Algorithms
(2006)- et al.
Grammatical evolution