Comparison of a genetic algorithm to grammatical evolution for automated design of genetic programming classification algorithms
Introduction
Data classification, is one of the most widely studied domains of research in machine learning. Many real-world tasks can be viewed as classification problems. Classification is the process of associating an object to a class (label) based on the features describing that object. Classification is generally performed by classifier models. Classification usually involves two phases a learning (training) phase and a testing phase. A classifier model is induced by a classification algorithm during training and its classification accuracy is evaluated during testing.
Evolutionary algorithms (EAs) are one of the methods that have gained prominence in the induction of classifiers, particularly genetic programming (Espejo, Ventura, Herrera, 2010, Freitas, 2003). Genetic programming (GP) is a population based algorithm that models Darwin’s theory of evolution (Koza, 1992). For a number of reasons GP has proved to be effective in the induction of classifiers. The tree representation used by GP allows it flexibility to evolve classifiers that model numerous problems Espejo et al. (2010). For example GP can be configured to represent decision trees, association rules or discriminant functions.
GP like most EAs is a parameterized algorithm and it has been shown that the effectiveness of such algorithms depends on their configuration (Eiben & Smit, 2011). Algorithm configuration is a design process that involves determining numerical parameter values, selecting categorical parameters and setting the control flow that would lead to the algorithm finding an optimal (or near optimal) solution to the problem at hand. According to Kramer and Kacprzyk (2008) to manually configure an EA that yields effective results, considerable algorithm design experience is necessary. However, Hutter (2009) argues that even with the necessary experience manual design is a tedious non-trivial task susceptible to human bias. Furthermore, the search space for possible parameter values is large and only a subset of the design decisions are considered during manual design. Parameter values and flow control options are considered using trial runs in an iterative trial and error approach. Montero and Riff (2014) argue that inexperienced designers add more parameters than are necessary during manual design leading to unnecessarily complex algorithms. They also point out that a lot of man hours are required for manual design and ideally the algorithm designer should have expert knowledge of the domain being considered, however this is not always possible. Parameter control and tuning methods have been proposed in literature (Dobslaw, 2010, Eiben, Hinterding, Michalewicz, 1999, Eiben, Smit, 2011) with no method being universally adopted (Karafotias, Hoogendoorn, & Eiben, 2015).
In a previous study, Nyathi and Pillay (2017) showed the effectiveness of using a genetic algorithm (GA) (Goldberg & Holland, 1988) to automate the design of a GP classification algorithm for data classification. In this study a comparison of the automated design of GP classification algorithms using a GA and grammatical evolution (GE) (Ryan, Collins, & Neill, 1998) is carried out. A GA and GE are used to automate the design of GP classification algorithms. GA and GE are individually used to search for a GP configuration that should result in GP evolving the best classifier for a specific classification problem. Throughout this paper the use of the word configuration refers to algorithm parameter values (numerical and/or categorical) and the algorithm control flow. The best classification accuracies of the classifiers evolved using the GA and GE algorithms are compared to each other and to those of manually designed classification algorithms for the considered problem instances. The study finds that automated designed classifiers significantly outperform manually designed classifiers for problems instances considered across domains and perform equivalently for a specific domain. In addition automated design time is shown to be less than manual design time. Hence, the contributions of this study are:
- •
The study investigates the feasibility of using a GA and GE for automating the design of GP classification algorithms for data classification and shows the effectiveness of automated design over manual design.
- •
The study compares the performance of using a GA to GE for automated design. It is shown that there is no significant difference in performance between the two EAs and either can be used for the design of GP classification algorithms. Although for the considered datasets GE is found to perform better on binary problems while the GA is found to be better on multiclass problems, the differences are not significant.
- •
The study also shows that the use of automated design leads to a reduction in man-hours for the design process.
This paper is structured as follows. Section 2 presents the background of the study outlining GP and the application of GP as a classification algorithm for data classification. The use of a GA and GE in automated design are also discussed in this section. Section 3 presents a brief overview of how GP, the GA and GE are related in the proposed approach. Section 4 outlines the manual design approach of GP classification algorithms, while Section 5 describes the automated design approach using the GA and GE implemented in this study. Experimental settings and a description of the experiments carried out are outlined in Section 6. Section 7 presents the results and analysis of the results. Finally Section 8 provides the conclusion of the study and discusses possible future work.
Section snippets
Classification
Classification is a supervised machine learning method (Pappa & Freitas, 2009). In supervised learning the features with their corresponding class labels are provided and the training process entails learning the classes based on the features. During the training phase the classification algorithm has access to the class labels. After training the evolved classifiers should be able to generalize and assign unseen objects to their respective classes correctly, this is a measure of the predictive
Overview of the proposed approach
In the proposed approach an evolutionary algorithm, in this case a GA(or GE), is used to make design decisions and evaluate different GP designs. The GA(GE) simulates an algorithm designer as it searches through the GP design space for the best GP configuration. Each member of a GA(GE) population encodes a GP configuration and a set of classification problems is used to evaluate each GP configuration. The evaluation is carried out by using the GA(GE) evolved GP configuration to configure GP
Manual GP
This section outlines the manual design of the generational Genetic Programming classification algorithm used in this study. Each individual is a classifier induced by the GP algorithm. Each individual can be one of three types of classifiers either an arithmetic tree classifier, logical tree classifier or decision tree classifier. The type of classifier is determined by the contents of the function and terminal sets.
Using the algorithm flow presented in Algorithm 1 the following sections
Proposed automated design
In this section the proposed automated design approach is presented. The design decisions, together with their possible values considered in this study are outlined. A detailed description of the proposed automated design approaches,using a GA and GE are also presented in this section.
Experimental settings
This section describes the experimental setup used to evaluate the automated design approach. Firstly the data used in the experiments is presented followed by a description of the experiments carried out. The performance of classifiers evolved by manual design and automated design are compared using binary class and multiclass classification problems.
Results and analysis
This section presents the results obtained from conducting the outlined experiments. The results obtained from applying the autoGA and autoGE algorithms are compared to each other and to those obtained from the manual design approach.
Conclusion
This study investigated the feasibility of automating the design of GP classification algorithms for data classification using a genetic algorithm and grammatical evolution. A GA and GE were used to evolve configurations for GP. The effectiveness of automated design is tested on a varied set of real-world problems selected from the UCI dataset repository and on the NSL-KDD dataset. The automated designed configurations were used to evolve GP algorithms that produce classifiers that perform
Acknowledgments
The authors would like to thank the reviewers for their helpful comments to improve the paper.
References (90)
A co-evolving decision tree classification method
Expert Systems with Applications
(2008)- et al.
Gp-coach: Genetic programming-based learning of compact and accurate fuzzy rule-based classification systems for high-dimensional problems
Information Sciences
(2010) - et al.
A constrained-syntax genetic programming system for discovering classification rules: Application to medical data sets
Artificial Intelligence in Medicine
(2004) - et al.
Parameter tuning for configuring and analyzing evolutionary algorithms
Swarm and Evolutionary Computation
(2011) - et al.
Depthlimited crossover in gp for classifier evolution
Computers in Human Behavior
(2011) - et al.
Inductive data mining based on genetic programming: automatic generation of decision trees from data for process historical data analysis
Computers & Chemical Engineering
(2009) - et al.
Building credit scoring models using genetic programming
Expert Systems with Applications
(2005) A comparison of classification accuracy of four genetic programming-evolved intelligent structures
Information Sciences
(2006)A multi-objective genetic programming approach to developing pareto optimal decision trees
Decision Support Systems
(2007)- et al.
Improving genetic programming classification for binary and multiclass datasets
Proceedings of the 2013 IEEE symposium on computational intelligence and data mining (CIDM)
(2013)
An investigation into the sensitivity of genetic programming to the frequency of leaf selection during subtree crossover
Proceedings of the 1st annual conference on genetic programming
A gender-based genetic algorithm for the automatic configuration of algorithms
Proceedings of the international conference on principles and practice of constraint programming
Genetic programming for human oral bioavailability of drugs
Proceedings of the 8th annual conference on genetic and evolutionary computation
Genetic algorithm based multiple decision tree induction
Proceedings of the 6th international conference on neural information processing
Genetic programming: An introduction
Automatic design of decision-tree algorithms with evolutionary algorithms
Evolutionary computation
Legal-tree: a lexicographic multi-objective genetic algorithm for decision tree induction
Proceedings of the 2009 ACM symposium on applied computing
Genetic programming for classification with unbalanced data
Proceedings of the European conference on genetic programming
UCI repository of machine learning databases
A comparison of selection schemes used in evolutionary algorithms
Evolutionary Computation
Genetic programming for knowledge discovery in chest-pain diagnosis
IEEE Engineering in Medicine and Biology Magazine
A classification of hyper-heuristic approaches
Handbook of metaheuristics
Multi-objective genetic programming for feature extraction and data visualization
Soft Computing
A study of fitness functions for data classification using grammatical evolution
Proceedings of the pattern recognition association of South Africa and robotics and mechatronics international conference
Statistical comparisons of classifiers over multiple data sets
Journal of Machine learning research
Evolving evolutionary algorithms using evolutionary algorithms
Proceedings of the 9th annual conference companion on genetic and evolutionary computation
A parameter tuning framework for metaheuristics based on design of experiments and artificial neural networks
Proceedings of the international conference on computer mathematics and natural computing
Generation of vns components with grammatical evolution for vehicle routing
Proceedings of the European conference on genetic programming
Adapting the fitness function in gp for data mining
Proceedings of the European conference on genetic programming
Parameter control in evolutionary algorithms
IEEE Transactions on Evolutionary Computation
Introduction to evolutionary computing
Induction of classification rules with grammar-based genetic programming
Proceedings of the conference on machine intelligence
A survey on the application of genetic programming to classification
IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews
Gpdti: A genetic programming decision tree induction method to find epistatic effects in common complex diseases
Bioinformatics
Grammar-guided evolutionary construction of bayesian networks
Proceedings of the international work-conference on the interplay between natural and artificial computation
A survey of evolutionary algorithms for data mining and knowledge discovery
Advances in evolutionary computing
Genetic programming for predicting protein networks
Proceedings of the Ibero-American conference on artificial intelligence
Simplifying decision trees learned by genetic programming
Evolutionary computation, 2006. cec 2006. ieee congress on
Genetic algorithms
Genetic algorithms and machine learning
Machine learning
Data mining: Concepts and techniques
Construction and assessment of classification rules
Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence
Automated design of probability distributions as mutation operators for evolutionary programming using genetic programming
Proceedings of the european conference on genetic programming
Automated configuration of algorithms for solving hard computational problems
Cited by (48)
Differential evolution ensemble designer
2024, Expert Systems with ApplicationsA meta-evolutionary selection of constituents in ensemble differential evolution algorithm
2022, Expert Systems with ApplicationsCitation Excerpt :Grammatical Evolution (GE) (O’Neill & Ryan, 2001; Ryan et al., 1998, 2018), that employs grammar guided genotype–phenotype mapping, is a potential EA variant for automated algorithm design by virtue of grammar based representation of solutions. GE has been used to design configuration for a genetic programming (GP) classification algorithm (Nyathi & Pillay, 2018), ant colony optimization (ACO) algorithm (Tavares & Pereira, 2012), to evolve EAs that can solve royal roads (RR) instances (Lourenço et al., 2012), to evolve the constructive heuristics and neighborhood move operator components of variable neighborhood search (VNS) algorithm used in vehicle routing problems (VRP) (Drake et al., 2013), to generate perturbative heuristics to solve combinatorial optimization problems (Mweshi & Pillay, 2021), to design local search heuristics for 1-D bin packing problem (Burke et al., 2011) and to evolve EAs for solving 0-1 knapsack problem (Lourenço et al., 2013). This paper intends to develop and demonstrate a meta-evolutionary design paradigm that is capable of evolving effective and robust ensemble DE configurations.
Drag force coefficient of the flexible vegetation root in an artificial floating bed channel
2022, Ecological EngineeringCitation Excerpt :After we measure the value of Fdx, parameter CDx and χ were derived from Eq. (5) using the GA method. GA is an excellent method for solving nonlinear problems (Sobey and Grudniewski, 2018; Nyathi and Pillay, 2018), and the basic steps of genetic algorithm are as follows. In this study, the population number is 10.
Induction of decision trees as classification models through metaheuristics
2022, Swarm and Evolutionary ComputationCitation Excerpt :Each chromosome in the population represents a split criterion that is incorporated into a DTI algorithm. Furthermore, Nyath & Pillay introduce AutoGE [338], a version of AutoGA that uses GE to find the near-optimal GP parameters to induce DTI methods. Except for the method of Jovanovic et al. that groups the categorical values in two or more sets, the HH-based methods use the multi-branching criterion to manage these attributes.