Elsevier

Expert Systems with Applications

Volume 104, 15 August 2018, Pages 213-234
Expert Systems with Applications

Comparison of a genetic algorithm to grammatical evolution for automated design of genetic programming classification algorithms

https://doi.org/10.1016/j.eswa.2018.03.030Get rights and content

Highlights

  • Automated design of Genetic Programming classification algorithms is presented.

  • Automated design uses a genetic algorithm and grammatical evolution.

  • The approach is trained and tested using real-world binary and multi-class data.

  • Grammatical evolution designed classifiers perform better for binary classification.

  • Genetic algorithm designed classifiers perform better for multi-classification.

Abstract

Genetic Programming (GP) is gaining increased attention as an effective method for inducing classifiers for data classification. However, the manual design of a genetic programming classification algorithm is a non-trivial time consuming process. This research investigates the hypothesis that automating the design of a GP classification algorithm for data classification can still lead to the induction of effective classifiers and also reduce the design time. Two evolutionary algorithms, namely, a genetic algorithm (GA) and grammatical evolution (GE) are used to automate the design of GP classification algorithms. The classification performance of the automated designed GP classifiers i.e. GA designed GP classifiers and GE designed GP classifiers are compared to each other and to manually designed GP classifiers on real-world problems. Furthermore, a comparison of the design times of automated design and manual design is also carried out for the same set of problems. The automated designed classifiers were found to outperform manually designed classifiers across problem domains. Automated design time is also found to be less than manual design time. This study revealed that for the considered datasets GE performs better for binary classification while the GA does better for multiclass classification. Overall the results of the study are in support of the hypothesis.

Introduction

Data classification, is one of the most widely studied domains of research in machine learning. Many real-world tasks can be viewed as classification problems. Classification is the process of associating an object to a class (label) based on the features describing that object. Classification is generally performed by classifier models. Classification usually involves two phases a learning (training) phase and a testing phase. A classifier model is induced by a classification algorithm during training and its classification accuracy is evaluated during testing.

Evolutionary algorithms (EAs) are one of the methods that have gained prominence in the induction of classifiers, particularly genetic programming (Espejo, Ventura, Herrera, 2010, Freitas, 2003). Genetic programming (GP) is a population based algorithm that models Darwin’s theory of evolution (Koza, 1992). For a number of reasons GP has proved to be effective in the induction of classifiers. The tree representation used by GP allows it flexibility to evolve classifiers that model numerous problems Espejo et al. (2010). For example GP can be configured to represent decision trees, association rules or discriminant functions.

GP like most EAs is a parameterized algorithm and it has been shown that the effectiveness of such algorithms depends on their configuration (Eiben & Smit, 2011). Algorithm configuration is a design process that involves determining numerical parameter values, selecting categorical parameters and setting the control flow that would lead to the algorithm finding an optimal (or near optimal) solution to the problem at hand. According to Kramer and Kacprzyk (2008) to manually configure an EA that yields effective results, considerable algorithm design experience is necessary. However, Hutter (2009) argues that even with the necessary experience manual design is a tedious non-trivial task susceptible to human bias. Furthermore, the search space for possible parameter values is large and only a subset of the design decisions are considered during manual design. Parameter values and flow control options are considered using trial runs in an iterative trial and error approach. Montero and Riff (2014) argue that inexperienced designers add more parameters than are necessary during manual design leading to unnecessarily complex algorithms. They also point out that a lot of man hours are required for manual design and ideally the algorithm designer should have expert knowledge of the domain being considered, however this is not always possible. Parameter control and tuning methods have been proposed in literature (Dobslaw, 2010, Eiben, Hinterding, Michalewicz, 1999, Eiben, Smit, 2011) with no method being universally adopted (Karafotias, Hoogendoorn, & Eiben, 2015).

In a previous study, Nyathi and Pillay (2017) showed the effectiveness of using a genetic algorithm (GA) (Goldberg & Holland, 1988) to automate the design of a GP classification algorithm for data classification. In this study a comparison of the automated design of GP classification algorithms using a GA and grammatical evolution (GE) (Ryan, Collins, & Neill, 1998) is carried out. A GA and GE are used to automate the design of GP classification algorithms. GA and GE are individually used to search for a GP configuration that should result in GP evolving the best classifier for a specific classification problem. Throughout this paper the use of the word configuration refers to algorithm parameter values (numerical and/or categorical) and the algorithm control flow. The best classification accuracies of the classifiers evolved using the GA and GE algorithms are compared to each other and to those of manually designed classification algorithms for the considered problem instances. The study finds that automated designed classifiers significantly outperform manually designed classifiers for problems instances considered across domains and perform equivalently for a specific domain. In addition automated design time is shown to be less than manual design time. Hence, the contributions of this study are:

  • The study investigates the feasibility of using a GA and GE for automating the design of GP classification algorithms for data classification and shows the effectiveness of automated design over manual design.

  • The study compares the performance of using a GA to GE for automated design. It is shown that there is no significant difference in performance between the two EAs and either can be used for the design of GP classification algorithms. Although for the considered datasets GE is found to perform better on binary problems while the GA is found to be better on multiclass problems, the differences are not significant.

  • The study also shows that the use of automated design leads to a reduction in man-hours for the design process.

This paper is structured as follows. Section 2 presents the background of the study outlining GP and the application of GP as a classification algorithm for data classification. The use of a GA and GE in automated design are also discussed in this section. Section 3 presents a brief overview of how GP, the GA and GE are related in the proposed approach. Section 4 outlines the manual design approach of GP classification algorithms, while Section 5 describes the automated design approach using the GA and GE implemented in this study. Experimental settings and a description of the experiments carried out are outlined in Section 6. Section 7 presents the results and analysis of the results. Finally Section 8 provides the conclusion of the study and discusses possible future work.

Section snippets

Classification

Classification is a supervised machine learning method (Pappa & Freitas, 2009). In supervised learning the features with their corresponding class labels are provided and the training process entails learning the classes based on the features. During the training phase the classification algorithm has access to the class labels. After training the evolved classifiers should be able to generalize and assign unseen objects to their respective classes correctly, this is a measure of the predictive

Overview of the proposed approach

In the proposed approach an evolutionary algorithm, in this case a GA(or GE), is used to make design decisions and evaluate different GP designs. The GA(GE) simulates an algorithm designer as it searches through the GP design space for the best GP configuration. Each member of a GA(GE) population encodes a GP configuration and a set of classification problems is used to evaluate each GP configuration. The evaluation is carried out by using the GA(GE) evolved GP configuration to configure GP

Manual GP

This section outlines the manual design of the generational Genetic Programming classification algorithm used in this study. Each individual is a classifier induced by the GP algorithm. Each individual can be one of three types of classifiers either an arithmetic tree classifier, logical tree classifier or decision tree classifier. The type of classifier is determined by the contents of the function and terminal sets.

Using the algorithm flow presented in Algorithm 1 the following sections

Proposed automated design

In this section the proposed automated design approach is presented. The design decisions, together with their possible values considered in this study are outlined. A detailed description of the proposed automated design approaches,using a GA and GE are also presented in this section.

Experimental settings

This section describes the experimental setup used to evaluate the automated design approach. Firstly the data used in the experiments is presented followed by a description of the experiments carried out. The performance of classifiers evolved by manual design and automated design are compared using binary class and multiclass classification problems.

Results and analysis

This section presents the results obtained from conducting the outlined experiments. The results obtained from applying the autoGA and autoGE algorithms are compared to each other and to those obtained from the manual design approach.

Conclusion

This study investigated the feasibility of automating the design of GP classification algorithms for data classification using a genetic algorithm and grammatical evolution. A GA and GE were used to evolve configurations for GP. The effectiveness of automated design is tested on a varied set of real-world problems selected from the UCI dataset repository and on the NSL-KDD dataset. The automated designed configurations were used to evolve GP algorithms that produce classifiers that perform

Acknowledgments

The authors would like to thank the reviewers for their helpful comments to improve the paper.

References (90)

  • P.J. Angeline

    An investigation into the sensitivity of genetic programming to the frequency of leaf selection during subtree crossover

    Proceedings of the 1st annual conference on genetic programming

    (1996)
  • C. Ansótegui et al.

    A gender-based genetic algorithm for the automatic configuration of algorithms

    Proceedings of the international conference on principles and practice of constraint programming

    (2009)
  • F. Archetti et al.

    Genetic programming for human oral bioavailability of drugs

    Proceedings of the 8th annual conference on genetic and evolutionary computation

    (2006)
  • Z. Bandar et al.

    Genetic algorithm based multiple decision tree induction

    Proceedings of the 6th international conference on neural information processing

    (1999)
  • W. Banzhaf et al.

    Genetic programming: An introduction

    (1998)
  • R.C. Barros et al.

    Automatic design of decision-tree algorithms with evolutionary algorithms

    Evolutionary computation

    (2013)
  • M.P. Basgalupp et al.

    Legal-tree: a lexicographic multi-objective genetic algorithm for decision tree induction

    Proceedings of the 2009 ACM symposium on applied computing

    (2009)
  • U. Bhowan et al.

    Genetic programming for classification with unbalanced data

    Proceedings of the European conference on genetic programming

    (2010)
  • C. Blake et al.

    UCI repository of machine learning databases

    (1998)
  • T. Blickle et al.

    A comparison of selection schemes used in evolutionary algorithms

    Evolutionary Computation

    (1996)
  • C.C. Bojarczuk et al.

    Genetic programming for knowledge discovery in chest-pain diagnosis

    IEEE Engineering in Medicine and Biology Magazine

    (2000)
  • E.K. Burke et al.

    A classification of hyper-heuristic approaches

    Handbook of metaheuristics

    (2010)
  • A. Cano et al.

    Multi-objective genetic programming for feature extraction and data visualization

    Soft Computing

    (2017)
  • T. Chareka et al.

    A study of fitness functions for data classification using grammatical evolution

    Proceedings of the pattern recognition association of South Africa and robotics and mechatronics international conference

    (2016)
  • J. Demšar

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine learning research

    (2006)
  • L.S. Diosan et al.

    Evolving evolutionary algorithms using evolutionary algorithms

    Proceedings of the 9th annual conference companion on genetic and evolutionary computation

    (2007)
  • F. Dobslaw

    A parameter tuning framework for metaheuristics based on design of experiments and artificial neural networks

    Proceedings of the international conference on computer mathematics and natural computing

    (2010)
  • J.H. Drake et al.

    Generation of vns components with grammatical evolution for vehicle routing

    Proceedings of the European conference on genetic programming

    (2013)
  • J. Eggermont et al.

    Adapting the fitness function in gp for data mining

    Proceedings of the European conference on genetic programming

    (1999)
  • Á.E. Eiben et al.

    Parameter control in evolutionary algorithms

    IEEE Transactions on Evolutionary Computation

    (1999)
  • A.E. Eiben et al.

    Introduction to evolutionary computing

    (2003)
  • P.G. Espejo et al.

    Induction of classification rules with grammar-based genetic programming

    Proceedings of the conference on machine intelligence

    (2005)
  • P.G. Espejo et al.

    A survey on the application of genetic programming to classification

    IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews

    (2010)
  • J.K. Estrada-Gil et al.

    Gpdti: A genetic programming decision tree induction method to find epistatic effects in common complex diseases

    Bioinformatics

    (2007)
  • J.M. Font et al.

    Grammar-guided evolutionary construction of bayesian networks

    Proceedings of the international work-conference on the interplay between natural and artificial computation

    (2011)
  • A.A. Freitas

    A survey of evolutionary algorithms for data mining and knowledge discovery

    Advances in evolutionary computing

    (2003)
  • B. Garcia et al.

    Genetic programming for predicting protein networks

    Proceedings of the Ibero-American conference on artificial intelligence

    (2008)
  • A.L. Garcia-Almanza et al.

    Simplifying decision trees learned by genetic programming

    Evolutionary computation, 2006. cec 2006. ieee congress on

    (2006)
  • D.E. Goldberg

    Genetic algorithms

    (2006)
  • D.E. Goldberg et al.

    Genetic algorithms and machine learning

    Machine learning

    (1988)
  • J. Han et al.

    Data mining: Concepts and techniques

    (2011)
  • D.J. Hand

    Construction and assessment of classification rules

    (1997)
  • J.H. Holland

    Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence

    (1992)
  • L. Hong et al.

    Automated design of probability distributions as mutation operators for evolutionary programming using genetic programming

    Proceedings of the european conference on genetic programming

    (2013)
  • F. Hutter

    Automated configuration of algorithms for solving hard computational problems

    (2009)
  • Cited by (48)

    • Differential evolution ensemble designer

      2024, Expert Systems with Applications
    • A meta-evolutionary selection of constituents in ensemble differential evolution algorithm

      2022, Expert Systems with Applications
      Citation Excerpt :

      Grammatical Evolution (GE) (O’Neill & Ryan, 2001; Ryan et al., 1998, 2018), that employs grammar guided genotype–phenotype mapping, is a potential EA variant for automated algorithm design by virtue of grammar based representation of solutions. GE has been used to design configuration for a genetic programming (GP) classification algorithm (Nyathi & Pillay, 2018), ant colony optimization (ACO) algorithm (Tavares & Pereira, 2012), to evolve EAs that can solve royal roads (RR) instances (Lourenço et al., 2012), to evolve the constructive heuristics and neighborhood move operator components of variable neighborhood search (VNS) algorithm used in vehicle routing problems (VRP) (Drake et al., 2013), to generate perturbative heuristics to solve combinatorial optimization problems (Mweshi & Pillay, 2021), to design local search heuristics for 1-D bin packing problem (Burke et al., 2011) and to evolve EAs for solving 0-1 knapsack problem (Lourenço et al., 2013). This paper intends to develop and demonstrate a meta-evolutionary design paradigm that is capable of evolving effective and robust ensemble DE configurations.

    • Drag force coefficient of the flexible vegetation root in an artificial floating bed channel

      2022, Ecological Engineering
      Citation Excerpt :

      After we measure the value of Fdx, parameter CDx and χ were derived from Eq. (5) using the GA method. GA is an excellent method for solving nonlinear problems (Sobey and Grudniewski, 2018; Nyathi and Pillay, 2018), and the basic steps of genetic algorithm are as follows. In this study, the population number is 10.

    • Induction of decision trees as classification models through metaheuristics

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      Each chromosome in the population represents a split criterion that is incorporated into a DTI algorithm. Furthermore, Nyath & Pillay introduce AutoGE [338], a version of AutoGA that uses GE to find the near-optimal GP parameters to induce DTI methods. Except for the method of Jovanovic et al. that groups the categorical values in two or more sets, the HH-based methods use the multi-branching criterion to manage these attributes.

    View all citing articles on Scopus
    View full text