Genetic programming for stacked generalization

https://doi.org/10.1016/j.swevo.2021.100913Get rights and content

Abstract

In machine learning, ensemble techniques are widely used to improve the performance of both classification and regression systems. They combine the models generated by different learning algorithms, typically trained on different data subsets or with different parameters, to obtain more accurate models. Ensemble strategies range from simple voting rules to more complex and effective stacked approaches. They are based on adopting a meta-learner, i.e. a further learning algorithm, and are trained on the predictions provided by the single algorithms making up the ensemble. The paper aims at exploiting some of the most recent genetic programming advances in the context of stacked generalization. In particular, we investigate how the evolutionary demes despeciation initialization technique, ϵ-lexicase selection, geometric-semantic operators, and semantic stopping criterion, can be effectively used to improve GP-based systems’ performance for stacked generalization (a.k.a. stacking). The experiments, performed on a broad set of synthetic and real-world regression problems, confirm the effectiveness of the proposed approach.

Introduction

Ensemble learning (EL) is a sub-field of machine learning (ML) inspired by humans’ natural tendency to seek and weigh others’ opinions before making any important decisions. Under this perspective, EL consists of combining several individual models, called base learners, in a way to produce a model (the ensemble), which is expected to solve a given task better than any of the base learners [26]. Stacked generalization, or simply stacking, consists of training an ensemble from the combined predictions of several ideally heterogeneous base learners. More specifically, it consists of training the base learners to solve the underlying task and subsequently train a meta-learner from their predictions [38].

In this paper, we study Genetic Programming (GP), in the context of regression problems, as a meta-learner. As a case study, we have chosen four different kinds of base learners: Multi-Linear Regression (MLR), Multi-Layer Perceptron Regression (MLPR), Random Forests Regression (RFR), and Support-Vector Regression (SVM). This choice was mainly motivated by the desired properties of the ensembles: the expected diversity of the underlying base learners and, particularly to stacked generalization, the convenience of avoiding to choose an appropriate base learners’ type. In this context, we decided to consider four types of ML algorithms, which differ in not only their complexity, but also the regression estimation principles. In such a way, we allow GP to automatically evolve computer programs, having as a terminal set the combined predictions of four heterogeneous base learners. The objective is to improve the ensemble’s generalization ability on a given Supervised ML (SML) task. For the sake of simplicity, we will refer to this kind of approach as stacking-GP (S-GP). Our motivation relies on the intrinsic properties of GP. We expect that with properly chosen operators and hyper-parameters, GP is capable of combining base learners in a highly non-linear fashion, which could better exploit their outputs and achieve superior generalization ability.

The idea that has inspired S-GP is not new. To our knowledge, the first related work dates back to 2006 [16] when GP was used to combine 10 Artificial Neural Networks into an ensemble. Since then, several other important contributions were proposed [4], [9], [13], [18]. Nevertheless, we consider that the research in S-GP is still much in demand, mainly for the two following reasons. First, the majority of S-GP contributions are assessed on classification problems, whereas few of the previous works provide a concise benchmark over regression problems. The second reason has to do with the recent methodological achievements in the GP field: Geometric-Semantic Operators, Semantic Stopping Criterion, ϵ-Lexicase Selection (ϵ-LS) and Evolutionary Demes Despeciation Algorithm (EDDA) are among numerous recent methods thatdeserve to be attentively studied in the context of S-GP.

The paper is organized as follows: Section 2 introduces the necessary theoretical background; Section 3 discusses existing works on ensemble learning, focusing on stacking, and using GP for ensemble; Section 4 describes the research hypothesis and the proposed approach; and Section 5 presents the studied test problems, our experimental framework, and the obtained results. Finally, Section 7 concludes the work and proposes ideas for future research.

Section snippets

Theoretical background

This section recalls the recent contributions in the field of GP that we considered in this study: Geometric Semantic Genetic Programming, Evolutionary Demes Despeciation Algorithm, ϵ-Lexicase Selection for Regression, and Semantic Stopping Criteria.

Previous and Related GP Uses as a Meta-Learning Technique

The idea of using GP as an automatic EL technique is not new. To our knowledge, the first evidence comes from 2006 [16], when GP was used to combine 10 ANNs into an ensemble. In [9], authors compared an equivalent approach against 3 ensemble approaches based on Genetic Algorithms (GAs). The experimental results involving four synthetic and one real-world symbolic regression tasks confirmed the preeminence of GP-based ensemble against not only against the best base learner, but also the 3

Individuals’ Representation in S-GP

In the context of S-GP, individuals’ representation is similar to traditional GP; the only difference is the set of terminals T, which is composed of the predicted outputs (y^) of the base learners trained to solve a given SML problem. In our experiments, the set of base learners is made of Multi-Linear Regression (MLR), Multi-Layer Perceptron Regression (MLPR), Random Forests Regression (RFR) and Support-Vector Regression (SVR); thus, the formal definition of our terminal set is given by T={y^M

Experimental Environment

The section aims to provide this study’s objectives and to present the experimental parameters and configurations.

Experimental Results

This section presents the experimental results and discusses the main findings.

Conclusion

This paper presents a study of Genetic Programming (GP) in the context of Stacked Generalization. More specifically, we have explored GP’s role as the meta-learning algorithm that blends in an evolutionary fashion the combined outputs of other Supervised Machine Learning (SML) methods, such as Multi-Linear Regression, Multi-Layer Perceptron, Random Forests and Support-Vector Machines. The contribution of this work is three-fold. First, we assess the impact of recent scientific achievements in

CRediT authorship contribution statement

Illya Bakurov: Conceptualization, Methodology, Software, Writing - original draft. Mauro Castelli: Conceptualization, Methodology, Validation, Writing - review & editing, Supervision, Funding acquisition. Olivier Gau: Methodology, Software. Francesco Fontanella: Methodology, Validation, Writing - review & editing, Project administration. Leonardo Vanneschi: Methodology, Validation, Writing - review & editing, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia) by the projects PTDC/CCI-INF/29168/2017, DSAIPA/DS/0022/2018 and DSAIPA/DS/0113/2019. Mauro Castelli also acknowledges the financial support from the Slovenian Research Agency (research core funding No. P5-0410).

References (39)

  • D.H. Wolpert

    Stacked generalization

    Neural Netw.

    (1992)
  • Statlib datasets archive: Boston house prices dataset, 1980, (http://lib.stat.cmu.edu/datasets/boston). Accessed:...
  • Genetic programming and evolvable machines (8): Genetic programming for computational pharmacokinetics in drug...
  • Least angle regression, lasso and forward stagewise in r: Diabetes dataset, 2013,...
  • N. Acosta-Mendoza et al.

    Learning to assemble classifiers via genetic programming.

    Int. J. Pattern Recognit. Artific. Intell.

    (2014)
  • R. Alhalaseh et al.

    Stacked generalization concept for electrical load prediction

    2019 4th International Conference on Smart and Sustainable Technologies (SpliTech)

    (2019)
  • U. Bhowan et al.

    Reusing genetic programming for ensemble selection in classification of unbalanced data

    IEEE Trans. Evolut. Comput.

    (2014)
  • L. Breiman

    Stacked regressions

    Mach. Learn.

    (1996)
  • L. Buitinck et al.

    API design for machine learning software: experiences from the scikit-learn project

    ECML PKDD Workshop: Languages for Data Mining and Machine Learning

    (2013)
  • V. Bukhtoyarov et al.

    Comprehensive evolutionary approach for neural network ensemble automatic design

    (2010)
  • M. Castelli et al.

    An efficient implementation of geometric semantic genetic programming for anticoagulation level prediction in pharmacogenetics

    Portuguese Conference on Artificial Intelligence

    (2013)
  • M. Castelli et al.

    Pruning techniques for mixed ensembles of genetic programming models

    Genetic Program.

    (2018)
  • B.P. Evans

    Population-based Ensemble Learning with Tree Structures for Classification

    (2019)
  • O. Flasch et al.

    Learning model-ensemble policies with genetic programming

    Tech. Rep.

    (2015)
  • I. Gonçalves et al.

    Unsure when to stop?: Ask your semantic neighbors

    Proceedings of the Genetic and Evolutionary Computation Conference

    (2017)
  • I. Gonçalves et al.

    Semantic learning machine: A feedforward neural network construction algorithm inspired by geometric semantic genetic programming

    17th Portuguese Conference on Artificial Intelligence, EPIA 2015 - Coimbra, Portugal

    (2015)
  • U. Johansson et al.

    Building neural network ensembles using genetic programming

    The 2006 IEEE International Joint Conference on Neural Network Proceedings

    (2006)
  • S. Karakatič et al.

    Building boosted classification tree ensemble with genetic programming

    Proceedings of the Genetic and Evolutionary Computation Conference Companion

    (2018)
  • P. Kordík et al.

    Building predictive models in two stages with meta-learning templates optimized by genetic programming

    2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL)

    (2014)
  • Cited by (12)

    • A binary dandelion algorithm using seeding and chaos population strategies for feature selection

      2022, Applied Soft Computing
      Citation Excerpt :

      Metaheuristic algorithm (MH) [1] is a kind of optimization technique that incorporates stochastic search and local search for excellence, which introduces higher-level strategies into the search process to guide the optimization process to perform a powerful search in the search space. MHs are proposed from various fields of nature, science, and so on, and can be divided into four main types: the first type is evolutionary algorithms, such as genetic algorithms (GA) [2], differential evolutionary algorithms (DE) [3], genetic programming (GP) [4], evolutionary strategies (ES) [5], and so on, which is inspired by the evolution of biological populations in nature; the second type is swarm intelligence algorithm, such as particle swarm optimization (PSO) [6], ant colony algorithm (ACO) [7], fireworks algorithm (FWA) [8], artificial bee colony algorithm (ABC) [9], artificial fish swarm algorithm (AFSA) [10], fruit fly optimization algorithm (FOA) [11], whale optimization algorithm(WOA) [12] etc. This category of algorithms is formed by the interaction between individuals in a self-organized, naturally or artificially formed population.

    • Novel automatic model construction method for the rapid characterization of petroleum properties from near-infrared spectroscopy

      2022, Fuel
      Citation Excerpt :

      As a branch of GA, GP can automatically evolve computer program by natural genetic operations [40] and has been applied in many scenarios, such as machine fault detection [43], stream flow forecasting [44] and facial recognition [45]. GP-based methods have been adopted to design model structures, such as learning interpretable policy for reinforcement learning [46], and generating the stacking principle in ensemble learning [47]. This strategy has also been used in feature selection, extraction, and reconstruction in image recognition [45,48,49].

    • Automatic CDT Scoring Using Machine Learning with Interpretable Feature

      2024, ACM International Conference Proceeding Series
    View all citing articles on Scopus
    View full text