Genetic programming for stacked generalization
Introduction
Ensemble learning (EL) is a sub-field of machine learning (ML) inspired by humans’ natural tendency to seek and weigh others’ opinions before making any important decisions. Under this perspective, EL consists of combining several individual models, called base learners, in a way to produce a model (the ensemble), which is expected to solve a given task better than any of the base learners [26]. Stacked generalization, or simply stacking, consists of training an ensemble from the combined predictions of several ideally heterogeneous base learners. More specifically, it consists of training the base learners to solve the underlying task and subsequently train a meta-learner from their predictions [38].
In this paper, we study Genetic Programming (GP), in the context of regression problems, as a meta-learner. As a case study, we have chosen four different kinds of base learners: Multi-Linear Regression (MLR), Multi-Layer Perceptron Regression (MLPR), Random Forests Regression (RFR), and Support-Vector Regression (SVM). This choice was mainly motivated by the desired properties of the ensembles: the expected diversity of the underlying base learners and, particularly to stacked generalization, the convenience of avoiding to choose an appropriate base learners’ type. In this context, we decided to consider four types of ML algorithms, which differ in not only their complexity, but also the regression estimation principles. In such a way, we allow GP to automatically evolve computer programs, having as a terminal set the combined predictions of four heterogeneous base learners. The objective is to improve the ensemble’s generalization ability on a given Supervised ML (SML) task. For the sake of simplicity, we will refer to this kind of approach as stacking-GP (S-GP). Our motivation relies on the intrinsic properties of GP. We expect that with properly chosen operators and hyper-parameters, GP is capable of combining base learners in a highly non-linear fashion, which could better exploit their outputs and achieve superior generalization ability.
The idea that has inspired S-GP is not new. To our knowledge, the first related work dates back to 2006 [16] when GP was used to combine 10 Artificial Neural Networks into an ensemble. Since then, several other important contributions were proposed [4], [9], [13], [18]. Nevertheless, we consider that the research in S-GP is still much in demand, mainly for the two following reasons. First, the majority of S-GP contributions are assessed on classification problems, whereas few of the previous works provide a concise benchmark over regression problems. The second reason has to do with the recent methodological achievements in the GP field: Geometric-Semantic Operators, Semantic Stopping Criterion, -Lexicase Selection (-LS) and Evolutionary Demes Despeciation Algorithm (EDDA) are among numerous recent methods thatdeserve to be attentively studied in the context of S-GP.
The paper is organized as follows: Section 2 introduces the necessary theoretical background; Section 3 discusses existing works on ensemble learning, focusing on stacking, and using GP for ensemble; Section 4 describes the research hypothesis and the proposed approach; and Section 5 presents the studied test problems, our experimental framework, and the obtained results. Finally, Section 7 concludes the work and proposes ideas for future research.
Section snippets
Theoretical background
This section recalls the recent contributions in the field of GP that we considered in this study: Geometric Semantic Genetic Programming, Evolutionary Demes Despeciation Algorithm, -Lexicase Selection for Regression, and Semantic Stopping Criteria.
Previous and Related GP Uses as a Meta-Learning Technique
The idea of using GP as an automatic EL technique is not new. To our knowledge, the first evidence comes from 2006 [16], when GP was used to combine 10 ANNs into an ensemble. In [9], authors compared an equivalent approach against 3 ensemble approaches based on Genetic Algorithms (GAs). The experimental results involving four synthetic and one real-world symbolic regression tasks confirmed the preeminence of GP-based ensemble against not only against the best base learner, but also the 3
Individuals’ Representation in S-GP
In the context of S-GP, individuals’ representation is similar to traditional GP; the only difference is the set of terminals , which is composed of the predicted outputs () of the base learners trained to solve a given SML problem. In our experiments, the set of base learners is made of Multi-Linear Regression (MLR), Multi-Layer Perceptron Regression (MLPR), Random Forests Regression (RFR) and Support-Vector Regression (SVR); thus, the formal definition of our terminal set is given by
Experimental Environment
The section aims to provide this study’s objectives and to present the experimental parameters and configurations.
Experimental Results
This section presents the experimental results and discusses the main findings.
Conclusion
This paper presents a study of Genetic Programming (GP) in the context of Stacked Generalization. More specifically, we have explored GP’s role as the meta-learning algorithm that blends in an evolutionary fashion the combined outputs of other Supervised Machine Learning (SML) methods, such as Multi-Linear Regression, Multi-Layer Perceptron, Random Forests and Support-Vector Machines. The contribution of this work is three-fold. First, we assess the impact of recent scientific achievements in
CRediT authorship contribution statement
Illya Bakurov: Conceptualization, Methodology, Software, Writing - original draft. Mauro Castelli: Conceptualization, Methodology, Validation, Writing - review & editing, Supervision, Funding acquisition. Olivier Gau: Methodology, Software. Francesco Fontanella: Methodology, Validation, Writing - review & editing, Project administration. Leonardo Vanneschi: Methodology, Validation, Writing - review & editing, Project administration, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia) by the projects PTDC/CCI-INF/29168/2017, DSAIPA/DS/0022/2018 and DSAIPA/DS/0113/2019. Mauro Castelli also acknowledges the financial support from the Slovenian Research Agency (research core funding No. P5-0410).
References (39)
Stacked generalization
Neural Netw.
(1992)- Statlib datasets archive: Boston house prices dataset, 1980, (http://lib.stat.cmu.edu/datasets/boston). Accessed:...
- Genetic programming and evolvable machines (8): Genetic programming for computational pharmacokinetics in drug...
- Least angle regression, lasso and forward stagewise in r: Diabetes dataset, 2013,...
- et al.
Learning to assemble classifiers via genetic programming.
Int. J. Pattern Recognit. Artific. Intell.
(2014) - et al.
Stacked generalization concept for electrical load prediction
2019 4th International Conference on Smart and Sustainable Technologies (SpliTech)
(2019) - et al.
Reusing genetic programming for ensemble selection in classification of unbalanced data
IEEE Trans. Evolut. Comput.
(2014) Stacked regressions
Mach. Learn.
(1996)- et al.
API design for machine learning software: experiences from the scikit-learn project
ECML PKDD Workshop: Languages for Data Mining and Machine Learning
(2013) - et al.
Comprehensive evolutionary approach for neural network ensemble automatic design
(2010)
An efficient implementation of geometric semantic genetic programming for anticoagulation level prediction in pharmacogenetics
Portuguese Conference on Artificial Intelligence
Pruning techniques for mixed ensembles of genetic programming models
Genetic Program.
Population-based Ensemble Learning with Tree Structures for Classification
Learning model-ensemble policies with genetic programming
Tech. Rep.
Unsure when to stop?: Ask your semantic neighbors
Proceedings of the Genetic and Evolutionary Computation Conference
Semantic learning machine: A feedforward neural network construction algorithm inspired by geometric semantic genetic programming
17th Portuguese Conference on Artificial Intelligence, EPIA 2015 - Coimbra, Portugal
Building neural network ensembles using genetic programming
The 2006 IEEE International Joint Conference on Neural Network Proceedings
Building boosted classification tree ensemble with genetic programming
Proceedings of the Genetic and Evolutionary Computation Conference Companion
Building predictive models in two stages with meta-learning templates optimized by genetic programming
2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL)
Cited by (12)
A study of dynamic populations in geometric semantic genetic programming
2023, Information SciencesA comprehensive review of automatic programming methods
2023, Applied Soft ComputingA binary dandelion algorithm using seeding and chaos population strategies for feature selection
2022, Applied Soft ComputingCitation Excerpt :Metaheuristic algorithm (MH) [1] is a kind of optimization technique that incorporates stochastic search and local search for excellence, which introduces higher-level strategies into the search process to guide the optimization process to perform a powerful search in the search space. MHs are proposed from various fields of nature, science, and so on, and can be divided into four main types: the first type is evolutionary algorithms, such as genetic algorithms (GA) [2], differential evolutionary algorithms (DE) [3], genetic programming (GP) [4], evolutionary strategies (ES) [5], and so on, which is inspired by the evolution of biological populations in nature; the second type is swarm intelligence algorithm, such as particle swarm optimization (PSO) [6], ant colony algorithm (ACO) [7], fireworks algorithm (FWA) [8], artificial bee colony algorithm (ABC) [9], artificial fish swarm algorithm (AFSA) [10], fruit fly optimization algorithm (FOA) [11], whale optimization algorithm(WOA) [12] etc. This category of algorithms is formed by the interaction between individuals in a self-organized, naturally or artificially formed population.
Novel automatic model construction method for the rapid characterization of petroleum properties from near-infrared spectroscopy
2022, FuelCitation Excerpt :As a branch of GA, GP can automatically evolve computer program by natural genetic operations [40] and has been applied in many scenarios, such as machine fault detection [43], stream flow forecasting [44] and facial recognition [45]. GP-based methods have been adopted to design model structures, such as learning interpretable policy for reinforcement learning [46], and generating the stacking principle in ensemble learning [47]. This strategy has also been used in feature selection, extraction, and reconstruction in image recognition [45,48,49].
Geometric semantic genetic programming with normalized and standardized random programs
2024, Genetic Programming and Evolvable MachinesAutomatic CDT Scoring Using Machine Learning with Interpretable Feature
2024, ACM International Conference Proceeding Series