Elsevier

Cognitive Systems Research

Volume 65, January 2021, Pages 23-39
Cognitive Systems Research

Adaptive sampling for active learning with genetic programming

https://doi.org/10.1016/j.cogsys.2020.08.008Get rights and content

Abstract

Active learning is a machine learning paradigm allowing to decide which inputs to use for training. It is introduced to Genetic Programming (GP) essentially thanks to the dynamic data sampling, used to address some known issues such as the computational cost, the over-fitting problem and the imbalanced databases. The traditional dynamic sampling for GP gives to the algorithm a new sample periodically, often each generation, without considering the state of the evolution. In so doing, individuals do not have enough time to extract the hidden knowledge. An alternative approach is to use some information about the learning state to adapt the periodicity of the training data change. In this work, we propose an adaptive sampling strategy for classification tasks based on the state of solved fitness cases throughout learning. It is a flexible approach that could be applied with any dynamic sampling. We implemented some sampling algorithms extended with dynamic and adaptive controlling re-sampling frequency. We experimented them to solve the KDD intrusion detection and the Adult incomes prediction problems with GP. The experimental study demonstrates how the sampling frequency control preserves the power of dynamic sampling with possible improvements in learning time and quality. We also demonstrate that adaptive sampling can be an alternative to multi-level sampling. This work opens many new relevant extension paths.

Introduction

Evolutionary Algorithms (EA) (Pétrowski and Ben Hamida, 2017, Simon, 2013, Yu and Gen, 2010) are meta-heuristics that comply with a wide range of problems such as complex optimization, identification, machine learning, and adaptation problems. Applied to machine learning, Evolutionary Algorithms, especially Genetic Programming (GP) (Koza, 1992), have been seen very effective in a wide range of problems in supervised and unsupervised learning. However, their flexibility and expressiveness comes with two major flaws: an excessive computational cost and a problematic parameters setting.

In supervised learning field, the lack of data may lead to unsatisfactory learners. This is no longer an issue with numerous data sources and high data volume that we witness in the era of Big Data. Nonetheless, this toughens up the computation problem of GP and precludes its application in data-intensive problems. There have been various research efforts on improving GP when applied with large datasets. These research efforts include hardware solutions, such as parallelization, or algorithmic solutions. The most affordable is software based solutions that do not require any specific hardware configuration. Sampling is the mainstream approach in this category. It relies on reducing processing time by reducing data while keeping relevant records.

A complete review of sampling methods used with GP is published in Hmida, Ben Hamida, Borgi, and Rukoz (2016b), extended with a discussion on their ability to deal with large datasets. In fact, sampling methods can be classified with regard to three properties: re-sampling frequency, sampling scheme, or strategy and sampling quantity. Sampling strategy defines how to select records from the input database. Sampling quantity defines how many samples are needed by the algorithm. Sampling frequency defines when the sampling technique is applied throughout the training process. The latest property is the focus of this study.

According to the re-sampling frequency, machine learning algorithms use a unique or a renewable sample. They are called respectively static or dynamic sampling. On the one hand, in static sampling for GP, like the Historical Subset Selection (Gathercole & Ross, 1994) and bagging/boosting (Iba, 1999, Paris et al., 2003), a selection of representative training set needs to be performed. With large datasets, this poses a problem of combining downsizing and data coverage objectives. On the other hand, dynamic sampling creates samples per generation according to its selection strategy. Consequently, GP individuals do not have enough time to learn from sampled data. The population might waste some good resources for solving some difficult cases in the current training set. Otherwise, re-sampling at each GP iteration might be computationally expensive, especially when using some sophisticated sampling strategies.

We propose, in this paper, an extension to dynamic sampling techniques in which sample renewal is controlled through a parameter that adapts the sampling to the learning process. This extension aims to preserve original sampling strategy while making an enhancement in learning robustness and/or learning time.

After studying the effect of the re-sampling frequency on the training quality and learning time, we propose two predicates to implement the adaptive sampling based on the status of resolved fitness cases. These predicates are tested and compared with two deterministic variation rules defined by two functions with an increasing and a decreasing patterns. The objective of this study is to demonstrate that controlling sampling frequency with deterministic or dynamic functions does not degrade the results. On the contrary, in some cases they allow an improvement in quality and learning time.

This paper is organized as follows. Next section gives an overview of the adaptive sampling in active machine learning. In Section 3, we expose the background of this work in GP and decision designs needed to add dynamic sampling to the GP engine. Section 4 reviews some sampling methods for active learning with GP that are involved in the experimental study. In Section 5 we study the effect of varying the sampling frequency on the Genetic Learners. Section 6 introduces the novel sampling approach and explains how it can extend dynamic sampling methods. Then, in Section 7, an experimental study gives the proof of concept of adaptive sampling and traces its effect on learning process through the discussion of registered results in Section 8. The main results in this section are compared to some results of three multi-level dynamic sampling methods published in Hmida, Ben Hamida, Borgi, and Rukoz (2016a) to demonstrate how adaptive sampling could be an alternative to hierarchical sampling. Finally, we give some conclusions and propose further developments.

Section snippets

Related works: adaptive sampling

In this paper, we are mainly interested on sampling methods aiming at reducing the original training data-set size by substituting it with a representative subset much smaller, thus reducing the evaluation cost of learning algorithm. Two major classes of sampling techniques can be laid out: static sampling where the training set is selected independently from the training process and remains unmodified along evolution, and active sampling, also known as active learning that could be defined as (

Genetic programming engine

As any EA, GP evolves a population of individuals throughout a number of generations. A generation is in fact an iteration of the main loop as described in Fig. 1. Each individual represents a complete mathematical expression or a small computer program. The standard GP uses a tree representation of individuals built from a function set for nodes and a terminal set for leaves. When GP is applied to a classification problem, each individual is a candidate classifier. Thus, the objective is to

Active sampling with GP

To select a training subset S from the database B, many approaches were proposed either for static or active sampling. For static sampling, the database is partitioned before the learning process, based essentially on some criteria or some features in the data. This sampling strategy is not discussed in this paper. For active sampling, we identify basically five main approaches used with GP: stochastic sampling, weighted sampling, data-topology based sampling, balanced sampling and incremental

The sampling frequency feature

Sampling frequency (f) is a main parameter for any active sampling technique. It defines how often the training subset is changed across the learning process. When f = 1, the training sample is extracted at each generation and the sampling approach is considered as a generation-wise sampling technique. Most of the sampling techniques applied with GP belong to this category. This is the case of the techniques described in Section 4. When f is set to 1, individuals in the current population have

The proposed sampling approach

Three main approaches are possible to control any EA parameter: deterministic, adaptive and self-adaptive (Eiben, Michalewicz, Schoenauer, & Smith, 2007). The deterministic control uses a deterministic rule to alter the EA parameter along the evolution. However, the adaptive control uses feed-backs from the search state to define a strategy for updating the parameter value. With the self-adaptive control, the parameter is encoded within the chromosome and it evolves with the population.

The

Cartesian Genetic Programming

Cartesian Genetic Programming (CGP) (Miller & Thomson, 2000) is a GP variant where individuals represent graph-like programs. It is called ”Cartesian” because it uses a two-dimensional grid of computational nodes implementing directed acyclic graphs. Each graph node encodes a function from the function set. The arguments of the encoded function are provided by the inputs of the node and its output designs the result.

CGP shows several advantages over other GP approaches. Unlike trees, there are

Results and discussion

The experimental study is organised in two parts. The aim of the first experiments is to study the impact of the variation of the sampling frequency on the learning time and performance indicators. A set of fixed values for f is chosen for this study given in Section 7.5. The second set of experiments study the efficiency of the proposed sampling frequency controlling strategies. Results are discussed and then compared with some hierarchical sampling results published in previous works.

For each

Conclusion

This work is a proposal for a new form of active learning with Genetic Programming based on adaptive sampling. Its main objective is to extend some known dynamic sampling techniques with an adaptive frequency control that takes into account the state of learning process. After a study of the impact of the sampling frequency variation on the performance of the derived models learning meantime, we proposed an increasing and a decreasing deterministic patterns, and two adaptive patterns for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (41)

  • X.B. Li et al.

    Adaptive data reduction for large-scale transaction data

    European Journal of Operational Research

    (2008)
  • L. Luo et al.

    Sampling-based adaptive bounding evolutionary algorithm for continuous optimization problems

    Information Sciences

    (2017)
  • Atlas, L. E., Cohn, D., & Ladner, R. (1990) Training connectionist networks with queries and selective sampling. In...
  • Balkanski, E. & Singer, Y. (2018a). The adaptive complexity of maximizing a submodular function. In: I. Diakonikolas,...
  • Balkanski, E. & Singer, Y. (2018b). Approximation guarantees for adaptive sampling. In: J.G. Dy, A. Krause (eds.)...
  • CGP. (2009). Cartesian gp website....
  • D. Cohn et al.

    Improving generalization with active learning

    Machine Learning

    (1994)
  • Curry, R. & Heywood, M. I. (2004). Towards efficient training on large datasets for genetic programming. In Advances in...
  • D. Deschrijver et al.

    Adaptive sampling algorithm for macromodeling of parameterized s -parameter responses

    IEEE Transactions on Microwave Theory and Techniques

    (2011)
  • A.E. Eiben et al.

    Parameter control in evolutionary algorithms

  • A.A. Freitas

    Data mining and knowledge discovery with evolutionary algorithms

    (2002)
  • Y. Fu et al.

    A survey on instance selection for active learning

    Knowledge and Information Systems

    (2013)
  • Gathercole, C. (1998). An investigation of supervised learning in genetic programming. Thesis, University of...
  • C. Gathercole et al.

    Dynamic training subset selection for supervised learning in genetic programming

  • C. Ghatercole et al.

    Small populations over many generations can beat large populations over few generations in genetic programming

  • Gonçalves, I. & Silva, S. (2013). Balancing learning and overfitting in genetic programming with interleaved sampling...
  • L. Haitao et al.

    A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design

    Structural and Multidisciplinary Optimization

    (2018)
  • S. Harding et al.

    Implementing cartesian genetic programming classifiers on graphics processing units using gpu.net

  • Hmida, H., Ben Hamida, S., Borgi, A., & Rukoz, M. (2016a). Hierarchical data topology based selection for large scale...
  • Hmida, H., Ben Hamida, S., Borgi, A., & Rukoz, M. (2016b). Sampling methods in genetic programming learners from large...
  • Cited by (3)

    • Active Learning with an Adaptive Classifier for Inaccessible Big Data Analysis

      2021, Proceedings of the International Joint Conference on Neural Networks
    View full text