Elsevier

Applied Soft Computing

Volume 52, March 2017, Pages 760-770
Applied Soft Computing

Hybridizing Cartesian Genetic Programming and Harmony Search for adaptive feature construction in supervised learning problems

https://doi.org/10.1016/j.asoc.2016.09.049Get rights and content

Highlights

  • We present a new iterative feature construction approach for supervised learning model based on the meta-heuristic Harmony Search (HS) algorithm and Cartesian Genetic Programming.

  • We propose a novel method to incorporate soft information about the relevance of the constructed features in the HS algorithm so as to enhance its convergence.

  • The performance of the proposed scheme is assessed over datasets from the literature, with promising results that support its suitability to deal with legacy datasets.

Abstract

The advent of the so-called Big Data paradigm has motivated a flurry of research aimed at enhancing machine learning models by following very diverse approaches. In this context this work focuses on the automatic construction of features in supervised learning problems, which differs from the conventional selection of features in that new characteristics with enhanced predictive power are inferred from the original dataset. In particular this manuscript proposes a new iterative feature construction approach based on a self-learning meta-heuristic algorithm (Harmony Search) and a solution encoding strategy (correspondingly, Cartesian Genetic Programming) suited to represent combinations of features by means of constant-length solution vectors. The proposed feature construction algorithm, coined as Adaptive Cartesian Harmony Search (ACHS), incorporates modifications that allow exploiting the estimated predictive importance of intermediate solutions and, ultimately, attaining better convergence rate in its iterative learning procedure. The performance of the proposed ACHS scheme is assessed and compared to that rendered by the state of the art in a toy example and three practical use cases from the literature. The excellent performance figures obtained in these problems shed light on the widespread applicability of the proposed scheme to supervised learning with legacy datasets composed by already refined characteristics.

Introduction

Predictive analytics are broadly conceived as the family of supervised machine learning models aimed at inferring unknown outcomes from a system based on a set of observed variables or features [1]. Albeit supervised learning models date back to several decades ago, predictive analytics have nowadays regained momentum by virtue of the in crescendo availability of data in most fields of knowledge. Hot topics such as Intelligent Systems [2] and Big Data [3], [4] evince the increasing relevance of predictive modeling among different disciplines and the subsequent need for enhancing and innovating through all its compounding processing steps [5]: (1) data preparation and cleansing, with different strategies to impute missing and/or illegal data depending on their alphabet; (2) novelty/outlier detection; (3) feature processing, where the original dataset is processed/transformed/filtered so as to describe the essential features of the data and reduce the complexity of the subsequent predictive model; (4) model selection, where diverse alternatives have so far been reported in the literature characterized by different controlling parameters, training algorithms, discriminative capability and generalization properties; (5) model tuning; and (6) model performance assessment when predicting a set of unseen examples. It is only by thoroughly elaborating on each of the above steps that a good predictive model is generated.

This manuscript gravitates on the third processing step as enumerated above: feature processing. The literature has been specially profitable in this regard, with de facto classifications depending on the selective or constructive nature of the feature processing approach at hand. On one hand, feature selection schemes essentially select a subset of the original features by following strategies (filter, wrapper or embedded methods). Interestingly for the scope of this manuscript meta-heuristically empowered feature selection schemes have lately come into scene in a diversity of scenarios [6], [7], [8], [9], [10], [11], [12], [13] with particular emphasis in Energy applications [14], [15] and Bioinformatics [16], [17]. On the other hand, feature extraction/construction or dimensionality reduction algorithms transform the original dataset to a feature space of fewer dimensions, which can be done by resorting to elements from linear statistics [18] or newer findings in the field of non-linear manifold learning and low-dimensional embedding [19].

This research work focuses on this second category, specifically on the construction of features via wrapper methods. This class of methods are of paramount utility when dealing with legacy datasets, i.e. datasets whose compounding features result from raw information preprocessed through application-specific signal processing stages. In such situations there is no access to the original data from which such features were extracted, hence jeopardizing the adoption of embedded schemes with known potential in highly multidimensional datasets (e.g. deep learning). The scope is also placed on the readability of the constructed features, which not only is useful for assessing mathematical properties therefrom (e.g. trends, correlations), but also becomes a requirement for certain application scenarios where supervision by a higher-level entity and/or the preservation of privacy are crucial, such as the risk assessment in bank insurance, the diagnosis of diseases and the personalized prescription of medical treatments. From a technical perspective this sought explicitness for the constructed feature set can be provided by Evolutionary Programming [20], a branch of Evolutionary Computation that aims at iteratively refining computer programs based on a measure of their quality or fitness. In the context of mathematical programs, this term stands for a combination or function of different variables (features) based on an alphabet of operator functions (e.g. +, −, ×, ÷). Such programs can be represented as tree structures, which can be in turn evolved via evolutionary crossover and mutation processes towards regions of progressively higher optimality as measured by the fitness function at hand. When put in the context of feature construction, each evolved program represents a combination of features (i.e. a newly constructed feature), whereas the fitness function is given by the performance of the wrapped predictive model when trained with the evolved feature set. Indeed this has been the technical approach followed by a number of contributions by the research community where the good performance of Evolutionary Programming has been evinced in diverse practical applications of predictive modeling (see [21], [22], [23], [24], [25], [26], [27], [28], [29], [30] and the comprehensive survey in [31]).

The work presented in this paper takes a step further in the state of the art in the above field by proposing a novel wrapper approach based on the combination of Cartesian Genetic Programming [32] and Harmony Search (hereafter denoted as HS, [33]). On the one hand, Cartesian Genetic Programming permits to encode (represent) programs by means of strings of integers, which numerically encode the operators that relate variables to each other, their connections to the set of input features and the resulting output features fed to the model. On the other hand, Harmony Search is a meta-heuristic solver that has been widely shown to outperform other bio-inspired optimization algorithms in many applications [34]. In this manuscript we propose to blend together these two techniques to yield a feature construction wrapper that in addition, exploits information about the predictive relevance of the produced feature set so as to enhance the convergence properties of the overall search process. The performance of the derived feature construction scheme is evaluated over four supervised learning problems – namely, the well-known WINE dataset, leaf-based plant classification (LEAF, [35]), classification of radar returns from the ionosphere (IONOSPHERE, [36]) and vehicle type recognition (VTR, [37]) – with results that dominate the best scores obtained to date. To the best of the authors’ knowledge, this is the first contribution in the literature hybridizing Cartesian Genetic Programming with Harmony Search for feature construction in supervised learning.

The rest of the paper is structured as follows: Section 2 formally poses the construction of explicit features in supervised learning scenarios as a mathematical optimization problem. Next, Section 3 and subsections therein delves into the proposed algorithmic approach by outlining its overall working procedure and detailing each of its compounding modules. Experimental results over the four considered datasets are presented and discussed in Sections 4 and 5 and, finally, Section 6 ends the paper by drawing conclusions and sketching several lines of future research.

Section snippets

Feature construction as an optimization problem

Mathematically speaking a supervised learning problem departs from a set of available data instances X={xn}n=1N, with N denoting the number of instances or examples, xnd the d-th feature for example n and D  |xn| ∀n  {1, …, N} the number of features or dimensionality. Since we deal with supervised learning, samples in X are associated to a value of the target variable to be predicted, which are all collected in the label vector y{yn}n=1N. The goal of a supervised learning algorithm is to infer

Proposed feature construction approach

In order to tackle the above problem in a computationally efficient fashion we propose a novel feature construction algorithm whose overall working procedure is illustrated in Fig. 1 and algorithmically described in Algorithm 1. The proposed scheme blends together elements from wrapper and embedded methods for feature processing. On one hand, the setup relies on a predictive learning model Mθ capable of internally estimating the relevance of each input variable when predicting the target

Cases of study and learning models

As has been already mentioned in the introduction the performance of the proposed ACHS approach has been experimentally assessed over four different datasets:

  • WINE dataset: this is a relatively small dataset consisting of N = 178 samples and D = 13 original features [61]. The data correspond to results of a chemical analysis of wines grown in the same region but delivered from three different cultivars. The aim is to classify among 3 different classes of wine. This first well-known dataset serves as

Results obtained with the WINE dataset

We trained two different ACHS-based supervised learning models to obtain D = 3 features with enhanced predictive power when classifying the samples within this dataset. For the first one, we used RF to measure feature importance and to classify the data (namely ACHS1 model). The second option was to use 1-NN for classification purposes and ReliefF to compute feature importance (ACHS2). Both ACHS1 and ACHS2 were compared with the results obtained with Principal Component Analysis (PCA) and the

Conclusions

This manuscript has delved into a novel feature construction framework for supervised learning problems. The proposed scheme, coined as ACHS, blends together 1) a heuristic wrapper that relies on Cartesian Genetic Programming and the Harmony Search solver; and 2) the predictive relevance of the constructed features produced by the model wrapped by the former. The solution encoding convention provided by Cartesian Genetic Programming is shown to conveniently match the constant-length encoding

Acknowledgements

This work has been funded in part by the Basque Government under the ELKARTEK program (BID3A project, grant ref. KK-2015/0000080). The authors would also like to thank the anonymous referees for their constructive comments and recommendations.

References (66)

  • Z.W. Geem et al.

    Parameter-setting-free harmony search algorithm

    Appl. Math. Comput.

    (2010)
  • D. Weyland

    A critical analysis of the harmony search algorithm – how not to solve Sudoku

    Oper. Res. Perspect.

    (2015)
  • M. Saka et al.

    Metaheuristics in structural optimization and discussions on harmony search algorithm

    Swarm Evol. Comput.

    (2016)
  • M. El-Abd

    An improved global-best harmony search algorithm

    Appl. Math. Comput.

    (2013)
  • M. Mahdavi et al.

    An improved harmony search algorithm for solving optimization problems

    Appl. Math. Comput.

    (2007)
  • K. Kira et al.

    A practical approach to feature selection

  • E. Siegel

    Predictive Analytics: The Power to Predict who will Click, Buy, Lie, or Die

    (2013)
  • M. Negnevitsky

    Artificial Intelligence: A Guide to Intelligent Systems

    (2005)
  • S. Lohr

    The Age of Big Data

    (2012)
  • F. Provost et al.

    Data science and its relationship to big data and data-driven decision making

    Big Data

    (2013)
  • J. Han et al.

    Data Mining: Concepts and Techniques

    (2011)
  • B. Xue et al.

    A survey on evolutionary computation approaches to feature selection

    IEEE Trans. Evol. Comput.

    (2016)
  • J. Yang et al.

    Feature subset selection using a genetic algorithm

  • K. Drozdz et al.

    Feature set reduction by evolutionary selection and construction

  • B. Xue et al.

    A comprehensive comparison on evolutionary feature selection approaches to classification

    Int. J. Comput. Intell. Appl.

    (2015)
  • S. Salcedo-Sanz et al.

    A novel harmony search algorithm for one-year-ahead energy demand estimation using macroeconomic variables

  • T. Jirapech-Umpai et al.

    Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes

    BMC Bioinf.

    (2005)
  • M. Banerjee et al.

    Evolutionary rough feature selection in gene expression data

    IEEE Trans. Syst. Man Cybern. C: Appl. Rev.

    (2007)
  • J.P. Cunningham et al.

    Linear dimensionality reduction: survey, insights, and generalizations

    J. Mach. Learn. Res.

    (2015)
  • J.A. Lee et al.

    Nonlinear Dimensionality Reduction

    (2007)
  • X. Yao et al.

    Evolutionary programming made faster

    IEEE Trans. Evol. Comput.

    (1999)
  • K. Krawiec

    Genetic programming-based construction of features for machine learning and knowledge discovery tasks

    Genetic Prog. Evol. Mach.

    (2002)
  • H. Guo et al.

    Feature generation using genetic programming with application to fault classification

    IEEE Trans. Syst. Man Cybern. B: Cybern.

    (2005)
  • Cited by (0)

    View full text