Hybridizing Cartesian Genetic Programming and Harmony Search for adaptive feature construction in supervised learning problems
Introduction
Predictive analytics are broadly conceived as the family of supervised machine learning models aimed at inferring unknown outcomes from a system based on a set of observed variables or features [1]. Albeit supervised learning models date back to several decades ago, predictive analytics have nowadays regained momentum by virtue of the in crescendo availability of data in most fields of knowledge. Hot topics such as Intelligent Systems [2] and Big Data [3], [4] evince the increasing relevance of predictive modeling among different disciplines and the subsequent need for enhancing and innovating through all its compounding processing steps [5]: (1) data preparation and cleansing, with different strategies to impute missing and/or illegal data depending on their alphabet; (2) novelty/outlier detection; (3) feature processing, where the original dataset is processed/transformed/filtered so as to describe the essential features of the data and reduce the complexity of the subsequent predictive model; (4) model selection, where diverse alternatives have so far been reported in the literature characterized by different controlling parameters, training algorithms, discriminative capability and generalization properties; (5) model tuning; and (6) model performance assessment when predicting a set of unseen examples. It is only by thoroughly elaborating on each of the above steps that a good predictive model is generated.
This manuscript gravitates on the third processing step as enumerated above: feature processing. The literature has been specially profitable in this regard, with de facto classifications depending on the selective or constructive nature of the feature processing approach at hand. On one hand, feature selection schemes essentially select a subset of the original features by following strategies (filter, wrapper or embedded methods). Interestingly for the scope of this manuscript meta-heuristically empowered feature selection schemes have lately come into scene in a diversity of scenarios [6], [7], [8], [9], [10], [11], [12], [13] with particular emphasis in Energy applications [14], [15] and Bioinformatics [16], [17]. On the other hand, feature extraction/construction or dimensionality reduction algorithms transform the original dataset to a feature space of fewer dimensions, which can be done by resorting to elements from linear statistics [18] or newer findings in the field of non-linear manifold learning and low-dimensional embedding [19].
This research work focuses on this second category, specifically on the construction of features via wrapper methods. This class of methods are of paramount utility when dealing with legacy datasets, i.e. datasets whose compounding features result from raw information preprocessed through application-specific signal processing stages. In such situations there is no access to the original data from which such features were extracted, hence jeopardizing the adoption of embedded schemes with known potential in highly multidimensional datasets (e.g. deep learning). The scope is also placed on the readability of the constructed features, which not only is useful for assessing mathematical properties therefrom (e.g. trends, correlations), but also becomes a requirement for certain application scenarios where supervision by a higher-level entity and/or the preservation of privacy are crucial, such as the risk assessment in bank insurance, the diagnosis of diseases and the personalized prescription of medical treatments. From a technical perspective this sought explicitness for the constructed feature set can be provided by Evolutionary Programming [20], a branch of Evolutionary Computation that aims at iteratively refining computer programs based on a measure of their quality or fitness. In the context of mathematical programs, this term stands for a combination or function of different variables (features) based on an alphabet of operator functions (e.g. +, −, ×, ÷). Such programs can be represented as tree structures, which can be in turn evolved via evolutionary crossover and mutation processes towards regions of progressively higher optimality as measured by the fitness function at hand. When put in the context of feature construction, each evolved program represents a combination of features (i.e. a newly constructed feature), whereas the fitness function is given by the performance of the wrapped predictive model when trained with the evolved feature set. Indeed this has been the technical approach followed by a number of contributions by the research community where the good performance of Evolutionary Programming has been evinced in diverse practical applications of predictive modeling (see [21], [22], [23], [24], [25], [26], [27], [28], [29], [30] and the comprehensive survey in [31]).
The work presented in this paper takes a step further in the state of the art in the above field by proposing a novel wrapper approach based on the combination of Cartesian Genetic Programming [32] and Harmony Search (hereafter denoted as HS, [33]). On the one hand, Cartesian Genetic Programming permits to encode (represent) programs by means of strings of integers, which numerically encode the operators that relate variables to each other, their connections to the set of input features and the resulting output features fed to the model. On the other hand, Harmony Search is a meta-heuristic solver that has been widely shown to outperform other bio-inspired optimization algorithms in many applications [34]. In this manuscript we propose to blend together these two techniques to yield a feature construction wrapper that in addition, exploits information about the predictive relevance of the produced feature set so as to enhance the convergence properties of the overall search process. The performance of the derived feature construction scheme is evaluated over four supervised learning problems – namely, the well-known WINE dataset, leaf-based plant classification (LEAF, [35]), classification of radar returns from the ionosphere (IONOSPHERE, [36]) and vehicle type recognition (VTR, [37]) – with results that dominate the best scores obtained to date. To the best of the authors’ knowledge, this is the first contribution in the literature hybridizing Cartesian Genetic Programming with Harmony Search for feature construction in supervised learning.
The rest of the paper is structured as follows: Section 2 formally poses the construction of explicit features in supervised learning scenarios as a mathematical optimization problem. Next, Section 3 and subsections therein delves into the proposed algorithmic approach by outlining its overall working procedure and detailing each of its compounding modules. Experimental results over the four considered datasets are presented and discussed in Sections 4 and 5 and, finally, Section 6 ends the paper by drawing conclusions and sketching several lines of future research.
Section snippets
Feature construction as an optimization problem
Mathematically speaking a supervised learning problem departs from a set of available data instances , with N denoting the number of instances or examples, the d-th feature for example n and D ≐ |xn| ∀n ∈ {1, …, N} the number of features or dimensionality. Since we deal with supervised learning, samples in X are associated to a value of the target variable to be predicted, which are all collected in the label vector . The goal of a supervised learning algorithm is to infer
Proposed feature construction approach
In order to tackle the above problem in a computationally efficient fashion we propose a novel feature construction algorithm whose overall working procedure is illustrated in Fig. 1 and algorithmically described in Algorithm 1. The proposed scheme blends together elements from wrapper and embedded methods for feature processing. On one hand, the setup relies on a predictive learning model Mθ capable of internally estimating the relevance of each input variable when predicting the target
Cases of study and learning models
As has been already mentioned in the introduction the performance of the proposed ACHS approach has been experimentally assessed over four different datasets:
- •
WINE dataset: this is a relatively small dataset consisting of N = 178 samples and D = 13 original features [61]. The data correspond to results of a chemical analysis of wines grown in the same region but delivered from three different cultivars. The aim is to classify among 3 different classes of wine. This first well-known dataset serves as
Results obtained with the WINE dataset
We trained two different ACHS-based supervised learning models to obtain D′ = 3 features with enhanced predictive power when classifying the samples within this dataset. For the first one, we used RF to measure feature importance and to classify the data (namely ACHS1 model). The second option was to use 1-NN for classification purposes and ReliefF to compute feature importance (ACHS2). Both ACHS1 and ACHS2 were compared with the results obtained with Principal Component Analysis (PCA) and the
Conclusions
This manuscript has delved into a novel feature construction framework for supervised learning problems. The proposed scheme, coined as ACHS, blends together 1) a heuristic wrapper that relies on Cartesian Genetic Programming and the Harmony Search solver; and 2) the predictive relevance of the constructed features produced by the model wrapped by the former. The solution encoding convention provided by Cartesian Genetic Programming is shown to conveniently match the constant-length encoding
Acknowledgements
This work has been funded in part by the Basque Government under the ELKARTEK program (BID3A project, grant ref. KK-2015/0000080). The authors would also like to thank the anonymous referees for their constructive comments and recommendations.
References (66)
- et al.
A distributed PSO-SVM hybrid system with feature selection and parameter optimization
Appl. Soft Comput.
(2008) - et al.
Parameter determination of support vector machine and feature selection using simulated annealing approach
Appl. Soft Comput.
(2008) - et al.
Evolutionary-based feature selection approaches with new criteria for data mining: a case study of credit approval data
Expert Syst. Appl.
(2009) - et al.
Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms
Appl. Soft Comput.
(2014) - et al.
A coral reefs optimization algorithm with harmony search operators for accurate wind speed prediction
Renew. Energy
(2015) - et al.
Feature generation using genetic programming with comparative partner selection for diabetes classification
Expert Syst. Appl.
(2013) - et al.
A survey on applications of the harmony search algorithm
Eng. Appl. Artif. Intell.
(2013) - et al.
Computer-assisted tree taxonomy by automated image recognition
Eng. Appl. Artif. Intell.
(2009) - et al.
Hybridizing extreme learning machines and genetic algorithms to select acoustic features in vehicle classification applications
Neurocomputing
(2015) Novel derivative of harmony search algorithm for discrete design variables
Appl. Math. Comput.
(2008)