A filter-based feature construction and feature selection approach for classification using Genetic Programming

https://doi.org/10.1016/j.knosys.2020.105806Get rights and content

Abstract

Feature construction and feature selection are two common pre-processing methods for classification. Genetic Programming (GP) can be used to solve feature construction and feature selection tasks due to its flexible representation. In this paper, a filter-based multiple feature construction approach using GP named FCM that stores top individuals is proposed, and a filter-based feature selection approach using GP named FS that uses correlation-based evaluation method is employed. A hybrid feature construction and feature selection approach named FCMFS that first constructs multiple features using FCM then selects effective features using FS is proposed. Experiments on nine datasets show that features selected by FS or constructed by FCM are all effective to improve the classification performance comparing with original features, and our proposed FCMFS can maintain the classification performance with smaller number of features comparing with FCM, and can obtain better classification performance with smaller number of features than FS on the majority of the nine datasets. Compared with another feature construction and feature selection approach named FSFCM that first selects features using FS then constructs features using FCM, FCMFS achieves better performance in terms of classification and the smaller number of features. The comparisons with three state-of-art techniques show that our proposed FCMFS approach can achieve better experimental results in most cases.

Introduction

Classification is an important supervised machine learning method that aims to classify unknown instances into corresponding categories based on the information contained in predefined features [1], [2]. So, the quality of features is a main factor that affects the classification performance [3]. Without prior knowledge, it is difficult to know which features are effective. Therefore the sufficient number of features is usually predefined, which results in many irrelevant and redundant features. Irrelevant and redundant features are not useful for classification and even reduce the classification performance [4], [5]. In some real-world classification applications, available features sometimes do not have adequate discrimination ability [3], so the trained classification model cannot achieve adequate classification performance.

Feature selection method is used to select effective features and remove irrelevant or redundant features [1], and feature construction method is employed to create new higher-level features from original ones to reduce the dimensionality of features and increase the classification performance [6]. Wrapper and filter are two typical feature construction and selection approaches based on different evaluation criteria. Wrapper-based approaches use the classification performance of a learning algorithm as the evaluation criterion, while information measures such as Information Gain(IG) [7], Information Gain Ratio(IGR) [3] and Correlation [8], [9] are used as evaluation criteria for filter-based approaches. Wrapper-based approaches are learning algorithms dependent, and in general these methods can achieve better classification performance than filter-based approaches. Since no learning algorithm is involved in the evaluation measures, filter-based approaches are faster and the classification models are more general than wrapper-based approaches [10], [11]. Moreover, experiments show that our proposed filter-based feature construction and feature selection method(FCWFS) can also achieve better classification performance than a wrapper-based feature construction method.

To address feature construction and feature selection problems, efficient global search algorithms are needed. Genetic Programming (GP) [12], [13] offers flexible pattern representations such as trees, and uses any kind of logical and mathematical expressions inside the representations [14]. These expressions can transform original features into new higher-level constructed features, and can also be used to select effective features. Therefore, GP can be used to solve feature selection [15], [16], [17] and feature construction [3], [10] tasks.

Feature construction approaches are used to transform the original feature space to another higher-level feature space. In general, GP can be used to construct single features. Otero et al. (2002) [3] used IGR as the fitness function and constructed single feature. Muharram et al. (2005) [18] proposed a single feature construction method that employed information gain, gini index, chisquare, and a combination of information gain and gini index as fitness functions. Guo et al. [19], [20] proposed similar single feature construction methods, and the difference is that these methods use Fisher criterion as fitness functions. Since single constructed features do not have adequate discrimination ability for classification, multiple feature construction approaches are investigated to improve the classification performance. Neshatian et al. (2012) [10] used a fitness function that maximized the purity of class intervals and constructed the same number of features as the number of classes. Krawiec (2002) [21] proposed an archive-based multiple feature construction method that stored useful individuals during evolutionary run. Ahmed et al. (2014) [22] divided the best individual to all possible sub-trees that were transformed into multiple features. Moreover, cooperative coevolution strategy [23], [24], [25], [26] that created multiple cocurrent populations was used to construct multiple features.

With the evolution of GP, many excellent individuals are often lost. In order to preserve effective constructed features during a GP run, a multiple feature construction approach (FCM) that stores top β individuals is proposed in our previous work [27]. However, how to set the parameter β is a problem. In this work, we investigate the impact of parameter β on the experimental results and set a β value that is as large as possible to maintain the classification performance. Thus, redundant features may be produced in the constructed features. Therefore, we employ a GP-based feature selection approach (FS) that uses correlation-based method to reduce feature redundancy and increase feature relevancy. This approach is named as FCMFS that first uses FCM to perform feature construction and then uses FS to perform feature selection.

To facilitate comparison, both feature selection and feature construction approaches use standard GP. The overall goal of this paper is to propose a feature construction and feature selection approach (FCMFS), which first constructs multiple features using FCM then selects effective constructed features using FS, and to investigate the effectiveness of our proposed FCMFS by comparing it with other feature processing methods. In order to achieve the overall goal, the following four objectives will be investigated.

Objective 1. Propose a filter-based multiple feature construction approach using GP (FCM) and a filter-based feature selection approach using GP (FS), and investigate whether features selected by FS or constructed by FCM can obtain better classification performance than original features.

Objective 2. Develop a feature construction and feature selection approach named FCMFS, and investigate whether FCMFS can obtain equivalent classification performance with a smaller number of features comparing with FCM, and whether FCMFS can achieve better classification performance and fewer features than FS on the nine datasets.

Objective 3. Investigate another feature construction and feature selection method that first selects features using FS then constructs features using FCM named FSFCM, and verify whether FCMFS can achieve better classification performance than FSFCM.

Objective 4. Investigate whether our proposed FCMFS can achieve better performance than three state-of-art methods including one wrapper-based feature construction method [28], one filter-based multiple feature construction method [10] and one single-stage feature construction and feature selection method [29].

The rest of the paper is arranged as follows. The next section describes background information involved in this paper. Section 3 presents the GP based feature construction and feature selection approaches. Section 4 describes the experimental design. Section 5 presents the experimental results with discussions. Section 6 provides conclusions and future directions.

Section snippets

Genetic programming

The evolutionary computation (EC) techniques are inspired by Darwin’s theory of evolution [30]. Genetic programming(GP) [12], [31], Genetic algorithm(GA), Particle swarm optimization(PSO), Ant colony optimization(ACO) are effective EC algorithms due to their global search ability. In addition, some new EC algorithms, such as extremal optimization (EO) algorithms, are used to solve optimization problems [32], [33]. Addressing multi-objective optimization problems using EC algorithms are getting

Methodology

We use standard GP representation methods to solve feature construction and feature selection problems. The individuals are represented as a tree-like structure. The leaf nodes of an individual are derived from original features, constructed features or selected features randomly according to different feature processing methods which are described as follows. The internal nodes are functions that come from a function set. Genetic operators, including reproduction, crossover and mutation, are

Benchmark techniques

To verify the effectiveness of our proposed FCMFS, three state-of-art techniques including one wrapper-based feature construction method [28], one filter-based multiple feature construction method [10] and one single-stage feature construction and feature selection method [29] are chosen for comparison.

The first is a conventional wrapper-based feature construction method using GP which constructs single feature, i.e, one GP run only outputs the best individual and the fitness function uses

Experimental results and discussions

We arranged the following experiments. Firstly FCMFS is compared with three benchmark techniques (FCW, FCMMR and SFCFS) to verify the effectiveness of the proposed FCMFS. Secondly, the effectiveness of six feature processing methods in Section 3 is compared. (1) FCMFS is compared with two baselines(ALL, FCS) to further verify the effectiveness of the proposed FCMFS, (2) FCM and FS are compared with ALL to verify the effectiveness of FCM and FS methods, and (3) FCMFS is compared with FCM and FS

Conclusions and future work

This paper proposes a filter-based feature construction and feature selection approach using GP(FCMFS) that is divided into two stages, i.e., first using multiple feature construction approach to store top individuals(FCM) and then using feature selection approach to select effective feature subset(FS). The experiments on nine datasets show that FCM and FS can obtain better performance than original features. FCMFS can maintain the classification performance with a smaller number of features

CRediT authorship contribution statement

Jianbin Ma: Conceptualization, Methodology, Software, Validation, Supervision, Investigation, Writing - original draft. Xiaoying Gao: Writing - review & editing, Formal analysis, Visualization, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by Hebei Agricultural University (No. ZD201702, No. LG201707) and Hebei Provincial Department of Human Resources and Social Security, China (No. CN201709).

References (64)

  • DongN. et al.

    An improvement decomposition-based multi-objective evolutionary algorithm using multi-search strategy

    Knowl.-Based Syst.

    (2019)
  • MantasC.J. et al.

    Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data

    Expert Syst. Appl.

    (2014)
  • XueB. et al.

    Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms

    Appl. Soft Comput.

    (2014)
  • XueB. et al.

    Particle swarm optimization for feature selection in classification: A multi-objective approach

    IEEE Trans. Cybern.

    (2013)
  • TranB. et al.

    Genetic programming for feature construction and selection in classification on high-dimensional data

    Memet. Comput.

    (2016)
  • OteroF.E.B. et al.

    Genetic programming for attribute construction in data mining

  • TranB.

    Evolutionary Computation for Feature Maniputation in Classification on High-Dimensioan Data

    (2018)
  • MuharramM. et al.

    Evolutionary constructive induction

    IEEE Trans. Knowl. Data Eng.

    (2005)
  • E. Hart, K. Sim, B. Gardiner, K. Kamimura, A hybrid method for feature construction and selection to improve...
  • HallM. et al.

    The WEKA data mining software: an update

    Acm Sigkdd Explor. Newslett.

    (2009)
  • NeshatianK. et al.

    A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming

    IEEE Trans. Evol. Comput.

    (2012)
  • M.G. Smith, L. Bull, Feature construction and selection using genetic programming and a genetic algorithm, in:...
  • KozaJ.R.

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
  • BanzhafW. et al.

    Genetic programming: An introduction on the automatic evolution of computer programs and its applications

    J. Combin. Theory

    (1998)
  • NeshatianK.

    Feature Manipulation with Genetic Programming

    (2010)
  • S. Ahmed, M. Zhang, L. Peng, Feature selection and classification of high dimensional mass spectrometry data: A genetic...
  • HarveyD.Y. et al.

    Automated feature design for numeric sequence classification by genetic programming

    IEEE Trans. Evol. Comput.

    (2014)
  • M.A. Muharram, G.D. Smith, Evolutionary feature construction using information gain and gini index, in: Proceedings of...
  • GuoH. et al.

    Feature extraction and dimensionality reduction by genetic programming based on the Fisher criterion

    Expert Syst.

    (2008)
  • KrawiecK.

    Genetic programming-based construction of features for machine learning and knowledge discovery tasks

    Genet. Progr. Evol. Mach.

    (2002)
  • S. Ahmed, M. Zhang, L. Peng, B. Xue, Multiple feature construction for effective biomarker identification and...
  • LinY. et al.

    Evolutionary feature synthesis for object recognition

    IEEE Trans. Syst. Man Cybern. C

    (2005)
  • Cited by (0)

    View full text