skip to main content
10.1145/3071178.3071338acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Sensitivity-like analysis for feature selection in genetic programming

Published:01 July 2017Publication History

ABSTRACT

Feature selection is an important process within machine learning problems. Through pressures imposed on models during evolution, genetic programming performs basic feature selection, and so analysis of the evolved models can provide some insights into the utility of input features. Previous work has tended towards a presence model of feature selection, where the frequency of a feature appearing within evolved models is a metric for its utility. In this paper, we identify some drawbacks with using this approach, and instead propose the integration of importance measures for feature selection that measure the influence of a feature within a model. Using sensitivity-like analysis methods inspired by importance measures used in random forest regression, we demonstrate that genetic programming introduces many features into evolved models that have little impact on a given model's behaviour, and this can mask the true importance of salient features. The paper concludes by exploring bloat control methods and adaptive terminal selection methods to influence the identification of useful features within the search performed by genetic programming, with results suggesting that a combination of adaptive terminal selection and bloat control may help to improve generalisation performance.

References

  1. Francesco Archetti, Stefano Lanzeni, Enza Messina, and Leonardo Vanneschi. 2006. Genetic programming for human oral bioavailability of drugs. In Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM, 255--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google ScholarGoogle Scholar
  4. Q. Chen, B. Xue, B. Niu, and M. Zhang. 2016. Improving generalisation of genetic programming for high-dimensional symbolic regression with feature selection. In 2016 IEEE Congress on Evolutionary Computation (CEC). 3793--3800.Google ScholarGoogle Scholar
  5. Grant Dick. 2014. Bloat and generalisation in symbolic regression. In Asia-Pacific Conference on Simulated Evolution and Learning. Springer International Publishing, 491--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Grant Dick. 2015. Improving Geometric Semantic Genetic Programming with Safe Tree Initialisation. In European Conference on Genetic Programming. Springer International Publishing, 28--40.Google ScholarGoogle Scholar
  7. Grant Dick, Aysha P Rimoni, and Peter A Whigham. 2015. A re-examination of the use of genetic programming on the oral bioavailability problem. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. ACM, 1015--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Grant Dick and Peter A Whigham. 2013. Controlling bloat through parsimonious elitist replacement and spatial structure. In European Conference on Genetic Programming. Springer Berlin Heidelberg, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Harrison and Daniel L Rubinfeld. 1978. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management 5, 1 (1978), 81--102.Google ScholarGoogle ScholarCross RefCross Ref
  11. John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Röisín Loughran, Alexandros Agapitos, Ahmed Kattan, Anthony Brabazon, and Michael O'Neill. 2017. Feature selection for speaker verification using genetic programming. Evolutionary Intelligence (2017), 1--21.Google ScholarGoogle Scholar
  13. Durga Prasad Muni, Nikhil R Pal, and Jyotirmay Das. 2006. Genetic programming for simultaneous feature selection and classifier design. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 36, 1 (2006), 106--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kourosh Neshatian and Mengjie Zhang. 2009. Genetic programming for feature subset ranking in binary classification problems. In European conference on genetic programming. Springer, 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J Ross Quinlan. 1993. Combining instance-based and model-based learning. In Proceedings of the Tenth International Conference on Machine Learning. 236--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrea Saltelli, Karen Chan, E Marian Scott, and others. 2000. Sensitivity analysis. Vol. 1. Wiley New York.Google ScholarGoogle Scholar
  17. Sean Stijven, Wouter Minnebo, and Katya Vladislavleva. 2011. Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression. In Proceedings of the 13th annual conference companion on Genetic and evolutionary computation. ACM, 623--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Peter A Whigham and Grant Dick. 2010. Implicitly controlling bloat in genetic programming. IEEE Transactions on Evolutionary Computation 14, 2 (2010), 173--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bing Xue, Mengjie Zhang, Will N Browne, and Xin Yao. 2016. A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 606--626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. I-C Yeh. 1998. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research 28, 12 (1998), 1797--1808.Google ScholarGoogle Scholar

Index Terms

  1. Sensitivity-like analysis for feature selection in genetic programming

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference
              July 2017
              1427 pages
              ISBN:9781450349208
              DOI:10.1145/3071178

              Copyright © 2017 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 July 2017

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              GECCO '17 Paper Acceptance Rate178of462submissions,39%Overall Acceptance Rate1,669of4,410submissions,38%

              Upcoming Conference

              GECCO '24
              Genetic and Evolutionary Computation Conference
              July 14 - 18, 2024
              Melbourne , VIC , Australia

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader