skip to main content
10.1145/3071178.3071183acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Genetic programming based feature construction for classification with incomplete data

Authors Info & Claims
Published:01 July 2017Publication History

ABSTRACT

Missing values are an unavoidable problem in many real-world datasets. Dealing with incomplete data is an crucial requirement for classification because inadequate treatment of missing values often causes large classification error. Feature construction has been successfully applied to improve classification with complete data, but it has been seldom applied to incomplete data. Genetic programming-based multiple feature construction (GPMFC) is a current encouraging feature construction method which uses genetic programming to evolve new multiple features from original features for classification tasks. GPMFC can improve the accuracy and reduce the complexity of many decision trees and rule-based classifiers; however, it cannot directly work with incomplete data. This paper proposes IGPMFC which is extended from GPMFC to tackle with incomplete data. IGPMFC uses genetic programming with interval functions to directly evolve multiple features for classification with incomplete data. Experimental results reveal that not only IGPMFC can substantially improve the accuracy, but also can reduce the complexity of learnt classifiers facing with incomplete data.

References

  1. A. Asuncion and D. Newman. UCI machine learning repository, 2007.Google ScholarGoogle Scholar
  2. J. O. Berger. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 2013.Google ScholarGoogle Scholar
  3. A. Bifet, G. Holmes, B. Pfahringer, and E. Frank. Fast perceptron decision tree learning from evolving data streams. In Advances in knowledge discovery and data mining, pages 299--310. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. CRC press, 1984.Google ScholarGoogle Scholar
  5. S. Buuren and K. Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in R. Journal of statistical software, 45, 2011.Google ScholarGoogle Scholar
  6. P. G. Espejo, S. Ventura, and F. Herrera. A survey on the application of genetic programming to classification. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 40:121--144, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Farhangfar, L. Kurgan, and J. Dy. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition, 41:3692--3705, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal. Pattern classification with missing data: a review. Neural Computing and Applications, 19:263--282, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. W. Graham. Missing data analysis: Making it work in the real world. Annual review of psychology, 60:549--576, 2009.Google ScholarGoogle Scholar
  10. H. Guo, Q. Zhang, and A. K. Nandi. Feature extraction and dimensionality reduction by genetic programming based on the fisher criterion. Expert Systems, 25:444--459, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11:10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. A. Hall. Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, 1999.Google ScholarGoogle Scholar
  13. J. Han, M. Kamber, and J. Pei. Data mining: concepts and techniques: concepts and techniques. Elsevier, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Hansen and G. W. Walster. Global optimization using interval analysis: revised and expanded, volume 264. CRC Press, 2003.Google ScholarGoogle Scholar
  15. J. R. Koza. Genetic programming: on the programming of computers by means of natural selection, volume 1. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Lin and B. Bhanu. Evolutionary feature synthesis for object recognition. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 35:156--171, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. J. Little and D. B. Rubin. Statistical analysis with missing data. John Wiley & Sons, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Luke, L. Panait, G. Balan, S. Paus, Z. Skolicki, E. Popovici, K. Sullivan, J. Harrison, J. Bassett, R. Hubley, et al. A java-based evolutionary computation research system. Online (March 2004) http://cs.gmu.edu/~eclab/projects/ecj, 2004.Google ScholarGoogle Scholar
  19. M. Muharram and G. D. Smith. Evolutionary constructive induction. Knowledge and Data Engineering, IEEE Transactions on, 17:1518--1528, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. R. Musser. Introspective sorting and selection algorithms. Softw., Pract. Exper., 27:983--993, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Neshatian, M. Zhang, and P. Andreae. A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. Evolutionary Computation, IEEE Transactions on, 16:645--661, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. R. Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Shi. Best-first decision tree learning. Master's thesis, University of Waikato, Hamilton, NZ, 2007. COMP594.Google ScholarGoogle Scholar
  24. M. G. Smith and L. Bull. Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines, 6:265--281, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Srinivasan and R. D. King. Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. Data Mining and Knowledge Discovery, 3:37--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Tan, B. Bhanu, and Y. Lin. Fingerprint classification based on learned features. Systems, Man, and Cybernetics, Tart C: Applications and Reviews, IEEE Transactions on, 35:287--300, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. T. Tran, P. Andreae, and M. Zhang. Impact of imputation of missing values on genetic programming based multiple feature construction for classification. In Evolutionary Computation (CEC), 2015 IEEE Congress on, pages 2398--2405, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  28. C. T. Tran, M. Zhang, and P. Andreae. Multiple imputation for missing data using genetic programming. In Proceedings of the 2015 annual conference on genetic and evolutionary computation, pages 583--590, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. T. Tran, M. Zhang, and P. Andreae. Directly evolving classifiers for missing data using genetic programming. In Evolutionary Computation (CEC), 2016 IEEE Congress on, pages 5278--5285, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. C. T. Tran, M. Zhang, and P. Andreae. A genetic programming-based imputation method for classification with missing data. In European Conference on Genetic Programming, pages 149--163, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  31. C. T. Tran, M. Zhang, P. Andreae, and B. Xue. Directly constructing multiple features for classification with missing data using genetic programming with interval functions. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, pages 69--70, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. I. R. White, P. Royston, and A. M. Wood. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in medicine, 30:377--399, 2011.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Genetic programming based feature construction for classification with incomplete data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference
          July 2017
          1427 pages
          ISBN:9781450349208
          DOI:10.1145/3071178

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 July 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          GECCO '17 Paper Acceptance Rate178of462submissions,39%Overall Acceptance Rate1,669of4,410submissions,38%

          Upcoming Conference

          GECCO '24
          Genetic and Evolutionary Computation Conference
          July 14 - 18, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader