Skip to main content

Genetic Programming for Imputation Predictor Selection and Ranking in Symbolic Regression with High-Dimensional Incomplete Data

  • Conference paper
  • First Online:
Book cover AI 2019: Advances in Artificial Intelligence (AI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11919))

Included in the following conference series:

Abstract

Incompleteness is one of the challenging issues in data science. One approach to tackle this issue is using imputation methods to estimate the missing values in incomplete data sets. In spite of the popularity of adopting this approach in several machine learning tasks, it has been rarely investigated in symbolic regression. In this work, a genetic programming (GP) based feature selection and ranking method is proposed and applied to high-dimensional symbolic regression with incomplete data. The main idea is to construct GP programs for each incomplete feature using other features as predictors. The predictors selected by these GP programs are then ranked based on the fitness values of the best constructed GP programs and the frequency of occurrences of the predictors in these programs. The experimental work is conducted on high-dimensional data where the number of features is greater than the number of instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, S., Zhang, M., Peng, L.: Improving feature ranking for biomarker discovery in proteomics mass spectrometry data using genetic programming. Conn. Sci. 26(3), 215–243 (2014)

    Article  Google Scholar 

  2. Al-Helali, B., Chen, Q., Xue, B., Zhang, M.: A hybrid GP-KNN imputation for symbolic regression with missing values. In: Mitrovic, T., Xue, B., Li, X. (eds.) AI 2018. LNCS (LNAI), vol. 11320, pp. 345–357. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03991-2_33

    Chapter  Google Scholar 

  3. Arslan, S., Ozturk, C.: Multi hive artificial bee colony programming for high dimensional symbolic regression with feature selection. Appl. Soft Comput. 78, 515–527 (2019)

    Article  Google Scholar 

  4. Austel, V., et al.: Globally optimal symbolic regression. arXiv preprint arXiv:1710.10720 (2017)

  5. Brandejsky, T.: Model identification from incomplete data set describing state variable subset only-the problem of optimizing and predicting heuristic incorporation into evolutionary system. In: Zelinka, I., Chen, G., Rössler, O., Snasel, V., Abraham, A. (eds.) Nostradamus 2013: Prediction, Modeling and Analysis of Complex Systems. Advances in Intelligent Systems and Computing, vol. 210, pp. 181–189. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00542-3_19

    Chapter  MATH  Google Scholar 

  6. Buuren, S.V., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 1–68 (2010)

    Google Scholar 

  7. Chen, Q., Zhang, M., Xue, B.: Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans. Evol. Comput. 21(5), 792–806 (2017)

    Article  Google Scholar 

  8. Clarke, R., et al.: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat. Rev. Cancer 8(1), 37 (2008)

    Article  Google Scholar 

  9. Dick, G.: Bloat and generalisation in symbolic regression. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 491–502. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13563-2_42

    Chapter  Google Scholar 

  10. Donders, A.R.T., Van Der Heijden, G.J., Stijnen, T., Moons, K.G.: A gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)

    Article  Google Scholar 

  11. Fortin, F.A., Rainville, F.M.D., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13(Jul), 2171–2175 (2012)

    MathSciNet  MATH  Google Scholar 

  12. Friedlander, A., Neshatian, K., Zhang, M.: Meta-learning and feature ranking using genetic programming for classification: variable terminal weighting. In: 2011 IEEE Congress of Evolutionary Computation (CEC), pp. 941–948. IEEE (2011)

    Google Scholar 

  13. Koza, J.R.: Genetic Programming II, Automatic Discovery of Reusable Subprograms. MIT Press, Cambridge (1992)

    Google Scholar 

  14. Liu, X., Wang, H., Ye, W., Xing, E.P.: Sparse variable selection on high dimensional heterogeneous data with tree structured responses. arXiv preprint arXiv:1711.08265 (2017)

  15. Muni, D.P., Pal, N.R., Das, J.: Genetic programming for simultaneous feature selection and classifier design (2006)

    Google Scholar 

  16. Neshatian, K., Zhang, M., Andreae, P.: Genetic programming for feature ranking in classification problems. In: Li, X., et al. (eds.) SEAL 2008. LNCS, vol. 5361, pp. 544–554. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89694-4_55

    Chapter  Google Scholar 

  17. Pires, A., Branco, J.: High dimensionality: the latest challenge to data analysis. arXiv preprint arXiv:1902.04679 (2019)

  18. Pornprasertmanit, S., Miller, P., Schoemann, A., Quick, C., Jorgensen, T., Pornprasertmanit, M.S.: Package ‘simsem’ (2016)

    Google Scholar 

  19. Tran, B.: Evolutionary computation for feature manipulation in classification on high-dimensional data. Ph.D. thesis, Victoria University of Wellington (2018)

    Google Scholar 

  20. Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)

    Article  Google Scholar 

  21. Venkatesh, B., Anuradha, J.: A hybrid feature selection approach for handling a high-dimensional data. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds.) Innovations in Computer Science and Engineering. LNNS, vol. 74, pp. 365–373. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-7082-3_42

    Chapter  Google Scholar 

  22. Vladislavleva, E., Smits, G., Den Hertog, D.: On the importance of data balancing for symbolic regression. IEEE Trans. Evol. Comput. 14(2), 252–277 (2010)

    Article  Google Scholar 

  23. Xue, B., Zhang, M.: Evolutionary feature manipulation in data mining/big data. ACM SIGEVOlution 10(1), 4–11 (2017)

    Article  Google Scholar 

  24. Zhang, M., Ciesielski, V.: Genetic programming for multiple class object detection. In: Foo, N. (ed.) AI 1999. LNCS (LNAI), vol. 1747, pp. 180–192. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46695-9_16

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baligh Al-Helali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al-Helali, B., Chen, Q., Xue, B., Zhang, M. (2019). Genetic Programming for Imputation Predictor Selection and Ranking in Symbolic Regression with High-Dimensional Incomplete Data. In: Liu, J., Bailey, J. (eds) AI 2019: Advances in Artificial Intelligence. AI 2019. Lecture Notes in Computer Science(), vol 11919. Springer, Cham. https://doi.org/10.1007/978-3-030-35288-2_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35288-2_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35287-5

  • Online ISBN: 978-3-030-35288-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics