Abstract
We propose a genetic programming (GP) system for measuring the relevance of subsets of features in binary classification tasks. A virtual program structure and an evaluation function are defined in a way that constructed GP programs can measure the goodness of subsets of features. The proposed system can detect relevant subsets of features in different situations including multimodal class distributions and mutually correlated features where other ranking methods have difficulties. Our empirical results indicate that the proposed system is good at ranking subsets and giving insight into the actual classification performance. The proposed ranking system is also efficient in terms of feature selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jong, K., Mary, J., Cornuéjols, A., Marchiori, E., Sebag, M.: Ensemble feature ranking. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 267–278. Springer, Heidelberg (2004)
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Fast feature ranking algorithm. In: Knowledge-Based Intelligent Information and Engineering Systems, pp. 325–331 (2003)
Biesiada, J., Duch, W., Kachel, A., Maczka, K., Palucha, S.: Feature ranking methods based on information entropy with parzen windows. In: International Conference on Research in Electrotechnology and Applied Informatics (REI 2005), pp. 109–119 (2005)
Lin, T.H., Chiu, S.H., Tsai, K.C.: Supervised feature ranking using a genetic algorithm optimized artificial neural network. Journal of Chemical Information and Modeling 46, 1604–1614 (2006)
Cheng, Q., Varshney, P., Arora, M.: Logistic regression for feature selection and soft classification of remote sensing data. Geoscience and Remote Sensing Letters 3, 491–494 (2006)
Neshatian, K., Zhang, M.: Genetic programming for feature ranking in classification problems. In: Li, X., et al. (eds.) SEAL 2008. LNCS, vol. 5361, pp. 544–554. Springer, Heidelberg (2008)
Davis, R.A., Charlton, A.J., Oehlschlager, S., Wilson, J.C.: Novel feature selection method for genetic programming using metabolomic 1h NMR data. Chemometrics and Intelligent Laboratory Systems 81, 50–59 (2006)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowl. and Data Eng. 17(4), 491–502 (2005)
Lin, J.Y., Ke, H.R., Chien, B.C., Yang, W.P.: Classifier design with feature selection and feature extraction using layered genetic programming. Expert Syst. Appl. 34(2), 1384–1393 (2008)
Jolliffe, I.T.: Principal Component Analysis (2002)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning table of contents, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Agresti, A., Agresti, A.: Categorical Data Analysis. Wiley, Chichester (2003)
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge (1994)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to platt’s SMO algorithm for SVM classifier design. Neural Comp. 13, 637–649 (2001)
Lowry, R.: Concepts and Applications of Inferential Statistics. VassarStat (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neshatian, K., Zhang, M. (2009). Genetic Programming for Feature Subset Ranking in Binary Classification Problems. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds) Genetic Programming. EuroGP 2009. Lecture Notes in Computer Science, vol 5481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01181-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-01181-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01180-1
Online ISBN: 978-3-642-01181-8
eBook Packages: Computer ScienceComputer Science (R0)