Abstract
Manifold learning, a non-linear approach of dimensionality reduction, assumes that the dimensionality of multiple datasets is artificially high and a reduced number of dimensions is sufficient to maintain the information about the data. In this paper, a large scale comparison of manifold learning techniques is performed for the task of classification. We show the current standing of genetic programming (GP) for the task of classification by comparing the classification results of two GP-based manifold leaning methods: GP-Mal and ManiGP - an experimental manifold learning technique proposed in this paper. We show that GP-based methods can more effectively learn a manifold across a set of 155 different problems and deliver more separable embeddings than many established methods.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Empirical tests have surprisingly shown the superior performance of this technique in comparison to tournament selection.
- 3.
References
Al-Madi, N., Ludwig, S.A.: Improving genetic programming classification for binary and multiclass datasets. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 166–173. IEEE (2013)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2002)
Bhowan, U., Zhang, M., Johnston, M.: Genetic programming for classification with unbalanced data. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 1–13. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12148-7_1
Borg, I., Groenen, P.: Modern multidimensional scaling: theory and applications. J. Educ. Meas. 40(3), 277–280 (2003)
Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: 2010 20th International Conference on Pattern Recognition, pp. 3121–3124. IEEE (2010)
Cano, A., Ventura, S., Cios, K.J.: Multi-objective genetic programming for feature extraction and data visualization. Soft. Comput. 21(8), 2069–2089 (2015). https://doi.org/10.1007/s00500-015-1907-y
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)
Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. J. Mach. Learn. Res. 16(1), 2859–2900 (2015)
De Backer, S., Naud, A., Scheunders, P.: Non-linear dimensionality reduction techniques for unsupervised feature extraction. Pattern Recogn. Lett. 19(8), 711–720 (1998)
Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Nat. Acad. Sci. 100(10), 5591–5596 (2003)
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(2), 121–144 (2009)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
Gisbrecht, A., Hammer, B.: Data visualization by nonlinear dimensionality reduction. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 5(2), 51–73 (2015)
Guo, H., Zhang, Q., Nandi, A.K.: Feature extraction and dimensionality reduction by genetic programming based on the fisher criterion. Expert Syst. 25(5), 444–459 (2008)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature Extraction: Foundations and Applications. STUDFUZZ, vol. 207. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8
Hotelling, H.: Relations between two sets of variates. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. SSS, pp. 162–190. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_14
Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, pp. 372–378. IEEE (2014)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
La Cava, W., Silva, S., Danai, K., Spector, L., Vanneschi, L., Moore, J.H.: Multidimensional genetic programming for multiclass classification. Swarm Evol. Comput. 44, 260–272 (2019)
La Cava, W., Williams, H., Fu, W., Moore, J.H.: Evaluating recommender systems for AI-driven data science. arXiv preprint arXiv:1905.09205 (2019)
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, New York (2007). https://doi.org/10.1007/978-0-387-39351-3
Lensen, A., Xue, B., Zhang, M.: Can genetic programming do manifold learning too? In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 114–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_8
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min. 10(1), 36 (2017)
Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Comput. Inform. 29(6+), 1221–1231 (2012)
Orzechowski, P., La Cava, W., Moore, J.H.: Where are we now?: a large benchmark study of recent symbolic regression methods. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1183–1190. ACM (2018)
Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rainville, D., Fortin, F.A., Gardner, M.A., Parizeau, M., Gagné, C., et al.: DEAP: a python framework for evolutionary algorithms. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 85–92. ACM (2012)
Rao, C.R.: The utilization of multiple measurements in problems of biological classification. J. Roy. Stat. Soc.: Ser. B (Methodol.) 10(2), 159–203 (1948)
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans. Evol. Comput. 4(2), 164–171 (2000)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Sorzano, C.O.S., Vargas, J., Montano, A.P.: A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877 (2014)
Spearmen, C.: General intelligence objectively determined and measured. Am. J. Psychol. 15, 107–197 (1904)
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 61(3), 611–622 (1999)
Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17(4), 401–419 (1952). https://doi.org/10.1007/BF02288916
Vural, E., Guillemot, C.: Out-of-sample generalizations for supervised manifold learning for classification. IEEE Trans. Image Process. 25(3), 1410–1424 (2016)
Vural, E., Guillemot, C.: A study of the classification of low-dimensional data with supervised manifold learning. J. Mach. Learn. Res. 18(1), 5741–5795 (2017)
Wattenberg, M., Viégas, F., Johnson, I.: How to use t-SNE effectively. Distill (2016). https://doi.org/10.23915/distill.00002. http://distill.pub/2016/misread-tsne
Weinberger, K.Q., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: AAAI, vol. 6, pp. 1683–1686 (2006)
Yao, H., Tian, L.: A genetic-algorithm-based selective principal component analysis (GA-SPCA) method for high-dimensional data feature extraction. IEEE Trans. Geosci. Remote Sens. 41(6), 1469–1478 (2003)
Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, pp. 1593–1600 (2007)
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)
Zhao, D., Lin, Z., Tang, X.: Laplacian PCA and its applications. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Acknowledgements
This research was supported in part by PL-Grid Infrastructure and by National Institutes of Health (NIH) grant LM012601. The authors would like to thank Dr. Andrew Lensen from Victoria University of Wellington for his help in running GP-MaL.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Orzechowski, P., Magiera, F., Moore, J.H. (2020). Benchmarking Manifold Learning Methods on a Large Collection of Datasets. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds) Genetic Programming. EuroGP 2020. Lecture Notes in Computer Science(), vol 12101. Springer, Cham. https://doi.org/10.1007/978-3-030-44094-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-44094-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44093-0
Online ISBN: 978-3-030-44094-7
eBook Packages: Computer ScienceComputer Science (R0)