Abstract
Transfer learning (TL) is the process by which some aspects of a machine learning model generated on a source task is transferred to a target task, to simplify the learning required to solve the target. TL in Genetic Programming (GP) has not received much attention, since it is normally assumed that an evolved symbolic expression is specifically tailored to a problem’s data and thus cannot be used in other problems. The goal of this work is to present a broad and diverse study of TL in GP, considering a varied set of source and target tasks, and dealing with questions that have received little, or no attention, in previous GP literature. In particular, this work studies the performance of transferred solutions when the source and target tasks are from different domains, and when they do not share a similar input feature space. Additionally, the relationship between the success and failure of transferred solutions is studied, considering different source and target tasks. Finally, the predictability of TL performance is analyzed for the first time in GP literature. GP-based constructive induction of features is used to carry out the study, a wrapper-based approach where GP is used to construct feature transformations and an additional learning algorithm is used to fit the final model. The experimental work presents several notable results and contributions. First, TL is capable of generating solutions that outperform, in many cases, baseline methods in classification and regression tasks. Second, it is shown that some problems are good source problems while others are good targets in a TL system. Third, the transferability of solutions is not necessarily symmetric between two problems. Finally, results show that it is possible to predict the success of TL in some cases, particularly in classification tasks.
Similar content being viewed by others
Notes
The terms original features, raw features and problem features are used to denote the input features (independent variables) that are contained in a machine learning dataset. These terms are used interchangeably in the rest of this paper.
We do not provide a complete description of all the works in this area, such a survey is beyond the scope of this paper. Instead, we chose representative papers of the groups and subgroups in our taxonomy of research in this area. We focused on unique papers that illustrate the main features of each group, and included works that have also achieved good performance on real-world tasks.
These results do not imply that source task features are not important in predicting TL performance, only that target task features are more important for a Random Forest regression model.
All of the predictors are plotted in a [0, 1] scale. For predictors with unbounded domains, min-max normalization was used.
References
M.M. Najafabadi, F. Villanustre, T.M. Khoshgoftaar, N. Seliya, R. Wald, E. Muharemagic, Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016)
S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
J. Yosinski, J. Clune, Y. Bengio, H. Lipson, How transferable are features in deep neural networks? CoRR. arXiv:abs/1411.1792
K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of transfer learning. J. Big Data 3(1), 9 (2016)
D. Jackson, A.P. Gibbons, Layered learning in boolean GP problems, in Genetic Programming, ed. by M. Ebner, et al. (Springer, Berlin, 2007), pp. 148–159
J.E. Perry, The effect of population enrichment in genetic programming, in Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, vol. 1 (1994), pp. 456–461
W.B. Langdon, J.P. Nordin, Seeding genetic programming populations, in Genetic Programming, ed. by R. Poli, W. Banzhaf, W.B. Langdon, J. Miller, P. Nordin, T.C. Fogarty (Springer, Berlin, 2000), pp. 304–315
T.T.H. Dinh, T.H. Chu, N.Q. Uy, Transfer learning in genetic programming, in 2015 IEEE Congress on Evolutionary Computation (CEC) (2015), pp. 1145–1151
E. Haslam, B. Xue, M. Zhang, Further investigation on genetic programming with transfer learning for symbolic regression, in IEEE Congress on Evolutionary Computation (CEC) (2016), pp. 3598–3605
D. O’Neill, H. Al-Sahaf, B. Xue, M. Zhang, Common subtrees in related problems: a novel transfer learning approach for genetic programming, in IEEE Congress on Evolutionary Computation (CEC) (2017), pp. 1287–1294
M. Iqbal, B. Xue, M. Zhang, Reusing extracted knowledge in genetic programming to solve complex texture image classification problems, in Proceedings, Part II, of the 20th Pacific–Asia Conference on Advances in Knowledge Discovery and Data Mining—Volume 9652, PAKDD 2016 (Springer, Berlin, 2016), pp. 117–129
M. Iqbal, M. Zhang, B. Xue, Improving classification on images by extracting and transferring knowledge in genetic programming, in IEEE Congress on Evolutionary Computation (CEC) (2016), pp. 3582–3589
W. Fu, B. Xue, M. Zhang, X. Gao, Transductive transfer learning in genetic programming for document classification, in Simulated Evolution and Learning, ed. by Y. Shi, et al. (Springer, Cham, 2017), pp. 556–568
M. Iqbal, B. Xue, H. Al-Sahaf, M. Zhang, Cross-domain reuse of extracted knowledge in genetic programming for image classification. IEEE Trans. Evol. Comput. 21(4), 569–587 (2017)
M. Iqbal, H. Al-Sahaf, B. Xue, M. Zhang, Genetic programming with transfer learning for texture image classification. Soft Comput. 23(23), 12859–12871 (2019). https://doi.org/10.1007/s00500-019-03843-5
J. Wnek, R.S. Michalski, Hypothesis-driven constructive induction in AQ17-HCI: a method and experiments. Mach. Learn. 14(2), 139–168 (1994)
H. Bensusan, I. Kuscu, Constructive induction using genetic programming, in Evolutionary Computing and Machine Learning Workshop (Morgan Kaufmann, Burlington, 1996)
L. Muñoz, L. Trujillo, S. Silva, M. Castelli, L. Vanneschi, Evolving multidimensional transformations for symbolic regression with M3GP. Memet. Comput. 11, 111–126 (2019)
Y. Martínez, L. Trujillo, P. Legrand, E. Galván-López, Prediction of expected performance for a genetic programming classifier. Genet. Program. Evolvable Mach. 17(4), 409–449 (2016)
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, vol. 1 (MIT Press, Cambridge, 1992)
P. Stone, M. Veloso, Layered learning, in Machine Learning: ECML 2000 (Proceedings of the Eleventh European Conference on Machine Learning), ed. by R.L. de Mántaras, E. Plaza (Springer, Barcelona, 2000), pp. 369–381
M. Keijzer, C. Ryan, M. Cattolico, Run transferable libraries—learning functional bias in problem domains, in Genetic and Evolutionary Computation–GECCO 2004, ed. by K. Deb (Springer, Berlin, 2004), pp. 531–542
G. Murphy, C. Ryan, D. Howard, (Seeding methods for run transferable libraries) Capturing domain relevant functionality through schematic manipulation for genetic programming, in 2007 Frontiers in the Convergence of Bioscience and Information Technologies (2007), pp. 769–772
G. Murphy, C. Ryan, Seeding methods for run transferable libraries, in Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO’07 (ACM, New York, 2007), pp. 1755–1755
M.D. Schmidt, H. Lipson, Incorporating expert knowledge in evolutionary search: a study of seeding methods, in Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ’09 (ACM, New York, 2009), pp. 1091–1098
L. Vanneschi, I. Bakurov, M. Castelli, An initialization technique for geometric semantic GP based on demes evolution and despeciation, in IEEE Congress on Evolutionary Computation (CEC) (2017), pp. 113–120
C.H. Westerberg, J. Levine, Investigation of different seeding strategies in a genetic planner, in Proceedings of the EvoWorkshops on Applications of Evolutionary Computing (Springer, Berlin, 2001), pp. 505–514
J.H. Moore, B.C. White, Exploiting expert knowledge in genetic programming for genome-wide genetic analysis, in Parallel Problem Solving from Nature—PPSN IX, ed. by T.P. Runarsson, et al. (Springer, Berlin, 2006), pp. 969–977
H. Ahmad, T. Helmuth, A comparison of semantic-based initialization methods for genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’18 (ACM, New York, 2018), pp. 1878–1881
I. Tanev, T. Kuyucu, K. Shimohara, Gp-induced and explicit bloating of the seeds in incremental GP improves evolutionary success. Genet. Program. Evolvable Mach. 15(1), 37–60 (2014)
C.J. Matheus, A constructive induction framework, in Proceedings of the Sixth International Workshop on Machine Learning, ed. by A.M. Segre (Morgan Kaufmann, San Francisco, 1989), pp. 474–475
L. Altenberg, Evolving better representations through selective genome growth, in Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, vol. 1 (1994), pp. 182–187
H. Vafaie, K. De Jong, Genetic algorithms as a tool for restructuring feature space representations, in Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence (1995), pp. 8–11
J. Sherrah, R.E. Bogner, B. Bouzerdoum, Automatic selection of features for classification using genetic programming, in Australian and New Zealand Conference on Intelligent Information Systems, 1996 (1996), pp. 284–287
M. Hinchliffe, H. Hiden, B. McKay, M. Willis, M. Tham, G. Barton, Modelling chemical process systems using a multi-gene genetic programming algorithm, in Late Breaking Papers at the Genetic Programming 1996 Conference Stanford University July 28–31, 1996, ed. by J.R. Koza (Stanford University, Stanford, 1996), pp. 56–65
J.R. Sherrah, R.E. Bogner, A. Bouzerdoum, The evolutionary pre-processor: Automatic feature extraction for supervised classification using genetic programming, in Proceedings of the 2nd International Conference on Genetic Programming, (GP-97) (Morgan Kaufmann, 1997), pp. 304–312
R.S. Michalski, A theory and methodology of inductive learning. Artif. Intell. 20(2), 111–161 (1983)
M. Willis, H. Hiden, G. Montague, Developing inferential estimation algorithms using genetic programming, in it IFAC Proceedings, iFAC Symposium on Advanced Control of Chemical Processes 1997 (ADCHEM ’97), Banff, Canada, 9–11 June vol. 30(9) (1997), pp. 209–214
M. Willis, H. Hiden, M. Hinchliffe, B. McKay, G.W. Barton, Systems modelling using genetic programming. Comput. Chem. Eng. 21, S1161–S1166 (1997)
S. Bleuler, M. Brack, L. Thiele, E. Zitzler, Multiobjective genetic programming: reducing bloat using SPEA2, in Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No. 01TH8546), vol. 1 (2001), pp. 536–543
K. Krawiec, Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet. Program. Evolvable Mach. 3(4), 329–343 (2002)
K. Krawiec, L. Włodarski, Coevolutionary feature construction for transformation of representation of machine learners, in Intelligent Information Processing and Web Mining, ed. by M.A. Kłopotek, S.T. Wierzchoń, K. Trojanowski (Springer, Berlin, 2004), pp. 139–150
Y. Zhang, P.I. Rockett, A generic optimal feature extraction method using multiobjective genetic programming, Tech. Rep. VIE 2006/001, University of Sheffield, Department of Electronic and Electrical Engineering (2006)
Y. Li, X. Wei, Linear-in-parameter models based on parsimonious genetic programming algorithm and its application to aero-engine start modeling. Chin. J. Aeronaut. 19(4), 295–303 (2006)
D. Searson, M. Willis, G. Montague, Co-evolution of non-linear PLS model components. J. Chemom. 21(12), 592–603 (2007)
J.-Y. Lin, H.-R. Ke, B.-C. Chien, W.-P. Yang, Classifier design with feature selection and feature extraction using layered genetic programming. Expert Syst. Appl. 34, 1384–1393 (2008)
Y. Zhang, P.I. Rockett, A generic multi-dimensional feature extraction method using multiobjective genetic programming. Evol. Comput. 17(1), 89–115 (2009)
X.-K. Wei, Y.-H. Li, Y. Feng, Parsimonious genetic programming for complex process intelligent modeling: algorithm and applications. Neural Comput. Appl. 19(2), 329–335 (2010)
D.P. Searson, D.E. Leahy, M.J. Willis, GPTIPS: an open source genetic programming toolbox for multigene symbolic regression, in International Multiconference of Engineers and Computer Scientists 2010 (IMECS 2010), vol. 3 (Newswood Ltd, London, 2010), pp. 77–80
G.A. Morrison, D.P. Searson, M.J. Willis, Using genetic programming to evolve a team of data classifiers. Int. J. Comput. Electr. Autom. Control Inf. Eng. 4(12), 1815–1818 (2010)
L. Guo, D. Rivero, J. Dorado, C.R. Munteanu, A. Pazos, Automatic feature extraction using genetic programming: an application to epileptic EEG classification. Expert Syst. Appl. 38(8), 10425–10436 (2011)
T. McConaghy, FFX: Fast, Scalable, Deterministic Symbolic Regression Technology (Springer, New York, 2011), pp. 235–260
A.H. Gandomi, A.H. Alavi, A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems. Neural Comput. Appl. 21(1), 171–187 (2012)
A.H. Gandomi, A.H. Alavi, A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake engineering problems. Neural Comput. Appl. 21(1), 189–201 (2012)
I. Icke, J.C. Bongard, Improving genetic programming based symbolic regression using deterministic machine learning, in IEEE Congress on Evolutionary Computation (2013), pp. 1763–1770
L. Shao, L. Liu, X. Li, Feature learning for image classification via multiobjective genetic programming. IEEE Trans. Neural Netw. Learn. Syst. 25(7), 1359–1371 (2014)
V. Ingalalli, S. Silva, M. Castelli, L. Vanneschi, A multi-dimensional genetic programming approach for multi-class classification problems, in 17th European Conference on Genetic Programming, vol. 8599, LNCS, ed. by M. Nicolau, et al. (Springer, Granada, 2014), pp. 48–60
V.V. De Melo, Kaizen programming, in Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14 (ACM, New York, 2014), pp. 895–902
I. Arnaldo, K. Krawiec, U.-M. O’Reilly, Multiple regression genetic programming, in Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14 (ACM, New York, 2014), pp. 879–886
A. Garg, K. Tai, An improved multi-gene genetic programming approach for the evolution of generalized model in modelling of rapid prototyping process, in Modern Advances in Applied Intelligence, ed. by M. Ali, J.-S. Pan, S.-M. Chen, M.-F. Horng (Springer, Cham, 2014), pp. 218–226
L. Muñoz, S. Silva, L. Trujillo, M3GP—multiclass classification with GP, in Genetic Programming: 18th European Conference, EuroGP 2015, Copenhagen, Denmark, April 8–10, 2015, Proceedings (Springer, Cham, 2015), pp. 78–91
I. Arnaldo, U.-M. O’Reilly, K. Veeramachaneni, Building predictive models via feature synthesis, in Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO ’15 (ACM, New York, 2015), pp. 983–990
D.P. Searson, GPTIPS 2: an open-source software platform for symbolic data mining. CoRR. arXiv:abs/1412.4690
V.V. de Melo, W. Banzhaf, Kaizen Programming for Feature Construction for Classification (Springer, Cham, 2016), pp. 39–57
S. Silva, L. Muñoz, L. Trujillo, V. Ingalalli, M. Castelli, L. Vanneschi, Multiclass Classification Through Multidimensional Clustering (Springer, Cham, 2016), pp. 219–239
W. La Cava, J. Moore, A general feature engineering wrapper for machine learning using epsilon-lexicase survival, in Genetic Programming, ed. by J. McDermott, et al. (Springer, Cham, 2017), pp. 80–95
W. La Cava, J.H. Moore, Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods (2017), pp. 961–968
W. La Cava, S. Silva, L. Vanneschi, L. Spector, J. Moore, Genetic programming representations for multi-dimensional feature learning in biomedical classification, in Applications of Evolutionary Computation, ed. by G. Squillero, K. Sim (Springer, Cham, 2017), pp. 158–173
A.L.F. Novaes, R. Tanscheit, D.M. Dias, Econometric genetic programming outperforms traditional econometric algorithms for regression tasks, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’17 (ACM, New York, 2017), pp. 1427–1430
A.L.F. Novaes, R. Tanscheit, D.M. Dias, Econometric genetic programming in binary classification: evolving logistic regressions through genetic programming, in Progress in Artificial Intelligence, ed. by E. Oliveira, J. Gama, Z. Vale, H. Lopes Cardoso (Springer, Cham, 2017), pp. 382–394
E. Dunn, G. Olague, E. Lutton, Parisian camera placement for vision metrology. Pattern Recognit. Lett. 27(11), 1209–1219 (2006). (Evolutionary computer vision and image understanding)
A.O.H. Gitlow, S. Gitlow, R. Oppenheim, Tools and Methods for the Improvement of Quality, Irwin Series in Qualitative Analysis for Business (Taylor & Francis, Milton Park, 1989)
W.L. Cava, S. Silva, K. Danai, L. Spector, L. Vanneschi, J.H. Moore, Multidimensional genetic programming for multiclass classification. Swarm Evol. Comput. 44, 260–272 (2019)
J.H. Friedman, Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)
A. Moraglio, K. Krawiec, C.G. Johnson, Geometric Semantic Genetic Programming (Springer, Berlin, 2012), pp. 21–31
I. Kojadinovic, On the use of mutual information in data analysis: an overview, in Proceedings of the International Symposium on Applied Stochastic Models Data Analysis (2005), pp. 738–47
S. Luke, L. Panait, Lexicographic parsimony pressure, in Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, GECCO’02 (Morgan Kaufmann Publishers, Burlington, 2002), pp. 829–836
J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Mult. Valued Log. Soft Comput. 17(2–3), 255–287 (2011)
D. Dua, C. Graff, UCI Machine Learning Repository (University of California, School of Information and Computer Science, Irvine, CA, 2019). http://archive.ics.uci.edu/ml
J. Gerritsma, R. Onnink, A. Versluis, Geometry, resistance and stability of the delft systematic yacht hull series. Int. Shipbuilding Prog. 28, 276–297 (1981)
I.-C. Yeh, Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)
A. Tsanas, A. Xifara, Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012)
D. Harrison, D.L. Rubinfeld, Hedonic housing prices and the demand for clean air. J. Environ. Econom. Manag. 5(1), 81–102 (1978)
E.J. Vladislavleva, G.F. Smits, D. den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2009)
Acknowledgements
This research was funded by CONACYT (Mexico) Fronteras de la Ciencia 2015-2 Project No. FC-2015-2/944, and first author was supported by CONACYT graduate scholarship No. 302526. This work was also partially supported by FCT through funding of LASIGE Research Unit (UID/CEC/00408/2019), and projects PERSEIDS (PTDC/EMS-SIS/0642/2014), INTERPHENO (PTDC/ASP-PLA/28726/2017), OPTOX (PTDC/CTA-AMB/30056/2017), BINDER (PTDC/CCI-INF/29168/2017), PREDICT (PTDC/CCI-CIF/29877/2017) and GADgET (DSAIPA/DS/0022/2018). The authors also thank Mauro Castelli from NOVA IMS for suggesting important references on transfer learning with GP.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Muñoz, L., Trujillo, L. & Silva, S. Transfer learning in constructive induction with Genetic Programming. Genet Program Evolvable Mach 21, 529–569 (2020). https://doi.org/10.1007/s10710-019-09368-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-019-09368-y