Mini-Batching, Gradient-Clipping, First- versus Second-Order: What Works in Gradient-Based Coefficient Optimisation for Symbolic Regression?
Created by W.Langdon from
gp-bibliography.bib Revision:1.8081
- @InProceedings{harrison:2023:GECCO,
-
author = "Joe Harrison and Marco Virgolin and
Tanja Alderliesten and Peter Bosman",
-
title = "{Mini-Batching,} {Gradient-Clipping,} First- versus
{Second-Order:} What Works in {Gradient-Based}
Coefficient Optimisation for Symbolic Regression?",
-
booktitle = "Proceedings of the 2023 Genetic and Evolutionary
Computation Conference",
-
year = "2023",
-
editor = "Sara Silva and Luis Paquete and Leonardo Vanneschi and
Nuno Lourenco and Ales Zamuda and Ahmed Kheiri and
Arnaud Liefooghe and Bing Xue and Ying Bi and
Nelishia Pillay and Irene Moser and Arthur Guijt and
Jessica Catarino and Pablo Garcia-Sanchez and
Leonardo Trujillo and Carla Silva and Nadarajen Veerapen",
-
pages = "1127--1136",
-
address = "Lisbon, Portugal",
-
series = "GECCO '23",
-
month = "15-19 " # jul,
-
organisation = "SIGEVO",
-
publisher = "Association for Computing Machinery",
-
publisher_address = "New York, NY, USA",
-
keywords = "genetic algorithms, genetic programming, symbolic
regression, gradient descent, explainable AI,
coefficient optimisation",
-
isbn13 = "9798400701191",
-
DOI = "doi:10.1145/3583131.3590368",
-
size = "10 pages",
-
abstract = "The aim of Symbolic Regression (SR) is to discover
interpretable expressions that accurately describe
data. The accuracy of an expression depends on both its
structure and coefficients. To keep the structure
simple enough to be interpretable, effective
coefficient optimisation becomes key. Gradient-based
optimisation is clearly effective at training neural
networks in Deep Learning (DL), which can essentially
be viewed as large, over-parameterised expressions: in
this paper, we study how gradient-based optimisation
techniques as often used in DL transfer to SR. In
particular, we first assess what techniques work well
across random SR expressions, independent of any
specific SR algorithm. We find that mini-batching and
gradient-clipping can be helpful (similar to DL), while
second-order optimisers outperform first-order ones
(different from DL). Next, we consider whether
including gradient-based optimisation in Genetic
Programming (GP), a classic SR algorithm, is
beneficial. On five real-world datasets, in a
generation-based comparison, we find that second-order
optimisation outperforms coefficient mutation (or no
optimisation). However, in time-based comparisons,
performance gaps shrink substantially because the
computational expensiveness of second-order
optimisation causes GP to perform fewer generations.
The interplay of computational costs between the
optimisation of structure and coefficients is thus a
critical aspect to consider.",
-
notes = "GECCO-2023 A Recombination of the 32nd International
Conference on Genetic Algorithms (ICGA) and the 28th
Annual Genetic Programming Conference (GP)",
- }
Genetic Programming entries for
Joe Harrison
Marco Virgolin
Tanja Alderliesten
Peter A N Bosman
Citations