abstract = "Much of the research on the accuracy of symbolic
regression (SR) has focused on artificially constructed
search problems where there is zero noise in the data.
Such problems admit of exact solutions but cannot tell
us how accurate the search process is in a noisy real
world domain. To explore this question symbolic
regression is applied here to an area of research which
has been well-travelled by regression modellers: the
prediction of unemployment rates. A respected dataset
was selected, the CEP-OECD Labor Market Institutions
Database, to provide a testing environment for a
variety of searches. Metrics of success for this paper
went beyond the normal yardsticks of statistical
significance to demand plausibility. Here it is assumed
that a plausible model must be able to predict
unemployment rates out of the sample period for six
future years: this metric is referred to as the out of
sample R-squared. We conclude that the two packages
tested, Eureqa and ARC, can produce models that go
beyond the power of traditional stepwise regression.
ARC, in particular, is able to replicate the format of
published economic research because ARC contains a high
level Regression Query Language (RQL). This research
produced a number of models that are consistent with
published economic research, have in sample R-squared
values over 0.80, no negative unemployment rates, and
out of sample R-squared values above 0.45. It is argued
that SR offers significant new advantages to social
science researchers.",