Symbolic regression for the interpretation of quantitative structure-property relationships
Created by W.Langdon from
gp-bibliography.bib Revision:1.8120
- @Article{TAKAKI:2022:ailsci,
-
author = "Katsushi Takaki and Tomoyuki Miyao",
-
title = "Symbolic regression for the interpretation of
quantitative structure-property relationships",
-
journal = "Artificial Intelligence in the Life Sciences",
-
volume = "2",
-
pages = "100046",
-
year = "2022",
-
ISSN = "2667-3185",
-
DOI = "doi:10.1016/j.ailsci.2022.100046",
-
URL = "https://www.sciencedirect.com/science/article/pii/S2667318522000162",
-
keywords = "genetic algorithms, genetic programming, Model
interpretability, Quantitative structure-activity
relationships, Quantitative structure-property
relationships, Symbolic regression",
-
abstract = "The interpretation of quantitative structure-activity
or structure-property relationships is important in the
field of chemoinformatics. Although multivariate linear
regression models are typically interpretable, they do
not generally have high predictive abilities. Symbolic
regression (SR) combined with genetic programming (GP)
is a well-established technique for generating the
mathematical expressions that describe the
relationships within a dataset. However, SR sometimes
produces complicated expressions that are hard for
humans to interpret. This paper proposes a method for
generating simpler expressions by incorporating three
filters into GP-based SR. The filters are further
combined with nonlinear least-squares optimization to
give filter-introduced GP (FIGP), which improves the
predictive ability of SR models while retaining simple
expressions. As a proof-of-concept, the quantitative
estimate of drug-likeness and the synthetic
accessibility score are predicted based on the chemical
structures of compounds. Overall, FIGP generates
less-complicated expressions than previous SR methods.
In terms of predictive ability, FIGP is better than GP,
but is outperformed by a support vector machine with a
radial basis function kernel. Furthermore, quantitative
structure-activity relationship models are constructed
for three matching molecular series with biological
targets. In the case of one target, the activity
prediction models given by FIGP exhibit better
predictive ability than multivariate linear regression
and support vector regression with the radial basis
function kernel, whereas for the remaining cases, FIGP
is slightly less accurate than multivariate linear
regression",
- }
Genetic Programming entries for
Katsushi Takaki
Tomoyuki Miyao
Citations