Automatic discovery of cross-family sequence features associated with protein function
Created by W.Langdon from
gp-bibliography.bib Revision:1.7954
- @Article{oai:biomedcentral.com:1471-2105-7-16,
-
author = "Markus Brameier and Josien Haan and Andrea Krings and
Robert M MacCallum",
-
title = "Automatic discovery of cross-family sequence features
associated with protein function",
-
publisher = "BioMed Central Ltd.",
-
year = "2006",
-
month = jan # "~12",
-
journal = "BMC bioinformatics [electronic resource]",
-
volume = "7",
-
number = "16",
-
keywords = "genetic algorithms, genetic programming",
-
ISSN = "1471-2105",
-
bibsource = "OAI-PMH server at www.biomedcentral.com",
-
language = "en",
-
oai = "oai:biomedcentral.com:1471-2105-7-16",
-
rights = "Copyright 2006 Brameier et al; licensee BioMed Central
Ltd.",
-
URL = "http://www.biomedcentral.com/content/pdf/1471-2105-7-16.pdf",
-
URL = "http://www.biomedcentral.com/1471-2105/7/16",
-
DOI = "doi:10.1186/1471-2105-7-16",
-
size = "16 pages",
-
abstract = "Background Methods for predicting protein function
directly from amino acid sequences are useful tools in
the study of uncharacterised protein families and in
comparative genomics. Until now, this problem has been
approached using machine learning techniques that
attempt to predict membership, or otherwise, to
predefined functional categories or subcellular
locations. A potential drawback of this approach is
that the human-designated functional classes may not
accurately reflect the underlying biology, and
consequently important sequence-to-function
relationships may be missed. Results We show that a
self-supervised data mining approach is able to find
relationships between sequence features and functional
annotations. No preconceived ideas about functional
categories are required, and the training data is
simply a set of protein sequences and their
UniProt/Swiss-Prot annotations. The main technical
aspect of the approach is the co-evolution of amino
acid-based regular expressions and keyword-based
logical expressions with genetic programming. Our
experiments on a strictly non-redundant set of
eukaryotic proteins reveal that the strongest and most
easily detected sequence-to-function relationships are
concerned with targeting to various cellular
compartments, which is an area already well studied
both experimentally and computationally. Of more
interest are a number of broad functional roles which
can also be correlated with sequence features. These
include inhibition, biosynthesis, transcription and
defence against bacteria. Despite substantial overlaps
between these functions and their corresponding
cellular compartments, we find clear differences in the
sequence motifs used to predict some of these
functions. For example, the presence of polyglutamine
repeats appears to be linked more strongly to the
{"}transcription{"} function than to the general
{"}nuclear{"} function/location. Conclusion We have
developed a novel and useful approach for knowledge
discovery in annotated sequence data. The technique is
able to identify functionally important sequence
features and does not require expert knowledge. By
viewing protein function from a sequence perspective,
the approach is also suitable for discovering
unexpected links between biological processes, such as
the recently discovered role of ubiquitination in
transcription.",
-
notes = "PMID: 16409628",
- }
Genetic Programming entries for
Markus Brameier
Josien Haan
Andrea Krings
Robert M MacCallum
Citations