A general method for incremental self-improvement and multiagent learning
Created by W.Langdon from
gp-bibliography.bib Revision:1.7917
- @InCollection{Schmidhuber:1999:ECTA,
-
author = "Juergen Schmidhuber",
-
title = "A general method for incremental self-improvement and
multiagent learning",
-
booktitle = "Evolutionary Computation: Theory and Applications",
-
publisher = "Scientific Publ. Co.",
-
year = "1999",
-
editor = "Xin Yao",
-
chapter = "3",
-
pages = "81--123",
-
address = "Singapore",
-
keywords = "genetic algorithms, genetic programming",
-
isbn13 = "978-981-281-747-1",
-
URL = "ftp://ftp.idsia.ch/pub/juergen/xinbook.pdf",
-
URL = "http://www.idsia.ch/~juergen/xinbook/",
-
book_doi = "doi:10.1142/2792",
-
abstract = "I describe a novel paradigm for reinforcement learning
(RL) with limited computational resources in realistic,
non-resettable environments. The learner's policy is an
arbitrary modifiable algorithm mapping environmental
inputs and internal states to outputs and new internal
states. Like in the real world, any event in system
life and any learning process computing policy
modifications may affect future performance and
preconditions of future learning processes. There is no
need for pre-defined trials. At a given time in system
life, there is only one single training example to
evaluate the current long-term usefulness of any given
previous policy modification, namely the average
reinforcement per time since that modification
occurred. At certain times in system life called
checkpoints, such singular observations are used by a
stack-based backtracking method which invalidates
certain previous policy modifications, such that the
history of still valid modifications corresponds to a
history of long-term reinforcement accelerations (up
until to the current checkpoint, each still valid
modification has been followed by faster reinforcement
intake than all the previous ones). Until the next
checkpoint there is time to collect delayed
reinforcement and to execute additional policy
modifications; until then no previous policy
modifications are invalidated; and until then the
straight-forward, temporary generalization assumption
is: each modification that until now appeared to
contribute to an overall speed-up will remain useful.
The paradigm provides a foundation for (1)
meta-learning, and (2) multi-agent learning. The
principles are illustrated in (1) a single,
self-referential, evolutionary system using an
assembler-like programming language to modify its own
policy, and to modify the way it modifies its policy,
etc., and (2) another evolutionary system consisting of
multiple agents, where each agent is in fact just a
connection in a fully recurrent RL neural net.",
- }
Genetic Programming entries for
Jurgen Schmidhuber
Citations