A Model of External Memory for Navigation in Partially Observative Visual Reinforcement Learning Tasks
Created by W.Langdon from
gp-bibliography.bib Revision:1.7970
- @InProceedings{Smith:2019:EuroGP,
-
author = "Robert Smith and Malcolm Heywood",
-
title = "A Model of External Memory for Navigation in Partially
Observative Visual Reinforcement Learning Tasks",
-
booktitle = "EuroGP 2019: Proceedings of the 22nd European
Conference on Genetic Programming",
-
year = "2019",
-
month = "24-26 " # apr,
-
editor = "Lukas Sekanina and Ting Hu and Nuno Lourenco",
-
series = "LNCS",
-
volume = "11451",
-
publisher = "Springer Verlag",
-
address = "Leipzig, Germany",
-
pages = "162--177",
-
organisation = "EvoStar, Species",
-
keywords = "genetic algorithms, genetic programming, Computer
Video Games, First person shooter",
-
isbn13 = "978-3-030-16669-4",
-
URL = "https://www.springer.com/us/book/9783030166694",
-
DOI = "doi:10.1007/978-3-030-16670-0_11",
-
size = "16 pages",
-
abstract = "Visual reinforcement learning implies that, decision
making policies are identified under delayed rewards
from an environment. Moreover, state information takes
the form of high-dimensional data, such as video. In
addition, although the video might characterize a 3D
world in high resolution, partial observability will
place significant limits on what the agent can actually
perceive of the world. This means that the agent also
has to: (1) provide efficient encodings of state, (2)
store the encodings of state efficiently in some form
of memory, (3) recall such memories after arbitrary
delays for decision making. In this work, we
demonstrate how an external memory model facilitates
decision making in the complex world of multi-agent
deathmatches in the ViZDoom first person shooter
environment. The ViZDoom environment provides a complex
environment of multiple rooms and resources in which
agents are spawned from multiple different locations. A
unique approach is adopted to defining external memory
for genetic programming agents in which: (1) the state
of memory is shared across all programs. (2) Writing is
formulated as a probabilistic process, resulting in
different regions of memory having short- versus
long-term memory. (3) Read operations are indexed,
enabling programs to identify regions of external
memory with specific temporal properties. We
demonstrate that agents purposefully navigate the world
when external memory is provided, whereas those without
external memory are limited to merely flight or fight
behaviour.",
-
notes = "http://www.evostar.org/2019/cfp_eurogp.php#abstracts
Part of \cite{Sekanina:2019:GP} EuroGP'2019 held in
conjunction with EvoCOP2019, EvoMusArt2019 and
EvoApplications2019",
- }
Genetic Programming entries for
Robert J Smith
Malcolm Heywood
Citations