Skip to main content

Finding Simple Solutions to Multi-Task Visual Reinforcement Learning Problems with Tangled Program Graphs

  • Chapter
  • First Online:
Book cover Genetic Programming Theory and Practice XVIII

Abstract

Tangled Program Graphs (TPG) represents a genetic programming framework in which emergent modularity incrementally composes programs into teams of programs into graphs of teams of programs. To date, the framework has been demonstrated on reinforcement learning tasks with stochastic partially observable state spaces or time series prediction. However, evolving solutions to reinforcement tasks often requires agents to demonstrate/ juggle multiple properties simultaneously. Hence, we are interesting in maintaining a population of diverse agents. Specifically, agent performance on a reinforcement learning task controls how much of the task they are exposed to. Premature convergence might therefore preclude solving aspects of a task that the agent only later encounters. Moreover, ‘pointless complexity’ may also result in which graphs largely consist of hitchhikers. In this research we benchmark the utilization of rampant mutation (multiple mutations applied simultaneously for offspring creation) and action programs (multiple actions per state). Several parameterizations are also introduced that potentially penalize the introduction of hitchhikers. Benchmarking over five VizDoom tasks demonstrates that rampant mutation reduces the likelihood of encountering pathologically bad offspring while action programs appears to improve performance in four out of five tasks. Finally, use of TPG parameterizations that actively limit the complexity of solutions appears to result in very efficient low dimensional solutions that generalize best across all combinations of 3, 4 and 5 VizDoom tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Implies that the interaction represents the special case of an episodic task [24].

  2. 2.

    Although a minimum of two learners (with different actions) is necessary to avoid defining a degenerate team Sect. 1.2.2.

  3. 3.

    An arc marking scheme has since been proposed [9], however, for the purpose of this work the original team formulation was assumed.

  4. 4.

    Stochastic nature of each subtask requires that agents are evaluated over multiple initializations.

  5. 5.

    https://github.com/mwydmuch/ViZDoom/tree/master/scenarios.

  6. 6.

    Reflected in the parameterization of the ‘Rampant Magnitude’ row in Table 1.1.

  7. 7.

    Includes introns and hitchhikers.

References

  1. Bjedov, I., Tenaillon, O., Gerard, B., Souza, V., Denamur, E., Radman, M., Taddei, F., Matic, I.: Stress-induced mutagenesis in bacteria. Science 300, 1404–1409 (2003)

    Google Scholar 

  2. Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007)

    Google Scholar 

  3. Branke, J.: Evolutionary approaches to dynamic environments—a survey. In: GECCO Workshop on Dynamic Optimization Problems, pp. 134–137 (1999)

    Google Scholar 

  4. Cobb, H.G.: An investigation into the use of hypermutation as an adaptive operating in genetic algorithms having continuous, time-dependent non-stationary environments. Technical Report TR AIC-90-001, Naval research Laboratory (1990)

    Google Scholar 

  5. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Ghosh, A., Tstutsui, S., Tanaka, H.: Function optimization in non-stationary environment using steady state genetic algorithms with aging of individuals. In: IEEE Congress on Evolutionary Computation, pp. 666–671 (1998)

    Google Scholar 

  7. Grefenstette, J.J.: Genetic algorithms for changing environments. In: PPSN, pp. 137–144 (1992)

    Google Scholar 

  8. Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. CoRR (2019). arXiv:abs/1901.08652

  9. Ianta, A., Amaral, R., Bayer, C., Smith, R.J., Heywood, M.I.: On the impact of tangled program graph marking schemes under the atari reinforcement learning benchmark. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, p. to appear (2021)

    Google Scholar 

  10. Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)

    Google Scholar 

  11. Kelly, S., Heywood, M.I.: Emergent tangled graph representations for atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017)

    Google Scholar 

  12. Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)

    Google Scholar 

  13. Kelly, S., Newsted, J., Banzhaf, W., Gondro, C.: A modular memory framework for time series prediction. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 949–957 (2020)

    Google Scholar 

  14. Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: a tutorial. In: Banzhaf, W., Spector, L., Sheneman L (eds.) Genetic Programming Theory and Practice XVI, Genetic and Evolutionary Computation, pp. 37–57 (2018)

    Google Scholar 

  15. Kelly, S., Smith, R.J., Heywood, M.I., Banzhaf, W.: Emergent tangled program graphs in partially observable recursive forecasting and ViZDoom navigation tasks. ACM Trans. Evol. Learn. Optim. 1 (2021)

    Google Scholar 

  16. Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaskowski, W.: ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)

    Google Scholar 

  17. Koza, J.R.: Genetic Programming—On the Programming of Computers by Means of Natural Selection. MIT Press, Complex Adaptive Systems (1993)

    Google Scholar 

  18. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. J. Artif. Intell. Res. 11, 199–229 (1999)

    Google Scholar 

  19. Parter, M., Kashtan, N., Alon, U.: Facilitated variation: how evolution learns from past environments to generalize to new environments. PLOS Comput. Biol. 4(11), 1–15 (2008)

    Google Scholar 

  20. Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, Lecture LNCS, vol. 10781, pp. 135–150 (2018)

    Google Scholar 

  21. Smith, R.J., Heywood, M.I.: Evolving Dota 2 shadow fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 179–187 (2019)

    Google Scholar 

  22. Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019)

    Google Scholar 

  23. Sünderhauf, N., Brock, O., Scheirer, W.J., Hadsell, R., Fox, D., Leitner, J., Upcroft, B., Abbeel, P., Burgard, W., Milford, M., Corke, P.: The limits and potentials of deep learning for robotics. Int. J. Robot. Res. 37(4–5), 405–420 (2018)

    Google Scholar 

  24. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT (2018)

    Google Scholar 

  25. Teng, G., Popavasiliou, F.N.: Immunoglobulin somatic hypermutation. Annu. Rev. Genet. 41, 107–120 (2007)

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge support from the NSERC CRD and Discovery programs (Canada).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bayer, C., Amaral, R., Smith, R.J., Ianta, A., Heywood, M.I. (2022). Finding Simple Solutions to Multi-Task Visual Reinforcement Learning Problems with Tangled Program Graphs. In: Banzhaf, W., Trujillo, L., Winkler, S., Worzel, B. (eds) Genetic Programming Theory and Practice XVIII. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-16-8113-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8113-4_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8112-7

  • Online ISBN: 978-981-16-8113-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics