skip to main content
10.1145/3449639.3459285acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

PSB2: the second program synthesis benchmark suite

Published:26 June 2021Publication History

ABSTRACT

For the past six years, researchers in genetic programming and other program synthesis disciplines have used the General Program Synthesis Benchmark Suite to benchmark many aspects of automatic program synthesis systems. These problems have been used to make notable progress toward the goal of general program synthesis: automatically creating the types of software that human programmers code. Many of the systems that have attempted the problems in the original benchmark suite have used it to demonstrate performance improvements granted through new techniques. Over time, the suite has gradually become outdated, hindering the accurate measurement of further improvements. The field needs a new set of more difficult benchmark problems to move beyond what was previously possible.

In this paper, we describe the 25 new general program synthesis benchmark problems that make up PSB2, a new benchmark suite. These problems are curated from a variety of sources, including programming katas and college courses. We selected these problems to be more difficult than those in the original suite, and give results using PushGP showing this increase in difficulty. These new problems give plenty of room for improvement, pointing the way for the next six or more years of general program synthesis research.

References

  1. dnolan. 2015. Code Wars: Ten-Pin Bowling. https://www.codewars.com/kata/5531abe4855bcc8d1f00004c/javascript Accessed: 2020-01-20.Google ScholarGoogle Scholar
  2. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  3. Project Euler. 2002. Project Euler: Coin Sums. https://projecteuler.net/problem=31 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  4. Project Euler. 2008. Project Euler: Dice Game. https://projecteuler.net/problem=205 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  5. Austin J. Ferguson, Jose Guadalupe Hernandez, Daniel Junghans, Alexander Lalejini, Emily Dolson, and Charles Ofria. 2019. Characterizing the effects of random subsampling and dilution on Lexicase selection. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Erik Goodman, Leigh Sheneman, Leonardo Trujillo, and Bill Worzel (Eds.). East Lansing, MI, USA.Google ScholarGoogle Scholar
  6. Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming. In EuroGP 2017: Proceedings of the 20th European Conference on Genetic Programming (LNCS, Vol. 10196). Springer Verlag, Amsterdam, 262--277. Google ScholarGoogle ScholarCross RefCross Ref
  7. Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending Program Synthesis Grammars for Grammar-Guided Genetic Programming. In 15th International Conference on Parallel Problem Solving from Nature (LNCS, Vol. 11101), Anne Auger, Carlos M. Fonseca, Nuno Lourenco, Penousal Machado, Luis Paquete, and Darrell Whitley (Eds.). Springer, Coimbra, Portugal, 197--208. Google ScholarGoogle ScholarCross RefCross Ref
  8. Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Towards effective semantic operators for program synthesis in genetic programming. In GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Kyoto, Japan, 1119--1126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Towards Understanding and Refining the General Program Synthesis Benchmark Suite with Genetic Programming. In 2018 IEEE Congress on Evolutionary Computation (CEC), Marley Vellasco (Ed.). IEEE, Rio de Janeiro, Brazil. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. g964. 2015. Code Wars: Bouncing Balls. https://www.codewars.com/kata/5544c7a5cb454edb3c000047 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  11. Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. SIGPLAN Not. 46, 1 (Jan. 2011), 317--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Thomas Helmuth and Peter Kelly. 2019. General Program Synthesis Benchmark Suite Datasets. https://github.com/thelmuth/program-synthesis-benchmark-datasetsGoogle ScholarGoogle Scholar
  13. Thomas Helmuth and Peter Kelly. 2021. PSB2: The Second Program Synthesis Benchmark Suite. Google ScholarGoogle ScholarCross RefCross Ref
  14. Thomas Helmuth, Nicholas Freitag McPhee, Edward Pantridge, and Lee Spector. 2017. Improving Generalization of Evolved Programs Through Automatic Simplification. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, Berlin, Germany, 937--944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program Synthesis using Uniform Mutation by Addition and Deletion. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, Kyoto, Japan, 1127--1134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Thomas Helmuth, Edward Pantridge, Grace Woolson, and Lee Spector. 2020. Genetic Source Sensitivity and Transfer Learning in Genetic Programming. In Artificial Life Conference Proceedings. MIT Press, 303--311. Google ScholarGoogle ScholarCross RefCross Ref
  17. Thomas Helmuth and Lee Spector. 2015. General Program Synthesis Benchmark Suite. In GECCO '15: Proceedings of the 2015 conference on Genetic and Evolutionary Computation Conference. ACM, Madrid, Spain, 1039--1046. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thomas Helmuth and Lee Spector. 2020. Explaining and Exploiting the Advantages of Down-sampled Lexicase Selection. In Artificial Life Conference Proceedings. MIT Press, 341--349. Google ScholarGoogle ScholarCross RefCross Ref
  19. Thomas Helmuth and Lee Spector. 2021. Problem-solving benefits of down-sampled lexicase selection. Artificial Life (2021). In press.Google ScholarGoogle Scholar
  20. Thomas Helmuth, Lee Spector, and James Matheson. 2015. Solving Uncompromising Problems with Lexicase Selection. IEEE Transactions on Evolutionary Computation 19, 5 (Oct. 2015), 630--643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, and Saul Shanabrook. 2016. Linear Genomes for Structured Programs. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.Google ScholarGoogle Scholar
  22. Erik Hemberg, Jonathan Kelly, and Una-May O'Reilly. 2019. On domain knowledge and novelty to improve program synthesis performance with grammatical evolution. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Prague, Czech Republic, 1039--1046. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jose Guadalupe Hernandez, Alexander Lalejini, Emily Dolson, and Charles Ofria. 2019. Random subsampling improves performance in lexicase selection. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Prague, Czech Republic, 2028--2031. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. jacobb. 2014. Code Wars: Simple Substitution Cipher Helper. https://www.codewars.com/kata/52eb114b2d55f0e69800078d Accessed: 2020-01-20.Google ScholarGoogle Scholar
  25. jhoffner. 2013. Code Wars: Convert string to camel case. https://www.codewars.com/kata/517abf86da9663f1d2000003 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  26. Susumu Katayama. 2010. Recent Improvements of MagicHaskeller. In Approaches and Applications of Inductive Programming. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  27. Jonathan Kelly, Erik Hemberg, and Una-May O'Reilly. 2019. Improving Genetic Programming with Novel Exploration - Exploitation Control. In EuroGP 2019: Proceedings of the 22nd European Conference on Genetic Programming, Lukas Sekanina, Ting Hu, Nuno Lourenço, Hendrik Richter, and Pablo García-Sánchez (Eds.). Springer International Publishing, 64--80.Google ScholarGoogle Scholar
  28. KenKamau. 2017. Code Wars: The boolean order. https://www.codewars.com/kata/59eb1e4a0863c7ff7e000008 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  29. Alexander Lalejini and Charles Ofria. 2019. Tag-accessed memory for genetic programming. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Prague, Czech Republic, 346--347. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Trang T Le, William La Cava, Joseph D Romano, John T Gregg, Daniel J Goldberg, Praneel Chakraborty, Natasha L Ray, Daniel Himmelstein, Weixuan Fu, and Jason H Moore. 2020. PMLB v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058 (2020).Google ScholarGoogle Scholar
  31. Jinsuk Lim and Shin Yoo. 2016. Field report: Applying monte carlo tree search for program synthesis. In International Symposium on Search Based Software Engineering. Springer, 304--310.Google ScholarGoogle ScholarCross RefCross Ref
  32. David Lynch, James McDermott, and Michael O'Neill. 2020. Program Synthesis in a Continuous Space using Grammars and Variational Autoencoders. In 16th International Conference on Parallel Problem Solving from Nature, Part II (LNCS, Vol. 12270), Thomas Baeck, Mike Preuss, Andre Deutz, Hao Wang2, Carola Doerr, Michael Emmerich, and Heike Trautmann (Eds.). Springer, Leiden, Holland, 33--47. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. mcclaskc. 2014. Code Wars: Validate Credit Card Number. https://www.codewars.com/kata/5418a1dd6d8216e18a0012b2 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  34. James McDermott, David R. White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaskowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, and Una-May O'Reilly. 2012. Genetic programming needs better benchmarks. In GECCO '12: Proceedings of the Genetic and evolutionary computation conference. ACM, Philadelphia, Pennsylvania, USA, 791--798. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. MrZizoScream. 2018. Code Wars: Array Leaders. https://www.codewars.com/kata/5a651865fd56cb55760000e0 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  36. myjinxin2015. 2016. Code Wars: Fastest Code: Half it IV. https://www.codewars.com/kata/5719b28964a584476500057d Accessed: 2020-01-20.Google ScholarGoogle Scholar
  37. MysteriousMagenta. 2014. Code Wars: Square Every Digit. https://www.codewars.com/kata/546e2562b03326a88e000020 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  38. Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36. Google ScholarGoogle ScholarCross RefCross Ref
  39. Michael O'Neill and Anthony Brabazon. 2019. Mutational Robustness and Structural Complexity in Grammatical Evolution. In 2019 IEEE Congress on Evolutionary Computation, CEC 2019, Carlos A. Coello Coello (Ed.). IEEE Computational Intelligence Society, IEEE Press, Wellington, New Zealand, 1338--1344. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Michael O'Neill and Lee Spector. 2020. Automatic programming: The open issue? Genetic Programming and Evolvable Machines 21, 1-2 (June 2020), 251--262. https://doi.org/ Twentieth Anniversary Issue. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Edward Pantridge, Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2017. On the Difficulty of Benchmarking Inductive Program Synthesis Methods. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). ACM, Berlin, Germany, 1589--1596. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Edward Pantridge and Lee Spector. 2020. Code Building Genetic Programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (GECCO '20). Association for Computing Machinery, internet, 994--1002. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. rb50. 2017. Code Wars: Shopping List. https://www.codewars.com/kata/596266482f9add20f70001fc Accessed: 2020-01-20.Google ScholarGoogle Scholar
  44. Christopher D. Rosin. 2019. Stepping Stones to Inductive Synthesis of Low-Level Looping Programs. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI '19, Vol. 33). AAAI Press, Palo Alto, California USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. RVdeKoning. 2015. Code Wars: Greatest common divisor. https://www.codewars.com/kata/5500d54c2ebe0a8e8a0003fd/python Accessed: 2020-01-20.Google ScholarGoogle Scholar
  46. Anil Kumar Saini and Lee Spector. 2019. Using Modularity Metrics as Design Features to Guide Evolution in Genetic Programming. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Erik Goodman, Leigh Sheneman, Leonardo Trujillo, and Bill Worzel (Eds.). Springer, East Lansing, MI, USA, 165--180. https://doi.org/ Google ScholarGoogle ScholarCross RefCross Ref
  47. Anil Kumar Saini and Lee Spector. 2020. Why and When Are Loops Useful in Genetic Programming?. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (GECCO '20). Association for Computing Machinery, internet, 247--248. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shivo. 2015. Code Wars: Get the Middle Character. https://www.codewars.com/kata/56747fd5cb988479af000028 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  49. smile67. 2016. Code Wars: Text Search. https://www.codewars.com/kata/56b78faebd06e61870001191 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  50. Dominik Sobania and Franz Rothlauf. 2020. Challenges of Program Synthesis with Grammatical Evolution. In EuroGP 2020: Proceedings of the 23rd European Conference on Genetic Programming (LNCS, Vol. 12101), Ting Hu, Nuno Lourenco, and Eric Medvet (Eds.). Springer Verlag, Seville, Spain, 211--227. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The Push3 execution stack and the evolution of control. In GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, Vol. 2. ACM Press, Washington DC, USA, 1689--1696. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Lee Spector and Alan Robinson. 2002. Genetic Programming and Autoconstructive Evolution with the Push Programming Language. Genetic Programming and Evolvable Machines 3, 1 (March 2002), 7--40. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. StephenLastname2. 2017. Code Wars: Distance between two points. https://www.codewars.com/kata/5a0b72484bebaefe60001867 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  54. stephenyu. 2014. Code Wars: Fizz Buzz. https://www.codewars.com/kata/5300901726d12b80e8000498 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  55. Eric Wastl. 2015. Advent of Code: Not Quite Lisp. https://adventofcode.com/2015/day/1 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  56. Eric Wastl. 2017. Advent of Code: Inverse Captcha. https://adventofcode.com/2017/day/1 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  57. Eric Wastl. 2019. Advent of Code: The Tyranny of the Rocket Empire. https://adventofcode.com/2019/day/1 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  58. Eric Wastl. 2020. Advent of Code: Report Repair. https://adventofcode.com/2020/day/1 Accessed: 2020-01-20.Google ScholarGoogle Scholar
  59. David R. White, James Mcdermott, Mauro Castelli, Luca Manzoni, Brian W. Goldman, Gabriel Kronberger, Wojciech Jaśkowski, Una-May O'Reilly, and Sean Luke. 2013. Better GP benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines 14, 1 (March 2013), 3--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. John Woodward, Simon Martin, and Jerry Swan. 2014. Benchmarks that matter for genetic programming. In GECCO 2014 4th workshop on evolutionary computation for the automated design of algorithms. ACM, Vancouver, BC, Canada, 1397--1404. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. xDranik. 2013. Code Wars: Stop gninnipS My sdroW! https://www.codewars.com/kata/5264d2b162488dc400000001 Accessed: 2020-01-20.Google ScholarGoogle Scholar

Index Terms

  1. PSB2: the second program synthesis benchmark suite

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference
      June 2021
      1219 pages
      ISBN:9781450383509
      DOI:10.1145/3449639

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 June 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader