ABSTRACT
GitHub Copilot, an extension for the Visual Studio Code development environment powered by the large-scale language model Codex, makes automatic program synthesis available for software developers. This model has been extensively studied in the field of deep learning, however, a comparison to genetic programming, which is also known for its performance in automatic program synthesis, has not yet been carried out. In this paper, we evaluate GitHub Copilot on standard program synthesis benchmark problems and compare the achieved results with those from the genetic programming literature. In addition, we discuss the performance of both approaches. We find that the performance of the two approaches on the benchmark problems is quite similar, however, in comparison to GitHub Copilot, the program synthesis approaches based on genetic programming are not yet mature enough to support programmers in practical software development. Genetic programming usually needs a huge amount of expensive hand-labeled training cases and takes too much time to generate solutions. Furthermore, source code generated by genetic programming approaches is often bloated and difficult to understand. For future work on program synthesis with genetic programming, we suggest researchers to focus on improving the execution time, readability, and usability.
- Francisco Baeta, João Correia, Tiago Martins, and Penousal Machado. 2021. TensorGP-Genetic Programming Engine in TensorFlow.. In EvoApplications. Springer, 763--778.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473Google Scholar
- Markus F Brameier and Wolfgang Banzhaf. 2007. Linear genetic programming. Springer Science & Business Media.Google Scholar
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdfGoogle Scholar
- Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting Training Data from Large Language Models. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2633--2650. https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extractingGoogle Scholar
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).Google Scholar
- Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724--1734.Google ScholarCross Ref
- Colin Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9052--9065.Google ScholarCross Ref
- Nichael Lynn Cramer. 1985. A representation for the adaptive generation of simple sequential programs. In proceedings of an International Conference on Genetic Algorithms and the Applications. 183--187.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186.Google Scholar
- Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A PreTrained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536--1547.Google ScholarCross Ref
- Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A grammar design pattern for arbitrary program synthesis problems in genetic programming. In European Conference on Genetic Programming. Springer, 262--277.Google ScholarCross Ref
- Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending program synthesis grammars for grammar-guided genetic programming. In International Conference on Parallel Problem Solving from Nature. Springer, 197--208.Google ScholarCross Ref
- Sumit Gulwani. 2010. Dimensions in program synthesis. In Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming. 13--24.Google ScholarDigital Library
- Thomas Helmuth and Amr Abdelhady. 2020. Benchmarking parent selection for program synthesis by genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. 237--238.Google ScholarDigital Library
- Thomas Helmuth and Peter Kelly. 2021. PSB2: the second program synthesis benchmark suite. In Proceedings of the Genetic and Evolutionary Computation Conference. 785--794.Google ScholarDigital Library
- Thomas Helmuth, Nicholas Freitag McPhee, Edward Pantridge, and Lee Spector. 2017. Improving generalization of evolved programs through automatic simplification. In Proceedings of the Genetic and Evolutionary Computation Conference. 937--944.Google ScholarDigital Library
- Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program synthesis using uniform mutation by addition and deletion. In Proceedings of the Genetic and Evolutionary Computation Conference. 1127--1134.Google ScholarDigital Library
- Thomas Helmuth and Lee Spector. 2015. General program synthesis benchmark suite. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. 1039--1046.Google ScholarDigital Library
- Thomas Helmuth and Lee Spector. 2021. Problem-solving benefits of down-sampled lexicase selection. arXiv preprint arXiv:2106.06085 (2021).Google Scholar
- Erik Hemberg, Jonathan Kelly, and Una-May O'Reilly. 2019. On domain knowledge and novelty to improve program synthesis performance with grammatical evolution. In Proceedings of the Genetic and Evolutionary Computation Conference. 1039--1046.Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 328--339.Google ScholarCross Ref
- John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.Google ScholarDigital Library
- Alexander Lalejini and Charles Ofria. 2019. Tag-accessed memory for genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 346--347.Google ScholarDigital Library
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.Google ScholarCross Ref
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- Md Omar Faruk Rokon, Risul Islam, Ahmad Darki, Evangelos E. Papalexakis, and Michalis Faloutsos. 2020. SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 149--163. https://www.usenix.org/conference/raid2020/presentation/omarGoogle Scholar
- Dominik Sobania. 2021. On the generalizability of programs synthesized by grammar-guided genetic programming. In European Conference on Genetic Programming (Part of EvoStar). Springer, 130--145.Google ScholarDigital Library
- Dominik Sobania and Franz Rothlauf. 2019. Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches. In Proceedings of the Genetic and Evolutionary Computation Conference. 1065--1074.Google ScholarDigital Library
- Dominik Sobania and Franz Rothlauf. 2020. Challenges of program synthesis with grammatical evolution. In European Conference on Genetic Programming (Part of EvoStar). Springer, 211--227.Google ScholarDigital Library
- Dominik Sobania and Franz Rothlauf. 2021. A generalizability measure for program synthesis with genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference. 822--829.Google ScholarDigital Library
- Dominik Sobania, Dirk Schweim, and Franz Rothlauf. 2022. A Comprehensive Survey on Program Synthesis with Evolutionary Algorithms. IEEE Transactions on Evolutionary Computation (2022).Google ScholarCross Ref
- Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The push3 execution stack and the evolution of control. In Proceedings of the 7th annual conference on Genetic and evolutionary computation. 1689--1696.Google ScholarDigital Library
- Lee Spector and Alan Robinson. 2002. Genetic programming and autoconstructive evolution with the push programming language. Genetic Programming and Evolvable Machines 3, 1 (2002), 7--40.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
Index Terms
- Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming
Recommendations
An empirical evaluation of GitHub copilot's code suggestions
MSR '22: Proceedings of the 19th International Conference on Mining Software RepositoriesGitHub and OpenAI recently launched Copilot, an "AI pair programmer" that utilizes the power of Natural Language Processing, Static Analysis, Code Synthesis, and Artificial Intelligence. Given a natural language description of the target functionality, ...
Counterexample-driven genetic programming
GECCO '17: Proceedings of the Genetic and Evolutionary Computation ConferenceGenetic programming is an effective technique for inductive synthesis of programs from training examples of desired input-output behavior (tests). Programs synthesized in this way are not guaranteed to generalize beyond the training set, which is ...
Introduction to genetic programming tutorial: from the basics to human-competitive results
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computationThe tutorial will start with a description of the problem addressed by genetic programming, a description of the basic genetic programming algorithm, and examples of applications. The tutorial will also describe advanced topics, such as use of a ...
Comments