ABSTRACT
Formally verified correctness is one of the most desirable properties of software systems. But despite great progress made via interactive theorem provers, such as Coq, writing proof scripts for verification remains one of the most effort-intensive (and often prohibitively difficult) software development activities. Recent work has created tools that automatically synthesize proofs or proof scripts. For example, CoqHammer can prove 26.6% of theorems completely automatically by reasoning using precomputed facts, while TacTok and ASTactic, which use machine learning to model proof scripts and then perform biased search through the proof-script space, can prove 12.9% and 12.3% of the theorems, respectively. Further, these three tools are highly complementary; together, they can prove 30.4% of the theorems fully automatically. Our key insight is that control over the learning process can produce a diverse set of models, and that, due to the unique nature of proof synthesis (the existence of the theorem prover, an oracle that infallibly judges a proof's correctness), this diversity can significantly improve these tools' proving power. Accordingly, we develop Diva, which uses a diverse set of models with TacTok's and ASTactic's search mechanism to prove 21.7% of the theorems. That is, Diva proves 68% more theorems than TacTok and 77% more than ASTactic. Complementary to CoqHammer, Diva proves 781 theorems (27% added value) that CoqHammer does not, and 364 theorems no existing tool has proved automatically. Together with CoqHammer, Diva proves 33.8% of the theorems, the largest fraction to date. We explore nine dimensions for learning diverse models, and identify which dimensions lead to the most useful diversity. Further, we develop an optimization to speed up Diva's execution by 40X. Our study introduces a completely new idea for using diversity in machine learning to improve the power of state-of-the-art proof-script synthesis techniques, and empirically demonstrates that the improvement is significant on a dataset of 68K theorems from 122 open-source software projects.
- Tony Abou-Assaleh, Nick Cercone, Vlado Keselj, and Ray Sweidan. 2004. N-gram-based detection of new malicious code. In Annual International IEEE Computer Software and Applications Conference, Vol. 2. 41--42. Google ScholarCross Ref
- Afsoon Afzal, Manish Motwani, Kathryn T. Stolee, Yuriy Brun, and Claire Le Goues. 2021. SOSRepair: Expressive Semantic Search for Real-World Program Repair. IEEE Transactions on Software Engineering (TSE) 47, 10 (October 2021), 2162--2181. Google ScholarCross Ref
- Jesse Alama, Tom Heskes, Daniel Kühlwein, Evgeni Tsivtsivadze, and Josef Urban. 2014. Premise selection for mathematics by corpus analysis and kernel methods. Journal of Automated Reasoning 52, 2 (2014), 191--213. Google ScholarDigital Library
- Enrique Alba and Francisco Chicano. 2007. Finding safety errors with ACO. In Conference on Genetic and Evolutionary Computation (GECCO). London, England, UK, 1066--1073. Google ScholarDigital Library
- Peter B Andrews and Chad E Brown. 2006. TPS: A hybrid automatic-interactive system for developing proofs. Journal of Applied Logic 4, 4 (2006), 367--395. Google ScholarCross Ref
- AWS [n.d.]. AWS Provable Security. https://aws.amazon.com/security/provable-security.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations (ICLR). San Diego, CA, USA. https://arxiv.org/abs/1409.0473Google Scholar
- Ahilton Barreto, Márcio Barros, and Cláudia Werner. 2008. Staffing a software project: A constraint satisfaction approach. Computers and Operations Research 35, 10 (2008), 3073--3089.Google ScholarDigital Library
- BedRock [n.d.]. BedRock Systems Inc. https://bedrocksystems.com.Google Scholar
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3, Feb. (2003), 1137--1155.Google Scholar
- Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66--75. Google ScholarDigital Library
- Jasmin Christian Blanchette, Lukas Bulwahn, and Tobias Nipkow. 2011. Automatic proof and disproof in Isabelle/HOL. In International Symposium on Frontiers of Combining Systems. Springer, 12--27. Google ScholarCross Ref
- Gavin Brown, Jeremy L Wyatt, Peter Tino, and Yoshua Bengio. 2005. Managing diversity in regression ensembles. Journal of machine learning research (JMLR) 6, 9 (2005).Google Scholar
- Alan Bundy. 1998. A science of reasoning. In International Conference on Automated Reasoning with Analytic Tableaux and Related Methods. Springer, 10--17. Google ScholarCross Ref
- Alan Bundy, Frank Van Harmelen, Christian Horn, and Alan Smaill. 1990. The OYSTER-CLAM system. In International Conference on Automated Deduction (CADE). Springer, 647--648. Google ScholarCross Ref
- Ahmet Celik, Karl Palmskog, and Milos Gligoric. 2017. ICoq: Regression proof selection for large-scale verification projects. In IEEE/ACM International Conference on Automated Software Engineering (ASE). Urbana-Champaign, IL, USA, 171--182. Google ScholarCross Ref
- Certora [n.d.]. Certora. https://www.certora.com.Google Scholar
- Philip K Chan and Salvatore J Stolfo. 1995. A comparative evaluation of voting and meta-learning on partitioned data. In Machine Learning Proceedings. Elsevier, 90--98.Google ScholarCross Ref
- Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar, 1724--1734. Google ScholarCross Ref
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Deep Learning and Representation Learning Workshop (DL&RL). http://arxiv.org/abs/1412.3555Google Scholar
- Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for Dependent Type Theory. Journal of Automated Reasoning 61, 1--4 (2018), 423--453. Google ScholarDigital Library
- Houtao Deng, George Runger, Eugene Tuv, and Martyanov Vladimir. 2013. A time series forest for classification and feature extraction. Information Sciences 239 (2013), 142--153.Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, USA, 4171--4186. Google ScholarCross Ref
- Saso Džeroski and Bernard Ženko. 2004. Is combining classifiers with stacking better than selecting the best one? Machine learning 54, 3 (2004), 255--273.Google Scholar
- Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, and Adam Chlipala. 2019. Simple High-Level Code for Cryptographic Arithmetic --- With Proofs, Without Compromises. In IEEE Symposium on Security and Privacy (S&P). 1202--1219. Google ScholarCross Ref
- Michael D. Ernst. 2017. Natural Language is a Programming Language: Applying Natural Language Processing to Software Development. In Summit on Advances in Programming Languages (SNAPL), Vol. 71. Dagstuhl, Germany, 4:1--4:14. Google ScholarCross Ref
- Emily First and Yuriy Brun. 2022. Replication package for "Diversity-Driven Automated Verification". Google ScholarCross Ref
- Emily First, Yuriy Brun, and Arjun Guha. 2020. TacTok: Semantics-Aware Proof Synthesis. Proceedings of the ACM on Programming Languages (PACMPL) Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) issue 4 (November 2020), 231:1--231:31. Google ScholarDigital Library
- Galois [n.d.]. Galois, Inc. https://galois.com.Google Scholar
- Thibault Gauthier, Cezary Kaliszyk, and Josef Urban. 2017. TacticToe: Learning to reason with HOL4 tactics. In International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), Vol. 46. 125--143.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.Google ScholarDigital Library
- Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. 2016. CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/guGoogle Scholar
- Arjun Guha, Mark Reitblatt, and Nate Foster. 2013. Machine Verified Network Controllers. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Seattle, WA, USA. Google ScholarDigital Library
- Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W Ayers, and Stanislas Polu. 2021. Proof Artifact Co-training for Theorem Proving with Language Models. CoRR (2021). https://arxiv.org/abs/2102.06203Google Scholar
- Mark Harman. 2007. The Current State and Future of Search Based Software Engineering. In ACM/IEEE International Conference on Software Engineering (ICSE). 342--357. Google ScholarDigital Library
- John Harrison. 1996. HOL Light: A tutorial introduction. In International Conference on Formal Methods in Computer-Aided Design (FMCAD). Palo Alto, CA, USA, 265--269. Google ScholarCross Ref
- Vincent J. Hellendoorn, Premkumar T. Devanbu, and Mohammad Amin Alipour. 2018. On the naturalness of proofs. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) New Ideas and Emerging Results track. Orlando, FL, USA, 724--728.Google ScholarDigital Library
- Jónathan Heras and Ekaterina Komendantskaya. 2014. Recycling proof patterns in Coq: Case studies. Mathematics in Computer Science 8, 1 (2014), 99--116. Google ScholarCross Ref
- Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the Naturalness of Software. Communications of the ACM (CACM) 59, 5 (April 2016), 122--131. Google ScholarDigital Library
- Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering (ICSE). 837--847. Google ScholarCross Ref
- Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. 2018. GamePad: A Learning Environment for Theorem Proving. CoRR (2018). https://arxiv.org/abs/1806.00608Google Scholar
- Atalay İleri, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. 2018. Proving Confidentiality in a File System Using DiskSec. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). Carlsbad, CA, 323--338. https://www.usenix.org/conference/osdi18/presentation/ileriGoogle Scholar
- Geoffrey Irving, Christian Szegedy, Alexander A Alemi, Niklas Eén, François Chollet, and Josef Urban. 2016. Deepmath-deep sequence models for premise selection. In Advances in Neural Information Processing Systems (NeurIPS). Barcelona, Spain, 2235--2243. https://papers.nips.cc/paper/6280-deepmath-deep-sequence-models-for-premise-selectionGoogle Scholar
- Kevin Jacobs and Benjamin Beurdouche. 2020. Performance Improvements via Formally-Verified Cryptography in Firefox. https://blog.mozilla.org/security/2020/07/06/performance-improvements-via-formally-verified-cryptography-in-firefox/.Google Scholar
- Dongseok Jang, Zachary Tatlock, and Sorin Lerner. 2012. Establishing Browser Security Guarantees Through Formal Shim Verification. In USENIX Security Symposium (USENIX Security). Bellevue, WA, USA, 113--128. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/jangGoogle Scholar
- Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing Programs with Semantic Code Search. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (9--13). Lincoln, NE, USA, 295--306. Google ScholarDigital Library
- Ekaterina Komendantskaya, Jónathan Heras, and Gudmund Grov. 2012. Machine learning in proof general: Interfacing interfaces. In International Workshop on User Interfaces for Theorem Provers (UITP), Vol. 118. Bremen, Germany. Google ScholarCross Ref
- Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. 2017. Generating Good Generators for Inductive Relations. Proceedings of the ACM on Programming Languages (PACMPL) 2, POPL (Dec. 2017), 45:1--45:30. Google ScholarDigital Library
- K. Rustan M. Leino. 2010. Dafny: An automatic program verifier for functional correctness. In International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR). Dakar, Senegal. Google ScholarCross Ref
- Xavier Leroy. 2009. Formal verification of a realistic compiler. Communications of the ACM (CACM) 52, 7 (2009), 107--115. Google ScholarDigital Library
- Zhaoyu Li, Binghong Chen, and Xujie Si. 2021. Graph Contrastive Pre-training for Effective Theorem Reasoning. In International Conference on Machine Learning (ICML), Vol. PLMR 139. http://arxiv.org/abs/2108.10821Google Scholar
- Shih-Wei Lin and Shih-Chieh Chen. 2012. Parameter determination and feature selection for C4.5 algorithm using scatter search approach. Soft Computing 16, 1 (2012), 63--75.Google ScholarDigital Library
- Laurent Mauborgne. 2004. AstrÉe: Verification of Absence of Runtime Error. In Building the Information Society. 385--392. Google ScholarCross Ref
- Jesús Maudes, Juan J Rodríguez, and César García-Osorio. 2009. Disturbing neighbors diversity for decision forests. In Applications of supervised and unsupervised ensemble methods. Springer, 113--133.Google Scholar
- Christoph C. Michael, Gary McGraw, and Michael A. Schatz. 2001. Generating Software Test Data by Evolution. IEEE Transactions on Software Engineering (TSE) 27, 12 (Dec. 2001), 1085--1110. Google ScholarDigital Library
- Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent Neural Network Based Language Model. In Annual Conference of the International Speech Communication Association (INTERSPEECH). Makuhari, Chiba, Japan. Google ScholarCross Ref
- Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. 2012. RockSalt: Better, Faster, Stronger SFI for the x86. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Beijing, China. Google ScholarDigital Library
- Manish Motwani, Mauricio Soto, Yuriy Brun, René Just, and Claire Le Goues. 2021. Quality of Automated Program Repair on Real-World Defects. IEEE Transactions on Software Engineering (TSE) (2021). DOI: 10.1109/TSE.2020.2998785. Google ScholarCross Ref
- Toby Murray, Daniel Matichuk, Matthew Brassil, Peter Gammie, Timothy Bourke, Sean Seefried, Corey Lewis, Xin Gao, and Gerwin Klein. 2013. seL4: From general purpose to a proof of information flow enforcement. In IEEE Symposium on Security and Privacy (S&P). San Francisco, CA, USA, 415--429.Google ScholarDigital Library
- David F. Nettleton, Albert Orriols-Puig, and Albert Fornells. 2010. A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review 33 (2010), 275--306. Google ScholarDigital Library
- Tobias Nipkow, Lawrence C Paulson, and Markus Wenzel. 2002. Isabelle/HOL: A proof assistant for higher-order logic. Vol. 2283. Springer Science & Business Media.Google Scholar
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Vol. 1. Association for Computational Linguistics, New Orleans, LA, USA, 2227--2237. Google ScholarCross Ref
- Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. CoRR (2020). https://arxiv.org/abs/2009.03393Google Scholar
- Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In IEEE/ACM 38th International Conference on Software Engineering (ICSE). Austin, TX, USA, 428--439. Google ScholarDigital Library
- Talia Ringer. 2021. Proof Repair. Ph.D. Dissertation. University of Washington.Google Scholar
- Talia Ringer, RanDair Porter, Nathaniel Yazdani, John Leo, and Dan Grossman. 2021. Proof Repair Across Type Equivalences. In ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI) (20--26). 112--127. Google ScholarDigital Library
- Talia Ringer, Nathaniel Yazdani, John Leo, and Dan Grossman. 2018. Adapting proof automation to adapt proofs. In ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP). Los Angeles, CA, USA, 115--129. Google ScholarDigital Library
- Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1249.Google ScholarCross Ref
- Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner. 2020. Generating correctness proofs with neural networks. In ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL). 1--10.Google ScholarDigital Library
- Olaf Seng, Johannes Stammel, and David Burkhart. 2006. Search-based determination of refactorings for improving the class structure of object-oriented systems. In Conference on Genetic and Evolutionary Computation (GECCO). Seattle, WA, USA, 1909--1916. Google ScholarDigital Library
- Ilya Sergey, James R. Wilcox, and Zachary Tatlock. 2017. Programming and Proving with Distributed Protocols. Proceedings of the ACM on Programming Languages (PACMPL) 2, POPL (Dec. 2017), 28:1--28:30. Google ScholarDigital Library
- Konrad Slind and Michael Norrish. 2008. A brief overview of HOL4. In International Conference on Theorem Proving in Higher Order Logics (TPHOLs). Montreal, QC, Canada, 28--32. Google ScholarDigital Library
- Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse than the Disease? Overfitting in Automated Program Repair. In Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2--4). Bergamo, Italy, 532--543. Google ScholarDigital Library
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing (EMNLP). 1631--1642. https://www.aclweb.org/anthology/D13-1170Google Scholar
- Jean Souyris. 2014. Industrial Use of CompCert on a Safety-Critical Software Product. http://projects.laas.fr/IFSE/FMF/J3/slides/P05_Jean_Souyiris.pdf.Google Scholar
- Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM Neural Networks for Language Modeling. In Annual Conference of the International Speech Communication Association (INTERSPEECH). Portland, OR, USA. Google ScholarCross Ref
- Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Jean-Karim Zinzindohoue, and Santiago Zanella-Béguelin. 2016. Dependent types and multi-monadic effects in F*. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Vol. 51. St. Petersburg, FL, USA, 256--270. Google ScholarDigital Library
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Annual Meeting of the Association for Computational Linguistics (ACL), Vol. 1. Beijing, China, 1556--1566. Google ScholarCross Ref
- The Coq Development Team. 2017. Coq, v.8.7. https://coq.inria.fr.Google Scholar
- Andrzej Trybulec and Howard A Blair. 1985. Computer Assisted Reasoning with MIZAR. In International Joint Conferences on Artificial Intelligence (IJCAI), Vol. 85. Los Angeles, CA, USA, 26--28. https://www.ijcai.org/Proceedings/85-1/Papers/006.pdfGoogle Scholar
- Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the Localness of Software. In ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). Hong Kong, China, 269--280. Google ScholarDigital Library
- Brendan van Rooyen, Aditya Menon, and Robert C Williamson. 2015. Learning with Symmetric Label Noise: The Importance of Being Unhinged. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/file/45c48cce2e2d7fbdea1afc51c7c6ad26-Paper.pdfGoogle Scholar
- Niki Vazou. 2016. Liquid Haskell: Haskell as a theorem prover. Ph.D. Dissertation. University of California, San Diego.Google Scholar
- Philip Wadler, Wen Kokke, and Jeremy G. Siek. 2020. Programming Language Foundations in Agda. http://plfa.inf.ed.ac.uk/20.07/Google Scholar
- Kristen R. Walcott, Mary Lou Soffa, Gregory M. Kapfhammer, and Robert S. Roos. 2006. Time-aware test suite prioritization. In International Symposium on Software Testing and Analysis (ISSTA). Portland, ME, USA, 1--12. Google ScholarDigital Library
- Mingzhe Wang, Yihe Tang, Jian Wang, and Jia Deng. 2017. Premise selection for theorem proving by deep graph embedding. In Advances in Neural Information Processing Systems (NeurIPS). Long Beach, CA, USA, 2786--2796. https://papers.nips.cc/paper/6871-premise-selection-for-theorem-proving-by-deep-graph-embeddingGoogle Scholar
- Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In ACM/IEEE International Conference on Software Engineering (ICSE). Vancouver, BC, Canada, 364--374. Google ScholarDigital Library
- James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. 2015. Verdi: A framework for implementing and formally verifying distributed systems. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Portland, OR, USA, 357--368.Google ScholarDigital Library
- Minchao Wu, Michael Norrish, Christian Walder, and Amir Dezfouli. 2021. TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning. CoRR abs/2102.09756 (2021). http://arxiv.org/abs/2102.09756Google Scholar
- Kaiyu Yang and Jia Deng. 2019. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning (ICML). Long Beach, CA, USA. http://proceedings.mlr.press/v97/yang19a/yang19a.pdfGoogle Scholar
- Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In Annual Meeting of the Association for Computational Linguistics (ACL), Vol. 1. Vancouver, BC, Canada, 440--450. Google ScholarCross Ref
Index Terms
Diversity-driven automated formal verification
Recommendations
Passport: Improving Automated Formal Verification Using Identifiers
Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. Tools that automate formal verification by learning from proof corpora to ...
Baldur: Whole-Proof Generation and Repair with Large Language Models
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringFormally verifying software is a highly desirable but labor-intensive task. Recent work has developed methods to automate formal verification using proof assistants, such as Coq and Isabelle/HOL, e.g., by training a model to predict one proof step at a ...
Foundational program verification in Coq with automated proofs
MSFP '10: Proceedings of the third ACM SIGPLAN workshop on Mathematically structured functional programmingMost people who know of the proof assistant Coq associate it with long, manual proofs via tactic scripts. In contrast, classical verification tools, based on automated theorem-provers for first-order logic, are well established as supporting program ...
Comments