skip to main content
10.1145/3510003.3510138acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Public Access
Artifacts Available / v1.1

Diversity-driven automated formal verification

Published:05 July 2022Publication History

ABSTRACT

Formally verified correctness is one of the most desirable properties of software systems. But despite great progress made via interactive theorem provers, such as Coq, writing proof scripts for verification remains one of the most effort-intensive (and often prohibitively difficult) software development activities. Recent work has created tools that automatically synthesize proofs or proof scripts. For example, CoqHammer can prove 26.6% of theorems completely automatically by reasoning using precomputed facts, while TacTok and ASTactic, which use machine learning to model proof scripts and then perform biased search through the proof-script space, can prove 12.9% and 12.3% of the theorems, respectively. Further, these three tools are highly complementary; together, they can prove 30.4% of the theorems fully automatically. Our key insight is that control over the learning process can produce a diverse set of models, and that, due to the unique nature of proof synthesis (the existence of the theorem prover, an oracle that infallibly judges a proof's correctness), this diversity can significantly improve these tools' proving power. Accordingly, we develop Diva, which uses a diverse set of models with TacTok's and ASTactic's search mechanism to prove 21.7% of the theorems. That is, Diva proves 68% more theorems than TacTok and 77% more than ASTactic. Complementary to CoqHammer, Diva proves 781 theorems (27% added value) that CoqHammer does not, and 364 theorems no existing tool has proved automatically. Together with CoqHammer, Diva proves 33.8% of the theorems, the largest fraction to date. We explore nine dimensions for learning diverse models, and identify which dimensions lead to the most useful diversity. Further, we develop an optimization to speed up Diva's execution by 40X. Our study introduces a completely new idea for using diversity in machine learning to improve the power of state-of-the-art proof-script synthesis techniques, and empirically demonstrates that the improvement is significant on a dataset of 68K theorems from 122 open-source software projects.

References

  1. Tony Abou-Assaleh, Nick Cercone, Vlado Keselj, and Ray Sweidan. 2004. N-gram-based detection of new malicious code. In Annual International IEEE Computer Software and Applications Conference, Vol. 2. 41--42. Google ScholarGoogle ScholarCross RefCross Ref
  2. Afsoon Afzal, Manish Motwani, Kathryn T. Stolee, Yuriy Brun, and Claire Le Goues. 2021. SOSRepair: Expressive Semantic Search for Real-World Program Repair. IEEE Transactions on Software Engineering (TSE) 47, 10 (October 2021), 2162--2181. Google ScholarGoogle ScholarCross RefCross Ref
  3. Jesse Alama, Tom Heskes, Daniel Kühlwein, Evgeni Tsivtsivadze, and Josef Urban. 2014. Premise selection for mathematics by corpus analysis and kernel methods. Journal of Automated Reasoning 52, 2 (2014), 191--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Enrique Alba and Francisco Chicano. 2007. Finding safety errors with ACO. In Conference on Genetic and Evolutionary Computation (GECCO). London, England, UK, 1066--1073. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter B Andrews and Chad E Brown. 2006. TPS: A hybrid automatic-interactive system for developing proofs. Journal of Applied Logic 4, 4 (2006), 367--395. Google ScholarGoogle ScholarCross RefCross Ref
  6. AWS [n.d.]. AWS Provable Security. https://aws.amazon.com/security/provable-security.Google ScholarGoogle Scholar
  7. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations (ICLR). San Diego, CA, USA. https://arxiv.org/abs/1409.0473Google ScholarGoogle Scholar
  8. Ahilton Barreto, Márcio Barros, and Cláudia Werner. 2008. Staffing a software project: A constraint satisfaction approach. Computers and Operations Research 35, 10 (2008), 3073--3089.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. BedRock [n.d.]. BedRock Systems Inc. https://bedrocksystems.com.Google ScholarGoogle Scholar
  10. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3, Feb. (2003), 1137--1155.Google ScholarGoogle Scholar
  11. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jasmin Christian Blanchette, Lukas Bulwahn, and Tobias Nipkow. 2011. Automatic proof and disproof in Isabelle/HOL. In International Symposium on Frontiers of Combining Systems. Springer, 12--27. Google ScholarGoogle ScholarCross RefCross Ref
  13. Gavin Brown, Jeremy L Wyatt, Peter Tino, and Yoshua Bengio. 2005. Managing diversity in regression ensembles. Journal of machine learning research (JMLR) 6, 9 (2005).Google ScholarGoogle Scholar
  14. Alan Bundy. 1998. A science of reasoning. In International Conference on Automated Reasoning with Analytic Tableaux and Related Methods. Springer, 10--17. Google ScholarGoogle ScholarCross RefCross Ref
  15. Alan Bundy, Frank Van Harmelen, Christian Horn, and Alan Smaill. 1990. The OYSTER-CLAM system. In International Conference on Automated Deduction (CADE). Springer, 647--648. Google ScholarGoogle ScholarCross RefCross Ref
  16. Ahmet Celik, Karl Palmskog, and Milos Gligoric. 2017. ICoq: Regression proof selection for large-scale verification projects. In IEEE/ACM International Conference on Automated Software Engineering (ASE). Urbana-Champaign, IL, USA, 171--182. Google ScholarGoogle ScholarCross RefCross Ref
  17. Certora [n.d.]. Certora. https://www.certora.com.Google ScholarGoogle Scholar
  18. Philip K Chan and Salvatore J Stolfo. 1995. A comparative evaluation of voting and meta-learning on partitioned data. In Machine Learning Proceedings. Elsevier, 90--98.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar, 1724--1734. Google ScholarGoogle ScholarCross RefCross Ref
  20. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Deep Learning and Representation Learning Workshop (DL&RL). http://arxiv.org/abs/1412.3555Google ScholarGoogle Scholar
  21. Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for Dependent Type Theory. Journal of Automated Reasoning 61, 1--4 (2018), 423--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Houtao Deng, George Runger, Eugene Tuv, and Martyanov Vladimir. 2013. A time series forest for classification and feature extraction. Information Sciences 239 (2013), 142--153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, USA, 4171--4186. Google ScholarGoogle ScholarCross RefCross Ref
  24. Saso Džeroski and Bernard Ženko. 2004. Is combining classifiers with stacking better than selecting the best one? Machine learning 54, 3 (2004), 255--273.Google ScholarGoogle Scholar
  25. Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, and Adam Chlipala. 2019. Simple High-Level Code for Cryptographic Arithmetic --- With Proofs, Without Compromises. In IEEE Symposium on Security and Privacy (S&P). 1202--1219. Google ScholarGoogle ScholarCross RefCross Ref
  26. Michael D. Ernst. 2017. Natural Language is a Programming Language: Applying Natural Language Processing to Software Development. In Summit on Advances in Programming Languages (SNAPL), Vol. 71. Dagstuhl, Germany, 4:1--4:14. Google ScholarGoogle ScholarCross RefCross Ref
  27. Emily First and Yuriy Brun. 2022. Replication package for "Diversity-Driven Automated Verification". Google ScholarGoogle ScholarCross RefCross Ref
  28. Emily First, Yuriy Brun, and Arjun Guha. 2020. TacTok: Semantics-Aware Proof Synthesis. Proceedings of the ACM on Programming Languages (PACMPL) Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) issue 4 (November 2020), 231:1--231:31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Galois [n.d.]. Galois, Inc. https://galois.com.Google ScholarGoogle Scholar
  30. Thibault Gauthier, Cezary Kaliszyk, and Josef Urban. 2017. TacticToe: Learning to reason with HOL4 tactics. In International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), Vol. 46. 125--143.Google ScholarGoogle Scholar
  31. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. 2016. CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/guGoogle ScholarGoogle Scholar
  33. Arjun Guha, Mark Reitblatt, and Nate Foster. 2013. Machine Verified Network Controllers. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Seattle, WA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W Ayers, and Stanislas Polu. 2021. Proof Artifact Co-training for Theorem Proving with Language Models. CoRR (2021). https://arxiv.org/abs/2102.06203Google ScholarGoogle Scholar
  35. Mark Harman. 2007. The Current State and Future of Search Based Software Engineering. In ACM/IEEE International Conference on Software Engineering (ICSE). 342--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. John Harrison. 1996. HOL Light: A tutorial introduction. In International Conference on Formal Methods in Computer-Aided Design (FMCAD). Palo Alto, CA, USA, 265--269. Google ScholarGoogle ScholarCross RefCross Ref
  37. Vincent J. Hellendoorn, Premkumar T. Devanbu, and Mohammad Amin Alipour. 2018. On the naturalness of proofs. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) New Ideas and Emerging Results track. Orlando, FL, USA, 724--728.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jónathan Heras and Ekaterina Komendantskaya. 2014. Recycling proof patterns in Coq: Case studies. Mathematics in Computer Science 8, 1 (2014), 99--116. Google ScholarGoogle ScholarCross RefCross Ref
  39. Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the Naturalness of Software. Communications of the ACM (CACM) 59, 5 (April 2016), 122--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering (ICSE). 837--847. Google ScholarGoogle ScholarCross RefCross Ref
  41. Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. 2018. GamePad: A Learning Environment for Theorem Proving. CoRR (2018). https://arxiv.org/abs/1806.00608Google ScholarGoogle Scholar
  42. Atalay İleri, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. 2018. Proving Confidentiality in a File System Using DiskSec. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). Carlsbad, CA, 323--338. https://www.usenix.org/conference/osdi18/presentation/ileriGoogle ScholarGoogle Scholar
  43. Geoffrey Irving, Christian Szegedy, Alexander A Alemi, Niklas Eén, François Chollet, and Josef Urban. 2016. Deepmath-deep sequence models for premise selection. In Advances in Neural Information Processing Systems (NeurIPS). Barcelona, Spain, 2235--2243. https://papers.nips.cc/paper/6280-deepmath-deep-sequence-models-for-premise-selectionGoogle ScholarGoogle Scholar
  44. Kevin Jacobs and Benjamin Beurdouche. 2020. Performance Improvements via Formally-Verified Cryptography in Firefox. https://blog.mozilla.org/security/2020/07/06/performance-improvements-via-formally-verified-cryptography-in-firefox/.Google ScholarGoogle Scholar
  45. Dongseok Jang, Zachary Tatlock, and Sorin Lerner. 2012. Establishing Browser Security Guarantees Through Formal Shim Verification. In USENIX Security Symposium (USENIX Security). Bellevue, WA, USA, 113--128. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/jangGoogle ScholarGoogle Scholar
  46. Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing Programs with Semantic Code Search. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (9--13). Lincoln, NE, USA, 295--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ekaterina Komendantskaya, Jónathan Heras, and Gudmund Grov. 2012. Machine learning in proof general: Interfacing interfaces. In International Workshop on User Interfaces for Theorem Provers (UITP), Vol. 118. Bremen, Germany. Google ScholarGoogle ScholarCross RefCross Ref
  48. Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. 2017. Generating Good Generators for Inductive Relations. Proceedings of the ACM on Programming Languages (PACMPL) 2, POPL (Dec. 2017), 45:1--45:30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. K. Rustan M. Leino. 2010. Dafny: An automatic program verifier for functional correctness. In International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR). Dakar, Senegal. Google ScholarGoogle ScholarCross RefCross Ref
  50. Xavier Leroy. 2009. Formal verification of a realistic compiler. Communications of the ACM (CACM) 52, 7 (2009), 107--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhaoyu Li, Binghong Chen, and Xujie Si. 2021. Graph Contrastive Pre-training for Effective Theorem Reasoning. In International Conference on Machine Learning (ICML), Vol. PLMR 139. http://arxiv.org/abs/2108.10821Google ScholarGoogle Scholar
  52. Shih-Wei Lin and Shih-Chieh Chen. 2012. Parameter determination and feature selection for C4.5 algorithm using scatter search approach. Soft Computing 16, 1 (2012), 63--75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Laurent Mauborgne. 2004. AstrÉe: Verification of Absence of Runtime Error. In Building the Information Society. 385--392. Google ScholarGoogle ScholarCross RefCross Ref
  54. Jesús Maudes, Juan J Rodríguez, and César García-Osorio. 2009. Disturbing neighbors diversity for decision forests. In Applications of supervised and unsupervised ensemble methods. Springer, 113--133.Google ScholarGoogle Scholar
  55. Christoph C. Michael, Gary McGraw, and Michael A. Schatz. 2001. Generating Software Test Data by Evolution. IEEE Transactions on Software Engineering (TSE) 27, 12 (Dec. 2001), 1085--1110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent Neural Network Based Language Model. In Annual Conference of the International Speech Communication Association (INTERSPEECH). Makuhari, Chiba, Japan. Google ScholarGoogle ScholarCross RefCross Ref
  57. Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. 2012. RockSalt: Better, Faster, Stronger SFI for the x86. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Beijing, China. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Manish Motwani, Mauricio Soto, Yuriy Brun, René Just, and Claire Le Goues. 2021. Quality of Automated Program Repair on Real-World Defects. IEEE Transactions on Software Engineering (TSE) (2021). DOI: 10.1109/TSE.2020.2998785. Google ScholarGoogle ScholarCross RefCross Ref
  59. Toby Murray, Daniel Matichuk, Matthew Brassil, Peter Gammie, Timothy Bourke, Sean Seefried, Corey Lewis, Xin Gao, and Gerwin Klein. 2013. seL4: From general purpose to a proof of information flow enforcement. In IEEE Symposium on Security and Privacy (S&P). San Francisco, CA, USA, 415--429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. David F. Nettleton, Albert Orriols-Puig, and Albert Fornells. 2010. A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review 33 (2010), 275--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Tobias Nipkow, Lawrence C Paulson, and Markus Wenzel. 2002. Isabelle/HOL: A proof assistant for higher-order logic. Vol. 2283. Springer Science & Business Media.Google ScholarGoogle Scholar
  62. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Vol. 1. Association for Computational Linguistics, New Orleans, LA, USA, 2227--2237. Google ScholarGoogle ScholarCross RefCross Ref
  63. Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. CoRR (2020). https://arxiv.org/abs/2009.03393Google ScholarGoogle Scholar
  64. Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In IEEE/ACM 38th International Conference on Software Engineering (ICSE). Austin, TX, USA, 428--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Talia Ringer. 2021. Proof Repair. Ph.D. Dissertation. University of Washington.Google ScholarGoogle Scholar
  66. Talia Ringer, RanDair Porter, Nathaniel Yazdani, John Leo, and Dan Grossman. 2021. Proof Repair Across Type Equivalences. In ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI) (20--26). 112--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Talia Ringer, Nathaniel Yazdani, John Leo, and Dan Grossman. 2018. Adapting proof automation to adapt proofs. In ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP). Los Angeles, CA, USA, 115--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1249.Google ScholarGoogle ScholarCross RefCross Ref
  69. Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner. 2020. Generating correctness proofs with neural networks. In ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL). 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Olaf Seng, Johannes Stammel, and David Burkhart. 2006. Search-based determination of refactorings for improving the class structure of object-oriented systems. In Conference on Genetic and Evolutionary Computation (GECCO). Seattle, WA, USA, 1909--1916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Ilya Sergey, James R. Wilcox, and Zachary Tatlock. 2017. Programming and Proving with Distributed Protocols. Proceedings of the ACM on Programming Languages (PACMPL) 2, POPL (Dec. 2017), 28:1--28:30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Konrad Slind and Michael Norrish. 2008. A brief overview of HOL4. In International Conference on Theorem Proving in Higher Order Logics (TPHOLs). Montreal, QC, Canada, 28--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse than the Disease? Overfitting in Automated Program Repair. In Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2--4). Bergamo, Italy, 532--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing (EMNLP). 1631--1642. https://www.aclweb.org/anthology/D13-1170Google ScholarGoogle Scholar
  75. Jean Souyris. 2014. Industrial Use of CompCert on a Safety-Critical Software Product. http://projects.laas.fr/IFSE/FMF/J3/slides/P05_Jean_Souyiris.pdf.Google ScholarGoogle Scholar
  76. Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM Neural Networks for Language Modeling. In Annual Conference of the International Speech Communication Association (INTERSPEECH). Portland, OR, USA. Google ScholarGoogle ScholarCross RefCross Ref
  77. Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Jean-Karim Zinzindohoue, and Santiago Zanella-Béguelin. 2016. Dependent types and multi-monadic effects in F*. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Vol. 51. St. Petersburg, FL, USA, 256--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Annual Meeting of the Association for Computational Linguistics (ACL), Vol. 1. Beijing, China, 1556--1566. Google ScholarGoogle ScholarCross RefCross Ref
  79. The Coq Development Team. 2017. Coq, v.8.7. https://coq.inria.fr.Google ScholarGoogle Scholar
  80. Andrzej Trybulec and Howard A Blair. 1985. Computer Assisted Reasoning with MIZAR. In International Joint Conferences on Artificial Intelligence (IJCAI), Vol. 85. Los Angeles, CA, USA, 26--28. https://www.ijcai.org/Proceedings/85-1/Papers/006.pdfGoogle ScholarGoogle Scholar
  81. Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the Localness of Software. In ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). Hong Kong, China, 269--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Brendan van Rooyen, Aditya Menon, and Robert C Williamson. 2015. Learning with Symmetric Label Noise: The Importance of Being Unhinged. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/file/45c48cce2e2d7fbdea1afc51c7c6ad26-Paper.pdfGoogle ScholarGoogle Scholar
  83. Niki Vazou. 2016. Liquid Haskell: Haskell as a theorem prover. Ph.D. Dissertation. University of California, San Diego.Google ScholarGoogle Scholar
  84. Philip Wadler, Wen Kokke, and Jeremy G. Siek. 2020. Programming Language Foundations in Agda. http://plfa.inf.ed.ac.uk/20.07/Google ScholarGoogle Scholar
  85. Kristen R. Walcott, Mary Lou Soffa, Gregory M. Kapfhammer, and Robert S. Roos. 2006. Time-aware test suite prioritization. In International Symposium on Software Testing and Analysis (ISSTA). Portland, ME, USA, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Mingzhe Wang, Yihe Tang, Jian Wang, and Jia Deng. 2017. Premise selection for theorem proving by deep graph embedding. In Advances in Neural Information Processing Systems (NeurIPS). Long Beach, CA, USA, 2786--2796. https://papers.nips.cc/paper/6871-premise-selection-for-theorem-proving-by-deep-graph-embeddingGoogle ScholarGoogle Scholar
  87. Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In ACM/IEEE International Conference on Software Engineering (ICSE). Vancouver, BC, Canada, 364--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. 2015. Verdi: A framework for implementing and formally verifying distributed systems. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Portland, OR, USA, 357--368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Minchao Wu, Michael Norrish, Christian Walder, and Amir Dezfouli. 2021. TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning. CoRR abs/2102.09756 (2021). http://arxiv.org/abs/2102.09756Google ScholarGoogle Scholar
  90. Kaiyu Yang and Jia Deng. 2019. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning (ICML). Long Beach, CA, USA. http://proceedings.mlr.press/v97/yang19a/yang19a.pdfGoogle ScholarGoogle Scholar
  91. Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In Annual Meeting of the Association for Computational Linguistics (ACL), Vol. 1. Vancouver, BC, Canada, 440--450. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Diversity-driven automated formal verification

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICSE '22: Proceedings of the 44th International Conference on Software Engineering
          May 2022
          2508 pages
          ISBN:9781450392211
          DOI:10.1145/3510003

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 July 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate276of1,856submissions,15%

          Upcoming Conference

          ICSE 2024

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader