skip to main content
research-article
Free Access

GEVO: GPU Code Optimization Using Evolutionary Computation

Published:25 November 2020Publication History
Skip Abstract Section

Abstract

GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU’s computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of general-purpose GPU programs and machine learning (ML) models on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline by an average of 51.08%. For the ML workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24×) and the a9a income prediction (2.93×) datasets with no loss of model accuracy. GEVO achieves 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.

References

  1. TensorFlow. 2018. XLA is a compiler that optimizes TensorFlow computations. Retrieved from https://www.tensorflow.org/xla/.Google ScholarGoogle Scholar
  2. Advanced Micro Devices, Inc. 2020. AMD Exascale Supercomputer. Retrieved from https://www.amd.com/en/products/exascale-era.Google ScholarGoogle Scholar
  3. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The SprayList: A scalable relaxed priority queue. SIGPLAN Not. 50, 8 (2015), 11--20. DOI:https://doi.org/10.1145/2858788.2688523Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Joshua A. Anderson, Chris D. Lorenz, and Alex Travesset. 2008. General purpose molecular dynamics simulations fully implemented on graphics processing units. J. Comput. Phys. 227, 10 (2008), 5342--5359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-chip-module GPUs for continued performance scalability. SIGARCH Comput. Archit. News 45, 2 (2017), 320--332. DOI:https://doi.org/10.1145/3140659.3080231Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Akhil Arunkumar, Shin-Ying Lee, and Carole-Jean Wu. 2016. ID-cache: Instruction and memory divergence based cache management for GPUs. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’16).Google ScholarGoogle ScholarCross RefCross Ref
  8. Shumeet Baluja and Rich Caruana. 1995. Removing the genetics from the standard genetic algorithm. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. SIGARCH Comput. Archit. News 34, 5 (2006), 394--403. DOI:https://doi.org/10.1145/1168919.1168906Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon-Pharabod, and Peter Sewell. 2015. The problem of programming language concurrency semantics. In Proceedings of the European Symposium on Programming Languages and Systems.Google ScholarGoogle ScholarCross RefCross Ref
  11. Benoit Baudry, Simon Allier, Marcelino Rodriguez-Cancio, and Martin Monperrus. 2015. Automatic software diversity in the light of test suites. arXiv preprint arXiv:1509.00144 (2015).Google ScholarGoogle Scholar
  12. Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).Google ScholarGoogle Scholar
  13. Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. 2018. Understanding and simplifying one-shot architecture search. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle Scholar
  14. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, Feb. (2012), 281--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bobby R. Bruce, Justyna Petke, and Mark Harman. 2015. Reducing energy consumption using genetic improvement. In Proceedings of the 17th Conference on Genetic and Evolutionary Computation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bobby Ralph Bruce, Justyna Petke, Mark Harman, and Earl T. Barr. 2019. Approximate oracles and synergy in software energy search spaces. IEEE Trans. Softw. Eng. 45, 11 (2019), 1150--1169. DOI:https://doi.org/10.1109/TSE.2018.2827066Google ScholarGoogle ScholarCross RefCross Ref
  17. Forbes J. Burkowski. 1999. Shuffle crossover and mutual information. In Proceedings of the Congress on Evolutionary Computation (CEC’99).Google ScholarGoogle ScholarCross RefCross Ref
  18. Nathan Burles, Edward Bowles, Alexander E. I. Brownlee, Zoltan A. Kocsis, Jerry Swan, and Nadarajen Veerapen. 2015. Object-oriented genetic improvement for improved energy consumption in Google Guava. In Proceedings of International Symposium on Search Based Software Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  19. Padraic Cashin, Carianne Martinez, Westley Weimer, and Stephanie Forrest. 2019. Understanding automatically generated patches through symbolic invariant differences. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (2011), 27 pages. DOI:https://doi.org/10.1145/1961189.1961199Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).Google ScholarGoogle Scholar
  22. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of 13th USENIX Symposium on Operating Systems Design and Implementation.Google ScholarGoogle Scholar
  23. Jaewoong Chung, Luke Yen, Stephan Diestelhorst, Martin Pohlack, Michael Hohmuth, David Christie, and Dan Grossman. 2010. ASF: AMD64 extension for lock-free data structures and transactional memory. In Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Berkeley Churchill, Rahul Sharma, J. F. Bastien, and Alex Aiken. 2017. Sound loop superoptimization for Google native client. SIGARCH Comput. Archit. News 45, 1 (2017), 313--326. DOI:https://doi.org/10.1145/3093337.3037754Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. François-Michel De Rainville, Félix-Antoine Fortin, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: A Python framework for evolutionary algorithms. In Proceedings of the 14th Conference on Genetic and Evolutionary Computation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and Tanaka Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. (2002).Google ScholarGoogle Scholar
  27. Vidroha Debroy and W. Eric Wong. 2010. Using mutation to automatically suggest fixes for faulty programs. In Proceedings of 3rd International Conference on Software Testing, Verification and Validation.Google ScholarGoogle Scholar
  28. Inderjit S. Dhillon and Dharmendra S. Modha. 2002. A data-clustering algorithm on distributed memory multiprocessors. In Large-scale Parallel Data Mining. Springer, 245--260.Google ScholarGoogle Scholar
  29. Jonathan Dorn, Jeremy Lacomis, Westley Weimer, and Stephanie Forrest. 2019. Automatically exploring tradeoffs between software output fidelity and energy costs. IEEE Trans. Softw. Eng. 45, 3 (2019), 219--236. DOI:https://doi.org/10.1109/TSE.2017.2775634Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Facebook. 2018. Finding and Fixing Software Bugs Automatically with Sapfix and Sapienz. Retrieved from https://code.fb.com/developer-tools/finding-and-fixing-software-bugs-automatically-with-sapfix-and-sapienz/.Google ScholarGoogle Scholar
  31. Facebook. 2019. Caffe2. Retrieved from https://caffe2.ai/.Google ScholarGoogle Scholar
  32. Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, and Claire Le Goues. 2009. A genetic programming approach to automated software repair. In Proceedings of the 11th Conference on Genetic and Evolutionary Computation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Trans. Softw. Eng. 38, 1 (2012), 54--72. DOI:https://doi.org/10.1109/TSE.2011.104Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. 2011. Synthesis of loop-free programs. SIGPLAN Not. 46, 6 (2011), 62--73. DOI:https://doi.org/10.1145/1993316.1993506Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, John Wawrzynek, Krste Asanovic, and Ion Stoica. 2020. AutoPhase: Juggling HLS phase orderings in random forests with deep reinforcement learning. In Proceedings of the 3rd Conference on Machine Learning and Systems (ML-Sys’20).Google ScholarGoogle Scholar
  36. Saemundur O. Haraldsson, John R. Woodward, Alexander, E. I. Brownlee, A. V. Smith, and V. Gudnason. 2017. Genetic improvement of runtime and its fitness landscape in a bioinformatics application. In Proceedings of the Genetic and Evolutionary Computation Conference.Google ScholarGoogle Scholar
  37. Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, and Kristin Siggeirsdottir. 2017. Fixing bugs in your sleep: How genetic improvement became an overnight success. In Proceedings of the Genetic and Evolutionary Computation Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture.Google ScholarGoogle ScholarCross RefCross Ref
  39. Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  40. Wenhao Jia, Kelly A. Shaw, and Margaret Martonosi. 2012. Characterizing and improving the use of demand-fetched caches in GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: Optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dan Judd, Philip K. McKinley, and Anil K. Jain. 1998. Large-scale parallel data clustering. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8 (1998), 871--876.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P. Xing. 2018. Neural architecture search with Bayesian optimisation and optimal transport. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  44. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.Google ScholarGoogle Scholar
  45. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1097--1105.Google ScholarGoogle Scholar
  46. William B. Langdon and Mark Harman. 2010. Evolving a CUDA kernel from an nVidia template. In Proceedings of the IEEE Congress on Evolutionary Computation.Google ScholarGoogle Scholar
  47. William B. Langdon and Mark Harman. 2014. Genetically improved CUDA C++ software. In Proceedings of 17th European Conference on Genetic Programming.Google ScholarGoogle Scholar
  48. William B. Langdon and Mark Harman. 2015. Grow and graft a better CUDA pknotsRG for RNA Pseudoknot free energy calculation. In Proceedings of the 17th Conference on Genetic and Evolutionary Computation.Google ScholarGoogle Scholar
  49. William B. Langdon, Brian Yee Hong Lam, Justyna Petke, and Mark Harman. 2015. Improving CUDA DNA analysis software with genetic programming. In Proceedings of the 17th Conference on Genetic and Evolutionary Computation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. 2007. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yann Le Cun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324. DOI:https://doi.org/10.1109/5.726791Google ScholarGoogle ScholarCross RefCross Ref
  52. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  53. Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the International Symposium on Code Generation and Optimization. 81--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. C.-Y. Lee and E. K. Antonsson. 2000. Variable length genomes for evolutionary algorithms. In Proceedings of the 2nd Conference on the Genetic and Evolutionary Computation Conference.Google ScholarGoogle Scholar
  55. Shin-Ying Lee and Carole-Jean Wu. 2016. Ctrl-C: Instruction-aware control loop based adaptive cache bypassing for GPUs. In Proceedings of the IEEE 34th International Conference on Computer Design (ICCD’16).Google ScholarGoogle ScholarCross RefCross Ref
  56. Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019. Genetic improvement of GPU code. In Proceedings of the 6th International Workshop on Genetic Improvement (GI’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019. Uncovering Performance Opportunities by Relaxing Program Semantics of GPGPU Kernels. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems: Wild and Crazy Idea session.Google ScholarGoogle Scholar
  58. Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. 2017. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017).Google ScholarGoogle Scholar
  60. Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. DARTS: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).Google ScholarGoogle Scholar
  61. Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  62. Lech Madeyski, Wojciech Orzeszyna, Richard Torkar, and Mariusz Jozala. 2014. Overcoming the equivalent mutant problem: A systematic literature review and a comparative experiment of second order mutation. IEEE Trans. Softw. Eng. 40, 1 (2014), 23--42. DOI:https://doi.org/10.1109/TSE.2013.44Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Irene Manotas, Lori Pollock, and James Clause. 2014. SEEDS: A software engineer’s energy-optimization decision support framework. In Proceedings of the 36th International Conference on Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Henry Massalin. 1987. Superoptimizer: A look at the smallest program. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems.Google ScholarGoogle Scholar
  65. Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2019. MLPerf training benchmark. arXiv preprint arXiv:1910.01500 (2019).Google ScholarGoogle Scholar
  66. P. Mattson, V. J. Reddi, C. Cheng, C. Coleman, G. Diamos, D. Kanter, P. Micikevicius, D. Patterson, G. Schmuelling, H. Tang, G. Wei, and C.-J. Wu. 2020. MLPerf: An industry standard benchmark suite for machine learning performance. IEEE Micro 40, 2 (2020), 8--16. DOI:https://doi.org/10.1109/TSE.2013.44Google ScholarGoogle ScholarCross RefCross Ref
  67. J. Andvandervorst Meijerink and Henk A. Van Der Vorst. 1977. An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix. Math. of Comput. 31, 137 (1977), 148--162. DOI:https://doi.org/10.2307/2005786Google ScholarGoogle Scholar
  68. Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe, and Michael Carbin. 2019. Compiler auto-vectorization with imitation learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 14598--14609.Google ScholarGoogle Scholar
  69. Brad L. Miller, David E. Goldberg, et al. 1995. Genetic algorithms, tournament selection, and the effects of noise. Complex Systems 9, 3 (1995), 193--212. Retrieved from https://www.complex-systems.com/abstracts/v09_i03_a02/.Google ScholarGoogle Scholar
  70. Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning convolutional neural networks for resource efficient inference. In Proceedings of International Conference on Learning Representations.Google ScholarGoogle Scholar
  71. David J. Montana and Lawrence Davis. 1989. Training feedforward neural networks using genetic algorithms. In Proceedings of the International Joint Conferences on Artificial Intelligence.Google ScholarGoogle Scholar
  72. Leonardo De Moura and Nikolaj Björner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems: Theory and Practice of Software.Google ScholarGoogle ScholarCross RefCross Ref
  73. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 456--471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NeurIPS Autodiff Workshop. Retrieved from https://openreview.net/forum?id=BJJsrmfCZ.Google ScholarGoogle Scholar
  75. Karl Pettis and Robert C. Hansen. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation.Google ScholarGoogle Scholar
  76. Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 (2018).Google ScholarGoogle Scholar
  77. John C. Platt. 1999. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA, USA, 185--208. https://dl.acm.org/doi/10.5555/299094.299105Google ScholarGoogle Scholar
  78. Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780--4789.Google ScholarGoogle Scholar
  79. Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V. Le, and Alexey Kurakin. 2017. Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR. org.Google ScholarGoogle Scholar
  80. Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  81. Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. MLPerf inference benchmark. arXiv preprint arXiv:1911.02549 (2019).Google ScholarGoogle Scholar
  82. Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Satish Nadathur, Jakob Olesen, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google ScholarGoogle Scholar
  83. Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 73--82.Google ScholarGoogle Scholar
  84. David Saad. 1998. Online algorithms and stochastic approximations. Online Learn. 5 (1998), 6--3.Google ScholarGoogle Scholar
  85. Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic superoptimization. SIGARCH Comput. Archit. News 41, 1 (2013), 305--316. DOI:https://doi.org/10.1145/2490301.2451150Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. SIGPLAN Not. 49, 6 (2014), 53--64. DOI:https://doi.org/10.1145/2594291.2594302Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Eric Schulte. 2014. Neutral Networks of Real-world Programs and Their Application to Automated Software Evolution. Ph.D. Dissertation. University of New Mexico, Albuquerque.Google ScholarGoogle Scholar
  88. Eric Schulte, Jonathan DiLorenzo, Stephanie Forrest, and Westley Weimer. 2013. Automated repair of binary and assembly programs for cooperating embedded devices. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Eric Schulte, Jonathan Dorn, Stephen Harding, Stephanie Forrest, and Westley Weimer. 2014. Post-compiler software optimization for reducing energy. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Eric Schulte, Zachary P. Fry, Ethan Fast, Westley Weimer, and Stephanie Forrest. 2014. Software mutational robustness. Genetic Prog. Evolv. Mach. 15, 3 (2014), 281--312. DOI:https://doi.org/10.1007/s10710-013-9195-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Eric M. Schulte, Westley Weimer, and Stephanie Forrest. 2015. Repairing COTS router firmware without access to source code or test suites: A case study in evolutionary software repair. In Proceedings of the 1st Genetic Improvement Workshop.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Michael J. Schulte, Mike Ignatowski, Gabriel H. Loh, Bradford M. Beckmann, William C. Brantley, Sudhanva Gurumurthi, Nuwan Jayasena, Indrani Paul, Steven K. Reinhardt, and Gregory Rodgers. 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro 35, 4 (2015), 26--36. DOI:https://doi.org/10.1109/MM.2015.71Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Rahul Sharma, Eric Schkufza, Berkeley Churchill, and Alex Aiken. 2015. Conditionally correct superoptimization. In Proceedings of the ACM SIGPLAN International Conference on Object-oriented Programming, Systems, Languages, and Applications.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Pitchaya Sitthi-Amorn, Nicholas Modly, Westley Weimer, and Jason Lawrence. 2011. Genetic programming for shader simplification. In Proceedings of the SIGGRAPH Asia Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Kenneth O. Stanley, David B. D’Ambrosio, and Jason Gauci. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15, 2 (2009), 185--212. DOI:https://doi.org/10.1162/artl.2009.15.2.15202Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evol. Comput. 10, 2 (2002), 99--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Wolfgang Stephan. 1996. The rate of compensatory evolution. Genetics 144, 1 (1996), 419--426. Retrieved from https://www.genetics.org/content/144/1/419.Google ScholarGoogle ScholarCross RefCross Ref
  99. Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Emina Torlak and Rastislav Bodik. 2013. Growing solver-aided languages with ROSETTE. In Proceedings of the ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! ’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Ludo Van Put, Dominique Chanet, Bruno De Bus, Bjorn De Sutter, and Koen De Bosschere. 2005. DIABLO: A reliable, retargetable and extensible link-time rewriting framework. In Proceedings of the 5th IEEE International Symposium on Signal Processing and Information Technology.Google ScholarGoogle Scholar
  103. Nadarajen Veerapen, Fabio Daolio, and Gabriela Ochoa. 2017. Modelling genetic improvement landscapes with local optima networks. In Proceedings of the Genetic and Evolutionary Computation Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Phillip Verbancsics and Kenneth O. Stanley. 2011. Constraining connectivity to encourage modularity in HyperNEAT. In Proceedings of the 13th Conference on Genetic and Evolutionary Computation. ACM.Google ScholarGoogle Scholar
  105. Lizhe Wang, Jie Tao, Marcel Kunze, Alvaro Canales Castellanos, David Kramer, and Wolfgang Karl. 2008. Scientific cloud computing: Early definition and experience. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications. IEEE, 825--830.Google ScholarGoogle ScholarCross RefCross Ref
  106. Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Zeyi Wen, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. 2018. ThunderSVM: A fast SVM library on GPUs and CPUs. J. Mach. Learn. Res. 19, 21 (2018), 1--5. Retrieved from http://jmlr.org/papers/v19/17-740.html.Google ScholarGoogle Scholar
  108. D. R. White, A. Arcuri, and J. A. Clark. 2011. Evolutionary improvement of programs. IEEE Trans. Evol. Comput. 15, 4 (2011), 515--538. DOI:https://doi.org/10.1109/TEVC.2010.2083669Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, and Robert Hundt. 2016. Gpucc: An open-source GPGPU compiler. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Shucai Xiao and Wu-chun Feng. 2010. Inter-block GPU communication via fast barrier synchronization. In Proceedings of the IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’10). IEEE.Google ScholarGoogle Scholar
  111. Lingxi Xie and Alan Yuille. 2017. Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarGoogle Scholar
  112. Amir Yazdanbakhsh, Divya Mahajan, Hadi Esmaeilzadeh, and Pejman Lotfi-Kamran. 2016. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Des. Test 34, 2 (2016), 60--68. DOI:https://doi.org/10.1109/MDAT.2016.2630270Google ScholarGoogle ScholarCross RefCross Ref
  113. Jieming Yin, Zhifeng Lin, Onur Kayiran, Matthew Poremba, Muhammad Shoaib Bin Altaf, Natalie Enright Jerger, and Gabriel H. Loh. 2018. Modular routing design for chiplet-based systems. In Proceedings of the ACM/IEEE 45th International Symposium on Computer Architecture (ISCA’18).Google ScholarGoogle Scholar
  114. Sixin Zhang, Anna E. Choromanska, and Yann LeCun. 2015. Deep learning with elastic averaging SGD. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 685--693.Google ScholarGoogle Scholar
  115. Barret Zoph and Quoc V. Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).Google ScholarGoogle Scholar
  116. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar

Index Terms

  1. GEVO: GPU Code Optimization Using Evolutionary Computation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 17, Issue 4
        December 2020
        430 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/3427420
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 November 2020
        • Accepted: 1 July 2020
        • Revised: 1 April 2020
        • Received: 1 November 2019
        Published in taco Volume 17, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format