GEVO: GPU Code Optimization Using Evolutionary Computation

Authors:
Jhe-Yu Liou

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ
View Profile

,
Xiaodong Wang

Facebook, Menlo Park, CA

Facebook, Menlo Park, CA
View Profile

,
Stephanie Forrest

Arizona State University and Santa Fe Institute, Santa Fe, NM

Arizona State University and Santa Fe Institute, Santa Fe, NM
View Profile

,
Carole-Jean Wu

Arizona State University and Facebook, Menlo Park, CA

Arizona State University and Facebook, Menlo Park, CA
View Profile

ACM Transactions on Architecture and Code Optimization Volume 17 Issue 4Article No.: 33pp 1–28https://doi.org/10.1145/3418055

Published:25 November 2020Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU’s computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of general-purpose GPU programs and machine learning (ML) models on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline by an average of 51.08%. For the ML workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24×) and the a9a income prediction (2.93×) datasets with no loss of model accuracy. GEVO achieves 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.

References

TensorFlow. 2018. XLA is a compiler that optimizes TensorFlow computations. Retrieved from https://www.tensorflow.org/xla/.Google Scholar
Advanced Micro Devices, Inc. 2020. AMD Exascale Supercomputer. Retrieved from https://www.amd.com/en/products/exascale-era.Google Scholar
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation.Google ScholarDigital Library
Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The SprayList: A scalable relaxed priority queue. SIGPLAN Not. 50, 8 (2015), 11--20. DOI:https://doi.org/10.1145/2858788.2688523Google ScholarDigital Library
Joshua A. Anderson, Chris D. Lorenz, and Alex Travesset. 2008. General purpose molecular dynamics simulations fully implemented on graphics processing units. J. Comput. Phys. 227, 10 (2008), 5342--5359.Google ScholarDigital Library
Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-chip-module GPUs for continued performance scalability. SIGARCH Comput. Archit. News 45, 2 (2017), 320--332. DOI:https://doi.org/10.1145/3140659.3080231Google ScholarDigital Library
Akhil Arunkumar, Shin-Ying Lee, and Carole-Jean Wu. 2016. ID-cache: Instruction and memory divergence based cache management for GPUs. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’16).Google ScholarCross Ref
Shumeet Baluja and Rich Caruana. 1995. Removing the genetics from the standard genetic algorithm. In Proceedings of the International Conference on Machine Learning.Google ScholarCross Ref
Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. SIGARCH Comput. Archit. News 34, 5 (2006), 394--403. DOI:https://doi.org/10.1145/1168919.1168906Google ScholarDigital Library
Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon-Pharabod, and Peter Sewell. 2015. The problem of programming language concurrency semantics. In Proceedings of the European Symposium on Programming Languages and Systems.Google ScholarCross Ref
Benoit Baudry, Simon Allier, Marcelino Rodriguez-Cancio, and Martin Monperrus. 2015. Automatic software diversity in the light of test suites. arXiv preprint arXiv:1509.00144 (2015).Google Scholar
Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).Google Scholar
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. 2018. Understanding and simplifying one-shot architecture search. In Proceedings of the International Conference on Machine Learning.Google Scholar
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, Feb. (2012), 281--305.Google ScholarDigital Library
Bobby R. Bruce, Justyna Petke, and Mark Harman. 2015. Reducing energy consumption using genetic improvement. In Proceedings of the 17th Conference on Genetic and Evolutionary Computation.Google ScholarDigital Library
Bobby Ralph Bruce, Justyna Petke, Mark Harman, and Earl T. Barr. 2019. Approximate oracles and synergy in software energy search spaces. IEEE Trans. Softw. Eng. 45, 11 (2019), 1150--1169. DOI:https://doi.org/10.1109/TSE.2018.2827066Google ScholarCross Ref
Forbes J. Burkowski. 1999. Shuffle crossover and mutual information. In Proceedings of the Congress on Evolutionary Computation (CEC’99).Google ScholarCross Ref
Nathan Burles, Edward Bowles, Alexander E. I. Brownlee, Zoltan A. Kocsis, Jerry Swan, and Nadarajen Veerapen. 2015. Object-oriented genetic improvement for improved energy consumption in Google Guava. In Proceedings of International Symposium on Search Based Software Engineering.Google ScholarCross Ref
Padraic Cashin, Carianne Martinez, Westley Weimer, and Stephanie Forrest. 2019. Understanding automatically generated patches through symbolic invariant differences. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19).Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (2011), 27 pages. DOI:https://doi.org/10.1145/1961189.1961199Google ScholarDigital Library
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).Google Scholar
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of 13th USENIX Symposium on Operating Systems Design and Implementation.Google Scholar
Jaewoong Chung, Luke Yen, Stephan Diestelhorst, Martin Pohlack, Michael Hohmuth, David Christie, and Dan Grossman. 2010. ASF: AMD64 extension for lock-free data structures and transactional memory. In Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society.Google ScholarDigital Library
Berkeley Churchill, Rahul Sharma, J. F. Bastien, and Alex Aiken. 2017. Sound loop superoptimization for Google native client. SIGARCH Comput. Archit. News 45, 1 (2017), 313--326. DOI:https://doi.org/10.1145/3093337.3037754Google ScholarDigital Library
François-Michel De Rainville, Félix-Antoine Fortin, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: A Python framework for evolutionary algorithms. In Proceedings of the 14th Conference on Genetic and Evolutionary Computation.Google ScholarDigital Library
Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and Tanaka Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. (2002).Google Scholar
Vidroha Debroy and W. Eric Wong. 2010. Using mutation to automatically suggest fixes for faulty programs. In Proceedings of 3rd International Conference on Software Testing, Verification and Validation.Google Scholar
Inderjit S. Dhillon and Dharmendra S. Modha. 2002. A data-clustering algorithm on distributed memory multiprocessors. In Large-scale Parallel Data Mining. Springer, 245--260.Google Scholar
Jonathan Dorn, Jeremy Lacomis, Westley Weimer, and Stephanie Forrest. 2019. Automatically exploring tradeoffs between software output fidelity and energy costs. IEEE Trans. Softw. Eng. 45, 3 (2019), 219--236. DOI:https://doi.org/10.1109/TSE.2017.2775634Google ScholarDigital Library
Facebook. 2018. Finding and Fixing Software Bugs Automatically with Sapfix and Sapienz. Retrieved from https://code.fb.com/developer-tools/finding-and-fixing-software-bugs-automatically-with-sapfix-and-sapienz/.Google Scholar
Facebook. 2019. Caffe2. Retrieved from https://caffe2.ai/.Google Scholar
Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, and Claire Le Goues. 2009. A genetic programming approach to automated software repair. In Proceedings of the 11th Conference on Genetic and Evolutionary Computation.Google ScholarDigital Library
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Trans. Softw. Eng. 38, 1 (2012), 54--72. DOI:https://doi.org/10.1109/TSE.2011.104Google ScholarDigital Library
Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. 2011. Synthesis of loop-free programs. SIGPLAN Not. 46, 6 (2011), 62--73. DOI:https://doi.org/10.1145/1993316.1993506Google ScholarDigital Library
Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, John Wawrzynek, Krste Asanovic, and Ion Stoica. 2020. AutoPhase: Juggling HLS phase orderings in random forests with deep reinforcement learning. In Proceedings of the 3rd Conference on Machine Learning and Systems (ML-Sys’20).Google Scholar
Saemundur O. Haraldsson, John R. Woodward, Alexander, E. I. Brownlee, A. V. Smith, and V. Gudnason. 2017. Genetic improvement of runtime and its fitness landscape in a bioinformatics application. In Proceedings of the Genetic and Evolutionary Computation Conference.Google Scholar
Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, and Kristin Siggeirsdottir. 2017. Fixing bugs in your sleep: How genetic improvement became an overnight success. In Proceedings of the Genetic and Evolutionary Computation Conference.Google ScholarDigital Library
Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture.Google ScholarCross Ref
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Wenhao Jia, Kelly A. Shaw, and Margaret Martonosi. 2012. Characterizing and improving the use of demand-fetched caches in GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12).Google ScholarDigital Library
Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: Optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP’19).Google ScholarDigital Library
Dan Judd, Philip K. McKinley, and Anil K. Jain. 1998. Large-scale parallel data clustering. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8 (1998), 871--876.Google ScholarDigital Library
Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P. Xing. 2018. Neural architecture search with Bayesian optimisation and optimal transport. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.Google Scholar
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1097--1105.Google Scholar
William B. Langdon and Mark Harman. 2010. Evolving a CUDA kernel from an nVidia template. In Proceedings of the IEEE Congress on Evolutionary Computation.Google Scholar
William B. Langdon and Mark Harman. 2014. Genetically improved CUDA C++ software. In Proceedings of 17th European Conference on Genetic Programming.Google Scholar
William B. Langdon and Mark Harman. 2015. Grow and graft a better CUDA pknotsRG for RNA Pseudoknot free energy calculation. In Proceedings of the 17th Conference on Genetic and Evolutionary Computation.Google Scholar
William B. Langdon, Brian Yee Hong Lam, Justyna Petke, and Mark Harman. 2015. Improving CUDA DNA analysis software with genetic programming. In Proceedings of the 17th Conference on Genetic and Evolutionary Computation.Google ScholarDigital Library
Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. 2007. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning.Google ScholarDigital Library
Yann Le Cun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324. DOI:https://doi.org/10.1109/5.726791Google ScholarCross Ref
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering.Google ScholarCross Ref
Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the International Symposium on Code Generation and Optimization. 81--91.Google ScholarDigital Library
C.-Y. Lee and E. K. Antonsson. 2000. Variable length genomes for evolutionary algorithms. In Proceedings of the 2nd Conference on the Genetic and Evolutionary Computation Conference.Google Scholar
Shin-Ying Lee and Carole-Jean Wu. 2016. Ctrl-C: Instruction-aware control loop based adaptive cache bypassing for GPUs. In Proceedings of the IEEE 34th International Conference on Computer Design (ICCD’16).Google ScholarCross Ref
Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019. Genetic improvement of GPU code. In Proceedings of the 6th International Workshop on Genetic Improvement (GI’19).Google ScholarDigital Library
Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019. Uncovering Performance Opportunities by Relaxing Program Semantics of GPGPU Kernels. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems: Wild and Crazy Idea session.Google Scholar
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarDigital Library
Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. 2017. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017).Google Scholar
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. DARTS: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).Google Scholar
Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarCross Ref
Lech Madeyski, Wojciech Orzeszyna, Richard Torkar, and Mariusz Jozala. 2014. Overcoming the equivalent mutant problem: A systematic literature review and a comparative experiment of second order mutation. IEEE Trans. Softw. Eng. 40, 1 (2014), 23--42. DOI:https://doi.org/10.1109/TSE.2013.44Google ScholarDigital Library
Irene Manotas, Lori Pollock, and James Clause. 2014. SEEDS: A software engineer’s energy-optimization decision support framework. In Proceedings of the 36th International Conference on Software Engineering.Google ScholarDigital Library
Henry Massalin. 1987. Superoptimizer: A look at the smallest program. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems.Google Scholar
Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2019. MLPerf training benchmark. arXiv preprint arXiv:1910.01500 (2019).Google Scholar
P. Mattson, V. J. Reddi, C. Cheng, C. Coleman, G. Diamos, D. Kanter, P. Micikevicius, D. Patterson, G. Schmuelling, H. Tang, G. Wei, and C.-J. Wu. 2020. MLPerf: An industry standard benchmark suite for machine learning performance. IEEE Micro 40, 2 (2020), 8--16. DOI:https://doi.org/10.1109/TSE.2013.44Google ScholarCross Ref
J. Andvandervorst Meijerink and Henk A. Van Der Vorst. 1977. An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix. Math. of Comput. 31, 137 (1977), 148--162. DOI:https://doi.org/10.2307/2005786Google Scholar
Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe, and Michael Carbin. 2019. Compiler auto-vectorization with imitation learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 14598--14609.Google Scholar
Brad L. Miller, David E. Goldberg, et al. 1995. Genetic algorithms, tournament selection, and the effects of noise. Complex Systems 9, 3 (1995), 193--212. Retrieved from https://www.complex-systems.com/abstracts/v09_i03_a02/.Google Scholar
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning convolutional neural networks for resource efficient inference. In Proceedings of International Conference on Learning Representations.Google Scholar
David J. Montana and Lawrence Davis. 1989. Training feedforward neural networks using genetic algorithms. In Proceedings of the International Joint Conferences on Artificial Intelligence.Google Scholar
Leonardo De Moura and Nikolaj Björner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems: Theory and Practice of Software.Google ScholarCross Ref
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 456--471.Google ScholarDigital Library
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NeurIPS Autodiff Workshop. Retrieved from https://openreview.net/forum?id=BJJsrmfCZ.Google Scholar
Karl Pettis and Robert C. Hansen. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation.Google Scholar
Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 (2018).Google Scholar
John C. Platt. 1999. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA, USA, 185--208. https://dl.acm.org/doi/10.5555/299094.299105Google Scholar
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780--4789.Google Scholar
Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V. Le, and Alexey Kurakin. 2017. Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR. org.Google Scholar
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.Google Scholar
Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. MLPerf inference benchmark. arXiv preprint arXiv:1911.02549 (2019).Google Scholar
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Satish Nadathur, Jakob Olesen, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google Scholar
Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 73--82.Google Scholar
David Saad. 1998. Online algorithms and stochastic approximations. Online Learn. 5 (1998), 6--3.Google Scholar
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic superoptimization. SIGARCH Comput. Archit. News 41, 1 (2013), 305--316. DOI:https://doi.org/10.1145/2490301.2451150Google ScholarDigital Library
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. SIGPLAN Not. 49, 6 (2014), 53--64. DOI:https://doi.org/10.1145/2594291.2594302Google ScholarDigital Library
Eric Schulte. 2014. Neutral Networks of Real-world Programs and Their Application to Automated Software Evolution. Ph.D. Dissertation. University of New Mexico, Albuquerque.Google Scholar
Eric Schulte, Jonathan DiLorenzo, Stephanie Forrest, and Westley Weimer. 2013. Automated repair of binary and assembly programs for cooperating embedded devices. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems.Google ScholarDigital Library
Eric Schulte, Jonathan Dorn, Stephen Harding, Stephanie Forrest, and Westley Weimer. 2014. Post-compiler software optimization for reducing energy. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems.Google ScholarDigital Library
Eric Schulte, Zachary P. Fry, Ethan Fast, Westley Weimer, and Stephanie Forrest. 2014. Software mutational robustness. Genetic Prog. Evolv. Mach. 15, 3 (2014), 281--312. DOI:https://doi.org/10.1007/s10710-013-9195-8Google ScholarDigital Library
Eric M. Schulte, Westley Weimer, and Stephanie Forrest. 2015. Repairing COTS router firmware without access to source code or test suites: A case study in evolutionary software repair. In Proceedings of the 1st Genetic Improvement Workshop.Google ScholarDigital Library
Michael J. Schulte, Mike Ignatowski, Gabriel H. Loh, Bradford M. Beckmann, William C. Brantley, Sudhanva Gurumurthi, Nuwan Jayasena, Indrani Paul, Steven K. Reinhardt, and Gregory Rodgers. 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro 35, 4 (2015), 26--36. DOI:https://doi.org/10.1109/MM.2015.71Google ScholarDigital Library
Rahul Sharma, Eric Schkufza, Berkeley Churchill, and Alex Aiken. 2015. Conditionally correct superoptimization. In Proceedings of the ACM SIGPLAN International Conference on Object-oriented Programming, Systems, Languages, and Applications.Google ScholarDigital Library
Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering.Google ScholarDigital Library
Pitchaya Sitthi-Amorn, Nicholas Modly, Westley Weimer, and Jason Lawrence. 2011. Genetic programming for shader simplification. In Proceedings of the SIGGRAPH Asia Conference.Google ScholarDigital Library
Kenneth O. Stanley, David B. D’Ambrosio, and Jason Gauci. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15, 2 (2009), 185--212. DOI:https://doi.org/10.1162/artl.2009.15.2.15202Google ScholarDigital Library
Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evol. Comput. 10, 2 (2002), 99--127.Google ScholarDigital Library
Wolfgang Stephan. 1996. The rate of compensatory evolution. Genetics 144, 1 (1996), 419--426. Retrieved from https://www.genetics.org/content/144/1/419.Google ScholarCross Ref
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13).Google ScholarDigital Library
Emina Torlak and Rastislav Bodik. 2013. Growing solver-aided languages with ROSETTE. In Proceedings of the ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! ’13).Google ScholarDigital Library
Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14).Google ScholarDigital Library
Ludo Van Put, Dominique Chanet, Bruno De Bus, Bjorn De Sutter, and Koen De Bosschere. 2005. DIABLO: A reliable, retargetable and extensible link-time rewriting framework. In Proceedings of the 5th IEEE International Symposium on Signal Processing and Information Technology.Google Scholar
Nadarajen Veerapen, Fabio Daolio, and Gabriela Ochoa. 2017. Modelling genetic improvement landscapes with local optima networks. In Proceedings of the Genetic and Evolutionary Computation Conference.Google ScholarDigital Library
Phillip Verbancsics and Kenneth O. Stanley. 2011. Constraining connectivity to encourage modularity in HyperNEAT. In Proceedings of the 13th Conference on Genetic and Evolutionary Computation. ACM.Google Scholar
Lizhe Wang, Jie Tao, Marcel Kunze, Alvaro Canales Castellanos, David Kramer, and Wolfgang Karl. 2008. Scientific cloud computing: Early definition and experience. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications. IEEE, 825--830.Google ScholarCross Ref
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering.Google ScholarDigital Library
Zeyi Wen, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. 2018. ThunderSVM: A fast SVM library on GPUs and CPUs. J. Mach. Learn. Res. 19, 21 (2018), 1--5. Retrieved from http://jmlr.org/papers/v19/17-740.html.Google Scholar
D. R. White, A. Arcuri, and J. A. Clark. 2011. Evolutionary improvement of programs. IEEE Trans. Evol. Comput. 15, 4 (2011), 515--538. DOI:https://doi.org/10.1109/TEVC.2010.2083669Google ScholarDigital Library
Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, and Robert Hundt. 2016. Gpucc: An open-source GPGPU compiler. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’16).Google ScholarDigital Library
Shucai Xiao and Wu-chun Feng. 2010. Inter-block GPU communication via fast barrier synchronization. In Proceedings of the IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’10). IEEE.Google Scholar
Lingxi Xie and Alan Yuille. 2017. Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Amir Yazdanbakhsh, Divya Mahajan, Hadi Esmaeilzadeh, and Pejman Lotfi-Kamran. 2016. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Des. Test 34, 2 (2016), 60--68. DOI:https://doi.org/10.1109/MDAT.2016.2630270Google ScholarCross Ref
Jieming Yin, Zhifeng Lin, Onur Kayiran, Matthew Poremba, Muhammad Shoaib Bin Altaf, Natalie Enright Jerger, and Gabriel H. Loh. 2018. Modular routing design for chiplet-based systems. In Proceedings of the ACM/IEEE 45th International Symposium on Computer Architecture (ISCA’18).Google Scholar
Sixin Zhang, Anna E. Choromanska, and Yann LeCun. 2015. Deep learning with elastic averaging SGD. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 685--693.Google Scholar
Barret Zoph and Quoc V. Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).Google Scholar
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar

Index Terms

GEVO: GPU Code Optimization Using Evolutionary Computation
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Heuristic function construction
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

GEVO-ML: a proposal for optimizing ML code with evolutionary computation
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion

Parallel accelerators, such as GPUs, are a key enabler of large-scale Machine Learning (ML) applications. However, programmers often lack detailed knowledge of the underlying architecture and fail to fully leverage their computational power. This paper ...
Read More
Genetic improvement of GPU code
GI '19: Proceedings of the 6th International Workshop on Genetic Improvement

As the programming stack and tool support for GPU have matured, GPUs have become accessible to programmers who often lack domain-specific knowledge of the underlying architecture and fail to fully leverage the GPU's computation power. This paper ...
Read More
Neural acceleration for GPU throughput processors
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 17, Issue 4
December 2020
430 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3427420
Editor:
David Kaeli
Northeastern University, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 November 2020
- Accepted: 1 July 2020
- Revised: 1 April 2020
- Received: 1 November 2019
Published in taco Volume 17, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU code optimization
Genetic improvement
LLVM intermediate representation
approximate computing
multi-objective evolutionary computation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 988
  Total Downloads
- Downloads (Last 12 months)291
- Downloads (Last 6 weeks)56
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

GEVO: GPU Code Optimization Using Evolutionary Computation

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

GEVO-ML: a proposal for optimizing ML code with evolutionary computation

Genetic improvement of GPU code

Neural acceleration for GPU throughput processors