ABSTRACT
Parallel accelerators, such as GPUs, are a key enabler of large-scale Machine Learning (ML) applications. However, programmers often lack detailed knowledge of the underlying architecture and fail to fully leverage their computational power. This paper proposes GEVO-ML, a tool for automatically discovering optimization opportunities and tuning the performance of ML kernels. GEVO-ML extends earlier work on GEVO (Gpu optimization using EVOlutionary computation) by focusing directly on ML frameworks, intermediate languages, and target architectures. It retains the multi-objective evolutionary search developed for GEVO, which searches for edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. In earlier work, we studied some ML workloads in GPU settings and found that GEVO could improve kernel speeds by factors ranging from 1.7X to 2.9X, even with access to only a small portion of the overall ML framework. This workshop paper examines the limitations and constraints of GEVO for ML workloads and discusses our GEVO-ML design, which we are currently implementing.
- 2018. XLA is a compiler that optimizes TensorFlow computations. https://www.tensorflow.org/xla/. (2018).Google Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proc. of the 12th USENIX Conf. on Operating Systems Design and Implementation.Google Scholar
- P Anju. 2018. Tips to Improve Performance for Popular Deep Learning Frameworks on CPUs. Intel Developer Zone (2018).Google Scholar
- Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, and P. Sadayappan. 2015. On Optimizing Machine Learning Workloads via Kernel Fusion. In Proceedings of the 20th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP 2015). Association for Computing Machinery, New York, NY, USA, 173--182. Google ScholarDigital Library
- Benoit Baudry, Simon Allier, Marcelino Rodriguez-Cancio, and Martin Monperrus. 2015. Automatic software diversity in the light of test suites. arXiv preprint arXiv:1509.00144 (2015).Google Scholar
- Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).Google Scholar
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305.Google ScholarDigital Library
- Bobby R. Bruce, Justyna Petke, and Mark Harman. 2015. Reducing Energy Consumption Using Genetic Improvement. In Proc. of the 17th Annual Conf. on Genetic and Evolutionary Computation.Google ScholarDigital Library
- Forbes J Burkowski. 1999. Shuffle crossover and mutual information. In Proc. of the 1999 Congress on Evolutionary Computation-CEC99.Google ScholarCross Ref
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. (2011).Google ScholarDigital Library
- Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).Google Scholar
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proc. of 13th {USENIX} Symp. on Operating Systems Design and Implementation.Google Scholar
- François-Michel De Rainville, Félix-Antoine Fortin, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: A Python Framework for Evolutionary Algorithms. In Proc. of the 14th Annual Conf. Companion on Genetic and Evolutionary Computation.Google ScholarDigital Library
- Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and Tanaka Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation (2002).Google Scholar
- Vidroha Debroy and W Eric Wong. 2010. Using Mutation to Automatically Suggest Fixes for Faulty Programs. In Proc. of 3rd Intl. Conf. on Software Testing, Verification and Validation.Google ScholarDigital Library
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. (2017). http://archive.ics.uci.edu/mlGoogle Scholar
- Facebook. 2018. Finding and Fixing Software Bugs Automatically With Sapfix and Sapienz. https://code.fb.com/developer-tools/finding-and-fixing-software-bugs-automatically-with-sapfix-and-sapienz/. (2018).Google Scholar
- Facebook. 2019. Caffe2. (2019). https://caffe2.ai/.Google Scholar
- Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, and Claire Le Goues. 2009. A Genetic Programming Approach to Automated Software Repair. In Proc. of the 11th Annual Conf. on Genetic and Evolutionary Computation.Google ScholarDigital Library
- Google. 2019. TensorFlow Performance Guide. https://docs.w3cub.com/tensorflow~guide/performance/performance_guide/#general_best_practices. (2019). TensorFlow Documentation.Google Scholar
- C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering (2012).Google Scholar
- Saemundur O. Haraldsson, John R. Woodward, Alexander, E.I. Brownlee, A. V. Smith, and V. Gudnason. 2017. Genetic improvement of runtime and its fitness landscape in a bioinformatics application. In Proc. of the Genetic and Evolutionary Computation Conf. Companion.Google Scholar
- Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, and Kristin Siggeirsdottir. 2017. Fixing Bugs in Your Sleep: How Genetic Improvement Became an Overnight Success. In Proc. of the Genetic and Evolutionary Computation Conf. Companion.Google ScholarDigital Library
- Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In Proc. of IEEE Intl. Symp. on High Performance Computer Architecture.Google ScholarCross Ref
- Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, et al. 2003. A practical guide to support vector classification. (2003).Google Scholar
- Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In Proc. of the IEEE Conf. on computer vision and pattern recognition.Google ScholarCross Ref
- Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions. In Proc. of the 27th ACM Symp. on Operating Systems Principles (SOSP '19).Google ScholarDigital Library
- Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1--12.Google ScholarDigital Library
- Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report.Google Scholar
- William B Langdon and Mark Harman. 2010. Evolving a CUDA kernel from an nVidia template. In Proc. of IEEE Congress on Evolutionary Computation.Google ScholarCross Ref
- William B. Langdon, Brian Yee Hong Lam, Justyna Petke, and Mark Harman. 2015. Improving CUDA DNA Analysis Software with Genetic Programming. In Proc. of the 17th Annual Conf. on Genetic and Evolutionary Computation.Google ScholarDigital Library
- Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. 2007. An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. In Proc. of the 24th Intl. Conf. on Machine Learning.Google ScholarDigital Library
- Chris Lattner and Jacques Pienaar. 2019. MLIR Primer: A Compiler Infrastructure for the End of Moore's Law. (2019).Google Scholar
- Yann Le Cun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. of the IEEE (1998).Google Scholar
- Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proc. of the 34th Int. Conf. on Software Engineering.Google ScholarCross Ref
- C.-Y. Lee and E. K. Antonsson. 2000. Variable Length Genomes for Evolutionary Algorithms. In Proc. of 2nd Annual Conf. on the Genetic and Evolutionary Computation Conf.Google Scholar
- Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019. Genetic Improvement of GPU Code. In Proc. of the 6th Intl. Workshop on Genetic Improvement (GI '19). Best paper award.Google ScholarDigital Library
- Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019. Uncovering Performance Opportunities by Relaxing Program Semantics of GPGPU Kernels. Wild and Crazy Idea session at the 24th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. (2019).Google Scholar
- Jhe-Yu Liou, Xiaodong Wang, Stephanie Forrest, and Carole-Jean Wu. 2020. GEVO: GPU Code Optimization using EvolutionaryComputation. (2020). arXiv:cs.NE/2004.08140Google Scholar
- Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).Google Scholar
- LLVM. 2020. Multi-Level IR Compiler Framework. (2020). https://mlir.llvm.org/.Google Scholar
- Lech Madeyski, Wojciech Orzeszyna, Richard Torkar, and Mariusz Jozala. 2014. Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation. IEEE Transactions on Software Engineering (2014).Google Scholar
- Henry Massalin. 1987. Superoptimizer: A Look at the Smallest Program. In Proc. of the 2nd Intl. Conf. on Architectual Support for Programming Languages and Operating Systems.Google ScholarDigital Library
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning Convolutional Neural Networks for Resource Efficient Inference. In Proc. of Intl. Conf. on Learning Representations.Google Scholar
- David J Montana and Lawrence Davis. 1989. Training Feedforward Neural Networks Using Genetic Algorithms.. In IJCAI.Google Scholar
- Gregory Morse and Kenneth O. Stanley. 2016. Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (GECCO '16). Association for Computing Machinery, New York, NY, USA, 477--484. Google ScholarDigital Library
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
- John C. Platt. 1999. Advances in Kernel Methods. Chapter Fast Training of Support Vector Machines Using Sequential Minimal Optimization.Google Scholar
- Qualcomm. 2016. Snapdragon Neural Processing Engine SDK. (2016). https://developer.qualcomm.com/docs/snpe/overview.html.Google Scholar
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices (2013).Google ScholarDigital Library
- Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In Proc. of the AAAI Conf. on Artificial Intelligence, Vol. 33. 4780--4789.Google ScholarDigital Library
- Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, and Alexey Kurakin. 2017. Large-scale evolution of image classifiers. In Proc. of the 34th Intl. Conf. on Machine Learning-Volume 70. JMLR. org.Google ScholarDigital Library
- Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems.Google Scholar
- Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Satish Nadathur, Jakob Olesen, et al. 2018. Glow: Graph Lowering Compiler Techniques for Neural Networks. arXiv preprint arXiv:1805.00907 (2018).Google Scholar
- David Saad. 1998. Online algorithms and stochastic approximations. Online Learning 5 (1998), 6--3.Google Scholar
- Eric Schulte. 2014. Neutral Networks of Real-World Programs and their Application to Automated Software Evolution. Ph.D. Dissertation. University of New Mexico, Albuquerque, USA.Google Scholar
- Eric Schulte, Jonathan DiLorenzo, Stephanie Forrest, and Westley Weimer. 2013. Automated Repair of Binary and Assembly Programs for Cooperating Embedded Devices. In Proc. of Intl. Conf. on Architectural Support for Programming Languages and Operating Systems.Google ScholarDigital Library
- Eric Schulte, Jonathan Dorn, Stephen Harding, Stephanie Forrest, and Westley Weimer. 2014. Post-compiler Software Optimization for Reducing Energy. In Proc. of the 19th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems.Google ScholarDigital Library
- Eric Schulte, Zachary P. Fry, Ethan Fast, Westley Weimer, and Stephanie Forrest. 2014. Software Mutational Robustness. Genetic Programming and Evolvable Machines (2014).Google Scholar
- Eric M Schulte, Westley Weimer, and Stephanie Forrest. 2015. Repairing COTS Router Firmware without Access to Source Code or Test Suites: A Case Study in Evolutionary Software Repair. In Proc. of the 1st Genetic Improvement Workshop.Google ScholarDigital Library
- Pitchaya Sitthi-Amorn, Nicholas Modly, Westley Weimer, and Jason Lawrence. 2011. Genetic Programming for Shader Simplification. In Proc. of the 2011 SIGGRAPH Asia Conf.Google ScholarDigital Library
- Leslie N Smith. 2017. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 464--472.Google ScholarCross Ref
- Kenneth O Stanley, David B D'Ambrosio, and Jason Gauci. 2009. A hypercubebased encoding for evolving large-scale neural networks. Artificial life (2009).Google Scholar
- Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation 10, 2 (2002), 99--127.Google ScholarDigital Library
- D Stathakis. 2009. How many hidden layers and nodes? International Journal of Remote Sensing 30, 8 (2009), 2133--2147.Google ScholarDigital Library
- Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In Proc. of the 19th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD '13).Google ScholarDigital Library
- Nadarajen Veerapen, Fabio Daolio, and Gabriela Ochoa. 2017. Modelling genetic improvement landscapes with local optima networks. In Proc. of the Genetic and Evolutionary Computation Conf. Companion.Google ScholarDigital Library
- Phillip Verbancsics and Kenneth O Stanley. 2011. Constraining connectivity to encourage modularity in HyperNEAT. In Proc. of the 13th annual Conf. on Genetic and evolutionary computation. ACM.Google ScholarDigital Library
- Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim Hazelwood, and David Brooks. 2019. Exploiting Parallelism Opportunities with Deep Learning Frameworks. arXiv preprint arXiv:1908.04705 (2019).Google Scholar
- Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically Finding Patches Using Genetic Programming. In Proc. of the 31st Intl. Conf. on Software Engineering.Google ScholarDigital Library
- Zeyi Wen, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. 2018. ThunderSVM: A Fast SVM Library on GPUs and CPUs. Journal of Machine Learning Research (2018).Google Scholar
- Carole-Jean Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, B. Wasti, Y. Wu, R. Xian, S. Yoo, and P. Zhang. 2019. Machine Learning at Facebook: Understanding Inference at the Edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 331--344.Google Scholar
- Sixin Zhang, Anna E Choromanska, and Yann LeCun. 2015. Deep learning with elastic averaging SGD. In Advances in neural information processing systems. 685--693.Google Scholar
- Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).Google Scholar
Index Terms
- GEVO-ML: a proposal for optimizing ML code with evolutionary computation
Recommendations
GEVO: GPU Code Optimization Using Evolutionary Computation
GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become ...
Genetic improvement of GPU code
GI '19: Proceedings of the 6th International Workshop on Genetic ImprovementAs the programming stack and tool support for GPU have matured, GPUs have become accessible to programmers who often lack domain-specific knowledge of the underlying architecture and fail to fully leverage the GPU's computation power. This paper ...
Constraint handling with modified hypervolume indicator for multi-objective optimization problems
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computationMany problems across various domains of research may be formulated as a multi-objective optimization problem. The Multi-objective Evolutionary Algorithm framework (MOEA) has been applied successfully to unconstrained multi-objective optimization ...
Comments