Hyperscale Hardware Optimized Neural Architecture Search

Sheng Li, Garrett Andersen, Tao Chen, Liqun Cheng, Julian Grady, Da Huang, Quoc V Le, Andrew Li, Xin Li, Yang Li, Chen Liang, Yifeng Lu, Yun Ni, Ruoming Pang, Mingxing Tan, Martin Wicke, Gang Wu, Shengqi Zhu, Parthasarathy Ranganathan, Norman P Jouppi
ASPLOS 2023 [ACM] [PDF]

Abstract

Recent advances in machine learning have leveraged dramatic increases in computational power, a trend expected to continue in the future. This paper introduces the first Hyperscale Hardware Optimized Neural Architecture Search (H2O-NAS) to automatically design accurate and performant machine learning models tailored to the underlying hardware architecture. H2O-NAS consists of three key components: a new massively parallel “one-shot” search algorithm with intelligent weight sharing, which can scale to search spaces of O(10280) and handle large volumes of production traffic; hardware-optimized search spaces for diverse ML models on heterogeneous hardware; and a novel two-phase hybrid performance model and a multi-objective reward function optimized for large scale deployments.

H2O-NAS has been implemented around state-of-the-art machine learning models (e.g. convolutional models, vision transformers, and deep learning recommendation models) and deployed at zettaflop scale in production. Our results demonstrate significant improvements in performance (22% ∼ 56%) and energy efficiency (17% ∼25%) at same or better quality. Our solution is designed for largescale deployment, streamlining privacy and security processes and reducing manual overhead. This facilitates a smooth and automated transition from research to production.