A rich body of prior work has highlighted the existence of communication bottlenecks in distributed training. To alleviate these bottlenecks, a long line of recent research proposes to use gradient compression methods. In this talk, Dr. Hongyi Wang (CMU) will first evaluate gradient compression methods’ efficacy and compare their scalability with optimized implementations of synchronous data-parallel SGD across more than 200 realistic distributed setups. The observation is that, surprisingly, only in six cases out of 200, do gradient compression methods provide promising speedup. He will then introduce our extensive investigation to identify the root causes of this phenomenon and present a performance model that can be used to identify the benefits of gradient compression for a variety of system setups. Finally, Dr. Hongyi will propose a list of desirable properties (along with two algorithmic instances) that a gradient compression method should satisfy, in order for it to provide significant speedup in real distributed training systems.
Dr. Hongyi Wang is currently a postdoctoral fellow at the machine learning department of Carnegie Mellon University. He obtained his Ph.D. from the Department of Computer Sciences at the University of Wisconsin-Madison. Dr. Wang has received the Baidu best paper award from the Spicy FL workshop at NeurIPS 2020, top reviewer awards from ICML and NeurIPS, the National Scholarship of China (for undergraduate students), Huawei Undergraduate Scholarship, and a few travel grants from top-tier machine learning conferences. He has served as a PC member for SIGKDD 2022, AAAI 2021, and 2022, as well as the artifact evaluation committee member of MLSys 2022.
Read More
Read More