Understanding the mixture of the expert layer in Deep Learning

Monday, November 14, 2022

The Mixture of Expert (MoE) layer is a newly invented sparsely activated layer in deep learning. Using a router network to route each token to one of the experts, the MoE layer can achieve comparable performance to the standard dense layer while significantly reducing the inference time.

Speaker/s

Yuanzhi Li is an assistant professor at CMU, Machine Learning Department, and an affiliated faculty at MBZUAI. His primary research area is deep learning theory and natural language processing. He did his Ph.D. at Princeton University under the advice of Sanjeev Arora.

Related