The Mixture of Expert (MoE) layer is a newly invented sparsely activated layer in deep learning. Using a router network to route each token to one of the experts, the MoE layer can achieve comparable performance to the standard dense layer while significantly reducing the inference time.
Yuanzhi Li is an assistant professor at CMU, Machine Learning Department, and an affiliated faculty at MBZUAI. His primary research area is deep learning theory and natural language processing. He did his Ph.D. at Princeton University under the advice of Sanjeev Arora.
Read More
Read More