In the rapidly evolving field of deep learning, the need for models that balance expressivity and computational efficiency has become increasingly critical. This talk introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms while preserving the ability to capture long-range dependencies and enable in-context learning. At the heart of Orchid is a new data-dependent global convolution layer, which adapts its kernel dynamically based on the input sequence through a dedicated conditioning neural network. To ensure scalability and shift equivariance, two simple yet effective conditioning network designs are proposed to enhance the expressivity of the convolution operation.
Orchid achieves quasilinear computational complexity O(N logN) for sequences of length N, without compromising performance. Extensive evaluations across domains, including language modelling and image classification, demonstrate that Orchid outperforms traditional attention-based architectures, such as BERT and Vision Transformers, often with smaller model sizes. Moreover, Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, marking a significant step toward more efficient and scalable deep learning models for sequence modelling.
This is joint work with Mehdi Karami (Google Research) and is presented as part of a NeurIPS 2024 paper.
Ali Ghodsi is a Professor at the University of Waterloo, where he earned his PhD, Director of the Data Analytics Lab, and a Vector Institute Faculty Affiliate. He specializes in machine learning and artificial intelligence, with a research focus on developing theoretical frameworks and practical algorithms for AI. His work spans applications in natural language processing, bioinformatics, and computer vision. He is the co-author of Elements of Dimensionality Reduction and Manifold Learning (Springer). He has published over 200 papers in leading journals, including Nature Communications, Nature Methods, and Nature Machine Intelligence, as well as top conferences such as NeurIPS, ICML, and CVPR. Dr. Ghodsi is also an active educator, delivering lectures that combine depth and clarity. His YouTube courses provide accessible insights into advanced AI topics and are widely followed by students and professionals worldwide.
Read More
Read More