New approaches for machine learning optimization presented at ICML

Wednesday, July 24, 2024

Eduard Gorbunov, a research scientist at the Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), is interested in optimization algorithms that are designed to improve the performance of machine learning driven applications. Optimization is a critical process in training machine learning models and aims to improve a model’s performance.

Gorbunov is coauthor of a recent study that proposes new methods for optimization in what are known as composite settings, which use a combination of different kinds of algorithms, and distributed settings, in which the model is run on more than one machine during training. The work is being presented at the International Conference on Machine Learning (ICML 2024), one of the largest annual meetings on machine learning. The conference is taking place this week in Vienna.

The study is a collaboration between scientists at King Abdullah University of Science and Technology, Moscow Institute of Physics and Technology, Université de Montréal, Mila (Quebec Artificial Intelligence Institute), Weierstrass Institute for Applied Analysis and Stochastics, University of Innopolis, Ivannikov Institute for System Programming, Skolkovo Institute of Science and Technology and MBZUAI.

There are many ways that researchers can optimize machine learning models, and the most appropriate optimization method can vary based on a model’s architecture, the amount or kind of data it has been designed to analyze and the tasks for which it is intended.

Approaches to optimization have evolved along with the rapid pace of progress in the field of machine learning as a whole. Researchers are continually developing new machine learning techniques and models are becoming larger, which presents practical challenges for training and optimization due to the size of the datasets and computational power that is required to process them. “There are many optimization methods today and the choice of the optimization method is very important, since it can affect the process of the architecture design,” Gorbunov said.

The goal of optimization is to minimize the “loss function,” which describes how the model’s output corresponds with the correct answer. In many cases, scientists reduce a model’s loss function by adjusting the “internal trainable parameters of the model that essentially define how the model works,” Gorbunov said.

One popular optimization technique is known as gradient clipping, which is used to prevent gradients from becoming too large during training by limiting them to specified thresholds, while still resulting in consistent and efficient learning. Gradient clipping is also particularly useful when data is heavy-tailed, which describes a distribution where there are many values far from the mean. However, while gradient clipping removes values at the heavy tail that aren’t necessarily representative of the data, this technique often isn’t compatible in composite and distributed settings.

Gorbunov explained his and his coauthors’ approach: “We propose new methods that clip not the gradients themselves but the difference between gradient and an estimate which in the distributed case is updated on the fly and converges to the gradient at the solution.”

“In composite optimization or in distributed optimization, individual gradients of clients are not necessarily zero at a solution. Therefore, we clip the differences in our methods, which allows us to improve the convergence and show the first results under such a general setup.”

Gorbunov said that the findings of the study indicate that when doing distributed training with heavy tailed noise, it’s better to clip the difference between the gradient and the estimate instead of clipping the gradient itself. He also noted that high probability convergence is “extremely important to consider” and the choice of the optimization algorithm makes a significant difference in training.

Related

thumbnail
Monday, March 24, 2025

MBZUAI and Berkeley explore the future of machine learning

Machine learning pioneer Michael I. Jordan was among the speakers discussing the cutting-edge ideas shaping the field.

  1. berkeley ,
  2. workshop ,
  3. ML ,
  4. collaborations ,
  5. innovation ,
  6. research ,
  7. machine learning ,
Read More
thumbnail
Tuesday, March 18, 2025

Culturally Yours: A new tool for understanding cultural references in text

Researchers from MBZUAI have developed a tool that uses demographic information to help bridge linguistic and cultural.....

  1. large language models ,
  2. COLING 2025 ,
  3. linguistics ,
  4. languages ,
  5. culture ,
  6. llms ,
  7. research ,
Read More
thumbnail
Thursday, February 13, 2025

Six predictions for how AI will evolve in 2025

MBZUAI Provost and Professor of NLP, Tim Baldwin, looks at the AI innovations, advances and challenges we.....

  1. agentic ,
  2. predictions ,
  3. provost ,
  4. innovation ,
  5. university ,
  6. Tim Baldwin ,
  7. artificial intelligence ,
  8. foundational models ,
  9. embodied AI ,
Read More