New approaches for machine learning optimization presented at ICML

Wednesday, July 24, 2024

Eduard Gorbunov, a research scientist at the Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), is interested in optimization algorithms that are designed to improve the performance of machine learning driven applications. Optimization is a critical process in training machine learning models and aims to improve a model’s performance.

Gorbunov is coauthor of a recent study that proposes new methods for optimization in what are known as composite settings, which use a combination of different kinds of algorithms, and distributed settings, in which the model is run on more than one machine during training. The work is being presented at the International Conference on Machine Learning (ICML 2024), one of the largest annual meetings on machine learning. The conference is taking place this week in Vienna.

The study is a collaboration between scientists at King Abdullah University of Science and Technology, Moscow Institute of Physics and Technology, Université de Montréal, Mila (Quebec Artificial Intelligence Institute), Weierstrass Institute for Applied Analysis and Stochastics, University of Innopolis, Ivannikov Institute for System Programming, Skolkovo Institute of Science and Technology and MBZUAI.

There are many ways that researchers can optimize machine learning models, and the most appropriate optimization method can vary based on a model’s architecture, the amount or kind of data it has been designed to analyze and the tasks for which it is intended.

Approaches to optimization have evolved along with the rapid pace of progress in the field of machine learning as a whole. Researchers are continually developing new machine learning techniques and models are becoming larger, which presents practical challenges for training and optimization due to the size of the datasets and computational power that is required to process them. “There are many optimization methods today and the choice of the optimization method is very important, since it can affect the process of the architecture design,” Gorbunov said.

The goal of optimization is to minimize the “loss function,” which describes how the model’s output corresponds with the correct answer. In many cases, scientists reduce a model’s loss function by adjusting the “internal trainable parameters of the model that essentially define how the model works,” Gorbunov said.

One popular optimization technique is known as gradient clipping, which is used to prevent gradients from becoming too large during training by limiting them to specified thresholds, while still resulting in consistent and efficient learning. Gradient clipping is also particularly useful when data is heavy-tailed, which describes a distribution where there are many values far from the mean. However, while gradient clipping removes values at the heavy tail that aren’t necessarily representative of the data, this technique often isn’t compatible in composite and distributed settings.

Gorbunov explained his and his coauthors’ approach: “We propose new methods that clip not the gradients themselves but the difference between gradient and an estimate which in the distributed case is updated on the fly and converges to the gradient at the solution.”

“In composite optimization or in distributed optimization, individual gradients of clients are not necessarily zero at a solution. Therefore, we clip the differences in our methods, which allows us to improve the convergence and show the first results under such a general setup.”

Gorbunov said that the findings of the study indicate that when doing distributed training with heavy tailed noise, it’s better to clip the difference between the gradient and the estimate instead of clipping the gradient itself. He also noted that high probability convergence is “extremely important to consider” and the choice of the optimization algorithm makes a significant difference in training.

Related

thumbnail
Thursday, January 23, 2025

Forecasting crop yields in an era of extreme weather

Fakhri Karray's new statistical model could help improve sustainability, agricultural systems, and the price of produce.

  1. forecasting ,
  2. food ,
  3. environment ,
  4. climate ,
  5. statistics ,
  6. sustainability ,
  7. machine learning ,
Read More
thumbnail
Monday, January 06, 2025

Accelerating neural network optimization: The power of second-order methods

A team from MBZUAI presented a new approach for optimizing neural networks at the recent NeurIPS conference.

  1. neural networks ,
  2. second-order ,
  3. optimization ,
  4. neurips ,
  5. students ,
  6. research ,
Read More
thumbnail
Wednesday, December 25, 2024

Machine learning 101

From optimal decision making to neural networks, we look at the basics of machine learning and how.....

  1. prediction ,
  2. algorithms ,
  3. ML ,
  4. deep learning ,
  5. research ,
  6. machine learning ,
Read More