A new strategy for complex optimization problems in machine learning presented at ICLR

Thursday, May 23, 2024

Machine learning models learn by analyzing data with optimization algorithms, such as gradient descent, which iteratively minimize the errors the models make on a given training set. It’s an important procedure, as machine learning models are being employed more and more across the spectrum of human activity, from designing medicines, to self-driving cars.

A recent study by researchers at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and other institutions proposes new methods of handling complex optimization problems in machine learning, particularly those involving constraints that are faced in many different conditions.

The study was presented at the Twelfth International Conference on Learning Representations (ICLR 2024), which was held recently in Vienna. The research was a collaboration between Bin Gu and Huan Xiong, both Assistant Professors of Machine Learning at MBZUAI, and Xinzhe Yuan of ISAM. This analysis builds on a previous examination by William de Vazelhes, a Ph.D. student in machine learning at MBZUAI and co-author of the study.

Learning from experience

De Vazelhes is interested in studying and improving the optimization of machine learning models in a variety of different settings. “I think it’s fascinating that a machine can learn from experience instead of hard-coded rules,” de Vazelhes said.

He initially became interested in machine learning through an initiative that is now famous in the artificial intelligence community: Deepmind, an artificial intelligence company that was later acquired by Google, developed models that could devise effective approaches to playing games on the Atari video game system. In some cases, the models surpassed human performance.

An important concept in machine learning is convergence, which plays a role in optimizing these systems, as it helps to assure that an algorithm can reach a quality solution. But there are a lot of unknowns when embarking on building a machine learning model to address a task.

For a given machine learning model and dataset, using the wrong setup for the optimization algorithm may cause the model to learn little from data, or to learn but only after an enormous number of steps. “With all machine learning models, we have to know how a model will converge, if it will take a week, a month or a million years,” de Vazelhes said.

In the study presented at ICLR, the researchers combine two approaches, which they call zeroth-order hard-thresholding. By doing so, they fused the two techniques of zeroth-order methods with hard-thresholding, with the goal of tackling specific settings that arise in machine learning.

The approach can be used to address optimization problems where the mathematical formula describing the model’s objective functions — which are ways of quantifying what a model is trying to achieve — are not available, and where the solution needs to be sparse, meaning that a user would want to enforce many values in the model to be zero. Sparsity helps simplify models and can make them more interpretable by people.

One aspect of the work that has drawn de Vazelhes to optimization is that it sits at the convergence of math and coding. “Often in machine learning, some methods work, but it is not perfectly clear why,” he said. “Our approach is rigorous because under some assumptions we can provide actionable predictions on how the changes we make affect the performance, and if the data doesn’t match our theory, we can come up with more realistic assumptions.”

This proposed new algorithm offers improved convergence rates, which is a way of measuring how quickly a system comes to a solution, and broad applicability by overcoming challenges that are faced by zeroth or hard-thresholding alone.

The study also contributes both theoretical insights through convergence analysis and practical applications by demonstrating effectiveness in real-world tasks, like adversarial attacks in images that are designed to confuse object detection applications.

Related

thumbnail
Monday, December 23, 2024

Bridging probability and determinism: A new causal discovery method presented at NeurIPS

MBZUAI research shows how a better understanding of the relationships between variables can benefit fundamental scientific research.

  1. student ,
  2. determinism ,
  3. variables ,
  4. casual discovery ,
  5. neurips ,
  6. research ,
  7. machine learning ,
Read More
thumbnail
Wednesday, December 18, 2024

AI and the Arabic language: Preserving cultural heritage and enabling future discovery

The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....

  1. Arabic language ,
  2. atlas ,
  3. language ,
  4. Arabic LLM ,
  5. United Nations ,
  6. jais ,
  7. llms ,
  8. large language models ,
Read More
thumbnail
Monday, December 16, 2024

Web2Code: A new dataset to enhance multimodal LLM performance presented at NeurIPS

A team from MBZUAI used instruction tuning to help multimodal LLMs generate HTML code and answer questions.....

  1. instruction tuning ,
  2. code ,
  3. multimodal ,
  4. llms ,
  5. dataset ,
  6. neurips ,
  7. machine learning ,
Read More