Machine learning models learn by analyzing data with optimization algorithms, such as gradient descent, which iteratively minimize the errors the models make on a given training set. It’s an important procedure, as machine learning models are being employed more and more across the spectrum of human activity, from designing medicines, to self-driving cars.
A recent study by researchers at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and other institutions proposes new methods of handling complex optimization problems in machine learning, particularly those involving constraints that are faced in many different conditions.
The study was presented at the Twelfth International Conference on Learning Representations (ICLR 2024), which was held recently in Vienna. The research was a collaboration between Bin Gu and Huan Xiong, both Assistant Professors of Machine Learning at MBZUAI, and Xinzhe Yuan of ISAM. This analysis builds on a previous examination by William de Vazelhes, a Ph.D. student in machine learning at MBZUAI and co-author of the study.
Learning from experience
De Vazelhes is interested in studying and improving the optimization of machine learning models in a variety of different settings. “I think it’s fascinating that a machine can learn from experience instead of hard-coded rules,” de Vazelhes said.
He initially became interested in machine learning through an initiative that is now famous in the artificial intelligence community: Deepmind, an artificial intelligence company that was later acquired by Google, developed models that could devise effective approaches to playing games on the Atari video game system. In some cases, the models surpassed human performance.
An important concept in machine learning is convergence, which plays a role in optimizing these systems, as it helps to assure that an algorithm can reach a quality solution. But there are a lot of unknowns when embarking on building a machine learning model to address a task.
For a given machine learning model and dataset, using the wrong setup for the optimization algorithm may cause the model to learn little from data, or to learn but only after an enormous number of steps. “With all machine learning models, we have to know how a model will converge, if it will take a week, a month or a million years,” de Vazelhes said.
In the study presented at ICLR, the researchers combine two approaches, which they call zeroth-order hard-thresholding. By doing so, they fused the two techniques of zeroth-order methods with hard-thresholding, with the goal of tackling specific settings that arise in machine learning.
The approach can be used to address optimization problems where the mathematical formula describing the model’s objective functions — which are ways of quantifying what a model is trying to achieve — are not available, and where the solution needs to be sparse, meaning that a user would want to enforce many values in the model to be zero. Sparsity helps simplify models and can make them more interpretable by people.
One aspect of the work that has drawn de Vazelhes to optimization is that it sits at the convergence of math and coding. “Often in machine learning, some methods work, but it is not perfectly clear why,” he said. “Our approach is rigorous because under some assumptions we can provide actionable predictions on how the changes we make affect the performance, and if the data doesn’t match our theory, we can come up with more realistic assumptions.”
This proposed new algorithm offers improved convergence rates, which is a way of measuring how quickly a system comes to a solution, and broad applicability by overcoming challenges that are faced by zeroth or hard-thresholding alone.
The study also contributes both theoretical insights through convergence analysis and practical applications by demonstrating effectiveness in real-world tasks, like adversarial attacks in images that are designed to confuse object detection applications.
The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....
A team from MBZUAI used instruction tuning to help multimodal LLMs generate HTML code and answer questions.....
Martin Takáč and Zangir Iklassov's 'self-guided exploration' significantly improves LLM performance in solving combinatorial problems.