Samuel Horváth, Assistant Professor of Machine Learning at the Mohamed bin Zayed University of Artificial Intelligence, identifies a paradox that challenges scientists’ efforts to advance machine learning.
While progress within the field has in large part been driven by securing access to and processing high-quality data, the best data often reside with users who are unable, or unwilling, to share them. “Most of the time, these data are private, and users don’t want to share them naively,” Horváth said.
There are methods, however, that scientists like Horváth are developing to reconcile this apparent paradox.
Horváth is interested in federated learning, which is designed to preserve a level of security for users while also providing the benefits of models that have access to large pools of data. With federated learning, data is processed by a model in the same location where the data is generated, for example, on a user’s device.
According to this approach, certain aspects of the data, or improvements made to the version of the model that is located on a user’s device, are then shared with the wider group of users. This allows everyone to benefit from information generated by individuals while at the same time maintaining privacy and security.
Despite these benefits, however, federated learning presents its own challenges.
Since data must be processed on user devices — which are limited in their computational power compared to the much larger machines that are used to train models — researchers must develop extremely efficient algorithms to work on them.
At the 2024 International Conference on Machine Learning (ICML), Horváth presented a study proposing a new framework to efficiently train machine learning models that could be used in federated settings, with coauthors from Brave Software, DataBricks and Carnegie Mellon University. They call their approach Maestro, and it is designed to identify components of a machine learning model that don’t contribute to the model’s overall performance, through a process called trainable decomposition. These components can then be removed, increasing efficiency.
“We want to have the power of a large model, while having them more like an edge device compatible size,” he said.
Increasing efficiency
The deep neural networks that power machine-learning models are composed of many layers of “neurons” that process data. Some of these layers are more important than others. “As you train a model, we often discover that for some of the layers, the capacity, or what we call the true inner dimension of the layers, can be much smaller than they are constructed,” Horváth said.
With the study presented at ICML, Horváth and his coauthors seek to identify redundancy and “decompose each layer into a low-dimensional approximation” of the original layer, he explained. Aspects of the model that are found to be redundant are not used by the model.
“The model we get through this process is much smaller, it runs much faster and requires less memory both to be trained and deployed,” Horváth said.
An added benefit is that the technique allows users to scale the size of the model according to the task and resources that are available. For example, a user may be training a model for a particular task on a tight timeline with access to a limited number of graphics processing units, a type of hardware used to train neural networks. Because components of the model are ranked according to their importance, Horváth and his colleagues’ approach makes it possible to cut out unnecessary components for that task.
In the study, Horváth and his coauthors noted that Maestro performed better compared to another technique designed to increase efficiency called singular value decomposition (SVD). Maestro and SVD were compared on a widely used neural network architecture known as a transformer, which is often used in computer vision and natural language processing tasks.
As they advance their research, the scientists are interested to study how their technique can be applied to large language models and make them more efficient for specific applications, known as downstream tasks. “We want to know if we can remove the redundant information that is not needed for a downstream task,” Horváth said. “We may be able to significantly decrease the model size and then fine-tune it for the task.”
A team from MBZUAI presented a new approach for optimizing neural networks at the recent NeurIPS conference.
From optimal decision making to neural networks, we look at the basics of machine learning and how.....
MBZUAI research shows how a better understanding of the relationships between variables can benefit fundamental scientific research.