At last count, 17 MBZUAI co-authors have 20 papers accepted at the 40th annual International Conference on Machine Learning (ICML), which will be held in late July in Honolulu, Hawaii.
MBZUAI Visiting Associate Professor of Machine Learning Tongliang Liu leads all authors with seven publications, followed by Deputy Department Chair and Associate Professor of Machine Learning, and Director of the Center for Integrative Artificial Intelligence (CIAI) Kun Zhang with six.
Liu gained recent recognition for his work on trustworthy machine learning systems as a member of IEEE AI’s 10 to Watch list.
“His work in theories and algorithms of ML with noisy labels has led to significant contributions and influence in the fields of ML, computer vision, natural language processing (NLP), and data mining, as large-scale datasets in those fields are prone to suffering severe label errors,” according to the IEEE Computer Society announcement.
Watch a talk by Liu on noisy labels:
Deep neural networks require vast amounts of data — data which works optimally when it is labelled correctly. But in the real world, data is not often as clean as data scientists would prefer, which makes networks perform sub-optimally. Many machine learning algorithms dealing with noisy labels have been developed in the community. They can be roughly divided into two groups — the semi-supervised methods, and the methods that seek to model and reduce noise .
In their paper, 'Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?,' Zhang et al. investigate which is better for labelling or 'annotation' of a dataset for use in a deep neural network — SSL-based methods or model-based methods. Annotation is important in a deep neural network because it helps to train the network to recognize patterns and make accurate predictions. In a neural network, the goal is to learn from a set of input data and produce output predictions. Annotation involves labeling the input data with information that the network can use to learn from.
“Machine learning should not be accessible only to those who can pay. Specifically, modern machine learning is migrating to the era of complex models (e.g., deep neural networks), which require a plethora of well-annotated data,” Liu said. “Giant companies have enough money to collect well-annotated data. However, for startups or non-profit organizations, such data is barely acquirable due to the cost of labeling data.”
In total, three MBZUAI faculty members co-authored the work including Liu, Zhang, and Affiliated Associate Professor of Machine Learning Mingming Gong, as well as Postdoctoral Fellow Yu Yao.
The team shows that the answer is dependent on how the data was created. In the case of SSL-based methods, the manner in which the data was created is critical, while model-based methods are not influenced. The co-authors assert that a hybrid method should be designed to simultaneously model label noise and leverage SSL to improve the overall model’s robustness. In addition, the team proposes a Casual-structure Detection method for learning with Noisy Labels or CDNL estimator to detect how the data was created, and where the information is necessary in method design.
“To be reliable and trustworthy, many AI systems need to deal with noise that may occur in the human labeling process because we have imperfect understandings or we don’t have much time,” Zhang said. “At the same time, semi-supervised learning is a paradigm widely used in human learning processes. This work aims to draw connections between them and figure out potential ways for them to benefit each other.”
A team from MBZUAI presented a new approach for optimizing neural networks at the recent NeurIPS conference.
From optimal decision making to neural networks, we look at the basics of machine learning and how.....
MBZUAI research shows how a better understanding of the relationships between variables can benefit fundamental scientific research.