Two research papers from MBZUAI have been named as Senior Area Chair Highlights at EMNLP 2025 in Suzhou, China – one of the highest recognitions at the leading conference.
EMNLP (the Conference on Empirical Methods in Natural Language Processing) awarded the distinction to just 36 papers out of 1,811 that were accepted to the main conference – marking out the MBZUAI research as among the top 2% of all accepted work. Some 8,174 submissions were made to the conference in total.
One of the winning papers, MAviS: A Multimodal Conversational Assistant for Avian Species, was developed as part of M.Sc. student Yevheniia Kryklyvets’s thesis, under the supervision of Hisham Cholakkal, Assistant Professor of Computer Vision.
MAviS is an AI system that can see, listen, and talk about birds. It combines images, sounds, and text to recognize species, describe what makes them unique, and answer questions in natural language.
Built on a large, open dataset of over a thousand bird species, MAviS was designed to help researchers, conservationists, and enthusiasts identify and understand wildlife more intuitively.
“I am extremely happy that our paper received the SAC Highlights Award at EMNLP 2025,” said Cholakkal. “EMNLP is one of the most competitive and prestigious conferences in NLP and AI. Out of more than 8,000 paper submissions this year, only 36 were selected for this award, making it highly competitive and truly special.
“I am especially proud of all the student authors. This entire project was driven by my MBZUAI M.Sc. and Ph.D. students, and all authors are affiliated to MBZUAI, which makes the achievement even more meaningful.”
Unlike general-purpose models, MAviS focuses on fine-grained ecological intelligence, helping conservationists and researchers identify species and gather real-time insights to better understand and protect the natural world.
“This topic is extremely relevant because biodiversity conservation and ecological monitoring increasingly rely on accurate, multimodal AI systems that can identify species, analyze behaviors, and support experts and citizen scientists,” said Cholakkal. “MAviS demonstrates why domain-specific MM-LLMs are necessary for real-world ecological applications.”
“For me, this award is a validation that biodiversity-focused multimodal AI is not a niche side topic but an area the community sees value in,” added Kryklyvets. “It gives me even more motivation to continue building research that connects advanced AI with societal and environmental impact.
“My hope is to show that multimodal AI can help protect the species, ecosystems, and sounds that make our world and our memories meaningful.”
The full list of authors is: Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shabzan Khan, Rao Muhammad Anwer, Salman Khan, and Hisham Cholakkal. All from MBZUAI.
The other award-winning paper, Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction, was authored by Huanxin Sheng, Xinyi Liu, Hangfeng He, Jieyu Zhao, and Jian Kang.
“I am very proud of all the authors for this achievement, especially for the leading author Huanxin Sheng,” said Kang, Assistant Professor of Statistics and Data Science, and Computer Science at MBZUAI. “This is the very first paper written by him, so it is a wonderful achievement for Huanxin to conduct such high-quality, award-winning research in his first paper.”
The research tackles a fast-emerging challenge in AI: how to measure the uncertainty of large language models (LLMs) when used as automated evaluators or “judges.”
While LLMs are increasingly being asked to score essays, summarize texts, or grade responses, they often provide a single numerical score without indicating how sure they are. The MBZUAI team applied conformal prediction, a statistical method that generates intervals around each prediction, effectively expressing how uncertain or reliable an LLM’s judgement is.
Their approach, known as interval evaluation, turns a single score into a range (for example, “between 3 and 4”), offering a clearer view of confidence and potential error. This method can be applied to automated evaluations across tasks – from summarization to reasoning – ensuring more reliable AI assessment.
“Because of the emerging ability of LLM-as-a-judge, LLMs can be a cost-effective and adaptive alternative to manual annotation, which is labor-extensive and time-consuming,” explains Kang. “I believe the judges and reviewers appreciated that we are studying a timely problem from an interesting perspective and agreed that applying conformal prediction to analyzing the uncertainty is well grounded and reasonable.”
Research by MBZUAI's Muhammad Haris Khan showed that fine-tuning modern waste detectors more than doubled their performance.....
Researchers from MBZUAI have developed a framework that helps LLMs better detect fake news by setting up.....
Developed by MBZUAI's Institute of Foundation Models, PAN doesn’t just render visuals; it simulates steerable futures you.....