MBZUAI’s Muhammad Maaz named Google Ph.D. Fellow: a first for the Gulf - MBZUAI MBZUAI

MBZUAI’s Muhammad Maaz named Google Ph.D. Fellow: a first for the Gulf

Wednesday, October 29, 2025

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Ph.D. candidate Muhammad Maaz has been awarded the 2025 Global Google Ph.D. Fellowship in Machine Perception, becoming the first student from MBZUAI and the first from the Gulf region to receive this prestigious recognition. 

The fellowship provides up to two years of funding, mentorship from a Google Research Scientist, and USD $50,000 to support living expenses, research, and academic travel. 

“These fellowships recognize outstanding graduate students who are conducting exceptional and innovative research in computer science and related fields, specifically focusing on candidates who seek to influence the future of technology. We are excited to welcome this global cohort and look forward to partnering with them as they continue to become leaders in their respective areas,” Google stated. 

Maaz is one of 255 Ph.D. students from across 35 countries and 12 research domains who understand that accelerating scientific discovery is vital to solving the world’s toughest challenges.  

A historic achievement for MBZUAI and the region 

Maaz’s selection marks a significant moment for MBZUAI and for artificial intelligence (AI) research in the Middle East. In the 12-year history of the Google Ph.D. Fellowship, no other student from the Gulf has previously been named a fellow with Maaz only the second from the MENA region to receive it for the Machine Perception category. 

“Being selected for this fellowship is an incredible honor,” Maaz said. “I am deeply grateful to my advisor, Dr. Salman Khan, and to MBZUAI’s Computer Vision Department for their constant guidance and belief in me. This achievement reflects not only my personal journey but also the world-class research environment MBZUAI provides.” 

MBZUAI Provost, Professor Timothy Baldwin, described the achievement as both a personal and institutional milestone. 

“We couldn’t be prouder of Maaz. He represents the spirit of curiosity, rigor, and global impact that defines MBZUAI’s students,” Baldwin said. “This prestigious fellowship is further evidence of the caliber of research taking place here in Abu Dhabi. As our first Google Ph.D. Fellow, and the first from the entire Gulf region, he has set a new benchmark for our students, that we will continue to build off. His success reinforces MBZUAI’s mission to advance AI for humanity and to train innovators who are at the forefront of the global future of technology.” 

Exceptional consistency and creativity 

Since beginning his research journey at MBZUAI in 2021 as a master’s student, Maaz has consistently excelled at top-tier venues such as Computer Vision and Pattern Recognition Conference (CVPR), European Conference on Computer Vision (ECCV), Neural Information Processing Systems Conference (NeurIPS), and Association for Computational Linguistics (ACL), amassing more than 4,500 citations; a rare distinction for any Ph.D. student worldwide. 

Maaz’s advisor praised his exceptional consistency and creativity. 

“Among roughly 50 students and interns I’ve mentored, Maaz is undoubtedly one of the best students I have worked with in terms of both technical talent and professional maturity,” his supervisor Associate Professor of Computer Vision, Salman Khan, wrote in the recommendation letter supporting his nomination. “Maaz combines strong fundamentals, an impressive record of high-impact research, and the self-drive to explore new frontiers in AI,” he added. 

Maaz has co-authored papers that redefine approaches to open-world detection, segmentation, prompting, and video-language modeling; areas crucial to the next wave of AI innovation. His works include open-vocabulary detection (NeurIPS 2022), MaPLe (CVPR 2023), GLaMM (CVPR 2024), Video-ChatGPT (ACL 2024), and PerceptionLM (NeurIPS 2025), each earning widespread recognition for its originality and impact. 

Maaz’s Ph.D. research, “Towards General-Purpose Video Understanding with Multimodal Large Language Models”, explores how AI can learn to interpret and reason about complex video content. 

His goal is to build multimodal large language models (MLLMs) that can understand long-form, real-world videos with high efficiency and reliability; a critical step toward AI systems that “see” and “reason” more like humans. 

His projects have already made a significant impact. His work “Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models”, published at ACL 2024, was among the first large-scale models for video-language understanding and has since garnered more than 1100 citations. His follow-up work done in collaboration with Meta, “PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding” will be presented at NeurIPS 2025 later this year, and contributes fully open data and models to facilitate research in vision-language modeling. 

From Abu Dhabi to Silicon Valley 

It is not Maaz’s first recognition from the world’s largest tech giants. In 2024, Maaz completed a competitive internship at Meta’s headquarters in Silicon Valley, where he worked on PerceptionLM, a multimodal language model integrating vision and language at scale. 

The project’s brief was ambitious: build a vision encoder for both images and videos that “works out of the box” and achieves world-leading performance. Working within Meta’s FAIR research team, Maaz was responsible for training and scaling multimodal models to enhance video understanding, video question answering, and action recognition tasks. 

He also developed and optimized models for synthetic video-caption generation, which were later used to train Meta’s perception encoder on millions of video samples. 

“I want to build things that are valuable to people, and the internship broadened my perspective about how AI systems can address their needs,” Maaz said. 

Meta later published the results on its company blog, underscoring the significance of the research in shaping open-access multimodal AI systems. 

Looking ahead: pushing the frontiers of AI 

For Maaz, the fellowship represents both validation and motivation. His future research will focus on scaling multimodal video models for more reliable and generalizable video comprehension, working closely with his Google Research mentor to explore new methods for cross-modal reasoning and long-form video understanding. 

“I’m eager to make the most of the opportunity,” Maaz said. “The Google Ph.D. Fellowship will allow me to deepen my work on general-purpose video understanding and to contribute to the development of models that can see, listen, and reason about the world with human-like depth.” 

Since its founding, MBZUAI has positioned Abu Dhabi as a global hub for AI education, research, and innovation. The University’s students continue to publish in world-leading conferences, collaborate with major industry players, and contribute to shaping the ethical and technological landscape of AI. 

Maaz’s journey — from Lahore to Abu Dhabi to Silicon Valley — reflects the University’s vision to empower global talent to tackle the most complex challenges in AI. 

“This achievement is not just mine,” Maaz said. “It represents what’s possible when students are given the right environment to explore, collaborate, and dream big.” 

Related

thumbnail
Wednesday, October 29, 2025

AI systems for earlier and more accurate dementia diagnosis

Ph.D. student Salma Hassan will present two new studies at MICCAI that could improve the diagnosis and.....

  1. neural networks ,
  2. medical imaging ,
  3. Ph.D. ,
  4. MICCAI ,
  5. health ,
  6. medicine ,
  7. healthcare ,
Read More
thumbnail
Friday, October 24, 2025

The rise of agentic AI: homegrown Lawa.AI gains momentum

MBZUAI’s student-led startup shows how agentic AI is reshaping digital engagement and redefining the future of intelligent.....

  1. innovation ,
  2. entrepreneurship ,
  3. students ,
  4. startup ,
  5. llms ,
  6. large language model ,
  7. agentic ,
Read More
thumbnail
Wednesday, October 22, 2025

Satellites are speaking a visual language that today’s AI doesn’t quite get

MBZUAI researchers have co-developed GEOBench-VLM, a benchmark testing AI’s geospatial skills for disasters, climate, and cities.

  1. imaging ,
  2. geospatial ,
  3. benchmark ,
  4. vision-language models ,
  5. ICCV ,
  6. research ,
Read More