Commencement 2026: Helping AI see the world better – to build a better world - MBZUAI MBZUAI

Commencement 2026: Helping AI see the world better – to build a better world

Wednesday, May 06, 2026

Once upon a time, Muhammad Maaz was on his way to becoming an electrical engineer in Pakistan, and AI was far from the global force it is now. But even in 2018, Maaz could see where his future was headed. 

By the final year of his undergraduate degree, he had shifted his focus toward machine learning. From there, he went on to build projects in facial recognition and object tracking for retail and road safety organizations in Pakistan and Saudi Arabia – early explorations that would shape the course of his future research and career. 

Today, as he prepares to graduate with a Ph.D. in Computer Vision from MBZUAI, Maaz stands at the forefront of a new wave of AI research. His aim is to help AI move beyond text and develop a richer, more complete understanding of the world and our visual reality. Not a small ambition, but one perfectly suited to Maaz’s determined, visionary spirit. 

“We are trying to build new technologies and algorithms that are more efficient and address the unsolved problems,” he says. “Without creativity, it’s not possible. And in order to be creative, you need the right environment.” 

So, it’s fortunate that Maaz became part of MBZUAI’s first cohort, finding the ideal environment in which to make his transition from engineer to researcher and innovator. 

Moving from language to perception: the next frontier of AI 

Maaz’s work sits at the cutting edge of one of the most important shifts in artificial intelligence today: the move from text-based models to multimodal systems that can understand images, videos, and the physical world. 

While large language models have transformed how machines process and generate text, the next challenge is far more complex – teaching AI to see, interpret, and reason across different types of data, simultaneously. 

Through his research, Maaz has contributed to foundational advances in this space. His work around Video-ChatGPT explored systems that can “converse” about video content, while his experience with GLaMM introduced new approaches to grounding and segmenting visual information. These projects have gained significant traction in the research community, with thousands of citations and widespread adoption. 

Later, during his time at Meta, Maaz co-developed PerceptionLM, one of the first fully open-source, reproducible multimodal large language models capable of understanding both images and video at scale. 

“It took around one year,” he recalls. “But we had something on the table that was reproducible from scratch without any proprietary distillation.” 

The result was a system trained on hundreds of millions of visual data samples, enabling users to interact with images and videos in ways that were previously impossible. 

Building impactful real-world applications 

For Maaz, the significance of this work lies not just in technical achievement, but in its real-world applications. 

“The examples are tremendous,” Maaz says, explaining that a multimodal system such as PerceptionLM can transform how we interact with visual data. Instead of manually reviewing hours of CCTV footage, a user could simply ask the system to identify anomalies or summarize key events. In healthcare, such models could assist in surgical settings – helping to detect the rare but serious instances in which foreign objects have been left inside a patient. 

“This is a foundation model,” he says. “So it can be adapted to many other applications.” 

But his vision goes even further. While AI has already made significant strides in the digital world, Maaz believes its greatest impact has yet to be realized in the physical one. 

“One area where AI has not shown a real impact yet is in robotics,” he says. “AI can paint. But I want AI to do my dishes.” 

This focus on practical value also shapes how Maaz thinks about success. In research, he explains, impact is measured not only by publication, but by adoption – and seeing how others build on your efforts. 

“How many people are utilizing or benefiting from the research that you’re doing?” he asks. “Citations mean your research is being used by another researcher, and it’s an indication that you’re doing valuable and impactful work.” 

With more than 6,000 citations and a Global Google Ph.D. Fellowship to his name, it’s safe to say that Maaz’s work has already made a significant mark. But his ultimate goal is bigger. Much bigger. He wants to develop systems that benefit not just researchers or industry leaders, but society as a whole. 

“I want to see AI be helpful for a bigger population, not just the small group of specialists building and studying it. Making life easier for everyone beyond that group would be something meaningful and valuable.” 

Embracing the discipline behind discovery 

Maaz is open about the realities of research: progress is rarely linear, and early success is far from guaranteed. During the first years of his Ph.D., he produced relatively few visible outputs – a period marked by experimentation, setbacks, and uncertainty. 

“In research, you don’t just succeed,” he warns. “You have to fail multiple times to get something fruitful or meaningful on the table. But this is part of the process. And it’s very important to trust experiments and keep going.” 

What sustained him through that period was consistency and the support of his advisors. Mentorship played a critical role, providing both the resources and the direction needed to navigate complex problems. 

“You need compute to solve the technological problems,” he explains. “But you need human backing to carry on.” 

Regular feedback and a clear research direction helped him remain committed, even when results were slow to materialize. Over time, that persistence paid off, culminating in a series of influential contributions that defined his later Ph.D. years. 

Maaz’s internship at Meta was another turning point. Working alongside leading researchers in Silicon Valley, he contributed to large-scale, high-impact projects with significant resources and global reach. Yet the experience also reinforced something important: the work being done at MBZUAI was already on par with the best in the field – and the world. 

“When I wasn’t in Silicon Valley, I thought that those people were doing something beyond our reach,” her recalls. “But after going there and working with them, I realized that we were doing equally good, relevant, and valuable work.” 

Following graduation, Maaz is set to return to Meta as a Research Scientist, where he will continue developing multimodal AI systems and expanding on his work with PerceptionLM. But his long-term vision remains grounded in a simple principle: build AI that matters. 

For Maaz, the future of artificial intelligence is not just about making systems more powerful, but more useful – bridging the gap between advanced technology and everyday human needs. “We are trying to minimize human effort in the digital world,” he says. “But in the physical world, AI is no more than a newborn kid. I would be very happy to contribute to this part.” 

As a graduate of MBZUAI’s first cohort, Maaz represents a newborn generation of researchers, dedicated to shaping how AI sees the world – and how the world sees AI. 

Related

thumbnail
Wednesday, May 20, 2026

Commencement 2026: Opening the black box of AI

As AI systems grow more human-like, their internal logic remains largely hidden. MBZUAI graduate Chenxi Wang is.....

  1. Commencement 2026 ,
  2. master's ,
  3. commencement ,
  4. nlp ,
  5. black box ,
  6. natural language processing ,
Read More
thumbnail
Thursday, May 14, 2026

Commencement 2026: Why Ahmed Alshamsi sees field experts and youth as the real builders of the future

MBZUAI computer vision graduate and Y71 founder believes AI's greatest value emerges when domain experts, students, and.....

  1. startup ,
  2. Commencement 2026 ,
  3. master's ,
  4. commencement ,
  5. cybersecurity ,
  6. M.Sc. ,
  7. youth ,
  8. entrepreneur ,
  9. computer vision ,
Read More
thumbnail
Tuesday, May 12, 2026

From first cohort to valedictorian: Hanoona Rasheed’s rise at MBZUAI

After six years of remarkable research and global impact, the Class of 2026 valedictorian will stay on.....

  1. computer vision ,
  2. valedictorian ,
  3. graduate ,
  4. Commencement 2026 ,
  5. commencement ,
  6. silicon valley ,
  7. Ph.D. ,
Read More