Beyond borders: cultivating the next generation of AI innovators in a global tech hub

By Tim Baldwin, Provost and Professor of Natural Language Processing at MBZUAI

Monday, November 11, 2024

Previously published in MIT Tech Review.

In order to make AI truly accessible to everyone, we need to invest in the creation of more academic centers dedicated to collaborative research and development which can in turn train the next generation of AI innovators. The Mohamed bin Zayed University of Artificial Intelligence in Abu Dhabi shows a path forward.

A few years ago, I had to make one of the biggest decisions of my life. I could continue as a professor at the University of Melbourne, a place I had called home for 20 years and where colleagues and I had established one of the top research groups in natural language processing. Or I could try something entirely different by moving to another part of the world to help build a brand new university focused entirely on AI.

Like anyone facing a major life decision, I thought deeply about my priorities. I have always felt that training and cultivating the next generation of minds is perhaps our most important task as scientists. With the rapid development we have seen in artificial intelligence over the past few years, I came to the realization that educating the next generation of AI innovators must be done in a way that is as inclusive as possible so that the benefits of the technology are shared widely across the globe.

The world in all its complexity

Today, unfortunately, the rewards of AI are mostly enjoyed by a few countries in what the Oxford Internet Institute dubbed the Compute North. These countries, such as the U.S., the U.K., France, Canada and China, have dominated research and development, and built state-of-the-art AI infrastructure capable of training foundational models. This should come as no surprise, as these countries are home to many of the world’s top universities and large tech corporations.

Most of the world lives in what the Oxford Internet Institute calls Compute Deserts: countries that lack the AI infrastructure needed to develop foundational AI models.

 

Budding scientists in labs and classrooms of cities like London, Paris, Berkeley or Beijing tread well-worn paths to top graduate programs where they deepen their expertise before taking jobs at scrappy startups or tech giants. These innovation ecosystems have led to significant leaps in applications of AI that range from self-driving cars to large language models to medical-image analysis.

But this concentration of innovation comes at a cost for billions of people who live outside these dominant countries and have different cultural backgrounds.

Large language models (LLMs) are illustrative of a significant challenge in AI. Researchers have shown that many of the most popular multilingual LLMs perform poorly with languages other than English, Chinese, and a handful of other (mostly) European languages. Yet, there are approximately 6,000 languages spoken today, many of them in communities in Africa, Asia and South America. Arabic alone is spoken by almost 400 million people and Hindi has 575 million speakers around the world.

Arabic

Model                  Knowledge         Commonsense       Misinformation/Bias        Average

LLaMA 2 (7B)      29.0                    39.3                        47.5                                37.2

LLaMA 2 (13B)    30.0                    40.3                        47.7                                38.1

Jais (6.7B)           36.6                    45.5                         49.3                               43.2

Jais (13B)            40.0                    49.8                         49.8                               46.5

 

English

LLaMA 2 (7B)      35                       58.9                         55.4                               53.9

LLaMA 2 (13B)    36.2                    60.8                         53.7                               55.0

Jais (6.7B)            32.8                   53.8                         54.0                               50.0

Jais (13B)             34.6                   59.5                         53.5                               53.9

LLaMA 2 performs up to 50% better in English compared to Arabic, when measured using the LM-Evaluation-Harness framework. Meanwhile, Jais – an LLM co-developed by MBZUAI – exceeds LLaMA 2 in Arabic and is comparable to Meta’s model in English.

 

If we are going to build applications that work across a wider array of the world’s languages — and we must — we need to create new institutions outside the Compute North that consistently and conscientiously invest in building tools designed for the thousands of language communities across the world.

Environments of innovation

One way to design new institutions is to study history and understand how today’s centers of gravity in AI research emerged decades ago. Before Silicon Valley earned its reputation as the center of global technological innovation, it was called Santa Clara Valley and was known for its prune farms. Though few were able to foresee it at the time, this humble valley possessed the key ingredients to become the tech capital of the world. The main catalyst for its makeover was a relatively new university that was unencumbered by the constraints of tradition and rigid academic culture of eastern rivals.

By the mid-twentieth century, Stanford University had built a reputation as one of the best places in the world to study electrical engineering. Over the years, through a combination of government-led investment through grants and focused research, the university birthed countless inventions that advanced computing. In addition, a culture of entrepreneurship encouraged bright students and faculty to strike out beyond campus and build businesses around their innovations. The results speak for themselves: Stanford alumni have founded companies such as Alphabet, NVIDIA, Netflix, and PayPal.

Today, like our predecessor in Santa Clara Valley, we have an opportunity to build a new technology hub centered around a university. That’s why I chose to join the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), the world’s first research university focused entirely on AI. From our position at the geographical crossroads of East and West, our goal is to attract the brightest minds from around the world and equip them with the tools they need to push the boundaries of AI research and development.

A community for inclusive AI

Our student body comes from more than 50 different countries around the globe. We have attracted top researchers such as Monojit Choudhury from Microsoft, Elizabeth Churchill from Google, Ted Briscoe from the University of Cambridge, Sami Haddadin from the Technical University of Munich, and Yoshihiko Nakamura from the University of Tokyo — just to name a few.

These scientists may come from different places but they all share a common vision: they’ve come to MBZUAI because they were intrigued by the interdisciplinary nature of the university, its relentless focus on making AI a force for global progress, and the opportunity the university provides to collaborate with others in disciplines such as robotics, natural language processing, machine learning, and computer vision.

In addition to these traditional AI disciplines, we have built departments in sibling areas that can both contribute to and benefit from AI, including human-computer interaction, statistics and data science, and computational biology

In 2024, MBZUAI welcomed 200 students from 36 countries, making it the most diverse group of graduates the university has had in its five year history.

 

Abu Dhabi’s commitment to MBZUAI is part of a broader vision for AI that extends beyond academia. Our scientists have collaborated with G42, an Abu Dhabi-based tech company, on Jais, an Arabic-centric LLM that is the highest-performing open-weight Arabic LLM, and also NANDA, an advanced Hindi large language model. Our Institute of Foundational Models have created LLM360, an initiative designed to level the playing field of large model research and development by publishing fully open source models and datasets that are competitive with closed source or open weights models available from tech companies in North America or China.

We are also developing language models that specialize in Turkic languages, which have traditionally been underrepresented in natural language processing and yet are spoken by millions of people across a vast swathe of land running from the Mediterranean to western China.

Another recent project has brought together native speakers of 26 languages from 28 different countries to compile a benchmark dataset that evaluates the performance of vision language models and their ability to understand cultural nuances in images.

These kinds of efforts to expand the capabilities of AI to broader communities are necessary if we want to maintain the world’s cultural diversity and provide everyone with AI tools that are useful to them.

At MBZUAI, we have created a unique mix of students and faculty to drive globally-inclusive AI innovation for the future. By building a broad community of scientists, entrepreneurs, and thinkers, the university is increasingly establishing itself as a driving force in AI innovation that extends far beyond Abu Dhabi, with the goal of developing technologies that are inclusive for the world’s diverse languages and cultures.

Related

thumbnail
Thursday, February 13, 2025

Six predictions for how AI will evolve in 2025

MBZUAI Provost and Professor of NLP, Tim Baldwin, looks at the AI innovations, advances and challenges we.....

  1. agentic ,
  2. predictions ,
  3. Tim Baldwin ,
  4. embodied AI ,
  5. foundational models ,
  6. artificial intelligence ,
  7. provost ,
  8. innovation ,
  9. university ,
Read More
thumbnail
Tuesday, January 21, 2025

Making MBZUAI the “Stanford of the Middle East”

MBZUAI President and University Professor Eric Xing announces ambitions for 2025 and beyond in his New Year,.....

  1. strategy ,
  2. growth ,
  3. future ,
  4. MBZUAI ,
  5. academic programs ,
  6. executive program ,
  7. innovation ,
Read More
thumbnail
Thursday, January 02, 2025

AI in fiction: 10 to read in 2025

To mark Science Fiction Day, here are 10 examples of AI in the written word to help.....

  1. AI ,
  2. artificial intelligence ,
  3. literature ,
  4. science fiction ,
  5. books ,
  6. fiction ,
  7. Asimov ,
  8. ethics ,
  9. future ,
Read More