Since experiencing his first taste of artificial intelligence as an undergraduate in his native Ethiopia, Atnafu Lambebo Tonja’s academic journey has followed a consistent thread: language. Who it serves, who it excludes, and how technology can bridge that gap.
It’s a journey that has taken him around the world in pursuit of knowledge and impact. From East Africa, he travelled to Mexico where he achieved his Ph.D., focusing on NLP for low-resource languages. From Mexico, he went to Abu Dhabi, where he deepened and refined his ability as an MBZUAI postdoc researcher.
And in March this year, he took his next big step – flying from the UAE to the UK to take up the prestigious role of Google DeepMind Academic Fellow at the UCL Centre for Artificial Intelligence.
In London, he will continue his research to promote inclusive and equitable AI technologies that support linguistic diversity, fully-funded, and with the benefit of mentors from Google DeepMind and UCL. His focus will be on multilingual and under-resourced language processing, including the development and evaluation of large language models tailored for low-resource languages.
It’s an interest that is deeply rooted in his lived experience. “I was born and raised in the southern part of Ethiopia, where we have more than 45 languages,” he explains. “People from different regions often struggle to communicate with each other. So, for my master’s, I created a machine translation dataset for five Ethiopian languages, and we trained a model that can translate between those languages and English as well.
“That was my starting point. Then, I looked into different problems for underrepresented languages across Africa, where more than 2,000 languages are spoken, but very few are represented in the research space.”
Tonja developed his research skills during his doctoral studies at the Instituto Politécnico Nacional in Mexico. There, he was exposed to another multilingual landscape; one shaped by Spanish, alongside a wide range of indigenous languages – none of which he spoke.
“I don’t speak Spanish, so I used Google Translate to communicate,” he laughs. “It was fine in school where we used English, but I had to find a solution for when we went to the mall, local markets, and places like that.
“That’s one of the advantages of technology, but it made me think about what somebody would do if they came to somewhere such as Ethiopia where you don’t have those kinds of applications. Spanish is a high-resource language, so you have tools. But what about communities or languages that don’t have these resources? That gave me even more motivation to do a lot of research in this space.”
During his Ph.D., Tonja spent some time as a visiting student at the University of Colorado, Colorado Springs , USA, and worked closely with MBZUAI’s Professor Thamar Solorio – then a professor at the University of Houston. Together, they published a paper “NLP progress in Indigenous Latin American languages”, focusing on the marginalization of indigenous language communities in the face of rapid technological advancements.
Such was the impact that Solorio and her work made on him, when she moved to MBZUAI in 2023, Tonja followed close behind.
“I was really interested in her research experience and believed she would guide me on my own research path. So, I joined MBZUAI as a visiting student in 2024.
“During that time, we published “The Zeno’s Paradox of Low-Resource Languages” paper at EMNLP 2024 that won an outstanding paper award. We collaborated with Professor Monojit Choudhury and Hellina Nigatu, one of his visiting students from UC Berkeley, as well as Benjamin Rosman of the University of the Witwatersrand in South Africa. That paper really shaped how we think about low-resource language itself.
“It was then that I decided to come back to MBZUAI after my Ph.D.”
In Abu Dhabi, Tonja found not just a place to continue his research, but an environment that would fundamentally shape how he approached it.
“MBZUAI is an amazing place for researchers to work on different problems,” he says. “You have access to experienced professors, to students and researchers you can collaborate with, and to the resources you need. You don’t have to worry about anything except the research itself.”
That freedom to focus fully on his work proved transformative. Surrounded by a global community of researchers and supported by extensive resources, Tonja was able to move beyond the constraints that can stifle early-stage research, and engage with problems at a more ambitious scale.
“I went back to MBZUAI not just because of the project but because of the environment. I mainly worked with Professor Thamar, but I had many advisors who directly or indirectly advised me how to do things, where to go, what to do. Because after your Ph.D., you start questioning what to do next, so you need that guidance – you need people with experience to help you.
“So, that was very important and it basically shaped my career path, as well as the way that I think about research, collaboration, networking, and building a team.”
One of the pieces of research from his time at MBZUAI that Tonja is most proud of is Afri-MCQA – a multimodal benchmark designed to test how well AI systems understand African cultures across both language and context. The dataset brings together more than 7,500 question–answer pairs, spanning 15 African languages across 12 countries – covering topics from food and clothing to traditions, ceremonies, and everyday life.
“We know that LLMs are very good at general knowledge, but they have a huge gap when it comes to certain countries or communities,” he says. “We wanted to see if they can actually reason or have knowledge about cultural content when it comes to Africa.”
The benchmark is multilingual and multimodal, combining text, speech, and images, with parallel questions in both English and African languages. This allows researchers to evaluate both whether models recognize cultural concepts, and whether they can do so across different linguistic contexts.
“We created questions about things such as food, traditions, history, and daily life,” he continues. “If the model doesn’t know about that specific thing, how are we going to trust it when it is used in real-world applications such as education or healthcare?”
The research found that despite their sophistication, many of today’s most advanced models struggle to capture the cultural diversity of the African continent.
“That really shows where the gap is, and how much work still needs to be done,” Tonja adds.
Days after his arrival in London, he learned that the research paper – authored with Solorio, as well as Assistant Professor of Natural Language Processing at MBZUAI, Alham Fikri Aji, and others – was accepted at leading NLP conference, ACL 2026, which will be held in San Diego later this year.
“It is actually one of the top 5% of papers to be accepted at the conference, based on Area Chair ratings, which I’m really proud of,” he says. “It also clearly shows the depth of this work and its importance.”
As he settles into life in London and the routine of his three-year fellowship, Tonja explains that he is excited for what comes next in his career. “It’s a new environment with new people and new ways of working, which I hope will bring some new ideas and perspectives,” he says.
“The most important thing is that I get to continue in the research domain that I was working on at MBZUAI. There’s still so much to do, and so many gaps to fill – I’m looking forward to taking things further.”
MBZUAI’s Nanda models for Hindi and English show that effective multilingual AI depends on cultural and linguistic.....
MBZUAI research shows how language models encode cultural knowledge — and how unevenly they express it across.....
Research from MBZUAI and Melbourne offers new metrics and training approaches that aim to better align AI.....