On 18 December 1973, the United Nations adopted Arabic as its sixth official language, joining Chinese, English, French, Russian and Spanish — a sextet that remains as the body’s linguistic core to this day.
Since 2012, 18 December has been recognized as World Arabic Language Day — a day on which we reflect on the depth and diversity of this remarkable language that has been the lynchpin of cultures and civilizations for more than 2,000 years, and the cornerstone of identity for more than 400 million Arabic speakers worldwide.
This year, the theme of World Arabic Language Day is Arabic Language and AI – Advancing Innovation While Preserving Cultural Heritage; a topic that is at the core of MBZUAI’s approach.
Among its various Arabic language initiatives, the University is developing advanced linguistic models that use deep learning techniques to improve machine understanding of Arabic. These models aim to process and understand Arabic texts more effectively, contributing to smart applications capable of instant translation, voice assistance, and handwritten text recognition.
One notable example is Jais — the world’s most advanced Arabic large language model (LLM), which was developed in a collaboration between MBZUAI and G42’s Inception and launched in 2023. The 13-billion parameter model was trained on a newly developed 395-billion-token Arabic and English dataset, helping it to outperform existing Arabic models by a sizeable margin, as well as being competitive with English models of similar size. As an open-source model, it aims to engage the scientific, academic, and developer communities to accelerate the growth of a vibrant Arabic language AI ecosystem.
Another recent example is the Atlas-Chat model, which excels in understanding and responding to the Moroccan dialect. The model achieved 13% better performance than larger models and surpassed systems such as LLaMA, GESS, and IceGPT in terms of text comprehension and rephrasing instructions in Moroccan Arabic.
Faculty and students at the University are also including Arabic in their practical AI solutions, such as BiMediX2 — a healthcare multi-modal model that is capable of understanding and responding to medical queries in both English and Arabic, as well as interpreting and summarizing medical images such as X-rays, MRIs, and CT scans. The model was named by Meta as one of the winners of its inaugural Llama Impact Innovation Awards, winning acclaim for its potential to solve healthcare accessibility challenges across the Middle East, Africa, and beyond.
MBZUAI also promotes scientific research targeting Arabic-specific linguistic challenges, such as dialects and complex syntactic structures, through postgraduate programs and collaborative research initiatives with other academic institutions. And the University actively engages with the wider AI community to raise awareness about the importance of AI for the Arabic language, hosting workshops, seminars, and partnerships.
Research and innovations such as these are vital for the Arabic language, which is on the wrong side of the digital divide. A report from the International Telecommunication Union showed that just 3% of content online is in Arabic, highlighting the need for greater inclusion and accessibility for Arabic speakers, who represent about 5.2% of global internet users.
The root of this problem could be in the diversity of the language’s dialects, its grammatical structure, and the coexistence of classical, modern standard and colloquial forms. But there are also technical difficulties, such as the ability of computers to faithfully represent the complex script, and the lack of open and accessible Arabic language resources. And then there are socio-economic factors, including the fact that the global technology scene has largely been dominated by Western nations and East Asia.
There is no small irony in the fact that the Arabic language helped to lay the foundations for today’s technology. Al-Kwarizmi, for example, introduced the concepts of algorithms and algebra in the ninth century, which are the bedrock of modern computer science and AI. Ibn Al-Haytham then pioneered optics and scientific methodology between the 10th and 11th centuries, influencing the development of lenses, cameras, and modern computational imaging.
Elsewhere, Al-Farabi contributed to early concepts of automation and artificial intelligence through his studies in mechanics and logic during the 9th and tenth centuries, and Al-Jazari designed water clocks, automata and innovative machines in the 12th century that are considered the early precursors to robotics.
Whatever happened in the years between then and now, AI has the potential to be of service to the Arabic language today and help it become a leading language of innovation and discovery once again. It is already doing this in several ways, from translation services to content generation, and Arabic voice recognition to tools capable of recognizing and analyzing Arabic text in images and videos.
Add these applications to the power of MBZUAI’s Arabic LLMs, and you have a recipe for integrating technology with Arab cultural heritage, helping Arabic gain wider global recognition and thrive in the digital age.
Martin Takáč and Zangir Iklassov's 'self-guided exploration' significantly improves LLM performance in solving combinatorial problems.
A team from MBZUAI is improving LLMs' performance across languages by helping them find the nuances of.....
The energy requirements of artificial Intelligence are increasing, so how is MBZUAI helping to lead the charge.....
Read More