In the past few years, artificial intelligence programs like chatbots and voice assistants that are driven by innovations in natural language processing have assisted users with tasks both large and small. As impressive as these tools may be, they also have significant limitations, which are particularly evident for people who speak languages other than English.
As thousands descended on Singapore last week for the Conference on Empirical Methods in Natural Language Processing, one researcher shared several studies that propose ways scientists can design natural language processing applications that benefit wider swathes of humanity.
There are far-reaching implications for not doing so.
“Technologies are becoming very pervasive in society and progress is happening quickly,” said Muhammad Abdul-Mageed, visiting associate professor of natural language processing and machine learning at MBZUAI. “If we keep developing applications in English and a few other languages while ignoring thousands of other languages and dialects, we will only widen the already existing gaps that exist in the world today.”
At the conference, known as EMNLP, Abdul-Mageed and colleagues shared three studies on how machines process and generate Arabic. A fourth paper described how large language models can account for how meaning is produced by the combination of language and the social context in which it is used. They will also present eight papers at the Arabic Natural Language Processing Conference (ArabicNLP), which is co-located with EMNLP in Singapore. This is the first ever conference of ArabicNLP.
One study evaluates the performance in Arabic of ChatGPT, a widely used chatbot developed by OpenAI that received a huge amount of publicity upon its launch a year ago, and GPT-4, a large language model also developed by OpenAI.
While previous studies have examined the applications’ performance in English, their ability to process other languages hasn’t been widely assessed. Indeed, Abdul-Mageed and his colleagues’ investigation is the first to look at ChatGPT’s and GPT-4’s capacity to process varieties of Arabic, including modern standard Arabic (MSA) and several regional dialects.
“Arabic dialects vary at all linguistic levels, such as in morphology and syntax,” Abdul-Mageed said. “And there are times that Arabic speakers may want to use standard Arabic and others where they want to use dialectical Arabic.” The movement between these different kinds of Arabic is a common occurrence for millions of Arabic speakers but can make the analysis of the language difficult for machines.
The researchers conclude that while the OpenAI applications work well in English, it’s unlikely that they would provide utility for speakers of languages like Arabic that are “characterized by extensive linguistic variations” across dialects. That said, GPT-4, the more advanced program, performed better than the GPT-3.5 version of ChatGPT on several tasks.
To measure the performance of natural language processing applications, researchers must establish what are known as benchmarks, which provide a way to evaluate how the models perform on different tasks.
Benchmarks also “can facilitate reproducibility and promote transparency across different studies, acting as a catalyst for advancement in the field,” Abdul-Mageed and his team write in another study that will be presented at EMNLP.
Their framework, called Dolphin provides benchmarks for classical Arabic, MSA and dialectical Arabic, including Egyptian, Jordanian and Palestinian. All the datasets that the team use in the study are publicly available.
While much of Abdul-Mageed’s work focuses on Arabic, his interests and commitments extend beyond it.
He has also co-authored a study to be presented at EMNLP that proposes benchmarks for large language models for 64 languages, focusing on what is known as sociopragmatic understanding. In short, the idea of sociopragmatics is that social context affects, and in some cases even determines, meaning.
When we communicate with other people — whether it is in speech, on social media or through a messaging application — meaning is a product of more than simply the words we use: “It also depends on who says what to whom, when, why and how,” Abdul-Mageed said. “Age, social class, educational background and many other variables govern how we communicate with language and how it is perceived.”
Indeed, the meaning of identical statements can shift depending on the context in which the statements are made, and these meanings are not stable across cultures.
In their study, the researchers find that several large language models, including ChatGPT, “struggle to understand” sociopragmatic meaning across languages. Their framework, called SPARROW, can be used to evaluate the ability of large language models to identify sociopragmatic meaning, with the goal of helping to improve this capability.
Though there is clearly a lot of work to be done to develop machines that can provide benefits to speakers of the thousands of languages across the globe, Abdul-Mageed says that he is “optimistic about our ability to work in collaboration beyond country borders and across borders.”
“Can we use technologies to educate people in better ways?” he asks. “If we can get to a point where computers are improving our health and wellbeing, then we are doing something that is in service to humanity.”
From optimal decision making to neural networks, we look at the basics of machine learning and how.....
MBZUAI research shows how a better understanding of the relationships between variables can benefit fundamental scientific research.
The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....