Alham Fikri Aji is assistant professor of natural language processing at Mohamed Bin Zayed University of Artificial Intelligence and is working to develop natural language processing (NLP) systems that are efficient and capable in a wide variety of languages.
Aji describes his research as progressing in two related directions that both focus on “low-resource” NLP. One direction relates to developing energy-efficient NLP models that are compatible with scarce computational resources. The other direction relates to focusing efforts on languages that currently have limited datasets when compared to languages like English and Chinese, which have historically been the focus of NLP researchers.
“Many languages are underrepresented in the NLP community, and for those languages, we work on improving data collection, performance, and benchmarking to determine how AI technologies handle those languages, along with related cultures and values,” Aji said. “However, communities and researchers working on these low-resource languages often also face restrictions in terms of compute. They do not necessarily have the computational resources needed to power NLP technologies.”
Aji has recently authored a study that proposes an approach to developing natural language processing tools that are less resource intensive than large-language models like OpenAI’s ChatGPT. The research is being presented at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL), which begins on March 17 in Malta.
Other scientists who contributed to the study are Minghao Wu of Monash University and MBZUAI, Abdul Waheed of MBZUAI and Chiyu Zhang and Muhammad Abdul-Mageed, both of the University of British Columbia and MBZUAI. Code and data sets for the study can be found on GitHub.
From large to small
In the study to be presented at EACL, Aji and his co-authors pursued an approach called knowledge distillation, which is when information from a larger, more complex model, called the “teacher,” is transferred to a smaller, more efficient model, called the “student.” The goal of this approach is to enable the student model to achieve performance that is similar to the teacher model while being more resource-efficient in terms of memory, computational power and energy consumption.
“We wanted to explore how to do knowledge distillation and bring that into smaller models,” Aji said.
The researchers asked ChatGPT to create what is called synthetic instruction data, which are essentially prompts generated by a language model that can also be used to generate responses from them. “Instruction data could be asking a model to generate a recipe or creating a travel plan for a trip to Spain,” Aji explained.
To diversify the instructions that were generated, the researchers created a program that pulled topics at random from Wikipedia and fed them along with the request for instructions to ChatGPT. They then asked ChatGPT to provide responses to these questions. (The researchers used ChatGPT version gpt-3.5-turbo in the study.)
The result was a data set — which the researchers call LaMini — that consists of more than 2.5 million examples, making it the largest in the world. Then through a process called instruction fine tuning, the researchers used the LaMini data set to further train several language models of varying sizes. This technique involves training a model on instructions paired with desired outputs, with the goal of refining the model’s performance so that it can generate more accurate, relevant and context-appropriate responses.
The models the researchers trained included versions of Meta’s LLaMA, Hugging Face’s T5, and ChatGPT. The models ranged in size from 61 million parameters to 7 billion parameters. Collectively they call these fine-tuned models LaMini-LM.
The researchers tested the performance of these models against 15 NLP benchmarks and also with human evaluators. Several of the LaMini-LM models demonstrated comparable performance to larger, standard models while being significantly more resource-efficient. Notably, the LaMini-LM LLaMA model dramatically outperformed the standard LLaMA model.
The researchers found that the instruction data set they developed and used to train models had a significant impact on the models’ performance.
What’s more, the researchers were able to show that their smaller models displayed similar performance to larger LLMs, such as Alpaca 7B, when their performance was evaluated by people.
“The main finding is that our models are approximately ten times smaller than Alpaca 7B but show a similar level of performance,” Aji said.
Cut-to-fit language models
Though innovation moves quickly in the field of natural language processing, and the version of ChatGPT that the researchers used for the study is no longer state-of-the-art, Aji said that the fundamental technique of using instruction tuning to improve the performance of a language model is still relevant, and the base models that were used could be updated with newer versions.
Is the future of natural language processing large models that are general or small models that are specific to certain tasks?
Aji doesn’t think there is one path that the industry will pursue — nor does there need to be.
“It’s not always about the size of a model and it’s good to see that there is active research on making things efficient as well,” he said. “People will consider how to make models that are general while making them efficient. But also if you want to build a model for a specific task, you really don’t need a large model.”
Overall, the research illuminates the path forward for developing efficient yet powerful language models, offering insights into the value of knowledge distillation and the potential for more sustainable AI development.
The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....
Martin Takáč and Zangir Iklassov's 'self-guided exploration' significantly improves LLM performance in solving combinatorial problems.
A team from MBZUAI is improving LLMs' performance across languages by helping them find the nuances of.....