Knowledge distillation and the greening of LLMs

Tuesday, May 02, 2023

Large language models (LLM) have burst onto the world stage with the introduction of Open AI’s Generative Pre-trained Model (GPT), Google’s Language Model for Dialogue Applications (LaMDA), and many others. These models take an impossible amount of data — the entire internet, for example — and crush it into an algorithm that informs a decision-making engine that can power interfaces, answer questions, generate high-quality content, and quite a bit else.

These powerful systems have captured global public imagination. They have also swallowed up city-sized helpings of electricity, water, and money, in the process of training and use. Through the use of instruction tuning and knowledge distillation, a team of researchers at MBZUAI, The University of British Columbia, and Monash University, are working to drastically cut back on the electricity, water, and money required to train and use them, and in the process, deliver on the promise of LLMs. Not only that, the team aims to make LLMs far more secure.

A number of high-profile organizations have recently featured in the news because they compromised the security and privacy of their data by uploading it to an LLM. Associate Professor of Natural Language Processing Alham Fikri Aji and Visiting Associate Professor of Natural Language Processing and Machine Learning Muhammad Abdul-Mageed, along with team members Mighao Wu, Abdul Waheed, and Chiyu Zhang, have created LaMini-LM — a collection of language models that they want to deploy in resource-deprived scenarios such as consumer laptops and mobile devices. This would virtually eliminate security concerns while allowing institutions of all sizes to deploy the power of an LLM on their home network or devices in a relatively efficient manner.

“LaMini-LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions,” Aji said. “We explore different model architectures and sizes, and extensively evaluate their performance across various NLP benchmarks and through human evaluation.”

The team developed LaMini-LM by distilling knowledge from ChatGPT, similar to how a teacher passes on a condensed version of their knowledge to students. They asked ChatGPT questions, received answers, and used those answers to train the LaMini models. Despite taking much less time to train, these smaller LaMini models performed almost as well as their larger counterparts. The team propose that, instead of leaving your workforce to use cloud-based LLMs to answer questions and produce content, a solution such as LaMini-LM could be customized to fit your use case, while helping to keep your data secure.

Aji et al. have posted a paper on their work entitled: “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions.” To learn more about the team’s data; the models they have developed; and the NLP and human evaluation outcomes; visit https://mbzuai-nlp.github.io/LaMini/

Aji was quick to stress that this is the first iteration of LaMini-LM and that the team are actively working to improve the model. LaMini-LM is part of a wider initiative at the university, working with a range of global collaborators, to decarbonize LLMs, while making them more nimble and more secure — Vicuna being another prominent example.

Related

thumbnail
Monday, January 20, 2025

Highlighting LLM safety: How the Libra-Leaderboard is making AI more responsible

MBZUAI-born start-up LibrAI launched its new leaderboard to assess and evaluate LLMs and make the gap between.....

  1. startup ,
  2. entrepreneurship ,
  3. large language models ,
  4. llms ,
  5. MIEC ,
  6. IEC ,
  7. LibrAI ,
  8. Safety ,
  9. ethical ,
  10. responsible ,
Read More
thumbnail
Monday, January 20, 2025

MBZUAI welcomes the world to Abu Dhabi as COLING 2025 opens

The conference is in the Middle East for the first time, with more than 850 papers and.....

  1. faculty ,
  2. nlp ,
  3. llms ,
  4. COLING ,
  5. COLING 2025 ,
  6. organizing committee ,
  7. conference ,
  8. natural language processing ,
  9. research ,
Read More
thumbnail
Thursday, January 09, 2025

LLMs 101: Large language models explained

LLMs are a staple of AI, but what exactly are they? Our 101 guide breaks down the.....

  1. LLM360 ,
  2. natural language processing ,
  3. open source ,
  4. nlp ,
  5. large language models ,
  6. llm ,
  7. llms ,
  8. tokens ,
Read More