Highlighting LLM safety: How the Libra-Leaderboard is making AI more responsible

Tuesday, February 18, 2025
Haonan Li of MBZUAI startup LibrAI gives a demonstration of the Libra Leaderboard.

MBZUAI-based safety monitoring start-up LibrAI has taken another step towards ensuring safe, responsible and ethical AI with the launch of its Libra-Leaderboard: an evaluation framework for large language models (LLMs) that seeks to close the gap between capability and safety. 

Founded in 2023 with support from the MBZUAI Incubation and Entrepreneurship Center (MIEC), LibrAI is a platform that allows developers to test their AI models in a secure environment to evaluate and enhance safety, leading to more responsible development and deployment of AI. 

The new leaderboard offers a comprehensive assessment of 26 mainstream LLMs including models such as Claude, ChatGPT and Gemini; using 57 datasets — most of them from 2023 onwards — to assign each model a score based on their capability and safety in a bid to guide future AI development.

“Most people are focused on AI’s capabilities – can it communicate in many languages, can it reason, pass a math exam and so on. Our motivation was driven by the need to ensure AI can develop safely; as it’s developed quickly, we don’t lose sight of safety.” says Haonan Li, postdoctoral research fellow at MBZUAI and chief technology officer at LibrAI. 

The LibrAI team has created a balance-encouraging scoring system for its new leaderboard. It evaluates categories such as bias, misinformation and oversensitivity to benign prompts, providing a holistic view of a model’s reliability. The leaderboard then gives each model a ‘capability score’, ‘safety score’, and an ‘overall score’. 

“The overall score is not simply the average of the capability score and safety score,” explains Li. “It penalizes discrepancies between capability and safety scores, ensuring that a higher overall score reflects models where capability and safety are closely aligned. In our view, reducing the gap between capability and safety is what models should be trying to achieve.” 

Education and evaluation 

LibrAI also launched the Interactive Safety Arena to complement the Libra-Leaderboard. The Arena is a platform designed to engage the public and educate them on AI safety. It allows users to test AI models with adversarial prompts, receive tutoring, and provide feedback, raising awareness about potential risks in LLMs. 

“The Arena provides an interactive, tutorial-like experience designed to educate people about AI safety,” says Li. “For example, one can enter some risky prompts and send it to two anonymous models. The Arena generates a comparison of the two models’ output. The user can choose which model is safer and more helpful. The scoring of the models is reflected in the leaderboard.” 

Looking ahead, the LibrAI team plan to make regular updates to the leaderboard with new datasets and evaluation criteria to address emerging vulnerabilities in AI systems, as well as gamifying aspects of the Arena to engage more people.

But that is merely the tip of the iceberg, as the team is working on a flagship AI evaluation product that they hope will set a new standard in AI safety.  

“We are creating an evaluator platform that helps organizations pre-emptively assess AI systems for alignment, reliability and ethical compliance. We believe it will be pivotal in equipping industries with the tools they need to safely and responsibly harness AI potential.” 

Related

thumbnail
Monday, February 17, 2025

LLMs 101: Large language models explained

LLMs are a staple of AI, but what exactly are they? Our 101 guide breaks down the.....

  1. natural language processing ,
  2. open source ,
  3. nlp ,
  4. large language models ,
  5. llm ,
  6. llms ,
  7. LLM360 ,
  8. tokens ,
Read More
thumbnail
Thursday, February 13, 2025

Six predictions for how AI will evolve in 2025

MBZUAI Provost and Professor of NLP, Tim Baldwin, looks at the AI innovations, advances and challenges we.....

  1. provost ,
  2. agentic ,
  3. predictions ,
  4. Tim Baldwin ,
  5. embodied AI ,
  6. foundational models ,
  7. artificial intelligence ,
  8. innovation ,
  9. university ,
Read More
thumbnail
Monday, January 27, 2025

Alumni Spotlight: In pursuit of truth

Having developed AI tools to fight misinformation and disinformation, MBZUAI alumnus Zain Mujahid has now turned his.....

  1. llms ,
  2. alumni ,
  3. nlp ,
  4. research ,
Read More