MBZUAI Workshop 2025

Wednesday, February 12, 2025 - Thursday, February 13, 2025

start-time 9:00 am - 4:10 pm
end-time Fondation François Sommer, 62 Rue des Archives, France, Paris
ADD TO CALENDAR
Home
Speaker List
Program Day 1
Program Day 2

Workshop Goals

We are pleased to announce the MBZUAI Workshop 2025 on "Foundations and Advances in Generative AI": Theory and Methods, organized by the Machine Learning Department of Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in collaboration with the MBZUAI France Lab. Following the first event successfully held in Abu Dhabi, this workshop aims to foster collaboration and accelerate progress in the machine learning aspects of large language models (LLMs), multimodal model research, and foundations of Generative AI.

Scheduled for February 12–13, the program includes:

  • Invited Talks: Two days of presentations by leading experts on cutting-edge machine learning and generative AI advancements.
  • Poster Session: A platform for researchers to showcase their work.
  • Join us for insightful discussions and networking opportunities in this rapidly evolving field!

Registration

To participate in the workshop, please complete the registration form. If you intend to present a poster, kindly ensure you fill out the appropriate fields before February 3, 2025, so we can review it.


Areas of Focus

We invite contributions on topics broadly related to machine learning for large models. Key areas of focus will include:

  • Development of novel ML architectures for large model training and inference.
  • Addressing bias and fairness in large model training data and output.
  • Techniques for interpretability and explainability of large model behavior.
  • Representation learning for large multimodal models.
  • Mitigating risks and ensuring safety in large model development.
  • Scaling and efficiency considerations for large-scale model training.
  • Application domains including bio/medical and others.

Organizers

Organizing committee:

Logistics support:


Directions

GitHub Page


Invited Speakers

 

Program on Wednesday, February 12

9:00 AM
Registration and Coffee & Tea!
9:30 AM
Opening Remarks
Eric Xing (MBZUAI & Carnegie Mellon University)
10:10 AM
Statistical Methods for Assessing the Factual Accuracy of Large Language Models
Emmanuel Candès (Stanford University)

We present new statistical methods for obtaining validity guarantees on the output of large language models (LLMs). These methods enhance conformal prediction techniques to filter out claims/remove hallucinations while providing a finite-sample guarantee on the error rate of what it being presented to the user. This error rate is adaptive in the sense that it depends on the prompt to preserve the utility of the output by not removing too many claims. We demonstrate performance on real-world examples. This is joint work with John Cherian and Isaac Gibbs.
10:50 AM
Coffee & Tea Break
11:00 AM
The ChatGLM's Road to AGI
Jie Tang (Tsinghua University)

The size of language models plays a critical role in their ability to address complex tasks in NLP. However such big LMs can be hard to deploy on edge devices which leads to the need of compressing LLMs. Recent studies have shown that compressing pretrained models can significantly influence the way they deal with various biases, such as biases related to fairness and model calibration. In this talk, I will provide an overview of recent research conducted at the ERIC Lab as part of the DIKé project. In particular, We will see how important quantization can lead to calibration errors and alter the model's confidence in its predictions. Additionally, I will discuss ongoing work on the alignment of LLMs with moral values.
11:40 AM
Exploiting Knowledge for Model-based Deep Music Generation
Gaël Richard (Télécom Paris)

We will describe and illustrate the concept of hybrid (or model-based) deep learning for music generation. This paradigm refers here to models that associates data-driven and model-based approaches in a joint framework by integrating our prior knowledge about the data in more controllable deep models. In the music domain, prior knowledge can relate for instance to the production or propagation of sound (using an acoustic or physical model) or how music is composed or structured (using a musicological model). In this presentation, we will first illustrate the concept and potential of such model-based deep learning approaches and then describe in more details its application to unsupervised music separation with source production models, music timbre transfer with diffusion and symbolic music generation with transformers using structured informed positional encoding.
12:20 PM
Auditing and Mitigating Biases in (compressed) Language Models
Julien Velcin (University of Lyon)

Large language models have substantially advanced the state of the art in various AI tasks, such as natural language understanding and text generation, and image processing, multimodal modeling. In this talk, we will first introduce the development of AI in the past decades, in particular from the angle of China. We will also talk about the opportunities, challenges, and risks of AGI in the future. In the second part of the talk, we will use ChatGLM, an alternative but open sourced model to ChatGPT, as an example to explain our understandings and insights derived during the implementation of the model.
13:00 PM
Lunch
14:00 PM
Intricacies of Game-theoretical LLM Alignment
Michal Valko (INRIA & Stealth Startup)

Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD. @This equivalence may seem surprising at first sight, since IPO is an offline method whereas Nash-MD is an online method using a preference model. However, this equivalence can be proven when we consider the online version of IPO, that is when both generations are sampled by the online policy and annotated by a trained preference model. Optimising the IPO loss with such a stream of data becomes equivalent to finding the Nash equilibrium of the preference model through self-play. Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm. We compare online-IPO and IPO-MD to different online versions of existing losses on preference data such as DPO and SLiC on a summarisation task.
14:40 PM
Moshi: A Speech-text Foundation Model for Real-time Dialogue
Alexandre Défossez (Kyutai)

We will discuss Moshi, our recently released model. Moshi is capable of full-duplex dialogue, e.g. it can both speak and listen at any time, offering the most natural speech interaction to date. Besides, Moshi is also multimodal, in particular it is able to leverage its inner text monologue to improve the quality of its generation. We will cover the design choices behind Moshi in particular the efficient joint sequence modeling permitted by RQ-Transformer, and the use of large scale synthetic instruct data.
15:20 PM
Coffee & Tea Break
15:30 PM
Feature-Conditioned Graph Generation using Latent Diffusion Models
Giannis Nikolentzos (University of Peloponnese)

Graph generation has emerged as a crucial task in machine learning, with significant challenges in generating graphs that accurately reflect specific properties. In this talk, I will present Neural Graph Generator, our recently released model which utilizes conditioned latent diffusion models for graph generation. The model employs a variational graph autoencoder for graph compression and a diffusion process in the latent vector space, guided by vectors summarizing graph statistics. Overall, this work represents a shift in graph generation methodologies, offering a more practical and efficient solution for generating diverse graphs with specific characteristics.
16:10 PM
Redefining AI Reasoning: From Self-Guided Exploration to Causal Loops, and Transformer-GNN Fusion
Martin Takáč (MBZUAI)

In this talk, we explore three intertwined directions that collectively redefine how AI systems reason about complex tasks. First, we introduce Self-Guided Exploration (SGE), a prompting strategy that enables Large Language Models (LLMs) to autonomously generate multiple “thought trajectories” for solving combinatorial problems. Through iterative decomposition and refinement, SGE delivers significant performance gains on NP-hard tasks—showcasing LLMs’ untapped potential in reasoning, logistics and resource management problems. Next, we delve into the Self-Referencing Causal Cycle (ReCall), a mechanism that sheds new light on LLMs’ ability to recall prior context from future tokens. Contrary to the common belief that unidirectional token generation fundamentally restricts memory, ReCall illustrates how “cycle tokens” create loops in the training data, enabling models to overcome the notorious “reversal curse.” Finally, we present a Transformer-GNN fusion architecture that addresses Transformers’ limitations in processing graph-structured data.
18:00 PM
Poster Session with Buffet at MBZUAI France Lab
To present a poster, please fill out the Google form for review.

Workshop participants are invited to join the poster session at MBZUAI France Lab.
Address: 42 Rue Notre Dame des Victoires, 75002 Paris

Program on Thursday, February 13

09:00 AM
Registration and Coffee & Tea!
09:30 AM
From Diffusion Models to Schrödinger Bridges
Valentin De Bortoli (Google DeepMind (on leave CNRS))

Diffusion models have revolutionized generative modeling. Conceptually, these methods define a transport mechanism from a noise distribution to a data distribution. Recent advancements have extended this framework to define transport maps between arbitrary distributions, significantly expanding the potential for unpaired data translation. However, existing methods often fail to approximate optimal transport maps, which are theoretically known to possess advantageous properties. In this talk, we will show how one can modify current methodologies to compute Schrödinger bridges—an entropy-regularized variant of dynamic optimal transport. We will demonstrate this methodology on a variety of unpaired data translation tasks.
10:10 AM
Multi-modal Foundation Models for Biology
Thomas Pierrot (InstaDeep)

The human genome sequence provides the underlying code for human biology. Since the sequencing of the human genome 20 years ago, a main challenge in genomics has been the prediction of molecular phenotypes from DNA sequences alone. Models that can “read” the genome of each individual and predict the different regulatory layers and cellular processes hold the promise to better understand, prevent and treat diseases. Here, we introduce the Nucleotide Transformer (NT), an initiative to build robust and general DNA foundation models that learn the languages of genomic sequences and molecular phenotypes. NT models, ranging from 100M to 2.5B parameters, learn transferable, context-specific representations of nucleotide sequences, and can be fine-tuned at low cost to solve a variety of genomics applications. In this talk, we will share insights about how to construct robust foundation models to encode genomic sequences and how to validate them. We will also present recent advancements of our group including a study of the performance of such models on protein tasks as well as our ongoing progress towards more general genomics AI agents that integrate different modalities and have improved transfer capabilities. The training and application of such foundational models in genomics can provide a widely applicable stepping stone to bridge the gap of accurate predictions from DNA sequence and makes a step towards building a virtual cell.
10:50 AM
Coffee & Tea Break
11:00 AM
Towards the Alignment of Geometric and Text Latent Spaces
Maks Ovsjanikov (Google DeepMind & École Polytechnique)

Recent works have shown that, when trained at scale, uni-modal 2D vision and text encoders converge to learned features that share remarkable structural properties, despite arising from different representations. However, the role of 3D encoders with respect to other modalities remains unexplored. Furthermore, existing 3D foundation models that leverage large datasets are typically trained with explicit alignment objectives with respect to frozen encoders from other representations. In this talk I will discuss some results on the alignment of representations obtained from uni-modal 3D encoders compared to text-based feature spaces. Specifically, I will show that it is possible to extract subspaces of the learned feature spaces that have common structure between geometry and text. This alignment also leads to improvement in downstream tasks, such as zero shot retrieval. Overall, this work helps to highlight both the shared and unique properties of 3D data compared to other representations.
11:40 AM
A Primer on Physics-informed Machine Learning
Gérard Biau (Sorbonne University)

Physics-informed machine learning typically integrates physical priors into the learning process by minimizing a loss function that includes both a data-driven term and a partial differential equation (PDE) regularization. Building on the formulation of the problem as a kernel regression task, we use Fourier methods to approximate the associated kernel, and propose a tractable estimator that minimizes the physics-informed risk function. We refer to this approach as physics-informed kernel learning (PIKL). This framework provides theoretical guarantees, enabling the quantification of the physical prior’s impact on convergence speed. We demonstrate the numerical performance of the PIKL estimator through simulations, both in the context of hybrid modeling and in solving PDEs. Additionally, we identify cases where PIKL surpasses traditional PDE solvers, particularly in scenarios with noisy boundary conditions. Joint work with Francis Bach (Inria, ENS), Claire Boyer (Université Paris-Saclay), and Nathan Doumèche (Sorbonne Université, EDF R&D).
12:20 PM
GFlowNets: A Novel Framework for Diverse Generation in Combinatorial and Continuous Spaces
Salem Lahlou (MBZUAI)

Generative Flow Networks offer a framework for sampling from reward-proportional distributions in combinatorial and continuous spaces. They provide an alternative to established methods such as MCMC that suffer from slow mixing in high-dimensional spaces. By leveraging flow conservation principles, GFlowNets enable exploration in scenarios where the diversity of solutions is crucial, differing from traditional reinforcement learning and generative models. The framework has shown practical utility in molecular design, protein structure prediction, and Bayesian network discovery, particularly when dealing with noisy reward landscapes where maintaining sample diversity is essential. Recent works have also explored GFlowNets as a mechanism for improving the systematic exploration capabilities of large language models. This talk will present the theoretical foundations of GFlowNets and discuss current research directions in expanding their applications.
13:00 PM
Lunch
14:00 PM
What's not an Autoregressive LLM?
Lingpeng Kong (University of Hong Kong)

This talk explores alternatives to autoregressive Large Language Models (LLMs), with a particular focus on discrete diffusion models. The presentation covers recent advances in non-autoregressive approaches to text generation, reasoning, and planning tasks. Key developments discussed include Reparameterized Discrete Diffusion Models (RDMs), which show promising results in machine translation and error correction, and applications of discrete diffusion to complex reasoning tasks like countdown games, Sudoku, and chess. The talk also examines sequence-to-sequence text diffusion models, as well as the novel Diffusion of Thoughts (DoTs) framework for chain-of-thought reasoning. These non-autoregressive approaches demonstrate competitive performance while offering potential advantages in terms of parallel processing and flexible generation patterns compared to traditional autoregressive models.
14:40 PM
Causal Representation Learning and Generative AI
Kun Zhang (MBZUAI)

Causality is a fundamental notion in science, engineering, and even in machine learning. Uncovering the causal process behind observed data can naturally help answer 'why' and 'how' questions, inform optimal decisions, and achieve adaptive prediction. In many scenarios, observed variables (such as image pixels and questionnaire results) are often reflections of the underlying causal variables rather than being causal variables themselves. Causal representation learning aims to reveal the underlying hidden causal variables and their relations. In this talk, we show how the modularity property of causal systems makes it possible to recover the underlying causal representations from observational data with identifiability guarantees: under appropriate assumptions, the learned representations are consistent with the underlying causal process. We demonstrate how identifiable causal representation learning can naturally benefit generative AI, with image generation, image editing, and text generation as particular examples.
15:20 PM
Coffee & Tea Break
15:30 PM
Factuality Challenges in the Era of Large Language Models: Can we Keep LLMs Safe and Factual?
Preslav Nakov (MBZUAI)

We will discuss the risks, the challenges, and the opportunities that Large Language Models (LLMs) bring regarding factuality. We will then delve into our recent work on using LLMs for fact-checking, on detecting machine-generated text, and on fighting the ongoing misinformation pollution with LLMs. We will also discuss work on safeguarding LLMs, and the safety mechanisms we incorporated in Jais-chat, the world's best open Arabic-centric foundation and instruction-tuned LLM, based on our Do-Not-Answer dataset. Finally, we will present a number of LLM fact-checking tools recently developed at MBZUAI: (i) LM-Polygraph, a tool to predict an LLM's uncertainty in its output using cheap and fast uncertainty quantification techniques, (ii) Factcheck-Bench, a fine-grained evaluation benchmark and framework for fact-checking the output of LLMs, (iii) Loki, an open-source tool for fact-checking the output of LLMs, developed based on Factcheck-Bench and optimized for speed and quality, (iv) OpenFactCheck, a framework for fact-checking LLM output, for building customized fact-checking systems, and for benchmarking LLMs for factuality, and (v) LLM-DetectAIve, a tool for machine-generated text detection.
15:50 PM
Variational Diffusion Posterior Sampling with Midpoint Guidance
Yazid Janati (FX Conseil)

Diffusion models have recently shown considerable potential in solving Bayesian inverse problems when used as priors. However, sampling from the resulting denoising posterior distributions remains a challenge as it involves intractable terms. To tackle this issue, state-of-the-art approaches formulate the problem as that of sampling from a surrogate diffusion model targeting the posterior and decompose its scores into two terms: the prior score and an intractable guidance term. While the former is replaced by the pre-trained score of the considered diffusion model, the guidance term has to be estimated. In this paper, we propose a novel approach that utilises a decomposition of the transitions which, in contrast to previous methods, allows a trade-off between the complexity of the intractable guidance term and that of the prior transitions. We also show how the proposed algorithm can be extended to handle the sampling of arbitrary unnormalised densities. We validate the proposed approach through extensive experiments on linear and nonlinear inverse problems, including challenging cases with latent diffusion models as priors.
16:10 PM
Demonstration-Regularized RL and RLHF
Daniil Tiapkin (École Polytechnique)

Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity, such as supervised fine-tuning data in the reinforcement learning from human feedback (RLHF) pipeline. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using N expert demonstrations enables the identification of an optimal policy at a sample complexity of order O(Poly(dim)/(ε^2 N)) in finite and linear MDPs, where ε is the target precision and dim is a problem dimensionality: number. Finally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works.


Venue

Fondation François Sommer, 62 Rue des Archives, France

Paris, France

Submit your query below