Latent Space Exploration for Safe and Trustworthy AI Models

Wednesday, August 21, 2024

The recent performance advancement in deep neural network models has substantially increased their dissemination to vast application areas. Given this widespread adoption, ensuring the safety and trustworthiness of AI models is more critical than ever before. A standard way for assessing AI model performance involves extrinsic evaluation across a set of downstream tasks. While effective for advancing state-of-the-art, these evaluations offer limited insights into how models learn and solve a task. In this talk, I advocate for a deeper exploration of model internals, particularly their latent space, to fully test their capabilities, to build better models and to increase trust in them. I will present a few use cases to support this stance. For instance, the intrinsic dimensionality trend of models explains the robustness-generalization tradeoff during adversarial training, informing the design of robust and scalable adversarial methods without compromising generalization. Moreover, the study of the structure and representation of knowledge within latent space is effective in evaluating the language comprehension capabilities of models and enables interpretation of their predictions.

 

Post Talk Link:  Click Here

Passcode: y+d9v@h8

Speaker/s

Hassan Sajjad is an Associate Professor in the Faculty of Computer Science at Dalhousie University, Canada, and the director of the HyperMatrix lab. His research focuses on natural language processing (NLP) and safe and trustworthy AI, particularly text generation, robustness, generalization, alignment, interpretation, and explainability of NLP models. His research work is recognized at several prestigious venues such as NeurIPS, ICLR, and ACL and is featured in prominent tech blogs including MIT News. Dr. Sajjad regularly serves as an area chair and reviewer at various machine learning and computational linguistics conferences and journals. He was a tutorial chair at EMNLP 2023, gave a tutorial on interpretability at NAACL 2021, co-organized the BlackboxNLP workshop 2020/21, and the shared task on MT Robustness 2019/20.

Related