Home / News / Foundations of Multisensory Artificial Intelligence

Foundations of Multisensory Artificial Intelligence

Wednesday, January 24, 2024

Building multisensory AI systems that can learn from many sensory inputs such as text, speech, audio, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents.

In this talk, I will discuss my research on the machine learning foundations of multisensory intelligence, as well as practical methods in building multisensory foundation models for many modalities and tasks. In the first half, I will present a new theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets and design principled approaches to learn these interactions. In the second part, I will present cross-modal attention and multimodal transformer architectures which now underpin many of today’s multimodal foundation models. Finally, I will discuss our collaborative efforts in applying multisensory AI for real-world impact: (1) aiding mental health practitioners by predicting daily mood fluctuations in patients using multimodal smartphone data, (2) supporting doctors in cancer prognosis using pathology images and multiomics data, and (3) enabling robust control of physical robots using cameras and touch sensors.

Speaker/s

Paul Liang is a Ph.D. student in Machine Learning at CMU, advised by Louis-Philippe Morency and Ruslan Salakhutdinov. He studies the machine learning foundations of multisensory intelligence in order to design practical AI systems that can integrate, learn from, and interact with a diverse range of real-world sensory modalities. His work has been applied in affective computing, mental health, pathology, and robotics. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper/honorable mention awards at ICMI and NeurIPS workshops. Outside of research, he loves teaching and advising, and received the Alan J. Perlis Graduate Student Teaching Award for instructing courses and tutorials on multimodal machine learning, and advising students around the world in directed research.

Monday, February 24, 2025

Foundations of Multisensory Artificial Intelligence

Speaker/s

Related

Formal Methods for Modern Payment Protocols

Polygenic Score Modeling to Investigate Genotype-Phenotype Associations

Trustworthy Machine Learning: Transparency, Collaboration, and Evaluation

Foundations of Multisensory Artificial Intelligence

Speaker/s

Related

Formal Methods for Modern Payment Protocols

Polygenic Score Modeling to Investigate Genotype-Phenotype Associations

Trustworthy Machine Learning: Transparency, Collaboration, and Evaluation

Subscribe to The Node