Structured World Models for Robots

Friday, June 07, 2024

Humans have an innate ability to construct detailed mental representations of the world from limited sensory data. These ‘world models’ are central to natural intelligence, allowing us to perceive, reason about, and act in the physical world. My research seeks to create ‘computational world models’ — artificial intelligence techniques that enable robots to understand and operate in the world around as effectively as humans. Despite the impressive successes of modern machine learning approaches on domains such as text, images, and video—where abundant training data is readily available—these have not translated to robotics. Building generally capable robotic systems presents unique challenges including the lack of data, and the need to adapt learning algorithms to a wide variety of embodiments, environments, and tasks of interest.

In my talk, I will present how my research contributes to the design of computational models for spatial, physical, and multimodal understanding. I will discuss differentiable computing approaches that have advanced the field of spatial perception, enabling an understanding of the structure of the 3D world, its constituent objects, and their semantic and physical properties from videos. I will also detail how my work interfaces advances in large image, language, and audio models with 3D scenes, enabling robots and computer vision systems to flexibly query these structured world models for a wide range of tasks. Finally, I will outline my vision for the future, where structured world models and modern scaling-based approaches work in tandem to create versatile robot perception and planning algorithms with the potential to meet and ultimately surpass human-level capabilities.

 

Post Talk Link:  Click Here

Passcode: 09hDc@$i

Speaker/s

Krishna Murthy is a postdoc at MIT, working with Antonio Torralba and Josh Tenenbaum. He previously completed his PhD working at Mila and the University of Montreal, advised by Liam Paull. His research focuses on building computational world models to help embodied agents perceive, reason about, and act in the physical world. He has led the organization of multiple workshops on themes spanning differentiable programming, physical reasoning, 3D vision and graphics, and ML research dissemination. His research has been recognized with graduate fellowship awards from NVIDIA and Google (2021); a best paper award from Robotics and Automation letters (2019); and an induction to the RSS Pioneers cohort (2020).

Related