Home / News / Towards embodied multi-modal visual understanding

Towards embodied multi-modal visual understanding

Monday, March 20, 2023

Visual understanding of images and videos has come a long way from object and action recognition in simple scenes to event captioning and localization, visual question answering, language-based image manipulation, novel view synthesis and other tasks with impressive results on diverse and challenging visual data. Sometimes one even gets impression that we approach the end of the journey. In this talk I will briefly overview the progress in video understanding with some examples of our work. In particular, I will present our recent work on video question answering, dense video captioning as well as vision-language navigation and manipulation. I will then argue that vision is still in its infancy when it comes to the detailed understanding of the physical world and will discuss open research directions related to robotics and video generation.

Post Talk Link: Click Here
Passcode: pe=g@5R+

Speaker/s

Ivan Laptev is a senior researcher at INRIA Paris, team leader of the Willow lab and the head of scientific board at VisionLabs. He received a PhD degree in Computer Science from the Royal Institute of Technology in 2004 and a Habilitation degree from École Normale Supérieure in 2013. Ivan's main research interests include visual recognition of human actions, objects and interactions, and more recently robotics. He has published over 100 technical papers most of which appeared in international journals and major peer-reviewed conferences of the field. He served as an associate editor of IJCV and TPAMI and as a program chair of CVPE2018, he will be serving as a program chair of ICCV 2023 and ACCV 2024, he is a regular area chair for CVPR, ICCV and ECCV. He has co-organized several tutorials, workshops and challenges at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013) and Machines Can See summits (2017-2022). He received an ERC Starting Grant in 2012 and was awarded a Helmholtz prize for significant impact on computer vision in 2017.

Monday, October 06, 2025

Towards embodied multi-modal visual understanding

Speaker/s

Related

Nobel Laureate Michael Spence on how AI is redefining the global economy

Understanding faith in the age of AI

Formal Methods for Modern Payment Protocols

About

Resources

Programs

Calendars

Towards embodied multi-modal visual understanding

Speaker/s

Related

Nobel Laureate Michael Spence on how AI is redefining the global economy

Understanding faith in the age of AI

Formal Methods for Modern Payment Protocols

Subscribe to The Node