In this talk we’ll discuss how to build rich 3D representations of the environment to assist people and robots to perform tasks. We’ll first discuss how to build visual 3D maps of environments and use those for visual (re)localization, spatial data access and navigation. We’ll cover recent methods based on geometry, learning and combining both. One of the questions we will consider is what is best learned and where we should use explicit geometric concepts. We’ll also discuss how to build rich 3D semantic representations that enable queries and interactions with the scene. Our approach allows open vocabulary queries by leveraging foundation models. While these models are very powerful in recognizing arbitrary objects, there are some aspects that are still missing to enable robotic interactions. We’ll also briefly cover some of our work on action recognition which is key in building AI assistants and could also be useful to enable robots to learn from examples.
Post Talk Link: Click Here
Passcode: A7u#@cUn
Marc Pollefeys is a Professor of Computer Science at ETH Zurich and the Director of the Microsoft Spatial AI Lab in Zurich where he works with a team of scientists and engineers to develop advanced perception capabilities. He is a Fellow of IEEE, ACM and AAIA, and a member of the Academia Europeae. He obtained his PhD from the KU Leuven in 1999 and was a professor at UNC Chapel Hill before joining ETH Zurich. He is best known for his work in 3D computer vision, having been the first to develop a software pipeline to automatically turn photographs into 3D models, but also works on robotics, graphics and machine learning problems. Other noteworthy projects he worked on are real-time 3D scanning with mobile devices, a real-time pipeline for 3D reconstruction of cities from vehicle mounted-cameras, camera-based self-driving cars and the first fully autonomous vision-based drone. In recent years his academic research has focused on combining 3D reconstruction with semantic scene understanding.
Read More
Read More