In this talk, I will address the importance of multimodality (i.e. using more than one modality, such as video, audio, text, masks and clinical data) for story-level recognition and generation. First, I will focus on story-level multimodal video understanding, as audio, faces, and visual temporal structure come naturally with the videos, and we can exploit them for free (FunnyNet-W and Short Film Dataset). Then, I will show some examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance).
Post Talk Link: Click Here
Passcode: vn!8G2H^
Vicky Kalogeiton has been an Assistant Professor at École Polytechnique since 2020. Before, she was a research fellow at the University of Oxford, working with A.Zisserman. In 2017, she obtained her PhD from the University of Edinburgh and Inria, Grenoble, advised by V.Ferrari and C.Schmid, where part of her thesis won the best poster award from the Grenoble Alpes University. She received her M.Sc degree in Computer Science from DUTh, Greece in 2013, being awarded the best master thesis award. Since 2021, V.Kalogeiton has received several awards for projects she supervised including a highlight at CVPR 2024, a student honorable mention award at ACCV 2022, and the best paper award at ICCV-W 2021 and grants, including two MS Azure Academic gifts, and an ANR JCJC award for junior researchers in France. Since 2021, she has been serving regularly as Area Chair at major vision conferences (outstanding Area Chair in 2022) and before she used to serve as a reviewer, having been awarded six times as an outstanding reviewer. Her research interests focus on multimodal learning (visual data, text, audio) split into three axes: generative AI, video understanding, and multimodal medical applications.
Read More
Read More