The biennial IEEE/CVF International Conference on Computer Vision (ICCV) was held in Paris, France this week and featured key contributions from MBZUAI faculty and staff.
Reflecting the exponential growth in artificial intelligence (AI) research and the field of computer vision (CV), the event was sponsored by Amazon, Facebook and Google and attracted a record number of 8,260 papers for submission, an increase of 34% on the previous conference in 2021.
2,161 papers (26.2%) were accepted after review, rebuttal, and discussion phases and further, 152 papers (1.8%) were given oral presentations. Thirty of the accepted papers were submitted by researchers from MBZUAI. In addition, Ivan Laptev, the university’s recently appointed Visiting Professor of Computer Vision, has served as one of five the ICCV Program Chairs.
Laptev recently joined MBZUAI, where he has opened a new research lab in CV, while on leave from the French National Institute for Research in Digital Science and Technology.
He is best known for his work on action recognition in video, but his recent research has explored the convergence of computer vision, natural language processing and robotics, addressing the problems of vision-language navigation and vision-language manipulation.
One study, led by Syed Talal Wasim, a researcher in computer vision affiliated with the Intelligent Visual Analytics Lab (IVAL) at MBZUAI, has developed a new approach to analyzing action in videos using an advanced CV capability called Video-FocalNets, which analyzes spatial and temporal information in moving images separately and at different scales.
Co-authors on the paper are Muhammad Uzair Khattak of MBZUAI, Muzammal Naseer of MBZUAI, Salman Khan of MBZUAI and Australian National University, Mubarak Shah of University of Central Florida, and Fahad Shahbaz Khan of MBZUAI and Linköping University.
The other paper, written by Guangyi Chen, a postdoctoral research fellow at MBZUAI and Carnegie Mellon University, explored how insights gained from analyzing still images can be translated into the realm of video using fewer resources, notably by using models that have been pre-trained on image data and using them to understand concepts in video.
The other authors of the research are Kun Zhang, associate professor of machine learning and director of Center for Integrative Artificial Intelligence (CIAI) at MBZUAI; Xiao Liu of Eindhoven University of Technology; Guangrun Wang and Philip H.S. Torr of University of Oxford; Xiao-Ping Zhang of Shenzhen International Graduate School, Tsinghua University and Toronto Metropolitan University; and Yansong Tang of Shenzhen International Graduate School, Tsinghua University.
The main program for this year’s ICCV included plenary lectures by Dr. Dorsa Sadigh, Assistant Professor in the Computer Science Department at Stanford University, who spoke about interactive learning in the era of large language models, and Dr. Pushmeet Kohli, Vice President of Research (AI for Science, Reliable and Responsible AI) at Google Deepmind, who gave a talk entitled ‘The potential of AI in advancing science and the importance of ensuring AI’s responsible use’.
In addition to the main program, these were accompanied by a doctoral consortium, 30 demonstrations, exhibits, and a number of co-located events, including 10 tutorials and 55 workshops. As the first ICCV to be held since COVID-19, the conference has attracted over 6,700 in-person attendees.
Developed by MBZUAI scientists, the new dataset will enable greater cultural and linguistic inclusivity in multimodal LLMs.
The students won the best student paper runners up award at ACCV for their new method called.....
The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....