MBZUAI team win industry computer vision award for best student paper

Friday, October 20, 2023

Dmitry Demidov, a Ph.D. student in the Computer Vision Department at MBZUAI, is part of a team of MBZUAI alumni and faculty who recently received the Best Student Paper Award from the International Conference on Computer Vision Theory and Applications (VISAPP) 2023.

Demidov’s fellow authors for ‘Salient Mask-Guided Vision Transformer for Fine-Grained Classification’ included Ph.D. student Muhammad Hamza Sharif, Aliakbar Abdurahim, a researcher in machine learning and former graduate teaching assistant, Assistant Professor of Computer Vision, Hisham Cholakkal, and Fahad Khan, Deputy Department Chair of Computer Vision and Professor of Computer Vision, MBZUAI.

The paper proposes a simple yet effective approach to improving the accuracy of standard Vision Transformer architecture in Fine-Grained Visual Classification (FGVC), which aims to achieve a level of acuity when identifying differences between visually similar objects that usually requires in-depth, specialist knowledge. Examples include recognizing the subtle variations between animal species of the same genera, or between specific types of cars or aircraft.

“Standard models usually require large numbers of high-resolution images to learn from and their performance will decline when there are fewer, lower-resolution images,” Demidov explained. “But even when that is the case, our model still out-performs popular Vision Transformer (ViT) architecture.”

Dmitry Demidov

Named SalientMask-Guided Vision Transformer (SM-ViT), the technique deployed by the team uses a salient mask to distinguish between objects in the foreground and background of any image and then deploys a a ViT-like Salient Mask-Guided Encoder (SMGE) to focus on specific, defining characteristics of an object rather than analysing the image in its entirety.

“Unlike most of the previous ViT-based works, we do not completely disregard the less recognizable image parts, but rather guide the attention scores towards the more beneficial salient patches,” the 24-year-old said.

The team used three data sets to train their model, a process that took just over four hours, and then ran between 50 and 60 iterations, each of which involved the analysis of approximately 30,000 images, before it was capable of out-performing standard ViT architecture. Demidov cites numerous real-world scenarios where this level of visual acuity would already prove invaluable, such as in airports: “You could use cameras not just to identify different types of plane and where they are but to automatically calculate how much fuel they require,” he explained.

Demidov first became involved in pattern recognition research as an undergraduate student in computer engineering at Omsk State Technical University (OmSTU), where he joined a robotics lab working on a project that involved a simultaneous localization and mapping (SLAM) task. After graduating in 2020, he joined MBZUAI, where he obtained his master’s degree in computer vision in 2022.

“Back in 2014, when I was an undergraduate I used a more classical approach and tried to solve computer vision-related tasks without using machine learning, which took a lot of resources and time,” Demidov remembered. “And I realised at that moment that this was the future, and now something that once took me two weeks can be achieved within four or five hours.”

Demidov’s current research interests involve various types of image classification, but in keeping with MBZUAI’s focus on using AI to deliver solutions to real-world challenges, his principal projects use computer vision to improve visual classification when working with extremely limited data sets.

“This problem may seem minor at first glance but is an essentially crucial problem that requires redefining the conventional machine- and deep-learning algorithms and rules of thumb typically used in traditional image classification with a vast number of images,” Demidov said. “Improving the performance of existing solutions on this class of tasks may help in numerous applications, since most of the real projects in industry are usually extremely limited in terms of data.”

The 19th International Conference on Computer Vision Theory and Applications (VISAPP 2023) was held in Lisbon, Portugal, from February 19 to 21, 2023. VISAPP is part of VISIGRAPP, the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.

Related

thumbnail
Monday, January 27, 2025

Alumni Spotlight: In pursuit of truth

Having developed AI tools to fight misinformation and disinformation, MBZUAI alumnus Zain Mujahid has now turned his.....

  1. nlp ,
  2. research ,
  3. alumni ,
  4. llms ,
Read More
thumbnail
Friday, January 17, 2025

MBZUAI student’s start-up wins regional cybersecurity prize

Ahmed AlShamsi won first place at the Gulf Hackathon for Cybersecurity in Oman for his project Secure+,.....

  1. security ,
  2. cybersecurity ,
  3. student ,
  4. award ,
  5. startup ,
  6. sustainability ,
  7. student achievements ,
Read More
thumbnail
Monday, January 13, 2025

MBZUAI students win award for study presented at Asian Conference on Computer Vision

The students won the best student paper runners up award at ACCV for their new method called.....

  1. ACCV ,
  2. award ,
  3. students ,
  4. computer vision ,
  5. research ,
  6. student achievements ,
Read More