I see what you’re saying: the Abu Dhabi AI researchers making video dubbing sync

Friday, December 31, 2021

Originally published by Wired Middle East on December 29, 2021

How a team of graduate students at the Mohamed Bin Zayed University of Artificial Intelligence are working to overcome the limitations of audio-visual dubbing technologies.

As anyone who has ever watched a dubbed movie will be aware, there is often a jarring disparity between the words we hear and the lip movements of the person supposedly delivering them. Now, a trio of young graduate students at the Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) is using AI—specifically, deep learning, natural language processing, and computer vision—to fix this problem once and for all.

They call their breakthrough product Auto-DUB, which draws on existing automated dubbing technologies, which commonly manipulate speech-to-speech patterns to generate more lifelike dubbing in moving images. The MBZUAI team diverges from the conventional approach both in terms of the process it has devised, and the end user it aims to target—switching the focus from entertainment to education.

“In recent decades, online education has achieved significant growth, providing access to thousands of courses from top universities around the world,” says team member and machine-learning specialist Akbobek Abilkaiyrkyzy, a graduate of Kazakhstan’s Almaty University of Power Engineering and Telecommunications. “After the Covid outbreak, e-learning has become even more central to people’s lives. We need to ensure that everyone has access to this resource.”

To this end, the students have devised a three-step process that aims to provide a relatively seamless audio-visual transition from one language to another—first by generating subtitles in a language selected by the user, then generating an audio representation of these subtitles, and finally synching the audio with the onscreen speaker’s lip movements. The thinking is that learning a new subject is challenging enough without the distraction of subtitles, poorly dubbed audio, or—worst of all—mistranslations.

“Most educational videos are predominately English, which causes an obstruction for people who are not native to the language,” says computer-vision specialist Gokul Karthik, an MBZUAI researcher and graduate of India’s Thiagarajar College of Engineering. “Our product aims to bridge the language barrier using state-of-the-art machine learning techniques to create accurate translations, lip-sync the dubbed audio, and retain the style of the original speaker.”

We believe AI will grow within societies to drive prosperity and synergy.

Akbobek Abilkaiyrkyzy
MBZUAI master's student
Omani machine-learning specialist Ahmed Al-Mahrooqi, who attended North Carolina State University, believes that the MBZUAI team’s system will help overcome the practical issues content providers currently face. “While education platforms usually provide subtitles, these lead to a less desirable user experience,” he says. “But manually dubbing videos requires human expertise—a lot of time and a lot of money. Using AI models, natural language processors, and computer vision, we can dub videos instantly, cheaply, and naturally—preserving the speaker’s tone and style.”

This October, when the MBZUAI team were invited to present their project to an audience at Dubai’s GITEX exhibition, it was Karthik who stood up and explained the complex technology behind their ideas—a presentation brimming with multiple application programming interfaces, speech recognition frameworks, deep neural networks, and sequence modeling systems. He allowed that the team is using existing translator engines in its prototype, but added that “we plan to do everything ourselves” in the final version.

This is heady stuff for a team of graduate students, but, as Al-Mahrooqi points out, they are not working in a vacuum. “Our university provides us with a unique collaborative environment, bringing together like-minded, talented individuals from over 40 countries,” he says of MBZUAI. “We also received mentorship from professors, who gave us expert advice on how to fine-tune the product and make it better and better.”

Al-Mahrooqi goes on to describe the team’s GITEX appearance as an “exciting” day for all of them. “We participated in a competition that involved solving societal problems using AI. We pitched our product in the final round and met industry professionals who encouraged us to take the idea further.”

As for where the idea will lead in the future, Al-Mahrooqi assures us that the team has this covered, too. “We need dubbing platforms that offer consumers the choice of human professionals, who may produce more creative options, or AI bots that offer cheaper and faster solutions,” he says. “This will create a sustainable ecosystem where developers as well as artists can benefit.”

There are, of course, economic opportunities involved in such work—as Al-Mahrooqi points out, e-learning had a market size of over $250 billion in 2020, and inroads into the entertainment industry could yield huge rewards—but the MBZUAI team has ambitions that extend far beyond financial benefit.

“We believe AI will grow within societies to drive prosperity and synergy. This university aims to bridge not only research and industry, but to apply AI to everyday solutions,” says Abilkaiyrkyzy. “We also believe that open source education has a huge potential to improve lives globally, but this goal is far-fetched if we don’t break down the language barriers.”

Related

thumbnail
Wednesday, December 18, 2024

AI and the Arabic language: Preserving cultural heritage and enabling future discovery

The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....

  1. atlas ,
  2. language ,
  3. Arabic LLM ,
  4. United Nations ,
  5. Arabic language ,
  6. jais ,
  7. llms ,
  8. large language models ,
Read More
thumbnail
Thursday, December 12, 2024

Solving complex problems with LLMs: A new prompting strategy presented at NeurIPS

Martin Takáč and Zangir Iklassov's 'self-guided exploration' significantly improves LLM performance in solving combinatorial problems.

  1. processing ,
  2. prompting ,
  3. problem-solving ,
  4. llms ,
  5. neurips ,
  6. machine learning ,
Read More