Processing language like a human

Monday, August 29, 2022

Many of us ask our phones or smart speakers to ring a friend, queue up a song, or direct us to the nearest gas station. But these handy virtual assistants are not equally effective across languages. If you speak to a virtual assistant in English, you will likely get the result you are looking for. But the error rate of automatic speech recognition dramatically increases with other languages. And for many languages, automated speech recognition must be greatly improved before it can provide reliable results to users.Hanan Al Darmaki, assistant professor of natural language processing at MBZUAI, is interested in improving automated speech recognition for what researchers call low-resource languages. These are languages for which there is a dearth of data that can be used to train computational models that are foundational to accurate, automated speech recognition.

...we want to be able to process language according to the way people speak.

Hanan Al Darmaki
MBZUAI Assistant Professor of Natural Language Processing
Advances in the field of automated speech recognition, or ASR, over the past decade have been made possible by a technique called supervised learning. With supervised learning, large data sets of spoken language and corresponding transcripts, together known as labeled data, are fed into computational models called neural networks. When provided enough data, these neural networks accurately create associations between spoken language and written text, of a particular language, that result in accurate, automated speech recognition.

“There are thousands of languages spoken by people that don’t have the kind of labeled data that exist for a language like English and can be used to train models,” Al Darmaki said. “But even when some of these resources do exist, the recordings of these languages often don’t correspond to what is available in text.”

Modern Standard Arabic goes unsupervised

This lack of correspondence makes training models objectively less effective. And there are additional complexities as well — some that are particular to languages like Arabic. The variety of dialects, for example, makes Arabic challenging for automated speech recognition. In addition, there are few written resources that correspond to the many Arabic dialects that millions of people speak every day.

“We have text written in Modern Standard Arabic (MSA), of course, but no one talks like that, and we want to be able to process language according to the way people speak,” she said. “We can’t just throw the data we have for these languages at a neural network. To discover patterns in these languages and map them, we need to do a lot of things differently than we would with a language like English.”

To address this gap for low-resource languages, Al Darmaki has focused her recent research on a technique called unsupervised speech recognition. She co-authored a review article on the topic in the journal Speech Communication earlier this year.

Mapping language with word embedding

Unsupervised speech recognition attempts to identify meaningful units in spoken language without corresponding text. One possible approach is to use a concept called word embedding, which has been successfully used to map words across languages without providing knowledge of those languages to the computational model.

With word embedding, words are translated to sequences of numbers, called vectors. The statistical models that do this translation look at large collections of data and conduct a statistical mapping that considers how often words appear next to each other in sentences. Words that appear in similar linguistic contexts are represented by similar but unique vectors.

“Something that was noticed when you look at word embeddings and you calculate distances between words was that the distances relate to meaning,” Al Darmaki said. “The distance between the word for ‘table’ and the word for ‘chair,’ for example, shows that there is some kind of relationship between those words and the relationships are similar across languages.”

The insight is that there is a similarity between the relationship between objects in the world and how these objects are represented in languages. While the concept of word embedding has made it possible to map words across languages, it may also be used to improve automatic speech recognition without the vast pools of data that have been necessary for this work in the past.

As for the future of the field of natural language processing, Al Darmaki believes that it is headed in an interesting direction. “There are many philosophical implications that make it difficult to predict the future,” she said. “The progress in natural language processing has been crazy, even over just the past few years.”

And yet, even with the rapid pace with which the field develops, she is pragmatic about the impact researchers like herself can have: “AI is a practical field and we have to care about its applications.”

About Hanan Al Darmaki

Al Darmaki’s research is focused on natural language processing (NLP) and automatic speech recognition (ASR) for low-resource languages. The methods she explores include unsupervised learning, transfer learning, and distant supervision to adapt NLP and ASR models for languages and dialects for which labeled data are scarce or nonexistent. This includes studying regularities in text and speech patterns to discover and map terms across languages or modalities, such as unsupervised dictionary induction, cross-lingual embeddings of speech and text, and unsupervised speech-to-text mapping.

Prior to joining MBZUAI, Al Darmaki was an assistant professor in the department of computer science and software engineering at UAE University (UAEU). While completing her Ph.D., she worked as a teaching assistant and lecturer at George Washington University as well as on research projects at Apple Inc. and Amazon Web Services as an intern. Before starting her Ph.D., she worked as a statistical analyst at the Statistics Center-Abu Dhabi (SCAD), and as a network engineer at Dubai Electricity and Water Authority.

Al Darmaki earned a Ph.D. in computer science from George Washington University. She holds a master of philosophy in computer speech, text, and internet technology (CSTIT) from University of Cambridge, and a bachelor of science in computer engineering from the American University of Sharjah.

Related

thumbnail
Thursday, March 20, 2025

Can LLMs reason? New benchmark puts models to the test

The game-based dataset created by MBZUAI scientists tests LMMs' pattern recognition, spatial awareness, arithmetic, and logical thinking.

  1. reasoning ,
  2. large language models ,
  3. dataset ,
  4. research ,
  5. llms ,
  6. benchmark ,
  7. intelligence ,
Read More
thumbnail
Tuesday, March 18, 2025

Culturally Yours: A new tool for understanding cultural references in text

Researchers from MBZUAI have developed a tool that uses demographic information to help bridge linguistic and cultural.....

  1. COLING 2025 ,
  2. linguistics ,
  3. languages ,
  4. culture ,
  5. llms ,
  6. large language models ,
  7. research ,
Read More
thumbnail
Tuesday, February 18, 2025

Highlighting LLM safety: How the Libra-Leaderboard is making AI more responsible

MBZUAI-born start-up LibrAI launched its new leaderboard to assess and evaluate LLMs and make the gap between.....

  1. responsible ,
  2. ethical ,
  3. Safety ,
  4. LibrAI ,
  5. IEC ,
  6. MIEC ,
  7. llms ,
  8. large language models ,
  9. startup ,
  10. entrepreneurship ,
Read More