Many of us ask our phones or smart speakers to ring a friend, queue up a song, or direct us to the nearest gas station. But these handy virtual assistants are not equally effective across languages. If you speak to a virtual assistant in English, you will likely get the result you are looking for. But the error rate of automatic speech recognition dramatically increases with other languages. And for many languages, automated speech recognition must be greatly improved before it can provide reliable results to users.Hanan Al Darmaki, assistant professor of natural language processing at MBZUAI, is interested in improving automated speech recognition for what researchers call low-resource languages. These are languages for which there is a dearth of data that can be used to train computational models that are foundational to accurate, automated speech recognition.
...we want to be able to process language according to the way people speak.
MBZUAI Assistant Professor of Natural Language Processing
“There are thousands of languages spoken by people that don’t have the kind of labeled data that exist for a language like English and can be used to train models,” Al Darmaki said. “But even when some of these resources do exist, the recordings of these languages often don’t correspond to what is available in text.”
This lack of correspondence makes training models objectively less effective. And there are additional complexities as well — some that are particular to languages like Arabic. The variety of dialects, for example, makes Arabic challenging for automated speech recognition. In addition, there are few written resources that correspond to the many Arabic dialects that millions of people speak every day.
“We have text written in Modern Standard Arabic (MSA), of course, but no one talks like that, and we want to be able to process language according to the way people speak,” she said. “We can’t just throw the data we have for these languages at a neural network. To discover patterns in these languages and map them, we need to do a lot of things differently than we would with a language like English.”
To address this gap for low-resource languages, Al Darmaki has focused her recent research on a technique called unsupervised speech recognition. She co-authored a review article on the topic in the journal Speech Communication earlier this year.
Unsupervised speech recognition attempts to identify meaningful units in spoken language without corresponding text. One possible approach is to use a concept called word embedding, which has been successfully used to map words across languages without providing knowledge of those languages to the computational model.
With word embedding, words are translated to sequences of numbers, called vectors. The statistical models that do this translation look at large collections of data and conduct a statistical mapping that considers how often words appear next to each other in sentences. Words that appear in similar linguistic contexts are represented by similar but unique vectors.
“Something that was noticed when you look at word embeddings and you calculate distances between words was that the distances relate to meaning,” Al Darmaki said. “The distance between the word for ‘table’ and the word for ‘chair,’ for example, shows that there is some kind of relationship between those words and the relationships are similar across languages.”
The insight is that there is a similarity between the relationship between objects in the world and how these objects are represented in languages. While the concept of word embedding has made it possible to map words across languages, it may also be used to improve automatic speech recognition without the vast pools of data that have been necessary for this work in the past.
As for the future of the field of natural language processing, Al Darmaki believes that it is headed in an interesting direction. “There are many philosophical implications that make it difficult to predict the future,” she said. “The progress in natural language processing has been crazy, even over just the past few years.”
And yet, even with the rapid pace with which the field develops, she is pragmatic about the impact researchers like herself can have: “AI is a practical field and we have to care about its applications.”
Al Darmaki’s research is focused on natural language processing (NLP) and automatic speech recognition (ASR) for low-resource languages. The methods she explores include unsupervised learning, transfer learning, and distant supervision to adapt NLP and ASR models for languages and dialects for which labeled data are scarce or nonexistent. This includes studying regularities in text and speech patterns to discover and map terms across languages or modalities, such as unsupervised dictionary induction, cross-lingual embeddings of speech and text, and unsupervised speech-to-text mapping.
Prior to joining MBZUAI, Al Darmaki was an assistant professor in the department of computer science and software engineering at UAE University (UAEU). While completing her Ph.D., she worked as a teaching assistant and lecturer at George Washington University as well as on research projects at Apple Inc. and Amazon Web Services as an intern. Before starting her Ph.D., she worked as a statistical analyst at the Statistics Center-Abu Dhabi (SCAD), and as a network engineer at Dubai Electricity and Water Authority.
Al Darmaki earned a Ph.D. in computer science from George Washington University. She holds a master of philosophy in computer speech, text, and internet technology (CSTIT) from University of Cambridge, and a bachelor of science in computer engineering from the American University of Sharjah.
The game-based dataset created by MBZUAI scientists tests LMMs' pattern recognition, spatial awareness, arithmetic, and logical thinking.
Researchers from MBZUAI have developed a tool that uses demographic information to help bridge linguistic and cultural.....
MBZUAI-born start-up LibrAI launched its new leaderboard to assess and evaluate LLMs and make the gap between.....