MBZUAI provides unique insights into the challenges facing artificial intelligence researchers in Southeast Asia

Wednesday, November 15, 2023

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) faculty played a key role in the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2023) this week, including winning two major publication awards, publishing eight papers, contributing to the conference organization, delivering tutorials and workshops, and presenting a keynote talk to the world’s leading Natural Language Processing (NLP) experts, with a focus on the Asia Pacific region.

The first award-winning paper was NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages, co-authored by Alham Fikri Aji, Assistant Professor of NLP (pictured, above), and Fajri Koto, an MBZUAI postdoctoral research fellow. The paper received the Best Resource Award in recognition of its innovative approach to high-quality corpus creation for low-resource Indonesian languages. The authors found that for the 12 local Indonesian languages of interest, manual construction of high-quality data through engagement with speaker communities was far superior to scraping web resources in terms of the utility of the data for NLP tasks.

The first award-winning paper was NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages, co-authored by Alham Fikri Aji, Assistant Professor of NLP, and Fajri Koto, an MBZUAI postdoctoral research fellow. The paper received the Best Resource Award in recognition of its innovative approach to high-quality corpus creation for low-resource Indonesian languages. The authors found that for the 12 local Indonesian languages of interest, manual construction of high-quality data through engagement with speaker communities was far superior to scraping web resources in terms of the utility of the data for NLP tasks.

The second award-winning paper was ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting, co-authored by Muhammad Abdul-Mageed, Visiting Associate Professor of Natural Language Processing and Machine Learning, which was awarded an Area Chair award. The authors proposed a novel method for constructing bilingual dictionaries by prompting large language models, and also for improving the quality of bilingual dictionaries that have been constructed through other methods.

Muhammad Abdul-Mageed

Other MBZUAI authors included Thamar Solorio, MBZUAI Professor of Natural Language Processing who co-authored A Review of Datasets for Aspect-based Sentiment Analysis, and Timothy Baldwin, MBZUAI’s Acting Provost and Professor of NLP. Professor Baldwin published a total of three papers at the conference, including Uncertainty Estimation for Debiased Models: Does Fairness Hurt Reliability? and It’s not only What You Say, It’s also Who It’s Said to: Counterfactual Analysis of Interactive Behavior in the Courtroom. The second of these papers explores how generative language models can deepen our understanding of bias, and fairness in the context of the role played by demographic and socio-economic factors in how individuals are addressed, based on a case study over US courtroom proceedings.

Thamar Solorio

By using thousands of documented cases over a 60-year period to quantify the politeness of Advocates' responses to questions from Justices, the team trained a large language model (LLM). After confirming that the LLM accurately replicated the levels of politeness and coordination observed in the real-world courtroom data, the authors used the LLM to investigate the hypothetical scenario of responses being directed to different Justices to those who asked the actual question, and whether this altered the sociolinguistic dynamic of the discourse. The model confirmed that it altered the dynamic of the discourse substantially, providing scientific evidence for the incidence of conscious or unconscious bias, and paving the way for an expanded role for LLMs in the analysis of social interaction.

Alongside the presentation of papers, Professor Preslav Nakov, recently appointed Chair of NLP at MBZUAI, acted as Senior Area Chair for Semantics: Lexical, Sentence-level Semantics, and Textual Inference.

A world leader in the detection of fake news, offensive language, and biomedical text-mining, Professor Preslav played an instrumental role in the recent development of Jais, the world’s most sophisticated Arabic-language large language model (LLM), which was open-sourced by the Abu Dhabi-based G42 company Inception in collaboration with the American artificial intelligence company Cerebras.

MBZUAI’s contribution to the conference also included a tutorial, Current Status of NLP in South East Asia with Insights from Multilingualism and Language Diversity, presented by an international team that included Assistant Professor Alham Fikri Aji.

The tutorial delivered particular insights into the challenges the region poses to the development of NLP research. These include the intricate language practises in Southeast Asia such as multilingualism and code-switching – the region is a melting pot of cultures, religions, and more than 1000 languages – a scarcity of datasets and models for the training of AI, and comparatively limited access to language technology and computing resources.

Professor Baldwin also presented a keynote talk at the Third Workshop on NLP for Medical Conversations titled, Analysing Patient Journeys across Clinical Visits to Analyse Anti-Microbial Stewardship in Australian Veterinary Clinics.

Professor Baldwin’s talk explored the large-scale analysis of veterinary clinical records to perform fine-grained analysis to aid in the monitoring of anti-microbial stewardship programs in Australia, focusing on patient journeys across multiple visits to a specific clinic for the treatment of specific ailments.

Reflecting MBZUAI’s role at the forefront of international AI research, the eight papers that were published at IJCNLP-AACL were the latest additions to more than 670 research papers published by university faculty, researchers and students at international conferences in 2023.

Related

thumbnail
Thursday, January 23, 2025

Forecasting crop yields in an era of extreme weather

Fakhri Karray's new statistical model could help improve sustainability, agricultural systems, and the price of produce.

  1. forecasting ,
  2. food ,
  3. environment ,
  4. climate ,
  5. statistics ,
  6. sustainability ,
  7. machine learning ,
Read More
thumbnail
Monday, January 06, 2025

Accelerating neural network optimization: The power of second-order methods

A team from MBZUAI presented a new approach for optimizing neural networks at the recent NeurIPS conference.

  1. neural networks ,
  2. second-order ,
  3. optimization ,
  4. neurips ,
  5. students ,
  6. research ,
Read More
thumbnail
Wednesday, December 25, 2024

Machine learning 101

From optimal decision making to neural networks, we look at the basics of machine learning and how.....

  1. prediction ,
  2. algorithms ,
  3. ML ,
  4. deep learning ,
  5. research ,
  6. machine learning ,
Read More