Faculty win EACL 2023 outstanding paper

Thursday, May 04, 2023

MBZUAI faculty members Alham Fikri Aji and Timothy Baldwin, and postdoctoral fellow Fajri Koto, have been honored, alongside their co-authors, with an Outstanding Paper Award at the European Chapter of the Association for Computational Linguistics (EACL) 2023. Aji is an assistant professor of natural language processing, and Baldwin is acting provost and chair of the Department of Natural Language Processing at MBZUAI.

The pair were recognized alongside 12 co-authors from Bloomberg, HKUST, Universitas Indonesia, INACL, the University of Melbourne, Telkom University, Kanda University of International Studies, Kata.ai, University of Zurich, and Google Research for their paper “NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.”

The award was announced during the EACL conference’s closing session today (May 4, 2023) in Dubrovnik, Croatia.

Indonesian language research

By some estimates there are more than 7,000 languages in the world. In Indonesia alone, there are around 700, many of which are spoken by over a million people. But due, in part, to lack of official support for the vast majority of those languages for official purposes, in digital applications, many such languages are dying out or are classified as endangered. The result is that many speakers of these languages have reduced access in their native language to high-quality content, banking, educational resources, e-government services, and more.

To help preserve these languages, researchers need resources such as benchmark datasets, and lexicons. In their latest research, Aji et al. developed the first-ever parallel resource for 10 Indonesian low-resource languages. The resulting resource aims to help boost performance in the task areas of sentiment analysis and machine translation.

The paper provides extensive analysis and describes the challenges of creating such resources. The authors hope their work will inspire both research on Indonesian languages specifically, as well as research into underrepresented languages more broadly.

To download data from the research, go to: https://huggingface.co/datasets/indonlp/NusaX-MT and https://huggingface.co/datasets/indonlp/NusaX-senti

Related

thumbnail
Monday, January 20, 2025

Highlighting LLM safety: How the Libra-Leaderboard is making AI more responsible

MBZUAI-born start-up LibrAI launched its new leaderboard to assess and evaluate LLMs and make the gap between.....

  1. startup ,
  2. entrepreneurship ,
  3. large language models ,
  4. llms ,
  5. MIEC ,
  6. IEC ,
  7. LibrAI ,
  8. Safety ,
  9. ethical ,
  10. responsible ,
Read More
thumbnail
Monday, January 20, 2025

MBZUAI welcomes the world to Abu Dhabi as COLING 2025 opens

The conference is in the Middle East for the first time, with more than 850 papers and.....

  1. faculty ,
  2. nlp ,
  3. llms ,
  4. COLING ,
  5. COLING 2025 ,
  6. organizing committee ,
  7. conference ,
  8. natural language processing ,
  9. research ,
Read More
thumbnail
Thursday, January 09, 2025

LLMs 101: Large language models explained

LLMs are a staple of AI, but what exactly are they? Our 101 guide breaks down the.....

  1. LLM360 ,
  2. natural language processing ,
  3. open source ,
  4. nlp ,
  5. large language models ,
  6. llm ,
  7. llms ,
  8. tokens ,
Read More