Thamar Solorio, senior director of graduate student affairs and professor of natural language processing at the Mohamed bin Zayed University of Artificial Intelligence, recently served as general chair of the annual conference on Empirical Methods in Natural Language Processing (EMNLP), one of the most important gatherings of scientists in the field of natural language processing (NLP).
Solorio was joined in Miami by many MBZUAI community members. MBZUAI affiliates were authors of nearly 50 studies and demos presented at the meeting. Indeed, a paper co-authored by Solorio and Monojit Choudhury, professor of natural language processing at MBZUAI, received an Outstanding Paper Award at the conference. Others looked at detecting machine-generated content, the ways in which large language models (LLMs) can process empathy and how LLMs represent cultures.
We spoke with Solorio to learn about about her experience as general chair, what she believes are some of the most interesting trends in NLP and what it was like to see such a strong contingent of scientists from MBZUAI present their findings at this important event.
MBZUAI: Tell us a bit about what EMNLP was like and some of the topics that were discussed.
Thamar Solorio: EMNLP was the largest gathering focused on NLP this year. We had more than 4,000 people registered for the conference and more than 3,000 attended in person, which was very exciting.
It was a huge honor being invited to be the general chair. At the same time, it was a huge responsibility to oversee the organization of this event. That said, I had a lot of help and worked closely with three program chairs who were wonderful.
As we were organizing the conference, the program chairs and I wanted to make sure that our conference had both a very exciting technical program and keynotes that were engaging and motivating for attendees. We wanted to make attendees feel excited about where the field of NLP is going and understand that NLP is an important part of the AI revolution that we are currently experiencing.
We highlighted the concepts and skills that are relevant today in NLP and invited people who could speak about those topics and about what we need to do to advance the field.
For example, we had a panel moderated by Monojit on the relevance of NLP in the era of AI, which featured of range of people who offered different perspectives on NLP. It included researchers who are focused on traditional NLP tasks and problems, others whose expertise is more focused on solving core machine learning (ML) problems and others still who sit between the two fields of NLP and ML. He did a wonderful job of getting the panelists engaged and discussing this important topic.
MBZUAI: What were some of the major themes that were discussed during EMNLP?
TS: One of the biggest themes was on the topic of cultural awareness. There were several papers that looked at evaluating how well vision language models represent diversity. When you ask these models to give you an image of what a breakfast looks like, what kind of image do they provide? Do the models give users an image of eggs and bacon, which is a very U.S.-centric version of breakfast? or something else?
There were other studies that looked at a concept called transcreation, which describes translating content from one language to another while making sure that the emotion and sentiment is localized. This means that in the translation, the model doesn’t only concentrate on preserving meaning, but also generates a translation in which the meaning is relevant to the local language and culture. It highlights the challenges of adapting NLP models to diverse cultural contexts.
In the field right now, there’s a boom of interest in cultural diversity and cultural alignment. We want to make models that represent the entire world, not just some countries.
MBZUAI: Could you provide a bit more background about why cultural awareness in NLP is so important and why people are interested in it?
TS: It’s really a sign of our times. We have these models that are embedded into our day-to-day devices, and it has become much more critical to make sure that they are aligned with different cultural values.
It’s shocking and disturbing when someone can’t see themselves represented in the technologies they use daily.
And so it has become a major concern which has been reflected in research.
MBZUAI: What are some of the ways that the NLP community can continue to support and encourage this kind of research?
TS: One way is by creating specific tracks related to these topics at conferences. There could be a track on multilingual topics, another one on low-resource languages, another on machine translation and so on.
This can help to motivate the community and signal that these topics are important. Another is through the selection of special themes for conferences as a whole.
MBZUAI: Were there other interesting themes or topics you encountered?
TS: There were presentations that highlighted efficient algorithms, training and inference, which is very relevant today, because we see this push for ever larger LLMs and there is a race happening for who can train the biggest one.
Progress to-date with LLMs has often been measured by size, by adding more and more data. Yet at some point, probably soon, we are going to run out of data to train these systems. They can ingest data faster than it can be generated. What does it mean when this happens and how do we move forward from there?
Generating synthetic data can help, but what does it mean when models are trained with a majority of synthetic data? How would that bias the learning? How would that bias the outcomes?
It’s important for us, particularly in academia, to take a step back from this race and think about efficient learning. This is one of the things that I saw being discussed at the conference as well.
MBZUAI: What was it like to see your colleagues from MBZUAI at the conference?
TS: It was great to have such a large group from MBZUAI and we took some really nice pictures together!
Many of us traveled on the same plane from Abu Dhabi to Miami and we organized ourselves to wear blue T-shirts during the reception so that we could represent the university.
It was also a nice feeling to see so many of the MBZUAI community present their research. And I saw many students, faculty and postdocs engaged in discussions with people they wouldn’t typically get to meet outside a major academic conference like this one.
We were a strong presence at EMNLP that was very loud and clear.
MBZUAI: Is there anything else you’d like to add?
TS: I think that we succeeded in accomplishing the main goal of leaving the conference with renewed energy and motivation to stay engaged in the field of NLP and to continue to advance interesting and meaningful research.
The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....
A team from MBZUAI is improving LLMs' performance across languages by helping them find the nuances of.....
A team from MBZUAI created a fine-grained benchmark to analyze each step of the fact-checking process and.....