It is a common occurrence to be reading on the web and come across a term or cultural reference that is difficult to understand. People typically deal with this problem by opening a new browser tab and looking up the term in Google, typing it into a chatbot, or simply ignoring it and moving through the piece without taking the time to learn what it means.
Researchers from MBZUAI have developed a new tool that could potentially make it much easier for people to gain a deeper understanding of cultural words and phrases they aren’t familiar with. The researchers call their application Culturally Yours, a reading assistant designed to highlight and explain “cultural-specific items” on webpages.
In addition to helping individuals gain a more nuanced understanding of culture, Culturally Yours has the potential to help businesses reach broader audiences across diverse cultural markets, the researchers write in their study. The team presented a demonstration of the technology at the 31st International Conference on Computational Linguistics, which was held in Abu Dhabi in January.
One of the creators of Culturally Yours is Saurabh Kumar Pandey, a research associate at MBZUAI. Pandey is interested in developing technologies that can bridge linguistic and cultural gaps in understanding, particularly for languages and cultures that have traditionally been underrepresented on the web and in AI innovations like large language models. This is one of several recent projects Pandey has contributed to on the topic.
“The intent with this initiative was to develop a system that could identify spans in text that might be unfamiliar to people of different backgrounds,” he says. “At the same time, we wanted to help users understand what these unfamiliar sections of text mean.”
Sougata Saha and Monojit Choudhury of MBZUAI and Harshit Budhiraja, an AI researcher, are co-authors of the study.
A user’s interaction with Culturally Yours begins with their providing demographic information so that the system can gain an initial sense of where gaps in cultural understanding might be. Doing so helps address what is known in machine learning as a “cold-start problem”. For example, when using YouTube or Netflix for the first time, these systems typically ask simple questions about a user’s demographics that can help provide preliminary suggestions. Recommendations get better over time based on user behavior.
Similarly, Culturally Yours asks users for demographic information to gain an initial idea about what topics they may have a hard time understanding. While this data doesn’t provide a perfect view of someone’s knowledge, it serves as a “proxy” for cultural understanding, Pandey says.
Once a user provides demographic information, they can share the URL of a webpage to be analyzed. The system identifies culture-specific words and passages in the text by highlighting them and categorizes these passages as “unfamiliar” and “somewhat familiar.”
While this first categorization is based on a user’s demographic information, a user can also provide feedback to the system by selecting additional passages they don’t understand, removing highlighting on spans they do understand, or changing familiarity levels.
“We start with demographic information as a proxy to solve the cold start problem and build on that to get more personalized recommendations,” Pandey explains. “Every user is different. Two users of the same age, from the same state, who speak the same language will have different preferences.”
Culturally Yours uses an LLM to process text on webpages, identifying words and passages that may be unfamiliar to the user. The LLM also provides explanations of these words and phrases. The current iteration of the system uses OpenAI’s GPT-4o, but it can be configured with any closed- or open-source LLM.
To personalize system performance, Pandey and his colleagues employed three learning strategies that are designed to adapt to user behavior. One learning strategy, called free learning, processes the passages users select and deselect, modifying behavior based on these changes without interpreting why they were made. Another approach, called constrained learning, employs semantic proxies to interpret user behavior, incorporating modifications to refine personalization. A third strategy, called semi-constrained learning, mixes free and constrained learning. In this case, when a user makes a modification, the system updates only two of the proxies and processes the changes themselves in place of the two other proxies.
Pandey and his colleagues tested Culturally Yours in a small study. They chose articles written in English on two topics — elections in the U.S. and traditional food of the UAE. These topics provided a good mix of information that was both general and culturally specific.
Thirteen people from eight countries — India, Indonesia, China, Mexico, Sri Lanka, Egypt, Uzbekistan and Kazakhstan — participated in the study. Each used Culturally Yours on six stories, testing the three different learning strategies on both topics. After using the tool, testers were asked for their thoughts on how good the tool was at identifying and explaining cultural-specific information and about its ability to provide personalized results.
The researchers found that the constrained learning approach performed best but noted that a text’s topic likely influences what learning strategy is most effective. In addition, the participants in the study generally thought the tool was helpful.
“People were very keen to use this tool in their day-to-day lives and most users felt it became more personalized to their cultural understanding and preferences,” Pandey says. That said, Pandey and his colleagues note there is room to improve satisfaction levels.
One way Pandey and his colleagues are pushing ahead is by developing a browser extension that provides the same functionality of Culturally Yours but does so without requiring users to copy and paste a URL into a separate web application. “The extension is an advancement from what we describe in the paper, and we thought it would be very convenient. With just one click, people will be able to get a definition of text they may not understand.”
In addition, Pandey notes that the benefits of a system like Culturally Yours extend beyond its primary application. The kind of cultural translation the system does could potentially be used to complement the functionality of other systems, like LLMs, and aid in translation of cultural concepts from one language to another. “If we are able to identify cultural-specific items on a large scale and translate them in a proper way, we may be able to use that information and build an LLM” capable of handling a wide variety of languages.
Machine learning pioneer Michael I. Jordan was among the speakers discussing the cutting-edge ideas shaping the field.
The game-based dataset created by MBZUAI scientists tests LMMs' pattern recognition, spatial awareness, arithmetic, and logical thinking.
MBZUAI-born start-up LibrAI launched its new leaderboard to assess and evaluate LLMs and make the gap between.....