Home / News / Culture and bias in LLMs: Defining the challenge and mitigating risks

Culture and bias in LLMs: Defining the challenge and mitigating risks

Thursday, November 28, 2024

A computer screen shows date being extracted from a digital image of the world.

Over the past few years, many studies in the field of natural language processing (NLP) have considered how large language models (LLMs) represent cultures. It’s an important topic because people all over the world expect LLMs to provide safe and accurate information, no matter their culture or language.

Researchers from the Mohamed bin Zayed University of Artificial Intelligence, the University of Washington and other institutions recently presented studies that explore the complex ways in which culture is represented by LLMs. The research was shared at the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), held in Miami.

Untangling culture and language

Muhammad Farid Adilazuarda is a research assistant at MBZUAI interested in developing efficient multicultural and multilingual language models. He is co-author of a survey that analyzes dozens of recent studies on LLMs and culture and proposes a new framework that can be used to guide future research. In writing the survey, Adilazuarda and his colleagues were partially motivated to learn how researchers defined the term ‘culture’ as it’s used in NLP. They didn’t expect to find out that no one had done it yet.

Adilazuarda explains that there is no single, widely accepted definition of the concept of culture. And while NLP researchers have often turned to other disciplines, like sociology and anthropology, for guidance and insights about how to think about the concept, there is no consensus. Moreover, language and culture are so tightly intertwined that it is difficult to separate the two. This poses a challenge for researchers who work with LLMs, as they must interpret how models represent culture through the lens of language.

Adilazuarda and his colleagues began by identifying relevant papers that had been presented at the annual meeting of the Association for Computational Linguistics (ACL), the largest scholarly organization in NLP, two other AI conferences and studies found on Google Scholar. This search gave them a selection of 90 studies.

The team read through the papers and labeled them according to the method researchers used to probe LLMs, the languages and cultures studied, and the definitions of culture the studies used.

As the researchers read through the papers, however, they realized that none explicitly defined the term culture. If they did provide some kind of definition, it was extremely broad and not necessarily helpful. No one in the field was talking about the same thing when they talked about culture.

Proxies of culture

The studies Adilazuarda and his colleagues analyzed evaluated the performance of LLMs on what are known as benchmark datasets. This technique is common in the field and allows researchers to compare performance of models on the same dataset providing some understanding of models’ strengths and weaknesses.

While benchmarking provides valuable information about the performance of a model, insights derived from these kinds of tests are limited. They illustrate a model’s ability on the benchmark, and don’t necessarily represent how a model will behave when people use it in the real world.

After reviewing all the papers, Adilazuarda and his colleagues formulated the idea that while the studies don’t provide clear definitions of culture, the benchmark datasets in effect act as proxies for culture. The problem is that researchers across the discipline are using different datasets.

Moreover, cultures are extremely diverse and amorphous concepts, Adilazuarda explains, and they can’t be quantified so easily — if at all. Datasets are rather specific and typically only test one aspect of a model’s performance, say, as it relates to religion or gender. “A model’s ability to represent culture can’t be simply represented as a number based on its performance on benchmark datasets,” he says.

“The paper provides a framework for people to talk more meaningfully about culture,” adds Monojit Choudhury, professor of natural language processing at MBZUAI and co-author of the study. “Because culture is so multifaceted, this is an important starting point that encourages future directions.”

In the study, Adilazuarda and his colleagues make several recommendations to advance research on this topic. They encourage other researchers to be clear and specific about the aspects of culture that datasets measure. They ask for more research on topics like quality, time, kinship and a concept called ‘aboutnes’s, which relates to “topics and issues that are prioritized or deemed relevant within different cultures”.

They also state that the field would benefit from more studies that are designed to be interpretable, as results from black box models are highly variable depending on the prompt. Indeed, another study presented at EMNLP authored by Adilazuarda, Choudhury and others explores the concept of what is known as socio-demographic prompting, a technique that can be used on both open-source and black-box models and is designed to illuminate models’ representations of culture. It is, however, subject to variability due to the way models process prompts.

Finally, the team calls for more collaboration across disciplines, including with fields like human-computer interaction and anthropology.

Pradhyumna Lavania, Ashutosh Modi and Siddhant Singh of Indian Institute of Technology Kanpur; Jacki O’Neill of Microsoft Research Africa; and Sagnik Mukherjee and Alham Fikri Aji of MBZUAI also contributed to the study.

Identifying cultural biases in LLMs

While Adilazuarda and his colleagues’ study highlights the theoretical challenges of defining and evaluating culture in LLMs, another study presented at EMNLP delves into the practical implications of these challenges and examines how cultural biases can manifest in real-world applications of AI, such as in hiring.

This study looks at how LLMs handle biases related to race as it is understood in the West and caste as it is understood in India. In the context of hiring, researchers consider what they call social threats and covert harms, which describe text that is harmful but phrased in a way that might evade detection because it doesn’t use profanity.

The work was led by Preetam Prabhu Srikar Dammu and Hayoung Jung, both doctoral students at the University of Washington. Anjali Singh and Tanushree Mitra of University of Washington and Choudhury also contributed to the study.

A significant amount of research has investigated how LLMs represent concepts related to race. Little work, however, has examined stereotypes produced by LLMs that relate to caste. “The narrative of bias and AI safety is very Western-centric,” Choudhury says. “While models have generally been aligned to make sure that they are not racist or sexist, there are many dimensions of bias.”

AI tools can be used to screen applicants and have the potential to play a significant role in who gets the job and who doesn’t. “If AI rejects a worthy candidate in the first or second round because of some bias, it’s essentially the same thing as not giving the person an opportunity at all,” Dammu adds.

And it’s not just a problem that we may encounter in some far-off future. “It’s already happening,” he notes.

AI HR

The authors examined the performance of eight LLMs, including open source and closed source models, in the context of hiring employees. They generated what are known as conversation seed prompts, which included information about the race or caste identities of theoretical job applicants. In the context of race, the identities of “white” and “black” were used; in the context of caste, they used “Brahmin” and “Dalit.”

The researchers randomly selected names that were “culturally indicative of different races and castes” and considered four occupations to which the person was applying: software developer, doctor, nurse and teacher, which were chosen “due to their varied societal perceptions and stereotypical associations along both race and class dimensions.”

For each combination of occupation, cultural concept and LLM, they generated 30 conversations, which resulted in 1,920 total conversations.

The team analyzed the responses using a new framework they developed called covert harms and social threats (CHAST). CHAST consists of seven metrics grounded in social science literature, including the social identity threat theory and intergroup threat theory, taking into account various harms people may face based on their identity. They also developed and released a model that computes these metrics.

Other studies on potential harms of LLMs have used approaches called adversarial attacks and jailbreaking, where people intentionally try to get models to produce harmful responses. Dammu and his colleagues, however, aimed to evaluate the models in their neutral settings by providing context and observing their responses.

They found that open-source models produced more harmful conversations related to caste. They also found that five of the eight models generated more harmful text for older occupations. “The results show more egregious views in the older professions like teacher and doctor, which were specifically reserved for higher castes, historically,” Dammu explains.

Dammu says that while it was interesting that the OpenAI models were better than the open-source models, at the same time, “We don’t know what is going on behind the OpenAI API, as we lack access to the actual models or their weights. There could be filters or additional layers in place, even if the base models themselves remain problematic.”

The authors note that the findings underscore the risks of deploying LLMs in real-world applications without safeguards, as they can reinforce societal inequities and stereotypes, particularly in the Global South. “Even in the default setting, LLMs could be harmful,” Dammu says. “Users need to know that even if you aren’t trying to jailbreak an LLM, it can create harmful content and when they generate something that’s questionable, it is often hard to detect because they don’t use explicitly harmful language.”

That said, awareness is the first step. “Once developers know about the problems, they can fix them if they want to,” Choudhury says.

Tuesday, August 12, 2025

Culture and bias in LLMs: Defining the challenge and mitigating risks

Untangling culture and language

Proxies of culture

Identifying cultural biases in LLMs

AI HR

Related

Detecting deepfakes in the presence of code-switching

How dialectal pretraining improves Arabic automatic speech recognition

How jailbreak attacks work and a new way to stop them

Culture and bias in LLMs: Defining the challenge and mitigating risks

Untangling culture and language

Proxies of culture

Identifying cultural biases in LLMs

AI HR

Related

Detecting deepfakes in the presence of code-switching

How dialectal pretraining improves Arabic automatic speech recognition

How jailbreak attacks work and a new way to stop them

Subscribe to The Node