The power of propaganda and AI’s ability to fight it

Umar among first seven NLP graduates

Monday, June 26, 2023

Muhammad Umar’s research represents a significant contribution to the field of natural language processing (NLP) and detecting propaganda on social media platforms, particularly in cases where there is a mixture of low and high-resource languages.

Globally, there is vast amounts of research and time being spent on languages other than English for preservation, education, and language models. Umar, a Class of 2023 graduate from Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), is contributing in his native language, Roman Urdu, among others.

He received his Master of Science in NLP at the Class of 2023 commencement ceremony at Abu Dhabi Energy Center on June 4, making him part of the first cohort of NLP graduates and the first in his family to receive a master’s degree.

“As an NLP researcher, I recognize the power of language in shaping opinions and influencing public discourse,” said the Pakistani national. “Propaganda is a pervasive tool used to manipulate public opinion, and it is a growing concern in the digital age, especially in bilingual communities where little to no work has been done to detect it.

As an NLP researcher, I recognize the power of language in shaping opinions and influencing public discourse,

Muhammad Umar
Class of 2023 Graduate
“With the rise of the internet and social media platforms, the spread of propaganda has become widespread and can have a significant impact on individuals’ attitudes and behavior. Most propaganda detection work has been done on high-resource languages, such as English, leaving low-resource languages largely unexplored. Code-switching, which involves mixing multiple languages in the same text, is common in low-resource language communities and can make propaganda detection more challenging.

“In linguistics, code-switching refers to the practice of alternating between two or more languages or language varieties in a single conversation or text. In the context of my thesis, code-switched social media text specifically refers to social media text that uses a mixture of different languages, including English and Roman Urdu.

Despite already graduating, Umar is continuing his research and hopes to submit a paper related to detecting propaganda techniques in code-switched text at the 2023 Empirical Methods in Natural Language Processing (EMNLP) conference, one of the primary high impact NLP and artificial intelligence (AI) conferences for NLP research.

His model can be extended to other underrepresented or low-resource languages. “This is one of the advantages of using deep learning models for NLP tasks,” Umar explains, “as they can be trained on large amounts of text data and then fine-tuned for specific tasks.’ I ran experiments using several pre-trained monolingual, multilingual, and cross-lingual language models, fine-tuning them on the annotated code-switched dataset I prepared, and evaluated their performances. I found XLM RoBERTa, fine-tuned on Roman Urdu, outperformed all other baseline NLP models on our task and dataset.”

Umar said he feels privileged to be part of the groundbreaking program which is leading the way in the field of NLP. “Being a part of the first batch has given me the opportunity to work closely with world-class faculty members, cutting-edge technologies, and a diverse group of fellow students who share a passion for NLP,” he continued.

“NLP is a rapidly growing field with significant potential for both positive and negative impacts. On one hand, it has led to the development of sophisticated tools like ChatGPT that can help us with several research-based tasks. On the other hand, there are concerns about how NLP can be used to manipulate or deceive people, especially with the rise of deepfake technology. However, despite these challenges, I firmly believe that NLP has the potential to shape our future in a positive way.”

Relocating to Abu Dhabi in 2020 after receiving his Bachelor’s in Computer Science from Lahore University of Management Sciences (LUMS), Pakistan is the first time Umar left his home country for higher studies. “My journey at MBZUAI was challenging at first, but in the end, it was incredibly rewarding,” Umar said. “I struggled initially to balance the demands of coursework with my ongoing research and personal life, but eventually learned to manage my time more effectively and prioritize my responsibilities. My experience at MBZUAI has been nothing short of transformative.

“As a student, I have had access to world-class resources and a supportive learning environment that has allowed me to develop my skills. The faculty and staff at MBZUAI are among the best in their fields, and their expertise and guidance have been invaluable to my growth as a researcher. I have had the opportunity to work on challenging projects, attend seminars and workshops, and collaborate with peers from diverse backgrounds, which has broadened my perspective and expanded my knowledge.

“My MBZUAI education has prepared me well for the demands of the industry; in particular, the research and programming skills I have acquired have equipped me with the ability to analyze complex problems, design innovative solutions, and implement them effectively.”

Umar undertook a voluntary internship as a data scientist at G42 Healthcare, UAE's leading company in AI and cloud computing. A total of 33 of the 59 Class of 2022 graduates undertook a voluntary internship last summer to gain invaluable industry experience. He is currently working as a data engineer at G42 Healthcare. He is responsible for building and maintaining data pipelines that process and analyze large volumes of healthcare data to perform quality assessments and checks to ensure the data's accuracy, completeness, and consistency. Healthcare affects everyone and NLP, together with machine learning, is increasingly being used to assist clinicians to devise the best primary treatment plans for patients.

Working with data is ultimately where Umar wants to end up. “Data is the foundation of modern technology and is generated at an incredible rate by the ever-increasing number of devices and systems we use,” he adds. “Data science and data engineering are fields that are dedicated to making sense of this vast amount of data, extracting valuable insights and knowledge to inform decision-making. What I find fascinating about data is its versatility and how it can be applied across a wide range of fields, from finance to healthcare, to education and social media.”

Related

thumbnail
Wednesday, December 18, 2024

AI and the Arabic language: Preserving cultural heritage and enabling future discovery

The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....

  1. atlas ,
  2. language ,
  3. Arabic LLM ,
  4. United Nations ,
  5. Arabic language ,
  6. jais ,
  7. llms ,
  8. large language models ,
Read More
thumbnail
Thursday, December 12, 2024

Solving complex problems with LLMs: A new prompting strategy presented at NeurIPS

Martin Takáč and Zangir Iklassov's 'self-guided exploration' significantly improves LLM performance in solving combinatorial problems.

  1. processing ,
  2. prompting ,
  3. problem-solving ,
  4. llms ,
  5. neurips ,
  6. machine learning ,
Read More