Addressing NLP problems in low resource settings

Thursday, July 21, 2022

My research program has focused in large part on developing effective machine learning approaches to solve the automated processing of spontaneous human language. For example, I’ve devoted significant efforts into enabling technology for code-switching data, where some of the unique research challenges arise from the heterogeneous nature of this linguistic phenomenon. In the first part of my talk, I will discuss our recent work on adapting multilingual transformers to code-switching data in an efficient manner. The main contribution is the extension of the subword tokenization model by improving its robustness for modeling unseen tokens.
The second part of my talk I will present data augmentation as a way to improve domain adaptation in sequence labeling tasks. The target domain is social media data, another source of spontaneous language samples, and we specifically look at cross domain settings. Lastly, I will conclude with an overview of other research projects I’m currently leading at RiTUAL (Research in Text Understanding and Analysis of Language) lab. The common thread among all these research problems is the scarcity of labeled data. Join me for an overview of how I’ve addressed low resource settings throughout different downstream tasks.

 

Post Talk Link:  Click Here

Passcode: TjRDW9?%

Speaker/s

Thamar Solorio is a Professor of Computer Science at the University of Houston (UH) and she is also a visiting scientist at Bloomberg LP. She holds graduate degrees in Computer Science from the Instituto Nacional de Astrofísica, Óptica y Electrónica, in Puebla, Mexico. Her research interests include information extraction from social media data, enabling technology for code-switched data, stylistic modeling of text, and more recently multimodal approaches for online content understanding. She is the director and founder of the RiTUAL Lab at UH. She is the recipient of an NSF CAREER award for her work on authorship attribution, and recipient of the 2014 Emerging Leader ABIE Award in Honor of Denice Denton. She is currently serving a second term as an elected board member of the North American Chapter of the Association of Computational Linguistics and was PC co-chair for NAACL 2019. She recently joined the team of Editors in Chief for the ACL Rolling Review (ARR) system. Her research is currently funded by the NSF and by ADOBE.

Related