Code-switching, defined as the mixing of languages in text and speech, is a worldwide phenomenon, and thus the ability of NLP systems to handle such input is essential. Despite the growing amount of research in this direction, there is still much ground to cover in terms of collecting corpora and advancing NLP systems. In this talk, I’ll present our work on Egyptian Arabic-English code-switching for two downstream tasks; automatic speech recognition (ASR) and machine translation (MT). I’ll first present ArzEn-ST, the speech translation corpus we have collected. Then for ASR, I’ll present our work on comparing the performance of end-to-end and hybrid systems, and improving recognition by utilizing the strengths of both systems. For MT, I’ll present our work on handling code-switching data scarcity through data augmentation and word segmentation. Finally, I’ll discuss ASR evaluation challenges in the context of code-switching, and present our work on benchmarking ASR evaluation metrics.
Post Talk Link: Click Here
Passcode: pJ^M$8=M
Injy Hamed is a research assistant at CAMeL Lab in New York University Abu Dhabi and a PhD student at the Institute for Natural Language Processing in Stuttgart University. Her research is focused on code-switched natural language processing, where she worked on corpora collection, speech recognition, and machine translation.
Read More
Read More