Most natural language processing (NLP) systems are unaware of their users’ preferred grammatical genders. Such systems typically generate a single output for a specific input without considering any user information. Beyond being simply incorrect in many cases, such output patterns create representational harms by propagating social biases and inequalities of the world we live in. While such biases can be traced back to the NLP systems’ training data, balancing and cleaning the training data will not guarantee the correctness of a single output that is arrived at without accounting for user preferences. In contrast, user-aware NLP systems should be designed to produce outputs that are as gender-specific as the input information they have access to. In this talk, I will introduce the task of gender rewriting in multi-user contexts with a focus on Arabic, a gender-marking morphologically rich language. I will discuss the Arabic Parallel Gender Corpus we built and the various modeling approaches we developed to solve the task of gender rewriting.
Post Talk Link: Click Here
Passcode: g+qyP0=T
Bashar is a computer science PhD student at New York University and a graduate research assistant at the CAMeL Lab in New York University Abu Dhabi. Bashar's research interests lie in the fields of natural language processing and deep learning. Particularly, he is interested in controlled language generation tasks such as grammatical error correction, text simplification, and machine translation. Bashar received a Master of Science in Computer Science from the University of Southern California (USC) where he worked at the USC Information Sciences Institute on low-resource machine translation and event relations extraction. He also has a Bachelor of Science in Computer Science and Mathematics from the University of Bridgeport.
Read More
Read More