Transformer Models: from Linguistic Probing to Outlier Weights

Thursday, November 16, 2023

Language Models, of all sizes, have improved at a fast pace during the last few years. However, besides performance measures on downstream tasks, it is hard to understand what degree of linguistic knowledge they have and even more difficult to understand their inner workings. Through linguistic probing of language models such as BERT and RoBERTa, I investigate their ability to encode linguistic properties and find a link between this ability and the phenomenon of outliers, parameters within language models that show unexpected behaviors. These findings help us understand some of the properties of the attention mechanism at the core of such models. Finally, I show early results of current work on fine-tuning LLMs in Italian and the detection of synthetic news they can generate.

 

Post Talk Link:  Click Here 

Passcode: 7+7SY+Gw

Speaker/s

Giovanni Puccetti is a post-doc at the Institute of Information Science and Technology (ISTI) of CNR in Pisa. He received his PhD in Data Science from Scuola Normale Superiore in Pisa. During his PhD ha has been visiting student at the Center for Social Data Science at the University of Copenhagen and at RIKEN in Tokyo. His main scientific interests involve the inspection and mechanistic understanding of Large Language Models as well as the adaptation of such models to non English languages (with a focus on Italian). On an application side ha has worked on information extraction from patents, with a focus on Named Entity Recognition of technical entities.

Related