A great part of the world’s knowledge is stored in long, structured documents such as scientific articles or legal texts. Language models (LMs) can help humans in handling these documents, for example by answering questions about documents or summarizing them. However, this potential has not been fully explored. First, the rich structure of documents composed of sections, subsections, etc, is often ignored. And second, due to hallucination, users need to verify LM responses, which can be hard for long documents. In this talk, I will present research that addresses these gaps: Regarding the use of document structure, we conducted probing experiments to assess the representation of document structure in long document transformers. Our data shows that the models learn to represent document structure during pre-training, and this representation can be enhanced via additional inputs. To improve verifiability, we investigated optimal approaches to attribution, i.e. the generation of responses with evidence from the document. We found that large models (e.g. GPT-4) are well able to cite from their inputs, while smaller models benefit from post-hoc evidence retrieval. The talk concludes with an outlook on promising directions for future work.
Post Talk Link: Click Here
Passcode: Umwvvr#1
Jan Buchmann is a fourth year PhD student in the Ubiquitous Knowledge Processing (UKP) Lab at TU Darmstadt, supervised by Iryna Gurevych. He is interested in harnessing the power of NLP to simplify access to information stored in long documents such as scientific documents or legal texts. He has led projects and contributed research leading to publications at venues such as CL, EACL, CoLM and EMNLP (forthcoming). Jan is part of the EU-funded Intertext project, where novel methods are developed for handling evolving, interconnected texts, instead of seeing them as static, isolated objects. Before joining UKP lab, Jan studied biochemistry and bioinformatics at the universities of Konstanz and Hamburg.
Read More
Read More