Abstract
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
Original language | English |
---|---|
Title of host publication | Challenges & Perspectives in Creating Large Language Models |
Subtitle of host publication | Proceedings of BigScience Episode #5 Workshop |
Editors | Angela Fan, Suzana Ilic, Thomas Wolf, Matthias Galle |
Place of Publication | Stroudsburg PA, USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 75-83 |
Number of pages | 9 |
ISBN (Electronic) | 978-1-955917-26-1 |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | Challenges and Perspectives in Creating Large Language Models: BigScience Episode #5 – Workshop - Virtual, Dublin, Ireland Duration: 27 May 2022 → 27 May 2022 https://aclanthology.org/2022.bigscience-1.pdf |
Workshop
Workshop | Challenges and Perspectives in Creating Large Language Models |
---|---|
Country/Territory | Ireland |
City | Virtual, Dublin |
Period | 27/05/22 → 27/05/22 |
Other | This workshop is organized by the BigScience initiative and will also serve as the closing session of this one year-long initiative aimed at developing a multilingual large language model, which is gathering 1.000+ researchers from more than 60 countries and 250 institutions and research labs. Its goal is to investigate the creation of a large scale dataset and model from a very wide diversity of angles. |
Internet address |