Abstract
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
| Original language | English |
|---|---|
| Title of host publication | Challenges & Perspectives in Creating Large Language Models |
| Subtitle of host publication | Proceedings of BigScience Episode #5 Workshop |
| Editors | Angela Fan, Suzana Ilic, Thomas Wolf, Matthias Galle |
| Place of Publication | Stroudsburg PA, USA |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 75-83 |
| Number of pages | 9 |
| ISBN (Electronic) | 978-1-955917-26-1 |
| DOIs | |
| Publication status | Published - 2022 |
| Externally published | Yes |
| Event | Challenges and Perspectives in Creating Large Language Models: BigScience Episode #5 – Workshop - Virtual, Dublin, Ireland Duration: 27 May 2022 → 27 May 2022 https://aclanthology.org/2022.bigscience-1.pdf |
Workshop
| Workshop | Challenges and Perspectives in Creating Large Language Models |
|---|---|
| Country/Territory | Ireland |
| City | Virtual, Dublin |
| Period | 27/05/22 → 27/05/22 |
| Other | This workshop is organized by the BigScience initiative and will also serve as the closing session of this one year-long initiative aimed at developing a multilingual large language model, which is gathering 1.000+ researchers from more than 60 countries and 250 institutions and research labs. Its goal is to investigate the creation of a large scale dataset and model from a very wide diversity of angles. |
| Internet address |
Fingerprint
Dive into the research topics of 'Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver