Leveraging Pre-Trained Representations to Improve Access to Untranscribed Speech from Endangered Languages

Nay San*, Martijn Bartelds*, Mitchell Browne, Lily Clifford, Fiona Gibson, John Mansfield, David Nash, Jane Simpson, Myfany Turpin, Maria Vollmer, Sasha Wilmoth, Dan Jurafsky

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    12 Citations (Scopus)

    Abstract

    Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic speech recognition (ASR). Yet many endangered languages lack sufficient data for pre-training such models, or are predominantly oral vernaculars without a standardised writing system, precluding fine-tuning. Query-by-example spoken term detection (QbE-STD) offers an alternative for iteratively indexing untranscribed speech corpora by locating spoken query terms. Using data from 7 Australian Aboriginal languages and a regional variety of Dutch, all of which are endangered or vulnerable, we show that QbE-STD can be improved by leveraging representations developed for ASR (wav2vec 2.0: the English monolingual model and XLSR53 multilingual model). Surprisingly, the English model outperformed the multilingual model on 4 Australian language datasets, raising questions around how to optimally leverage self-supervised speech representations for QbE-STD. Nevertheless, we find that wav2vec 2.0 representations (either English or XLSR53) offer large improvements (56-86% relative) over state-of-the-art approaches on our endangered language datasets.

    Original languageEnglish
    Title of host publication2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1094-1101
    Number of pages8
    ISBN (Electronic)9781665437394
    DOIs
    Publication statusPublished - 2021
    Event2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Cartagena, Colombia
    Duration: 13 Dec 202117 Dec 2021

    Publication series

    Name2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings

    Conference

    Conference2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
    Country/TerritoryColombia
    CityCartagena
    Period13/12/2117/12/21

    Fingerprint

    Dive into the research topics of 'Leveraging Pre-Trained Representations to Improve Access to Untranscribed Speech from Endangered Languages'. Together they form a unique fingerprint.

    Cite this