The pacific expansion: Optimizing phonetic transcription of archival corpora

Rosey Billington, Hywel Stoakes, Nick Thieberger

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    For most of the world’s languages, detailed phonetic analyses across different aspects of the sound system do not exist, due in part to limitations in available speech data and tools for efficiently processing such data for low-resource languages. Archival language documentation collections offer opportunities to extend the scope and scale of phonetic research on low-resource languages, and developments in methods for automatic recognition and alignment of speech facilitate the preparation of phonetic corpora based on these collections. We present a case study applying speech modelling and forced alignment methods to narrative data for Nafsan, an Oceanic language of central Vanuatu. We examine the accuracy of the forced-aligned phonetic labelling based on limited speech data used in the modelling process, and compare acoustic and durational measures of 17,851 vowel tokens for 11 speakers with previous experimental phonetic data for Nafsan. Results point to the suitability of archival data for large-scale studies of phonetic variation in low-resource languages, and also suggest that this approach can feasibly be used as a starting point in expanding to phonetic comparisons across closely-related Oceanic languages.

    Original languageEnglish
    Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
    PublisherInternational Speech Communication Association
    Pages1713-1717
    Number of pages5
    ISBN (Electronic)9781713836902
    DOIs
    Publication statusPublished - 2021
    Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
    Duration: 30 Aug 20213 Sept 2021

    Publication series

    NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    Volume3
    ISSN (Print)2308-457X
    ISSN (Electronic)1990-9772

    Conference

    Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
    Country/TerritoryCzech Republic
    CityBrno
    Period30/08/213/09/21

    Fingerprint

    Dive into the research topics of 'The pacific expansion: Optimizing phonetic transcription of archival corpora'. Together they form a unique fingerprint.

    Cite this