High level forensic voice comparison based on fused long-term fundamental frequency and word n-gram features

Michael Carne, Shunichi Ishihara, Yuko Kinoshita

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    Feature robustness is particularly important in forensic applications of speaker recognition, where there are often significant differences in the recording conditions between forensic samples. For this reason, high level features have previously been recommended for use in forensic systems, since they tend to be more robust to the acoustic variability introduced by recording conditions [1]. A drawback of high level features though is their poor performance relative to low-level cepstral features. We suggest, however, it may be possible to improve the performance of high feature systems by combining acoustic and idiolectal information, and this may deliver a better trade-off with respect to robustness, interpretability and discrimination performance. In this paper we evaluate a likelihood ratio-based (LR) forensic voice comparison (FVC) system fusing two high level feature subsystems: word n-grams and long-term fundamental frequency (LTF0). Preliminary experiments demonstrate some promising performance gains. We also examine how the duration of speech data impacts on this proposed system.

    Original languageEnglish
    Pages (from-to)5293-5297
    Number of pages5
    JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    Volume2022-September
    DOIs
    Publication statusPublished - 2022
    Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
    Duration: 18 Sept 202222 Sept 2022

    Fingerprint

    Dive into the research topics of 'High level forensic voice comparison based on fused long-term fundamental frequency and word n-gram features'. Together they form a unique fingerprint.

    Cite this