TY - JOUR
T1 - High level forensic voice comparison based on fused long-term fundamental frequency and word n-gram features
AU - Carne, Michael
AU - Ishihara, Shunichi
AU - Kinoshita, Yuko
N1 - Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - Feature robustness is particularly important in forensic applications of speaker recognition, where there are often significant differences in the recording conditions between forensic samples. For this reason, high level features have previously been recommended for use in forensic systems, since they tend to be more robust to the acoustic variability introduced by recording conditions [1]. A drawback of high level features though is their poor performance relative to low-level cepstral features. We suggest, however, it may be possible to improve the performance of high feature systems by combining acoustic and idiolectal information, and this may deliver a better trade-off with respect to robustness, interpretability and discrimination performance. In this paper we evaluate a likelihood ratio-based (LR) forensic voice comparison (FVC) system fusing two high level feature subsystems: word n-grams and long-term fundamental frequency (LTF0). Preliminary experiments demonstrate some promising performance gains. We also examine how the duration of speech data impacts on this proposed system.
AB - Feature robustness is particularly important in forensic applications of speaker recognition, where there are often significant differences in the recording conditions between forensic samples. For this reason, high level features have previously been recommended for use in forensic systems, since they tend to be more robust to the acoustic variability introduced by recording conditions [1]. A drawback of high level features though is their poor performance relative to low-level cepstral features. We suggest, however, it may be possible to improve the performance of high feature systems by combining acoustic and idiolectal information, and this may deliver a better trade-off with respect to robustness, interpretability and discrimination performance. In this paper we evaluate a likelihood ratio-based (LR) forensic voice comparison (FVC) system fusing two high level feature subsystems: word n-grams and long-term fundamental frequency (LTF0). Preliminary experiments demonstrate some promising performance gains. We also examine how the duration of speech data impacts on this proposed system.
KW - forensic phonetics
KW - forensic voice comparison
KW - high level features
KW - likelihood ratios
KW - n-grams
UR - http://www.scopus.com/inward/record.url?scp=85140059362&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-11127
DO - 10.21437/Interspeech.2022-11127
M3 - Conference article
AN - SCOPUS:85140059362
SN - 2308-457X
VL - 2022-September
SP - 5293
EP - 5297
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Y2 - 18 September 2022 through 22 September 2022
ER -