TY - JOUR
T1 - Likelihood ratio-based forensic voice comparison with higher level features
T2 - research and reality
AU - Rose, Phil
N1 - Publisher Copyright:
© 2017 Elsevier Ltd
PY - 2017/9
Y1 - 2017/9
N2 - Examples are given of forensic voice comparison with higher level features in real-world cases and research. A pilot experiment relating to estimation of strength of evidence in forensic voice comparison is described which explores the use of higher-level features extracted over a disyllabic word as a whole, rather than over individual monosyllables as conventionally practiced. The trajectories of the first three formants and tonal F0 of the hexaphonic disyllabic Cantonese word daihyat ‘first’ from controlled but natural non-contemporaneous recordings of 23 male speakers are modeled with polynomials, and multivariate likelihood ratios estimated from their coefficients. Evaluation with the log likelihood ratio cost validity metric Cllr shows an optimum performance is obtained, surprisingly, with lower order polynomials, with F2 requiring a cubic fit, and F1 and F3 quadratic. Fusion of F-pattern and tonal F0 results in considerable improvement over the individual features, reducing the Cllr to ca. 0.1. The forensic potential of the daihyat data is demonstrated by fusion with three other Cantonese higher-level features: the F-pattern of /i/, short-term F0, and syllabic nasal cepstral spectrum, which reduces the Cllr still further to 0.03. Important pros and cons of higher-level features and likelihood ratios are discussed, the latter illustrated with data from Japanese, and three varieties of English in real forensic casework.
AB - Examples are given of forensic voice comparison with higher level features in real-world cases and research. A pilot experiment relating to estimation of strength of evidence in forensic voice comparison is described which explores the use of higher-level features extracted over a disyllabic word as a whole, rather than over individual monosyllables as conventionally practiced. The trajectories of the first three formants and tonal F0 of the hexaphonic disyllabic Cantonese word daihyat ‘first’ from controlled but natural non-contemporaneous recordings of 23 male speakers are modeled with polynomials, and multivariate likelihood ratios estimated from their coefficients. Evaluation with the log likelihood ratio cost validity metric Cllr shows an optimum performance is obtained, surprisingly, with lower order polynomials, with F2 requiring a cubic fit, and F1 and F3 quadratic. Fusion of F-pattern and tonal F0 results in considerable improvement over the individual features, reducing the Cllr to ca. 0.1. The forensic potential of the daihyat data is demonstrated by fusion with three other Cantonese higher-level features: the F-pattern of /i/, short-term F0, and syllabic nasal cepstral spectrum, which reduces the Cllr still further to 0.03. Important pros and cons of higher-level features and likelihood ratios are discussed, the latter illustrated with data from Japanese, and three varieties of English in real forensic casework.
KW - Cantonese
KW - F-pattern trajectories
KW - Forensic voice comparison
KW - Higher-level features
KW - Likelihood ratio
KW - Segmental cepstrum
KW - Short term F0
KW - Tonal F0 trajectory
UR - http://www.scopus.com/inward/record.url?scp=85018601194&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2017.03.003
DO - 10.1016/j.csl.2017.03.003
M3 - Article
SN - 0885-2308
VL - 45
SP - 475
EP - 502
JO - Computer Speech and Language
JF - Computer Speech and Language
ER -