TY - JOUR
T1 - Strength of linguistic text evidence
T2 - A fused forensic text comparison system
AU - Ishihara, Shunichi
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2017/9
Y1 - 2017/9
N2 - Compared to other forensic comparative sciences, studies of the efficacy of the likelihood ratio (LR) framework in forensic authorship analysis are lagging. An experiment is described concerning the estimation of strength of linguistic text evidence within that framework. The LRs were estimated by trialling three different procedures: one is based on the multivariate kernel density (MVKD) formula, with each group of messages being modelled as a vector of authorship attribution features; the other two involve N-grams based on word tokens and characters, respectively. The LRs that were separately estimated from the three different procedures are logistic-regression-fused to obtain a single LR for each author comparison. This study used predatory chatlog messages sampled from 115 authors. To see how the number of word tokens affects the performance of a forensic text comparison (FTC) system, token numbers used for modelling each group of messages were progressively increased: 500, 1000, 1500 and 2500 tokens. The performance of the FTC system is assessed using the log-likelihood-ratio cost (Cllr), which is a gradient metric for the quality of LRs, and the strength of the derived LRs is charted as Tippett plots. It is demonstrated in this study that (i) out of the three procedures, the MVKD procedure with authorship attribution features performed best in terms of Cllr, and that (ii) the fused system outperformed all three of the single procedures. When the token length is 1500, for example, the fused system achieved a Cllr value of 0.15. Some unrealistically strong LRs were observed in the results. Reasons for these are discussed, and a possible solution to the problem, namely the empirical lower and upper bound LR (ELUB) method is trialled and applied to the LRs of the best-achieving fusion system.
AB - Compared to other forensic comparative sciences, studies of the efficacy of the likelihood ratio (LR) framework in forensic authorship analysis are lagging. An experiment is described concerning the estimation of strength of linguistic text evidence within that framework. The LRs were estimated by trialling three different procedures: one is based on the multivariate kernel density (MVKD) formula, with each group of messages being modelled as a vector of authorship attribution features; the other two involve N-grams based on word tokens and characters, respectively. The LRs that were separately estimated from the three different procedures are logistic-regression-fused to obtain a single LR for each author comparison. This study used predatory chatlog messages sampled from 115 authors. To see how the number of word tokens affects the performance of a forensic text comparison (FTC) system, token numbers used for modelling each group of messages were progressively increased: 500, 1000, 1500 and 2500 tokens. The performance of the FTC system is assessed using the log-likelihood-ratio cost (Cllr), which is a gradient metric for the quality of LRs, and the strength of the derived LRs is charted as Tippett plots. It is demonstrated in this study that (i) out of the three procedures, the MVKD procedure with authorship attribution features performed best in terms of Cllr, and that (ii) the fused system outperformed all three of the single procedures. When the token length is 1500, for example, the fused system achieved a Cllr value of 0.15. Some unrealistically strong LRs were observed in the results. Reasons for these are discussed, and a possible solution to the problem, namely the empirical lower and upper bound LR (ELUB) method is trialled and applied to the LRs of the best-achieving fusion system.
KW - Authorship attribution features
KW - Forensic text comparison
KW - Likelihood ratio
KW - Logistic-regression fusion
KW - Multivariate kernel density
KW - N-grams
UR - http://www.scopus.com/inward/record.url?scp=85024896321&partnerID=8YFLogxK
U2 - 10.1016/j.forsciint.2017.06.040
DO - 10.1016/j.forsciint.2017.06.040
M3 - Article
SN - 0379-0738
VL - 278
SP - 184
EP - 197
JO - Forensic Science International
JF - Forensic Science International
ER -