TY - JOUR
T1 - A likelihood ratio-based evaluation of strength of authorship attribution evidence in SMS messages using N-grams
AU - Ishihara, Shunichi
PY - 2014
Y1 - 2014
N2 - An experiment in forensic text comparison (FTC) within the likelihood ratio (LR) framework is described. The experiment attempts to determine the strength of author- ship attribution evidence modelled with N-grams, which is perhaps one of the most basic automatic modelling techniques. The SMS messages of multiple authors selected from the SMS corpus compiled by the National University of Singapore were used for same- and different-author comparisons. The number of words used for the N-gram modelling was varied (200, 1000, 2000 or 3000 words), and then the performance of each set was assessed. The performance of the LR-based FTC system was assessed with the log likelihood ratio cost (Cllr). It is shown in this study that N-grams can be employed within an LR framework to discriminate same-author and different-author SMS texts, but a fairly large amount of data are needed to do it well (i.e. to obtain Cllr < 0.75). It is concluded that the LR framework warrants further examination with different features and processing techniques.
AB - An experiment in forensic text comparison (FTC) within the likelihood ratio (LR) framework is described. The experiment attempts to determine the strength of author- ship attribution evidence modelled with N-grams, which is perhaps one of the most basic automatic modelling techniques. The SMS messages of multiple authors selected from the SMS corpus compiled by the National University of Singapore were used for same- and different-author comparisons. The number of words used for the N-gram modelling was varied (200, 1000, 2000 or 3000 words), and then the performance of each set was assessed. The performance of the LR-based FTC system was assessed with the log likelihood ratio cost (Cllr). It is shown in this study that N-grams can be employed within an LR framework to discriminate same-author and different-author SMS texts, but a fairly large amount of data are needed to do it well (i.e. to obtain Cllr < 0.75). It is concluded that the LR framework warrants further examination with different features and processing techniques.
KW - Forensic text comparison
KW - Likelihood ratio
KW - Log-likelihood ratio cost
KW - N-gram language model
KW - SMS messages
KW - Tippett plot
UR - http://www.scopus.com/inward/record.url?scp=84903386913&partnerID=8YFLogxK
U2 - 10.1558/ijsll.v21i1.23
DO - 10.1558/ijsll.v21i1.23
M3 - Article
SN - 1748-8885
VL - 21
SP - 23
EP - 49
JO - International Journal of Speech, Language and the Law
JF - International Journal of Speech, Language and the Law
IS - 1
ER -