Strength of forensic text comparison evidence from stylometric features: A multivariate likelihood ratio-based analysis

Shunichi Ishihara*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    10 Citations (Scopus)


    An experiment in forensic text comparison (FTC) within the likelihood ratio (LR) framework is described, in which authorship attribution was modelled with word-and character-based stylometric features. Chatlog messages of 115 authors were selected from a chatlog archive containing real pieces of chatlog evidence used to prosecute paedophiles. Four different text lengths (500, 1000, 1500 or 2500 words) were used for modelling in order to investigate how system performance is influenced by sample size. Strength of authorship attribution evidence (or LR) is estimated with the Multivariate Kernel Density formula. Performance was primarily assessed with the log-likelihood ratio cost (Cllr), but assessments of other metrics, e.g. credible interval and equal error rate, are also given. Taking into account the small number of features used for modelling authorship attribution, results are promising. Even with a small sample size of 500 words, the system achieved a discrimination accuracy of c. 76% (Cllr = 0.68258). With a sample size of 2500 words, a discrimination accuracy of c. 94% (Cllr = 0.21707) was obtained. Larger sample size is beneficial to FTC, resulting in an improvement in discriminability, an increase in the magnitude of the consistent-with-fact LRs and a decrease in the magnitude of the contrary-to-fact LRs. It was found that ‘Average character number per word token’, ‘Punctuation character ratio’, and vocabulary richness features are robust features, which work well regardless of sample sizes. The results demonstrate the efficacy of the LR framework for analysing authorship attribution evidence.

    Original languageEnglish
    Pages (from-to)67-98
    Number of pages32
    JournalInternational Journal of Speech, Language and the Law
    Issue number1
    Publication statusPublished - 2017


    Dive into the research topics of 'Strength of forensic text comparison evidence from stylometric features: A multivariate likelihood ratio-based analysis'. Together they form a unique fingerprint.

    Cite this