Likelihood ratio estimation for authorship text evidence: An empirical comparison of score- and feature-based methods

Shunichi Ishihara*, Michael Carne

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    10 Citations (Scopus)

    Abstract

    This study compares score- and feature-based methods for estimating forensic likelihood ratios for text evidence. Three feature-based methods built on different Poisson-based models with logistic regression fusion are introduced and evaluated: a one-level Poisson model, a one-level zero-inflated Poisson model and a two-level Poisson-gamma model. These are compared with a score-based method that employs the cosine distance as a score-generating function. The two types of methods are compared using the same data (i.e., documents attributable to 2,157 authors) and the same features set, which is a bag-of-words model using the 400 most frequently occurring words. Their performances are evaluated via the log-likelihood ratio cost (Cllr) and its composites: discrimination (Cllrmin) and calibration (Cllrcal) cost. The results show that (1) the feature-based methods outperform the score-based method by a Cllr value of 0.14–0.2 when their best results are compared and (2) a feature selection procedure can further improve performance for the feature-based methods. Some distinctive performance characteristics associated with likelihood ratios produced using the feature-based methods are described, and their implications will be discussed with real forensic casework in mind.

    Original languageEnglish
    Article number111268
    JournalForensic Science International
    Volume334
    DOIs
    Publication statusPublished - May 2022

    Fingerprint

    Dive into the research topics of 'Likelihood ratio estimation for authorship text evidence: An empirical comparison of score- and feature-based methods'. Together they form a unique fingerprint.

    Cite this