Feature-based forensic text comparison using a Poisson model for likelihood ratio estimation

Michael Carne, Shunichi Ishihara

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Score- and feature-based methods are the two main ones for estimating a forensic likelihood ratio (LR) quantifying the strength of evidence. In this forensic text comparison (FTC) study, a score-based method using the Cosine distance is compared with a feature-based method built on a Poisson model with texts collected from 2,157 authors. Distance measures (e.g. Burrows’s Delta, Cosine distance) are a standard tool in authorship attribution studies. Thus, the implementation of a score-based method using a distance measure is naturally the first step for estimating LRs for textual evidence. However, textual data often violates the statistical assumptions underlying distance-based models. Furthermore, such models only assess the similarity, not the typicality, of the objects (i.e. documents) under comparison. A Poisson model is theoretically more appropriate than distance-based measures for authorship attribution, but it has never been tested with linguistic text evidence within the LR framework. The log-LR cost (Cllr) was used to assess the performance of the two methods. This study demonstrates that: (1) the feature-based method outperforms the score-based method by a Cllr value of ca. 0.09 under the best-performing settings and; (2) the performance of the feature-based method can be further improved by feature selection.
    Original languageEnglish
    Title of host publicationProceedings of the The 18th Annual Workshop of the Australasian Language Technology Association
    Place of PublicationAustralia
    PublisherAustralasian Language Technology Association
    Pages32-42
    Publication statusPublished - 2020
    Event18th Annual Workshop of the Australasian Language Technology Association - Australia, Australia
    Duration: 1 Jan 2020 → …

    Conference

    Conference18th Annual Workshop of the Australasian Language Technology Association
    Country/TerritoryAustralia
    Period1/01/20 → …
    OtherTue Dec 01 00:00:00 AEST 2020

    Fingerprint

    Dive into the research topics of 'Feature-based forensic text comparison using a Poisson model for likelihood ratio estimation'. Together they form a unique fingerprint.

    Cite this