An effect of background population sample size on the performance of a likelihood ratio-based forensic text comparison system: A Monte Carlo simulation with Gaussian mixture model

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    This is a Monte Carlo simulation-based study that explores the effect of the sample size of the background database on a likelihood ratio (LR)-based forensic text comparison (FTC) system built on multivariate authorship attribution features. The text messages written by 240 authors who were randomly selected from an archive of chatlog messages were used in this study. The strength of evidence (= LR) was estimated using the multivariate kernel density likelihood ratio (MVKD) formula with a logistic-regression calibration. The results are reported along two points: the system performance (= accuracy) and the stability of performance based on the standard metric for LR-based systems; namely the log-likelihood ratio cost (Cllr). It was found in this study that the system performance and its stability improve as a function of the sample size (= author count) in the background database in a non-linear manner, and that the more features used for modelling, the more background data the system generally requires for optimal results. The implications of the findings to the real casework are also discussed.
    Original languageEnglish
    Title of host publicationProceedings of Australasian Language Technology Association Workshop 2016 Workshop
    EditorsTrevor Cohn
    Place of PublicationPennsylvania, USA
    PublisherAssociation for Computational Linguistics
    Pages124-132pp
    EditionPeer Reviewed
    ISBN (Print)9781510833166
    Publication statusPublished - 2016
    EventAustralasian Language Technology Association Workshop (ALTA 2016) - Caulfield, Australia
    Duration: 1 Jan 2016 → …
    http://alta2016.alta.asn.au/U16/U16-1.pdf

    Conference

    ConferenceAustralasian Language Technology Association Workshop (ALTA 2016)
    Period1/01/16 → …
    OtherDecember 5–7 2016
    Internet address

    Fingerprint

    Dive into the research topics of 'An effect of background population sample size on the performance of a likelihood ratio-based forensic text comparison system: A Monte Carlo simulation with Gaussian mixture model'. Together they form a unique fingerprint.

    Cite this