The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    This study investigates the robustness and stability of a likelihood ratio–based (LRbased) forensic text comparison (FTC) system against the size of background population data. Focus is centred on a score-based approach for estimating authorship LRs. Each document is represented with a bagof-words model, and the Cosine distance is used as the score-generating function. A set of population data that differed in the number of scores was synthesised 20 times using the Monte-Carol simulation technique. The FTC system’s performance with different population sizes was evaluated by a gradient metric of the log–LR cost (Cllr). The experimental results revealed two outcomes: 1) that the score-based approach is rather robust against a small population size—in that, with the scores obtained from the 40~60 authors in the database, the stability and the performance of the system become fairly comparable to the system with a maximum number of authors (720); and 2) that poor performance in terms of Cllr, which occurred because of limited background population data, is largely due to poor calibration. The results also indicated that the score-based approach is more robust against data scarcity than the feature-based approach; however, this finding obliges further study.
    Original languageEnglish
    Title of host publicationThe influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison
    Place of PublicationAustralia
    PublisherAustralasian Language Technology Association
    Pages1-11
    Publication statusPublished - 2020
    EventThe Australasian Language Technology Association Workshop 2020 - Virtual
    Duration: 1 Jan 2020 → …
    http://efaidnbmnnnibpcajpcglclefindmkaj/https://aclanthology.org/2020.alta-1.3.pdf

    Conference

    ConferenceThe Australasian Language Technology Association Workshop 2020
    Period1/01/20 → …
    Other2020
    Internet address

    Fingerprint

    Dive into the research topics of 'The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison'. Together they form a unique fingerprint.

    Cite this