Abstract
This study compares score- and feature-based methods for estimating forensic likelihood ratios for text evidence. Three feature-based methods built on different Poisson-based models with logistic regression fusion are introduced and evaluated: a one-level Poisson model, a one-level zero-inflated Poisson model and a two-level Poisson-gamma model. These are compared with a score-based method that employs the cosine distance as a score-generating function. The two types of methods are compared using the same data (i.e., documents attributable to 2,157 authors) and the same features set, which is a bag-of-words model using the 400 most frequently occurring words. Their performances are evaluated via the log-likelihood ratio cost (Cllr) and its composites: discrimination (Cllrmin) and calibration (Cllrcal) cost. The results show that (1) the feature-based methods outperform the score-based method by a Cllr value of 0.14–0.2 when their best results are compared and (2) a feature selection procedure can further improve performance for the feature-based methods. Some distinctive performance characteristics associated with likelihood ratios produced using the feature-based methods are described, and their implications will be discussed with real forensic casework in mind.
| Original language | English |
|---|---|
| Article number | 111268 |
| Journal | Forensic Science International |
| Volume | 334 |
| DOIs | |
| Publication status | Published - May 2022 |
Fingerprint
Dive into the research topics of 'Likelihood ratio estimation for authorship text evidence: An empirical comparison of score- and feature-based methods'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver