Abstract
This study investigates how the reliability of likelihood ratio (LR)-based forensic text comparison (FTC) systems is affected by the sampling variability regarding author numbers in databases. When 30–40 authors (each contributing two 4 kB documents) are included in each of the test, reference and calibration databases, the experimental results demonstrate: 1) the overall performance (validity) of the FTC system reaches the same level of performance as a system with 720 authors, and 2) the variability of the system performance (reliability) starts to converge. A similar trend can be observed regarding the magnitude of fluctuation in derived LRs. The variability of the overall system performance is mostly due to the large variability in calibration, not discrimination. Furthermore, FTC systems are more prone to instability when the dimension of the feature vector is high.
Original language | English |
---|---|
Title of host publication | Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association |
Editors | Pradeesh Parameswaran, Jennifer Biggs, David Powers |
Place of Publication | Adelaide, SA |
Publisher | Australasian Language Technology Association |
Pages | 1-9 |
Publication status | Published - 2022 |
Event | The 20th Annual Workshop of the Australasian Language Technology Association - Adelaide, SA, Australia Duration: 1 Jan 2022 → … https://alta2022.alta.asn.au/papers |
Conference
Conference | The 20th Annual Workshop of the Australasian Language Technology Association |
---|---|
Country/Territory | Australia |
Period | 1/01/22 → … |
Other | 14-16 December 2022 |
Internet address |