Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation

Conrad Sanderson*, Simon Guenter

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    123 Citations (Scopus)

    Abstract

    We present an investigation of recently proposed character and word sequence kernels for the task of authorship attribution based on relatively short texts. Performance is compared with two corresponding probabilistic approaches based on Markov chains. Several configurations of the sequence kernels are studied on a relatively large dataset (50 authors), where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, the amount of training material has more influence on discrimination performance than the amount of test material. Moreover, we show that the recently proposed author unmasking approach is less useful when dealing with short texts.

    Original languageEnglish
    Title of host publicationCOLING/ACL 2006 - EMNLP 2006
    Subtitle of host publication2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
    Pages482-491
    Number of pages10
    Publication statusPublished - 2006
    Event11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006 - Sydney, NSW, Australia
    Duration: 22 Jul 200623 Jul 2006

    Publication series

    NameCOLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

    Conference

    Conference11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006
    Country/TerritoryAustralia
    CitySydney, NSW
    Period22/07/0623/07/06

    Fingerprint

    Dive into the research topics of 'Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation'. Together they form a unique fingerprint.

    Cite this