On authorship attribution via Markov chains and sequence kernels

Conrad Sanderson*, Simon Guenter

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference Paperpeer-review

    8 Citations (Scopus)

    Abstract

    We investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance.

    Original languageEnglish
    Title of host publicationProceedings - 18th International Conference on Pattern Recognition, ICPR 2006
    Pages437-440
    Number of pages4
    DOIs
    Publication statusPublished - 2006
    Event18th International Conference on Pattern Recognition, ICPR 2006 - Hong Kong, China
    Duration: 20 Aug 200624 Aug 2006

    Publication series

    NameProceedings - International Conference on Pattern Recognition
    Volume3
    ISSN (Print)1051-4651

    Conference

    Conference18th International Conference on Pattern Recognition, ICPR 2006
    Country/TerritoryChina
    CityHong Kong
    Period20/08/0624/08/06

    Fingerprint

    Dive into the research topics of 'On authorship attribution via Markov chains and sequence kernels'. Together they form a unique fingerprint.

    Cite this