TY - GEN
T1 - On authorship attribution via Markov chains and sequence kernels
AU - Sanderson, Conrad
AU - Guenter, Simon
PY - 2006
Y1 - 2006
N2 - We investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance.
AB - We investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance.
UR - https://www.scopus.com/pages/publications/34147123127
U2 - 10.1109/ICPR.2006.899
DO - 10.1109/ICPR.2006.899
M3 - Conference Paper
SN - 0769525210
SN - 9780769525211
T3 - Proceedings - International Conference on Pattern Recognition
SP - 437
EP - 440
BT - Proceedings - 18th International Conference on Pattern Recognition, ICPR 2006
T2 - 18th International Conference on Pattern Recognition, ICPR 2006
Y2 - 20 August 2006 through 24 August 2006
ER -