Cross language prediction of vandalism on wikipedia using article views and revisions

Khoi Nguyen Tran, Peter Christen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    11 Citations (Scopus)

    Abstract

    Vandalism is a major issue on Wikipedia, accounting for about 2% (350,000+) of edits in the first 5 months of 2012. The majority of vandalism are caused by humans, who can leave traces of their malicious behaviour through access and edit logs. We propose detecting vandalism using a range of classifiers in a monolingual setting, and evaluated their performance when using them across languages on two data sets: the relatively unexplored hourly count of views of each Wikipedia article, and the commonly used edit history of articles. Within the same language (English and German), these classifiers achieve up to 87% precision, 87% recall, and F1-score of 87%. Applying these classifiers across languages achieve similarly high results of up to 83% precision, recall, and F1-score. These results show characteristic vandal traits can be learned from view and edit patterns, and models built in one language can be applied to other languages.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings
    Pages268-279
    Number of pages12
    EditionPART 2
    DOIs
    Publication statusPublished - 2013
    Event17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD, Australia
    Duration: 14 Apr 201317 Apr 2013

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    NumberPART 2
    Volume7819 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
    Country/TerritoryAustralia
    CityGold Coast, QLD
    Period14/04/1317/04/13

    Fingerprint

    Dive into the research topics of 'Cross language prediction of vandalism on wikipedia using article views and revisions'. Together they form a unique fingerprint.

    Cite this