A framework for measuring the impact of web spam

Timothy Jones*, David Hawking, Ramesh Sankaranarayana

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    5 Citations (Scopus)

    Abstract

    Web spam potentially causes three deleterious effects: unnecessary work for crawlers and search engines; diversion of traffic away from legitimate businesses; and annoyance to search engine users through poorer results. Past research on web spam has focused on spamming techniques, spam suppression techniques, and methods for classifying web content as spam or non-spam. Here we focus on the deterioration of search result quality caused by the presence of spam in a countryscale web. We present a framework for measuring the degradation in quality of search results caused by the presence of web spam. We index the 80 million page UK2006 web spam collection on one machine. We trial the proposed framework in an experiment with the UK2006 collection and demonstrate that simple removal of spam pages from result sets can increase result quality. We conclude that the framework is a reasonable vehicle for research in this area and outline changes necessary for planned future experiments.

    Original languageEnglish
    Title of host publicationADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium
    Pages108-111
    Number of pages4
    Publication statusPublished - 2007
    Event12th Australasian Document Computing Symposium, ACDS 2007 - Melbourne, VIC, Australia
    Duration: 10 Dec 200710 Dec 2007

    Publication series

    NameADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium

    Conference

    Conference12th Australasian Document Computing Symposium, ACDS 2007
    Country/TerritoryAustralia
    CityMelbourne, VIC
    Period10/12/0710/12/07

    Fingerprint

    Dive into the research topics of 'A framework for measuring the impact of web spam'. Together they form a unique fingerprint.

    Cite this