TY - GEN
T1 - A framework for measuring the impact of web spam
AU - Jones, Timothy
AU - Hawking, David
AU - Sankaranarayana, Ramesh
PY - 2007
Y1 - 2007
N2 - Web spam potentially causes three deleterious effects: unnecessary work for crawlers and search engines; diversion of traffic away from legitimate businesses; and annoyance to search engine users through poorer results. Past research on web spam has focused on spamming techniques, spam suppression techniques, and methods for classifying web content as spam or non-spam. Here we focus on the deterioration of search result quality caused by the presence of spam in a countryscale web. We present a framework for measuring the degradation in quality of search results caused by the presence of web spam. We index the 80 million page UK2006 web spam collection on one machine. We trial the proposed framework in an experiment with the UK2006 collection and demonstrate that simple removal of spam pages from result sets can increase result quality. We conclude that the framework is a reasonable vehicle for research in this area and outline changes necessary for planned future experiments.
AB - Web spam potentially causes three deleterious effects: unnecessary work for crawlers and search engines; diversion of traffic away from legitimate businesses; and annoyance to search engine users through poorer results. Past research on web spam has focused on spamming techniques, spam suppression techniques, and methods for classifying web content as spam or non-spam. Here we focus on the deterioration of search result quality caused by the presence of spam in a countryscale web. We present a framework for measuring the degradation in quality of search results caused by the presence of web spam. We index the 80 million page UK2006 web spam collection on one machine. We trial the proposed framework in an experiment with the UK2006 collection and demonstrate that simple removal of spam pages from result sets can increase result quality. We conclude that the framework is a reasonable vehicle for research in this area and outline changes necessary for planned future experiments.
KW - Adversarial information retrieval
KW - Web information retrieval
KW - Web spam
UR - http://www.scopus.com/inward/record.url?scp=84876701476&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9780646484372
T3 - ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium
SP - 108
EP - 111
BT - ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium
T2 - 12th Australasian Document Computing Symposium, ACDS 2007
Y2 - 10 December 2007 through 10 December 2007
ER -