Reordering an index to speed query processing without loss of effectiveness

David Hawking*, Timothy Jones

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

Following Long and Suel, we empirically investigate the importance of document order in search engines which rank documents using a combination of dynamic (query-dependent) and static (queryindependent) scores, and use document-at-a-time (DAAT) processing. When inverted file postings are in collection order, assigning document numbers in order of descending static score supports lossless early termination while maintaining good compression. Since static scores may not be available until all documents have been gathered and indexed, we build a tool for reordering an existing index and show that it operates in less than 20% of the original indexing time. We note that this additional cost is easily recouped by savings at query processing time. We compare best early-termination points for several different index orders on three enterprise search collections (a whole-of-government index with two very different query sets, and a collection from a UK university). We also present results for the same orders for ClueWeb09-CatB . Our evaluation focuses on finding results likely to be clicked on by users of Web or website search engines - Nav and Key results in the TREC 2011 Web Track judging scheme. The orderings tested are Original, Reverse, Random, and QIE (descending order of static score). For three enterprise search test sets we find that QIE order can achieve close-to-maximal search effectiveness with much lower computational cost than for other orderings. Additionally, reordering has negligible impact on compressed index size for indexes that contain position information. Our results for an artificial query set against the TREC ClueWeb09 Category B collection are much more equivocal and we canvass possible explanations for future investigation.

Original languageEnglish
Title of host publicationProceedings of the 17th Australasian Document Computing Symposium, ADCS 2012
Pages17-24
Number of pages8
DOIs
Publication statusPublished - 2012
Event17th Australasian Document Computing Symposium, ADCS 2012 - Dunedin, New Zealand
Duration: 5 Dec 20126 Dec 2012

Publication series

NameProceedings of the 17th Australasian Document Computing Symposium, ADCS 2012

Conference

Conference17th Australasian Document Computing Symposium, ADCS 2012
Country/TerritoryNew Zealand
CityDunedin
Period5/12/126/12/12

Fingerprint

Dive into the research topics of 'Reordering an index to speed query processing without loss of effectiveness'. Together they form a unique fingerprint.

Cite this