Server selection methods in hybrid portal search

David Hawking, Paul Thomas

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    37 Citations (Scopus)

    Abstract

    The TREC.GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with judged answers. It can usefully model aspects of government and large corporate portals. Analysis of the.gov data shows that a purely distributed approach would not be feasible for providing search on a.gov portal because of the large number (17,000+) of web sites and the high proportion that do not provide a search interface. An alternative hybrid approach, combining both distributed and centralized techniques, is proposed and server selection methods are evaluated within this framework using web-oriented evaluation methodology. A number of well-known algorithms are compared against representatives (highest anchor ranked page (HARP) and anchor weighted sum (AWSUM)) of a family of new selection methods which use link anchortext extracted from an auxiliary crawl to provide descriptions of sites which are not themselves crawled. Of the previously published methods, ReDDE substantially outperformed three variants of CORI and also outperformed a method based on Kullback-Leibler Divergence (extended) except on topic distillation. HARP and AWSUM performed best overall but were outperformed on the topic distillation task by extended KL Divergence.

    Original languageEnglish
    Title of host publicationSIGIR 2005 - Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
    Pages75-82
    Number of pages8
    DOIs
    Publication statusPublished - 2005
    Event28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005 - Salvador, Brazil
    Duration: 15 Aug 200519 Aug 2005

    Publication series

    NameSIGIR 2005 - Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

    Conference

    Conference28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005
    Country/TerritoryBrazil
    CitySalvador
    Period15/08/0519/08/05

    Fingerprint

    Dive into the research topics of 'Server selection methods in hybrid portal search'. Together they form a unique fingerprint.

    Cite this