Query-independent evidence in home page finding

Trystan Upstill*, Nick Craswell, David Hawking

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    47 Citations (Scopus)

    Abstract

    Hyperlink recommendation evidence, that is, evidence based on the structure of a web's link graph, is widely exploited by commercial Web search systems. However there is little published work to support its popularity. Another form of query-independent evidence, URL-type, has been shown to be beneficial on a home page finding task. We compared the usefulness of these types of evidence on the home page finding task, combined with both content and anchor text baselines. Our experiments made use of five query sets spanning three corpora - one enterprise crawl, and the WT10g and VLC2 Web test collections. We found that, in optimal conditions, all of the query-independent methods studied (in-degree, URL-type, and two variants of PageRank) offered a better than random improvement on a content-only baseline. However, only URL-type offered a better than random improvement on an anchor text baseline. In realistic settings, for either baseline, only URL-type offered consistent gains. In combination with URL-type the anchor text baseline was more useful for finding popular home pages, but URL-type with content was more useful for finding randomly selected home pages. We conclude that a general home page finding system should combine evidence from document content, anchor text, and URL-type classification.

    Original languageEnglish
    Pages (from-to)286-313
    Number of pages28
    JournalACM Transactions on Information Systems
    Volume21
    Issue number3
    DOIs
    Publication statusPublished - Jul 2003

    Fingerprint

    Dive into the research topics of 'Query-independent evidence in home page finding'. Together they form a unique fingerprint.

    Cite this