Improving Topic Coherence with Regularized Topic Models

David Newman, Edwin Bonilla, Wray Buntine

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data.
    Original languageEnglish
    Title of host publicationAdvances in Neural Information Processing Systems 23
    EditorsRich Zemel, John Shawe-Taylor, Peter Bartlett, Fernando Pereira and Kilian Weinb
    Place of PublicationGranada Spain
    PublisherNeural Information Processing Systems Foundation
    Pages9
    EditionPeer Reviewed
    ISBN (Print)9781618395993
    Publication statusPublished - 2011
    EventNeural Information Processing Systems (NIPS 2011) - Granada Spain
    Duration: 1 Jan 2011 → …
    https://papers.nips.cc/paper/4487-contextual-gaussian-process-bandit-optimization

    Conference

    ConferenceNeural Information Processing Systems (NIPS 2011)
    Period1/01/11 → …
    OtherDecember 13-15 2011
    Internet address

    Fingerprint

    Dive into the research topics of 'Improving Topic Coherence with Regularized Topic Models'. Together they form a unique fingerprint.

    Cite this