Experiments with non-parametric topic models

Wray L. Buntine, Swapnil Mishra

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    45 Citations (Scopus)

    Abstract

    In topic modelling, various alternative priors have been developed, for instance asymmetric and symmetric priors for the document-topic and topic-word matrices respectively, the hierarchical Dirichlet process prior for the document-topic matrix and the hierarchical Pitman-Yor process prior for the topic-word matrix. For information retrieval, language models exhibiting word burstiness are important. Indeed, this burstiness effect has been show to help topic models as well, and this requires additional word probability vectors for each document. Here we show how to combine these ideas to develop high-performing non-parametric topic models exhibiting burstiness based on standard Gibbs sampling. Experiments are done to explore the behavior of the models under different conditions and to compare the algorithms with previously published. The full non-parametric topic models with burstiness are only a small factor slower than standard Gibbs sampling for LDA and require double the memory, making them very competitive. We look at the comparative behaviour of different models and present some experimental insights.

    Original languageEnglish
    Title of host publicationKDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    PublisherAssociation for Computing Machinery (ACM)
    Pages881-890
    Number of pages10
    ISBN (Print)9781450329569
    DOIs
    Publication statusPublished - 2014
    Event20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States
    Duration: 24 Aug 201427 Aug 2014

    Publication series

    NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Conference

    Conference20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
    Country/TerritoryUnited States
    CityNew York, NY
    Period24/08/1427/08/14

    Fingerprint

    Dive into the research topics of 'Experiments with non-parametric topic models'. Together they form a unique fingerprint.

    Cite this