TY - JOUR
T1 - Nonparametric Bayesian topic modelling with the hierarchical Pitman–Yor processes
AU - Lim, Kar Wai
AU - Buntine, Wray
AU - Chen, Changyou
AU - Du, Lan
N1 - Publisher Copyright:
© 2016 Elsevier Inc.
PY - 2016/11/1
Y1 - 2016/11/1
N2 - The Dirichlet process and its extension, the Pitman–Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications.
AB - The Dirichlet process and its extension, the Pitman–Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications.
KW - Bayesian nonparametric methods
KW - Hierarchical Pitman–Yor processes
KW - Markov chain Monte Carlo
KW - Topic models
UR - http://www.scopus.com/inward/record.url?scp=84979608871&partnerID=8YFLogxK
U2 - 10.1016/j.ijar.2016.07.007
DO - 10.1016/j.ijar.2016.07.007
M3 - Article
SN - 0888-613X
VL - 78
SP - 172
EP - 191
JO - International Journal of Approximate Reasoning
JF - International Journal of Approximate Reasoning
ER -