TY - JOUR
T1 - Score Matching for Compositional Distributions
AU - Scealy, Janice L.
AU - Wood, Andrew T.A.
N1 - Publisher Copyright:
© 2022 American Statistical Association.
PY - 2022/3/3
Y1 - 2022/3/3
N2 - Compositional data are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. With real data, it is often the case that many of the compositional components are highly right-skewed, with large numbers of zeros. Major limitations of currently available models for compositional data include one or more of the following: insufficient flexibility in terms of distributional shape; difficulty in accommodating zeros in the data in estimation; and lack of computational viability in moderate to high dimensions. In this article, we propose a new model, the polynomially tilted pairwise interaction (PPI) model, for analysing compositional data. Maximum likelihood estimation is difficult for the PPI model. Instead, we propose novel score matching estimators, which entails extending the score matching approach to Riemannian manifolds with boundary. These new estimators are available in closed form and simulation studies show that they perform well in practice. As our main application, we analyse real microbiome count data with fixed totals using a multinomial latent variable model with a PPI model for the latent variable distribution. We prove that, under certain conditions, the new score matching estimators are consistent for the parameters in the new multinomial latent variable model.
AB - Compositional data are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. With real data, it is often the case that many of the compositional components are highly right-skewed, with large numbers of zeros. Major limitations of currently available models for compositional data include one or more of the following: insufficient flexibility in terms of distributional shape; difficulty in accommodating zeros in the data in estimation; and lack of computational viability in moderate to high dimensions. In this article, we propose a new model, the polynomially tilted pairwise interaction (PPI) model, for analysing compositional data. Maximum likelihood estimation is difficult for the PPI model. Instead, we propose novel score matching estimators, which entails extending the score matching approach to Riemannian manifolds with boundary. These new estimators are available in closed form and simulation studies show that they perform well in practice. As our main application, we analyse real microbiome count data with fixed totals using a multinomial latent variable model with a PPI model for the latent variable distribution. We prove that, under certain conditions, the new score matching estimators are consistent for the parameters in the new multinomial latent variable model.
KW - Dirichlet distribution
KW - Latent variables
KW - Microbiome data
KW - Multinomial distribution
KW - Parameter estimation
KW - Zeros
UR - http://www.scopus.com/inward/record.url?scp=85126025323&partnerID=8YFLogxK
U2 - 10.1080/01621459.2021.2016422
DO - 10.1080/01621459.2021.2016422
M3 - Article
SN - 0162-1459
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
ER -