Score Matching for Compositional Distributions

Janice L. Scealy*, Andrew T.A. Wood

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    4 Citations (Scopus)

    Abstract

    Compositional data are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. With real data, it is often the case that many of the compositional components are highly right-skewed, with large numbers of zeros. Major limitations of currently available models for compositional data include one or more of the following: insufficient flexibility in terms of distributional shape; difficulty in accommodating zeros in the data in estimation; and lack of computational viability in moderate to high dimensions. In this article, we propose a new model, the polynomially tilted pairwise interaction (PPI) model, for analysing compositional data. Maximum likelihood estimation is difficult for the PPI model. Instead, we propose novel score matching estimators, which entails extending the score matching approach to Riemannian manifolds with boundary. These new estimators are available in closed form and simulation studies show that they perform well in practice. As our main application, we analyse real microbiome count data with fixed totals using a multinomial latent variable model with a PPI model for the latent variable distribution. We prove that, under certain conditions, the new score matching estimators are consistent for the parameters in the new multinomial latent variable model.

    Original languageEnglish
    Number of pages13
    JournalJournal of the American Statistical Association
    DOIs
    Publication statusPublished - 3 Mar 2022

    Fingerprint

    Dive into the research topics of 'Score Matching for Compositional Distributions'. Together they form a unique fingerprint.

    Cite this