TY - JOUR
T1 - Gaussian Asymptotic Limits for the α-transformation in the Analysis of Compositional Data
AU - Pantazis, Yannis
AU - Tsagris, Michail
AU - Wood, Andrew T.A.
N1 - Publisher Copyright:
© 2019, Indian Statistical Institute.
PY - 2019/2
Y1 - 2019/2
N2 - Compositional data consists of vectors of proportions whose components sum to 1. Such vectors lie in the standard simplex, which is a manifold with boundary. One issue that has been rather controversial within the field of compositional data analysis is the choice of metric on the simplex. One popular possibility has been to use the metric implied by log-transforming the data, as proposed by Aitchison (Biometrika70, 57–65, 1983, 1986) and another popular approach has been to use the standard Euclidean metric inherited from the ambient space. Tsagris et al. (2011) proposed a one-parameter family of power transformations, the α-transformations, which include both the metric implied by Aitchison’s transformation and the Euclidean metric as particular cases. Our underlying philosophy is that, with many datasets, it may make sense to use the data to help us determine a suitable metric. A related possibility is to apply the α-transformations to a parametric family of distributions, and then estimate α along with the other parameters. However, as we shall see, when one follows this last approach with the Dirichlet family, some care is needed in a certain limiting case which arises (α → 0), as we found out when fitting this model to real and simulated data. Specifically, when the maximum likelihood estimator of α is close to 0, the other parameters tend to be large. The main purpose of the paper is to study this limiting case both theoretically and numerically and to provide insight into these numerical findings.
AB - Compositional data consists of vectors of proportions whose components sum to 1. Such vectors lie in the standard simplex, which is a manifold with boundary. One issue that has been rather controversial within the field of compositional data analysis is the choice of metric on the simplex. One popular possibility has been to use the metric implied by log-transforming the data, as proposed by Aitchison (Biometrika70, 57–65, 1983, 1986) and another popular approach has been to use the standard Euclidean metric inherited from the ambient space. Tsagris et al. (2011) proposed a one-parameter family of power transformations, the α-transformations, which include both the metric implied by Aitchison’s transformation and the Euclidean metric as particular cases. Our underlying philosophy is that, with many datasets, it may make sense to use the data to help us determine a suitable metric. A related possibility is to apply the α-transformations to a parametric family of distributions, and then estimate α along with the other parameters. However, as we shall see, when one follows this last approach with the Dirichlet family, some care is needed in a certain limiting case which arises (α → 0), as we found out when fitting this model to real and simulated data. Specifically, when the maximum likelihood estimator of α is close to 0, the other parameters tend to be large. The main purpose of the paper is to study this limiting case both theoretically and numerically and to provide insight into these numerical findings.
KW - Dirichlet distribution
KW - Log-ratio transformation
KW - Manifold
KW - Metric
KW - Power transformation
KW - Primary 62E20
KW - Secondary 62H12
UR - http://www.scopus.com/inward/record.url?scp=85061473915&partnerID=8YFLogxK
U2 - 10.1007/s13171-018-00160-1
DO - 10.1007/s13171-018-00160-1
M3 - Article
SN - 0976-836X
VL - 81
SP - 63
EP - 82
JO - Sankhya A
JF - Sankhya A
IS - 1
ER -