TY - JOUR
T1 - Improved Classification for Compositional Data Using the α-transformation
AU - Tsagris, Michail
AU - Preston, Simon
AU - Wood, Andrew T.A.
N1 - Publisher Copyright:
© 2016, Classification Society of North America.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.
AB - In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.
KW - Classification
KW - Compositional data
KW - Jensen-Shannon divergence
KW - α-metric
KW - α-transformation
UR - http://www.scopus.com/inward/record.url?scp=84982803322&partnerID=8YFLogxK
U2 - 10.1007/s00357-016-9207-5
DO - 10.1007/s00357-016-9207-5
M3 - Article
SN - 0176-4268
VL - 33
SP - 243
EP - 261
JO - Journal of Classification
JF - Journal of Classification
IS - 2
ER -