TY - JOUR
T1 - Cauchy robust principal component analysis with applications to high-dimensional data sets
AU - Fayomi, Aisha
AU - Pantazis, Yannis
AU - Tsagris, Michail
AU - Wood, Andrew T.A.
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2024/2
Y1 - 2024/2
N2 - Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood typically outperform, or are on a par with, existing robust PCA techniques. Moreover, the Cauchy PCA algorithm we have used has much lower computational cost in very high dimensional settings than the other public domain robust PCA methods we consider.
AB - Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood typically outperform, or are on a par with, existing robust PCA techniques. Moreover, the Cauchy PCA algorithm we have used has much lower computational cost in very high dimensional settings than the other public domain robust PCA methods we consider.
KW - Cauchy log-likelihood
KW - High-dimensional data
KW - Principal component analysis
KW - Robust
UR - http://www.scopus.com/inward/record.url?scp=85175739797&partnerID=8YFLogxK
U2 - 10.1007/s11222-023-10328-x
DO - 10.1007/s11222-023-10328-x
M3 - Article
SN - 0960-3174
VL - 34
JO - Statistics and Computing
JF - Statistics and Computing
IS - 1
M1 - 26
ER -