Robust PCA for high-dimensional data based on characteristic transformation

Lingyu He, Yanrong Yang, Bo Zhang*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    1 Citation (Scopus)

    Abstract

    In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.

    Original languageEnglish
    Article numberanzs.12385
    Pages (from-to)127-151
    Number of pages25
    JournalAustralian and New Zealand Journal of Statistics
    Volume65
    Issue number2
    DOIs
    Publication statusPublished - Jun 2023

    Fingerprint

    Dive into the research topics of 'Robust PCA for high-dimensional data based on characteristic transformation'. Together they form a unique fingerprint.

    Cite this