Visualisation of "high p, small n" data

Y. E. Pittelkow, S. R. Wilson*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Development of methods for visualisation of high-dimensional data where the number of observations, n, is small compared to the number of variables, p, is of increasing importance. One major application is the burgeoning field of microarray (gene expression) experiments. Because of their high cost, the number of chips (n) is O(10 - 102) while the number (p) of genes (including expressed sequence tags) on each chip is O(103 - 104). Based on synthetic data simulated in accord with current biological interpretation of microarray data, we have adapted the biplot that simultaneously plots the genes and the chips to display relevant experimental information. Other ordination techniques are also useful for visually exploring microarray data. The biological information that can be revealed by applying these exploratory, visual techniques is illustrated using data from gene expression experiments. When ordination methods, or dimension reduction methods such as PCA and its many variants, are used, in association with gene selection methods, it is well known that "selection bias" can result. We show an application of bootstrap methodology to ordination methods that can be used to account for this bias. Such methods are invaluable when visualization methods are used for pattern recognition, such as when identifying previously unknown sub-classes of tumours in molecular classification.

    Original languageEnglish
    Pages (from-to)533-541
    Number of pages9
    JournalComputational Statistics
    Volume22
    Issue number4
    DOIs
    Publication statusPublished - Dec 2007

    Fingerprint

    Dive into the research topics of 'Visualisation of "high p, small n" data'. Together they form a unique fingerprint.

    Cite this