Integrative exploration of large high-dimensional datasets

Christopher Pardy, Sally Galbraith, Susan R. Wilson

    Research output: Contribution to journalArticlepeer-review

    7 Citations (Scopus)

    Abstract

    Large, high-dimensional datasets containing different types of variables are becoming increasingly common. For exploring such data, there is a need for integrated methods. For example, a single genomic experiment can contain large quantities of different types of data (including clinical data) that make it a challenge to coherently describe the patterns of variability within and between the inter-related datasets. Mutual information (MI) is a widely used information theoretic dependency measure that also can identify nonlinear and nonmonotonic associations. First, we develop a computationally efficient implementation of MI between a discrete and a continuous variable. This implementation allows us to apply a coherent approach to all comparisons arising from continuous and categorical data. As commonly applied, MI can have high levels of bias. So we present a novel development of mutual information (MI) that reduces the bias, and that we term bias corrected mutual information (BCMI). Further, BCMI is useful as an association measure that can be incorporated in subsequent analyses such as clustering and visualisation procedures. To demonstrate our approach, a genomic dataset is re-examined. This dataset contains single nucleotide polymorphisms (SNPs, a discrete variable), gene expression levels and clinical data (all continuous variables). Our approach allows us to integrate these different types of data by exploring associations both within and between these types of variables.

    Original languageEnglish
    Pages (from-to)178-199
    Number of pages22
    JournalAnnals of Applied Statistics
    Volume12
    Issue number1
    DOIs
    Publication statusPublished - Mar 2018

    Fingerprint

    Dive into the research topics of 'Integrative exploration of large high-dimensional datasets'. Together they form a unique fingerprint.

    Cite this