Complementary hierarchical clustering

Gen Nowak*, Robert Tibshirani

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)

Abstract

When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly differentially expressed genes that have closely related expression patterns. Sometimes, these genes may not be relevant to the biological process under study or their functions may already be known. The problem is that these genes can potentially drown out the effects of other genes that are relevant or have novel functions. We propose a procedure called complementary hierarchical clustering that is designed to uncover the structures arising from these novel genes that are not as highly expressed. Simulation studies show that the procedure is effective when applied to a variety of examples. We also define a concept called relative gene importance that can be used to identify the influential genes in a given clustering. Finally, we analyze a microarray data set from 295 breast cancer patients, using clustering with the correlation-based distance measure. The complementary clustering reveals a grouping of the patients which is uncorrelated with a number of known prognostic signatures and significantly differing distant metastasis-free probabilities.

Original languageEnglish
Pages (from-to)467-483
Number of pages17
JournalBiostatistics
Volume9
Issue number3
DOIs
Publication statusPublished - Jul 2008
Externally publishedYes

Fingerprint

Dive into the research topics of 'Complementary hierarchical clustering'. Together they form a unique fingerprint.

Cite this