Controlling false discovery rates in RNA-sequencing data

Conrad Burden, Sumaira Qureshi, Susan Wilson

    Research output: Contribution to journalMeeting Abstract

    Abstract

    High throughput sequencing technologies are supplanting microarrays as the preferred technology for detecting and quantifying differential gene expression. The raw data produced by the a technique known as RNA-sequencing (RNA-seq), consists of integer counts of reverse transcribed cDNA fragment reads mapped onto each gene or transcript isoform in a reference genome or transcriptome. Many software packages exist for analysing RNA-seq datasets consisting of tables of mapped read counts from biological or technical replicate experiments under two or more conditions, the purpose being to detect which genes are differentially expressed between conditions. Two state-of-the-art packages, DESeq and edgeR, are based on a negative binomial model of read counts. Our tests with simulated data constructed according to the statistical model assumed by these packages reveal that both packages generate a non-uniform p-value spectrum from -hypothesis data. We demo! nstrate how specific knowledge of the non-uniformity can be exploited to develop a graphical technique based on the Storey-Tibshirani method for improving estimates of p-values and false discovery rates in databases where differential expression is present. We have developed an add-on package for DESeq and edgeR, called Polyfit, which implements this method, and evaluate its performance against DESeq, edgeR and another recently introduced package, PoissonSeq, using simulated data.
    Original languageEnglish
    Pages (from-to)1339-1344pp
    JournalProceedings of the 2013 World Statistics Conference
    Publication statusPublished - 2013
    EventISI World Statistics Conference - Hong Kong, Hong Kong
    Duration: 1 Jan 2013 → …
    http://www.isi-web.org, 2013.isiproceedings.org

    Fingerprint

    Dive into the research topics of 'Controlling false discovery rates in RNA-sequencing data'. Together they form a unique fingerprint.

    Cite this