Abstract
As the cost eciency of the next generation DNA sequencing technology keeps improving,
there is an ever-increasing demand for high-throughput software to align the enormous number of short reads (patterns) with reference genomes (such as the human genome). In the past few years, a number of very fast alignment software (e.g., Maq, SOAP2, ZOOM, Bowtie, BWA) have been developed; most of them are exploiting some kind of compressed index (like BWT) of the reference sequence. These tools can align at a speed of a few hundred seconds per one million reads. Yet such speed is still behind the throughput of the next generation sequencing machines. In this paper we show the rst implementation of a compressed index on the GPU. The technical issues include how to reduce the memory accesses to the index from individual threads (cores) and how to control the branching and divergence of the threads to avoid unnecessary idle time. Based on this new index, we are able to exploit the parallelism of GPU to speed up the alignment of short reads. Our experiments show that with respect to the human genome, the GPU takes only a few seconds to perform exact alignment of one million length-100 reads, and tens of seconds when a few mismatches are required. Further improvement has been obtained by utilizing the host CPU to share the load of the GPU concurrently.
there is an ever-increasing demand for high-throughput software to align the enormous number of short reads (patterns) with reference genomes (such as the human genome). In the past few years, a number of very fast alignment software (e.g., Maq, SOAP2, ZOOM, Bowtie, BWA) have been developed; most of them are exploiting some kind of compressed index (like BWT) of the reference sequence. These tools can align at a speed of a few hundred seconds per one million reads. Yet such speed is still behind the throughput of the next generation sequencing machines. In this paper we show the rst implementation of a compressed index on the GPU. The technical issues include how to reduce the memory accesses to the index from individual threads (cores) and how to control the branching and divergence of the threads to avoid unnecessary idle time. Based on this new index, we are able to exploit the parallelism of GPU to speed up the alignment of short reads. Our experiments show that with respect to the human genome, the GPU takes only a few seconds to perform exact alignment of one million length-100 reads, and tens of seconds when a few mismatches are required. Further improvement has been obtained by utilizing the host CPU to share the load of the GPU concurrently.
Original language | English |
---|---|
Publication status | Published - 2011 |
Externally published | Yes |
Event | Third Workshop on Massive Data Algorithmics - Paris, Paris, France Duration: 16 Jun 2011 → 16 Jun 2011 https://cs.au.dk/~gerth/madalgo/old-madalgo.au.dk/html_sider/2_5_Events/MASSIVE_2011/Call_For_Papers_2011.html |
Conference
Conference | Third Workshop on Massive Data Algorithmics |
---|---|
Country/Territory | France |
City | Paris |
Period | 16/06/11 → 16/06/11 |
Internet address |