Alignment in a SNAP: Cancer Diagnosis in the Genomic Age
Matei Zaharia, Bill Bolosky, Kristal Curtis, David Patterson, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Taylor Sittler. UCSF, San Francisco, CA; UC Berkeley, Berkeley, CA; Microsoft, Redmond, WA
Background: As the cost of DNA sequencing continues to drop at a pace exceeding that of Moore's Law, there is growing need for tools that can efficiently analyze ever larger bodies of sequence data. By mid-2013, it is estimated that we will reach the $1000 genome. The cost of sequencing a person's genome will then enter the realm of routine clinical practice and it is expected that each cancer patient will have their genome and their cancer's genome sequenced. In order to assemble and interpret this information from the massive numbers of short reads generated by current sequencing machines, significant technological advancement is necessary. Here, we address the first step in the interpretation of a cancer genome from raw sequence information: sequence alignment.
Design: We tested SNAP (Scalable Nucleotide Alignment Package) against the most popular short read aligners, including BWA, Bowtie, and SOAP. Trials included generation of reads from the hg19 build of the human genome with simulated mutations, insertions, and deletions. Additional trials demonstrating superior performance against longer reads and actual whole genome sequencing data sets will be presented at the conference.
Results: SNAP significantly outperforms existing aligners in terms of speed while achieving higher accuracy.
|Aligner||Seconds per Million Reads||Accuracy (%)||False Positive (%)|