Pathogen Discovery in Hematological Neoplasms and Inflammatory Diseases by High Throughput Sequencing of Human Tissues.
Akinyemi I Ojesina, Aleksandar Kostic, Joonil Jung, Chandra Pedamallu, Gad Getz, Jaroslaw Maciejewski, Margaret Shipp, Ethel Cesarman, Jon Aster, Matthew Meyerson. Dana-Farber Cancer Institute, Boston, MA; Broad Institute of MIT anad Harvard, Cambridge, MA; Harvard Medical School, Boston, MA; Cleveland Clinic, OH; Weill Cornell Medical College, New York, NY; Brigham and Women's Hospital, Boston
Background: Many diseases are thought to be caused by pathogens. However, many of these pathogens are unknown, possibly because they are present at very low frequencies in the diseased tissues. The advent of high throughput sequencing provides a unique opportunity to address this need. Our work is based on the premise that tissues from pathogen-driven diseases should contain both human and pathogenic nucleic acids. Therefore, high throughput sequencing, followed by computational subtraction of human sequences should result in the enrichment of candidate pathogenic sequences.
Design: We generated cDNA libraries from primary tissues across several hematological cancers and inflammatory diseases. These libraries were subjected to high throughput Illumina sequencing to generate 30-60 million 76bp paired-end sequence reads per sample. Quality filtered reads were analyzed using our automated pipeline, PathSeq, which carries out several subtraction steps involving alignments to i) human genome sequence databases; ii) human transcriptome sequence databases; and iii) other vertebrate sequence databases. Residual sequence reads were then compared with microbial databases, either individually or as part of de novo assembled contigs.
Results: Using both frozen and formalin-fixed paraffin-embedded (FFPE) tissues, we have identified unique pathogenic sequences previously unassociated with hematological cancers and inflammatory diseases. In addition, our pathogen discovery pipeline works with both transcriptome and whole genome sequencing (WGS) data, and it is applicable to data across all high throughput sequencing platforms. Most notably, we are able to detect as low as 1 viral sequence per billion total sequences for WGS data, a sign of the sensitivity of our method. Furthermore, we are implementing a Cloud Computing version of the pipeline for public use by the general scientific community.
Conclusions: We have developed an integrated pipeline, PathSeq, for pathogen discovery in both frozen and FFPE tissues using a high throughput sequencing-based computational subtraction process. This presentation will include highlights of our pipeline and the results of our PathSeq analyses in hematological cancers and inflammatory diseases.
Category: Special Category - Pan-genomic/Pan-proteomic approaches to Cancer
Monday, February 28, 2011 9:30 AM
Poster Session I Stowell-Orbison/Surgical Pathology/Autopsy Awards Poster Session # 242, Monday Morning