Enabling Translational Research by Integrating Molecular Pathology Data with Tumor Annotation Data for Research in Head and Neck Cancers
Harpreet Singh, Waqas Amin, Ann M Egloff, Jennifer R Hetrick, Jennifer Grandis, Anil V Parwani. University of Pittsburgh, Pittsburgh, PA; University of Pittsburgh Medical Center, Pittsburgh, PA
Background: The SPORE Head and Neck Tumor Database is a bioinformatics supported system incorporating demographics clinical, pathological, and molecular data into a single architecture carried out by a set of common data elements (CDEs) in order to expedite head and neck cancer research. The database is built to provide semantic and syntactic interoperability of data sets and to make the data flexible, shareable and understandable across multiple systems, and end-users.
Design: The database model provides a web-based data annotation and query engine based on common data elements (CDEs) incorporated from College of American Pathologists (CAP) Checklist and North American Association of Central Cancer Registries (NAACR) standards. The system is supported in a three-tiered architecture, and implemented on an Oracle Application Server v10.1.2.3 running on a Windows 2007 and Oracle RDBMS v.11.1.0 running on a Community Enterprise Operating Systems (Centos 5.3.) virtual host definitions which is supported by IBM Cluster hardware. The data annotation engine is a flexible dynamic web-based tool, while the data query engine facilitates investigators to search de-identified information within the warehouse through a customizable interface.
Results: The database contains multimodal datasets that are accessible to investigators via an easy to use query tool. The database currently holds 7662 cases and provides demographic, clinical, pathology, treatment, follow-up, patient and tumor genomic sequencing and other molecular data to 12281 tumor accessions. Recent integration and link to whole genome sequence data from 92 patients is one example of how valuable such as resource of robust, highly annotated database is for researched. This database allows access to sequence analysis data set of 9423 annotation results within the same interface and flexibly accommodates additional data set needs for the future.
Conclusions: The database provides an informatics support to facilitate basic, clinical and translational science research. It offers a mechanism to efficiently select and access richly-annotated biospecimens to meet their research interests and requirements with the goal of integrating laboratory data from multiple investigators in order to develop a comprehensive characterization of individual patients and tumors. The tool protects patient privacy by providing only de-identified data with Institutional Review Board and scientific committee review and approval.
Monday, March 19, 2012 1:00 PM
Poster Session II # 297, Monday Afternoon